# Wikiapi functions

## Agent Authentication

Initialize the agent, specifying contact information for if they need to communicate something to you.

In [1]:
import wikipediaapi
agent = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'en')

## Page Object

See examples of attributes below.

In [18]:

page_py = agent.page('Python_(programming_language)')

print("fullurl:", page_py.fullurl)
print("canonicalurl:", page_py.canonicalurl)
print("pageid:", page_py.pageid)
print("displaytitle:", page_py.displaytitle)
print("talkid:", page_py.talkid)

print("Page - Exists: %s" % page_py.exists())
print("Page - Title: %s" % page_py.title)
print("Page - Summary: %s" % page_py.summary[:60])
print("Page - Language: %s" % page_py.language)
print("Page - Namespace: %s" % page_py.namespace)
print("Page - Sections: %s" % [section.title for section in page_py.sections])

print("\n\n")

for attribute in dir(page_py):
    if not attribute.startswith("_"):
        print(attribute)

fullurl: https://en.wikipedia.org/wiki/Python_(programming_language)
canonicalurl: https://en.wikipedia.org/wiki/Python_(programming_language)
pageid: 23862
displaytitle: Python (programming language)
talkid: 24235
Page - Exists: True
Page - Title: Python (programming language)
Page - Summary: Python is a high-level, general-purpose programming language
Page - Language: en
Page - Namespace: 0
Page - Sections: ['History', 'Design philosophy and features', 'Syntax and semantics', 'Programming examples', 'Libraries', 'Development environments', 'Implementations', 'Development', 'API documentation generators', 'Naming', 'Popularity', 'Uses', 'Languages influenced by Python', 'See also', 'References', 'Further reading', 'External links']



ATTRIBUTES_MAPPING
backlinks
categories
categorymembers
exists
langlinks
language
links
namespace
section_by_title
sections
sections_by_title
summary
text
title
wiki


## Get all the articles in a category

In [20]:
def print_categories(page):
        categories = page.categories
        for title in sorted(categories.keys()):
            print("%s: %s" % (title, categories[title]))


print("Categories")
print_categories(page_py)

Categories
Category:All articles containing potentially dated statements: Category:All articles containing potentially dated statements (id: ??, ns: 14)
Category:Articles containing potentially dated statements from 2008: Category:Articles containing potentially dated statements from 2008 (id: ??, ns: 14)
Category:Articles containing potentially dated statements from 2020: Category:Articles containing potentially dated statements from 2020 (id: ??, ns: 14)
Category:Articles containing potentially dated statements from December 2022: Category:Articles containing potentially dated statements from December 2022 (id: ??, ns: 14)
Category:Articles containing potentially dated statements from June 2023: Category:Articles containing potentially dated statements from June 2023 (id: ??, ns: 14)
Category:Articles containing potentially dated statements from March 2024: Category:Articles containing potentially dated statements from March 2024 (id: ??, ns: 14)
Category:Articles containing potentia

## Get all category members

In [4]:
def print_categorymembers(categorymembers, level=0, max_level=1):
        for c in categorymembers.values():
            print("%s: %s (ns: %d)" % ("*" * (level + 1), c.title, c.ns))
            if c.ns == wikipediaapi.Namespace.CATEGORY and level < max_level:
                print_categorymembers(c.categorymembers, level=level + 1, max_level=max_level)

agent_es = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'es')
cat = agent_es.page("Category:Ciencias de la vida")
print("Category members: Category:Ciencias de la vida")


print_categorymembers(cat.categorymembers, level=0, max_level=3)

Category members: Category:Ciencias de la vida
*: Ciencias de la vida (ns: 0)
*: Euroliga de Ciencias de la Vida (ns: 0)
*: Categoría:Bioingeniería (ns: 14)
**: Agricultura molecular (ns: 0)
**: Amortiguador reológico (ns: 0)
**: Bioastronáutica (ns: 0)
**: Biocibernética (ns: 0)
**: Biocompatibilidad (ns: 0)
**: Biofabricación (ns: 0)
**: Bioimpresión 3D (ns: 0)
**: Bioinformática (ns: 0)
**: BIOMED (ns: 0)
**: Biorreactor (ns: 0)
**: Cerámica piezoeléctrica (ns: 0)
**: E-NABLE (ns: 0)
**: Edición de calidad (ns: 0)
**: Fed-batch (ns: 0)
**: Gene targeting (ns: 0)
**: Hígado artificial (ns: 0)
**: Ingeniería biológica (ns: 0)
**: Ingeniería biónica (ns: 0)
**: Ingeniería de tejidos (ns: 0)
**: Ingeniería genética humana (ns: 0)
**: Instituto de Investigación Biomédica de Lérida (ns: 0)
**: Instituto de Investigación Biomédica (ns: 0)
**: Instituto Maimónides de Investigación Biomédica de Córdoba (ns: 0)
**: Instituto Nacional de Bioinformática (ns: 0)
**: Material biocompatible (ns: 0

KeyboardInterrupt: 

In [62]:
import wikipediaapi
import concurrent.futures
from tqdm import tqdm

def get_subcategories(category, depth, max_depth):
    if depth > max_depth:
        return []
    subcategories = []
    for subcategory in category.categorymembers.values():
        if subcategory.ns == wikipediaapi.Namespace.CATEGORY:
            subcategories.append(subcategory)
            subcategories.extend(get_subcategories(subcategory, depth + 1, max_depth))
    return subcategories

def get_articles(category, articles_set):
    articles = []
    for member in category.categorymembers.values():
        if member.ns == wikipediaapi.Namespace.MAIN:
            if member.title not in articles_set:
                articles.append(member)
                articles_set.add(member.title)
    return articles

def fetch_subcategories(subcategory):
    return get_subcategories(subcategory, 1, max_depth)

def fetch_articles(subcategory):
    return get_articles(subcategory, articles_set)

wiki_wiki = wikipediaapi.Wikipedia("papapa", 'es')

category_name = "Categoría:Ciencias de la vida"
category = wiki_wiki.page(category_name)

if not category.exists():
    print(f"Category '{category_name}' does not exist.")

max_depth = 5
print(f"Get subcategories from Category:{category}")

# Using ThreadPoolExecutor to parallelize the fetching of subcategories
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(fetch_subcategories, subcategory) for subcategory in category.categorymembers.values() if subcategory.ns == wikipediaapi.Namespace.CATEGORY]
    
    subcategories = []
    for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
        subcategories.extend(future.result())

print(f"Get articles from Category:{category}")
articles_set = set()
articles = get_articles(category, articles_set)

print(f"Subcategories: {len(subcategories)}")
print(f"Articles: {len(articles)}")

# Using ThreadPoolExecutor to parallelize the fetching of articles from subcategories
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(fetch_articles, subcategory) for subcategory in subcategories]

    for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
        articles.extend(future.result())


Get subcategories from Category:Categoría:Ciencias de la vida (id: 4591756, ns: 14)


100%|██████████| 7/7 [49:10<00:00, 421.50s/it]


Get articles from Category:Categoría:Ciencias de la vida (id: 4591756, ns: 14)
Subcategories: 66325
Articles: 2


100%|██████████| 66325/66325 [15:24<00:00, 71.74it/s]  


In [61]:
import concurrent.futures
from tqdm import tqdm

# Assuming the definitions for get_subcategories and get_articles are already available

max_depth = 5
print(f"Get subcategories from Category:{category}")
subcategories = get_subcategories(category, 4, max_depth)

print(f"Get articles from Category:{category}")
articles_set = set()
articles = get_articles(category, articles_set)

print(f"Subcategories: {len(subcategories)}")
print(f"Articles: {len(articles)}")

def fetch_articles(subcategory):
    return get_articles(subcategory, articles_set)

# Using ThreadPoolExecutor to parallelize the fetching of articles from subcategories
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(fetch_articles, subcategory) for subcategory in subcategories]

    for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
        articles.extend(future.result())


Get subcategories from Category:Categoría:Ciencias de la vida (id: 4591756, ns: 14)


KeyboardInterrupt: 

[Ciencias de la vida (id: ??, ns: 0),
 Euroliga de Ciencias de la Vida (id: ??, ns: 0),
 Agricultura molecular (id: ??, ns: 0),
 Amortiguador reológico (id: ??, ns: 0),
 Bioastronáutica (id: ??, ns: 0),
 Biocibernética (id: ??, ns: 0),
 Biocompatibilidad (id: ??, ns: 0),
 Biofabricación (id: ??, ns: 0),
 Bioimpresión 3D (id: ??, ns: 0),
 Bioinformática (id: ??, ns: 0),
 BIOMED (id: ??, ns: 0),
 Biorreactor (id: ??, ns: 0),
 Cerámica piezoeléctrica (id: ??, ns: 0),
 E-NABLE (id: ??, ns: 0),
 Edición de calidad (id: ??, ns: 0),
 Fed-batch (id: ??, ns: 0),
 Gene targeting (id: ??, ns: 0),
 Hígado artificial (id: ??, ns: 0),
 Ingeniería biológica (id: ??, ns: 0),
 Ingeniería biónica (id: ??, ns: 0),
 Ingeniería de tejidos (id: ??, ns: 0),
 Ingeniería genética humana (id: ??, ns: 0),
 Instituto de Investigación Biomédica de Lérida (id: ??, ns: 0),
 Instituto de Investigación Biomédica (id: ??, ns: 0),
 Instituto Maimónides de Investigación Biomédica de Córdoba (id: ??, ns: 0),
 Instituto Na

In [56]:
len(subcategories)

505

In [57]:
len(articles)

11881

In [None]:
    for subcategory in category.categorymembers.values():
        if subcategory.ns == wikipediaapi.Namespace.CATEGORY:
            subcategories.append(subcategory)
            subcategories.extend(get_subcategories(subcategory, depth + 1, max_depth))
    return subcategories

def get_articles(category, articles_set):
    articles = []
    for member in category.categorymembers.values():
        if member.ns == wikipediaapi.Namespace.MAIN:
            if member.title not in articles_set:
                articles.append(member)
                articles_set.add(member.title)
    return articles

wiki_wiki = wikipediaapi.Wikipedia("papapa", 'es')

category_name = "Categoría:Ciencias de la vida"
category = wiki_wiki.page(category_name)

if not category.exists():
    print(f"Category '{category_name}' does not exist.")


max_depth = 5
subcategories = get_subcategories(category, 4, max_depth)

articles_set = set()
articles = get_articles(category, articles_set)

for subcategory in subcategories:
    articles.extend(get_articles(subcategory, articles_set))

for article in articles:
    print(article.title, article.fullurl)

In [19]:
def print_langlinks(page):
        langlinks = page.langlinks
        for k in sorted(langlinks.keys()):
            v = langlinks[k]
            print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl))

print_langlinks(page_py)

af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal)
als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache)
an: an - Python: https://an.wikipedia.org/wiki/Python
ar: ar - بايثون (لغة برمجة): https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86_(%D9%84%D8%BA%D8%A9_%D8%A8%D8%B1%D9%85%D8%AC%D8%A9)
as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8
ast: ast - Python: https://ast.wikipedia.org/wiki/Python
az: az - Python (proqramlaşdırma dili): https://az.wikipedia.org/wiki/Python_(proqramla%C5%9Fd%C4%B1rma_dili)
azb: azb - پایتون: https://azb.wikipedia.org/wiki/%D9%BE%D8%A7%DB%8C%D8%AA%D9%88%D9%86
ban: ban - Python: https://ban.wikipedia.org/wiki/Python
be: be - Python (мова праграмавання): https://be.wikipedia.org/wiki/Python_(%D0%BC%D0%BE%D0%B2%D0%B0_%D0%BF%D1%80%D0%B0%D0%B3%D1%80%D0%B0%D0%BC%D0%B0%D0%B2%D0%B0%D0%BD%D0%BD%D1%8F)
bg: bg - Python: https://