# Return a list of names from Wikipedia

The input for 'Return_death_dates' is a list of names. I need to query the Wikipedia API and return a list of names. Ideally this should be a list of all people on wikipedia. Extra desirable information is profession (i.e., muscician), music genre, success.

## Use the Wikipedia API

In [1]:
import wikipediaapi

wiki_en = wikipediaapi.Wikipedia('en')  # call an instance of english

## Explore .categories

In [18]:
cat_Musicians_page = wiki_en.page("Category:Musicians")

f'Using the .page() method on a Category returns a {type(cat_Musicians_page)}.'

"Using the .page() method on a Category returns a <class 'wikipediaapi.WikipediaPage'>."

In [19]:
cat_Musicians_members = cat_Musicians_page.categorymembers

f'We can then use the attribute .categorymembers to return a {type(cat_Musicians_members)}'

"We can then use the attribute .categorymembers to return a <class 'dict'>"

In [21]:
length = len(list(cat_Musicians_members.keys())) # number of keys in dict

import random

onesample = random.choice(list(cat_Musicians_members.items()))

f'The dictionary has {length} elements of the following form: {onesample}'

"The dictionary has 65 elements of the following form: ('Category:Singers', Category:Singers (id: ??, ns: 14))"

Of note is the tuple that includes 'ns'. This tells us that there are 14 more levels that we can dig.

## Define a function to return a dictionary of category members

In [2]:
def wiki_cat_dictionary(wiki_Category):
    """
    Using a category input, return a dictionary.
    Dictionary contains category members
    
    category syntax == "Category:Musicians"
    """
    wiki_en = wikipediaapi.Wikipedia('en')
    
    return wiki_en.page(wiki_Category).categorymembers

This includes base members (ns = 0) and category members (ns > 0)

In [292]:
wiki_cat_dictionary('Category:Musicians')

{'Musician': Musician (id: ??, ns: 0),
 'Songster': Songster (id: ??, ns: 0),
 'Troubadour': Troubadour (id: ??, ns: 0),
 'Virtuoso': Virtuoso (id: ??, ns: 0),
 'Category:Musicians by band': Category:Musicians by band (id: ??, ns: 14),
 'Category:Musicians by century': Category:Musicians by century (id: ??, ns: 14),
 'Category:Musicians by genre': Category:Musicians by genre (id: ??, ns: 14),
 'Category:Musicians by geographical categorization': Category:Musicians by geographical categorization (id: ??, ns: 14),
 'Category:Musicians by instrument': Category:Musicians by instrument (id: ??, ns: 14),
 'Category:Artists by record label': Category:Artists by record label (id: ??, ns: 14),
 'Category:Musicians of Indian descent': Category:Musicians of Indian descent (id: ??, ns: 14),
 'Category:Musicians of Iranian descent': Category:Musicians of Iranian descent (id: ??, ns: 14),
 'Category:Lists of musicians': Category:Lists of musicians (id: ??, ns: 14),
 'Category:Musical groups': Catego

This includes only category members (ns > 0)

In [285]:
for c in wiki_cat_dictionary('Category:LGBT musicians by nationality').values():
    if c.title[0:8] == 'Category':
        print(c.title)

Category:LGBT musicians from Algeria
Category:LGBT musicians from Argentina
Category:LGBT musicians from Australia
Category:LGBT musicians from Belgium
Category:LGBT musicians from Brazil
Category:LGBT musicians from Bulgaria
Category:LGBT musicians from Canada
Category:LGBT musicians from Chile
Category:LGBT musicians from China
Category:LGBT musicians from Cuba
Category:LGBT musicians from Denmark
Category:LGBT musicians from Finland
Category:LGBT musicians from France
Category:LGBT musicians from Germany
Category:LGBT musicians from Hong Kong
Category:LGBT musicians from Iceland
Category:LGBT musicians from Ireland
Category:LGBT musicians from Israel
Category:LGBT musicians from Italy
Category:LGBT musicians from Japan
Category:LGBT musicians from the Philippines
Category:LGBT musicians from the Netherlands
Category:LGBT musicians from New Zealand
Category:LGBT musicians from Norway
Category:LGBT musicians from Poland
Category:LGBT musicians from Russia
Category:LGBT musicians from 

## Define a function to return all base category members

This will reflect musicians if we choose appropriate categories to start with.

In [277]:
def list_bottomcategorymembers(categorymembers, return_list, parent_list):
    """
    Using a dictionary of wikipedia page category members, return a list of all bottom category members (ns == 0)
    Provide an empty list that will be populated with bottom category members (return_list)
    Provide an empty list that will be populated with parent categories (parent_list)
    """
           
    seen = set(return_list)  # functionality to prevent duplicates; code source: https://stackoverflow.com/questions/19834806/is-there-a-more-pythonic-way-to-prevent-adding-a-duplicate-to-a-list
    
    for c in categorymembers.values():
        
        # check for duplicates
        if c.title not in seen:
               
            if c.ns == 0:
                return_list.append(c.title)

            else:
                # sometimes category members do not begin with 'Category'; will result in read time error
                if c.title[0:8] != 'Category':
                    continue

                # some artists have categories; not interested in digging deeper
                if c.title == parent_list[-1]:
                    continue

                # some categories are empty; will result in read time error
                if wiki_cat_dictionary(c.title) == {}:
                    continue

                # keep track of previous categories - helps with debugging
                parent_list.append(c.title)
                
                # keep digging down
                list_bottomcategorymembers(c.categorymembers, return_list, parent_list)
            
    return return_list

## Execute

Define an empty list 'return_list' this will be filled with base category members. The parent list will keep track of of categories. Handy for troubleshooting and avoiding musicians that have non-base categories (e.g, Bruce Springsteen).

In [290]:
return_list = []
parent_list = [-99]

In [291]:
male_musicians = list_bottomcategorymembers(wiki_cat_dictionary('Category:Male musicians by nationality'), return_list, parent_list)

In [293]:
female_musicians = list_bottomcategorymembers(wiki_cat_dictionary('Category:Female musicians by nationality'), return_list, parent_list)

In [294]:
lgbt_musicians = list_bottomcategorymembers(wiki_cat_dictionary('Category:LGBT musicians by nationality'), return_list, parent_list)

In [297]:
f'The Wikipedia API returned {len(lgbt_musicians)} musicians.'

'The Wikipedia API returned 76326 musicians.'

The following can be used to inspect the results.

In [220]:
return_list[-20:]

['Brianna Thomas',
 'Tracie Thoms',
 'Amelia Tilghman',
 'Linda Tillery',
 'Barbara Johnson Tucker',
 'Lisa Tucker (singer)',
 'Leslie Uggams',
 'Stephanie Umoh',
 'Charenee Wade',
 'Ella Washington',
 'Rose Weaver',
 'Jane White',
 'Lillias White',
 'Terri White',
 'Alyson Williams',
 'Thomasina Winslow',
 'Carol Woods',
 'Renn Woods',
 'Norma Jean Wright',
 'Wynter Gordon']

In [282]:
parent_list[-20:]

['Category:Ugandan female musicians',
 'Category:Ugandan female singers',
 'Category:Ugandan girl groups',
 'Category:Ukrainian female musicians',
 'Category:Ukrainian female singers',
 'Category:Ukrainian contraltos',
 'Category:Ukrainian mezzo-sopranos',
 'Category:Ukrainian sopranos',
 'Category:Ukrainian operatic sopranos',
 'Category:Ukrainian girl groups',
 'Category:Ukrainian women pianists',
 'Category:Venezuelan female musicians',
 'Category:Venezuelan female composers',
 'Category:Venezuelan female classical composers',
 'Category:Venezuelan female singers',
 'Category:Venezuelan female singer-songwriters',
 'Category:Venezuelan women pianists',
 'Category:Vietnamese female musicians',
 'Category:Vietnamese female singers',
 'Category:Vietnamese female rappers']