# Accessing the Wikipedia API
This notebook pulls some information from the Wikipedia API. This API is nice because it doesn't require authentication. (The Twitter API requires authentication--that's a necessary process to go through, but requires some work.) 

In [None]:
import requests

wikipedia_api_url = "https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmlimit=10"

We can start by just building a simple query, getting 10 people born in 1973. We will use the delightful and amazing [`requests`](http://docs.python-requests.org/en/master/) library in Python. The format of the URL is based on a bunch of reading about the [Wikipedia API](https://www.mediawiki.org/wiki/API:Categorymembers) and trial and error. And error. 

In [None]:
full_url = wikipedia_api_url + "&cmtitle=Category:1973_births"

print(full_url)

r = requests.get(full_url)

Feel free to click on the link above. You'll see a `pprint` version of what was returned. Thanks Wikipedia!

In this next cell, type `r.` + tab and look at all the options you have to complete the request object.

In [None]:
r.

One of the most useful is `r.json()`.

In [None]:
r.json()

Compare these results to the entry: https://en.wikipedia.org/wiki/Category:1973_births.

JSON objects look a lot like Python dictonaries. In this case, we've got three main keys, `batchcomplete`, `continue`, and `query`.

In [None]:
for item in r.json() :
    print(item)

`batchcomplete` tells us if we're done, I think. `continue` is used to continue through the results since we can't request more than 500 items at once. And `query` has the results. 

In [None]:
for item in r.json()['query']['categorymembers'] :
    print(item)

Now let's build a list of everyone born in 1973. I've added a way to get out using an interation counter. Change the `iteration > n` line to get a different number of pages of results or make it something like 50 to get all the names. 

In [None]:
# Let's build up our request in a more sustainable way
req = {'action':'query',
       'format':'json',
       'list':'categorymembers',
       'cmlimit':500, # move the limit up to the max we can do.
       'cmtitle':'Category:1973_births'}

last_continue = {} # used to keep track of how far we've gone. 
iteration = 1
pages = 0

names = []

while True :
    # Modify it with the values returned in the 'continue' section of the last result.
    req.update(last_continue)
    
    # Call API
    result = requests.get('https://en.wikipedia.org/w/api.php', params=req).json() 
    
    pages += 1
    
    # Grab the names
    for item in result['query']['categorymembers'] :
        names.append(item['title'])
    
    # keep track of our iteration so we can exit if this runs forever
    iteration += 1
    
    # Can we get out?
    if 'continue' not in result :
        break
    else :
        last_continue = result['continue']
    
    if iteration > 300 :
        # it's useful to have a way out of while statements
        break 

print("We pulled {} pages".format(pages))

Let's talk through the above code. 

To see what's going on, I'll print the first 10 names and the last 10 names.

In [None]:
print(names[:10])
print(names[-10:])

Now your turn. Pick a year, pull all the names for people born in that year, and count up the most common first names and last names. 