# A Basic Introduction to using APIs for Acquiring Data
---

First, I will use these two libraries, so I should load them:

In [None]:
# Import two libraries
import requests # this handles the getting web data
import pandas as pd # this is pandas, where I've given a short name 'pd'

In [None]:
import sys
sys.path.append('/home/alistair/.keys')
import alistair_keys

## Open Notify (who's in Space!)

In [None]:
# Set the variable to the output of the requests.get() function
# Pointed at a very specific domain
resp = requests.get('http://api.open-notify.org/astros.json')
# This is an if command
if resp.status_code==200 : # the condition here is a 200 status code (something like a 404 or 403 would be an error!)
    print("Good GET") # everything that's indented is then run if true
received_dict=resp.json()
# Anything run on the last line will be output to the notebook
received_dict

* Explaining the output, the braces `{ }` tell us this is a dict data type (a dictionary). 
* Each entry is an `key:value` pair a `'key'` to describe the entry, and a `value` associated with that key. 
   * So the `'message'` key has value `'success'`. 
   * The `'people'` key has a value which is a list `[ ]`
   * Each entry in the `people` list is another dictionary with two keys 
      * A `craft` for the name of the vessel
      * A `name` for the astronaut

Extracting the `people` entry, we just enter the key as an argument for the dictionary. 

In [None]:
spacePeople=received_dict['people']
spacePeople

Similarly, for the list we can call any entry by the numerical value (where the first entry is 0)

In [None]:
spacePeople[0]['craft']

In [None]:
spacePeople[6]

Often in Python it's good to figure out how long things are, or what data type they are:

In [None]:
print(  len( spacePeople )  )  # len() tells me the length of the listable object
type(spacePeople)  # type() confirms this is a list data type

The list here is made up of a series of 'dict' (a dictionary) entries, the key-name and value pair for the specific list entry.

Here I'm just going to extract the type for each entry in turn and print it to output:

In [None]:
for entry in spacePeople:
    print(type(entry ))

Thankfully, it's easy to bring Lists of dictionaries like this into Pandas as a dataframe

In [None]:
df=pd.DataFrame(resp.json()['people'])
df

Dataframes in python are similar to R, where each column can be a different data type. 

Thankfully, pandas has lots of helper functions built in that can operate on both columns

In [None]:
for spacecraft in df['craft'].unique(): #( for each unique entry in the column)
    print(spacecraft) # print out it's name

Because you may want to get data via Python, but analyze it in R, it is often convenient to export the dataframe you've constructed to a csv file.

In [None]:
df.to_csv('CurrentSpacePeople.csv',columns=["craft","name"])

## FBI Most Wanted

Anyone who is making an API available will normally have some documentation about how to make requests. For the FBI most wanted, though fairly minimal, there is documentation [here](https://www.fbi.gov/wanted/api)

I read this, and will use it to access their data.

In [None]:
# Make an empty list
resp_list=[]
# Add the first page of result to the list 
resp_list.append( requests.get('https://api.fbi.gov/wanted/v1/list'))
# get the status code for the page (remember 200 codes are good!)
print(resp_list[0].status_code) # note that in python, lists starts at element zero!

In [None]:
type(resp_list[0])

The response data is stored in the `.json()` part of the requests response

In [None]:
resp_list[0].json() # This will be quite long!

Probably easier to just iterate through the data here:

In [None]:
i=1 # initialize variable i to 1
# For each key in the dictionary, print the key name
for key in resp_list[0].json(): 
    print("key "+str(i)+": "+key)
    i=i+1
# Now that this is no longer indented, it's not part of the for loop
print (str(i-1) +" keys printed") # Summarize what we just did

The page key here is just which page of entries the request got from the website

In [None]:
len(resp_list[0].json()['items'])

In [None]:
print("There are "+ str( resp_list[0].json()['total'] )+ " total entries in the MostWanted List")
print("But there are only "+str(  len(resp_list[0].json()['items']) ) + " entries in the Items list!")

The reason for this, is that our get request only populated the first page. From reading the documentation, we can pass a parameter to the API to ask for a different page:

In [None]:
resp_list.append(requests.get('https://api.fbi.gov/wanted/v1/list', params={'page': 2}))
print(resp_list[1].status_code)

Let's get multiple pages using a for loop... but before we do, let's look at one intricacy of python:

In [None]:
for ii in range(0,5): # range(0,5) would be similar
    print(ii)

So the range command here runs over 5 total entries, but because it starts at zero, the last entry is 4.

So we're going to need how many pages?

In [None]:
resp_list[0].json()['total']/ len( resp_list[0].json()['items'] )

Initializing our list:

In [None]:
resp_list=[]

So we'll explicitly tell it to look through pages 1 to 8

In [None]:
for ii in range(1,9): # for 1 and 15 (note it doesn't include 16) do the following: 
    resp_list.append(requests.get('https://api.fbi.gov/wanted/v1/list', params={'page': ii}))
    if resp_list[ii-1].status_code==200: print("Page "+str(ii)+" captured") # this is indented, so it's within the for loop

In [None]:
resp_list # let's check they all have 200 codes!

I'm going to append all the entries in the list together as a dataframe:

In [None]:
df_list=[] # create empty list
for resp in resp_list:
    df_list.append( pd.DataFrame(resp.json()['items']) ) # append the items entry in the json response as a new entry in the list
# No indent, so this is after the for loop...
dfJoined=pd.concat(df_list, ignore_index=True) # Join all the separate dataframes in the list together, ingnoring the separate indices...
dfJoined

In [None]:
list_of_lists=dfJoined["field_offices"].to_list()
field_office_list=set() # a set is an unordered list with no repeat entries...
for sublist in list_of_lists: 
    if sublist is not None: # If the sublist has some entries
        for office in sublist: # for each office in the sublist
            if office is not None: field_office_list.add(office) # if it isn't None type, add to the set
field_office_list # print the set at the end!

We can be more precise here, where we could have asked the API about the pittsburgh field office specifically as a parameter:

In [None]:
response = requests.get('https://api.fbi.gov/wanted/v1/list', params={
    'field_offices': 'pittsburgh'
})
pittsburghList=response.json()['items']
len(pittsburghList)

In [None]:
pghFBI=pd.DataFrame(pittsburghList)
pghFBI[ ['title','images','description'] ] 

There are 18 entries (so 0 to 18 in the list)

In [None]:
imgList=[]
imgLoc=pghFBI[["title","images"]]
for index, row in imgLoc.iterrows():
    imgList.append( {'name':row['title'],'thumb':row['images'][0]['thumb']})

In [None]:
imgList=[]
imgLoc=pghFBI[["title","images"]]
for index, row in imgLoc.iterrows(): # Here I'm iterating over each row in the dataframe
    imgList.append( {'name':row['title'],'thumb':row['images'][0]['thumb']})
# Here I'll use a library called IPython to display the output in a web format:
from IPython.display import Image, display
# Now use Ipython to display them
for img in imgList[ : 5]: # Here I'm getting the first 3 images
    display(img['name'],Image(url=img['thumb'], width=150))

___
## Bing Search
This code requires an API key (which is like a password) to get access. I signed up for an account (and a $100 credit, student signup [here](https://azure.microsoft.com/en-us/free/students/https://azure.microsoft.com/en-us/free/students/)), and set up this access account under the *free* tier. There is documentation for the API [here](https://learn.microsoft.com/en-us/bing/search-apis/bing-web-search/quickstarts/rest/python) (in fact I copied some of their code!)

Here I just read in my key from a file (that you don't have!)

In [None]:
bingKey=alistair_keys.bing['key']

In [None]:
server = "https://api.bing.microsoft.com/v7.0/search"
search_term = "Alistair Wilson MQE"
headers = {"Ocp-Apim-Subscription-Key": bingKey} # Here I'm including my API key to get permission to use this resource
params = {"q": search_term, "textDecorations": True, "textFormat": "HTML" }
resp = requests.get(server,
                    headers=headers, params=params
                   )
search_results = resp.json()

In [None]:
print(resp.status_code) # This should be 200

In [None]:
i=1
for key in search_results: 
    print("key "+str(i)+": "+key)
    i=i+1

In [None]:
type(search_results["webPages"])

In [None]:
webResults=search_results["webPages"]
i=1
for key in webResults: 
    print("key "+str(i)+": "+key)
    i=i+1

In [None]:
webResults["totalEstimatedMatches"]

The main results from this query are in the `"value"` key

In [None]:
pd.DataFrame(webResults["value"])

Microsoft's help documentation also gave me the following snippet which uses IPython to display this

In [None]:
from IPython.display import HTML

rows = "\n".join(["""<tr>
                       <td><a href=\"{0}\">{1}</a></td>
                       <td>{2}</td>
                     </tr>""".format(v["url"], v["name"], v["snippet"])
                  for v in search_results["webPages"]["value"]])
HTML("<table>{0}</table>".format(rows))

___
## Canvas API example

This is another API where you need an access key. Again, I've stored it locally, where tehnically, with this one, you could change anything I have access to in Canvas!

As a student in Canvas I believe you can create your own API key in the settings menu.

In [None]:
f = open("../../.canvas.txt", "r")
# Open the file, read the key, strip the whitespace
canvasKey=f.read().rstrip()
f.close()

See [Canvas API reference](canvas.instructure.com/doc/api) for details on the set up here

In [None]:
server = "https://canvas.instructure.com/api/v1/courses"
headers = {"Authorization": "Bearer " + canvasKey}
params = {"textFormat": "HTML",'per_page': 25}
resp = requests.get(server,
                    headers=headers
                   )
if resp.status_code==200 : print("Good GET") 
print( type(resp.json() ) )
print( type(resp.json()[0])  ) 

This returns a list of entries, where the first entry in that list is a dictionary. 

Let's look at the keys in that dictionary! But to do this let's define a function to do it for us, so we can use this function repeatedly.

In [None]:
def keyPrintFunction(dictIn) : # Deine a function (everything within the function needs to be indented one level)
    # take as input a dict
    if type(dictIn) is dict: # this if line checks that it is a dict input
        typeList=[] # initialize a list
        for key in dictIn:
            # for each entry in the dictIn variable find its type
            typeList.append(type(dictIn[key]).__name__) 
            # The __name__ is just to make sure this variable is a string
        # Now outside of the loop, make a dataframe out of the typeList, where we set the index to the keys 
        dfX = pd.DataFrame( data={'dataType': typeList}, index=list(  dictIn.keys()   )   )
        return dfX # This tells the function to return the dataframe as the output
    else : # All the above was in the if loop, this else line just outputs a message 
        print("Input not a dict type")
        
# The function is ended whenever we're outside the first level of indentation
# So anything you write down here won't be part of the function


But when we run this, it gets captured by our Else...

In [None]:
keyPrintFunction(resp.json())

Because the JSON data is instead:

In [None]:
type(resp.json())

So let's apply out function to the first entry in our list:

In [None]:
keyPrintFunction(resp.json()[0])

In [None]:
# Let's convert the entire output into a data frame with pandas
courseList=pd.DataFrame( resp.json() )
# And now I'm asking it to display a subset of the columns in the list [ "name" , "id"]
courseList[   [ "name"  , "id"]    ] 

So our course is not in that list (it's on the next page)

In [None]:
# get the course id in Canvas for our course
mqe_course_id=139970000000187972

From here I simple switch to a different part of the API, where I query the enrollments in this particular course id

In [None]:
server = "https://canvas.instructure.com/api/v1/courses/"
headersIn = {"Authorization": "Bearer " +canvasKey}
# The request string i've assembled come from looking at the API documentation at Canvas
resp = requests.get(  server+"/"+str(mqe_course_id)+"/enrollments",
                    headers=headersIn
                   )
# Return the output as a panas dataframe
studentList = pd.DataFrame(resp.json())
if resp.status_code==200 : print("Good GET") 

In [None]:
print(  type(resp.json())  )
print(  type(resp.json()[0])  )

So the `resp.json()[0]` variable is a dictionary, so we can apply the function we wrote above to it to get the variable types:

In [None]:
len(resp.json())

In [None]:
keyPrintFunction(resp.json()[0])

Looking at the objects, there's actually another dictionary embedded within the `"user"` value of this dictionary, so we load that:

In [None]:
resp.json()[8]

In [None]:
keyPrintFunction(resp.json()[0]["user"])

And now I just load that information!

In [None]:
# initialize the list
studentList=[]
# for each record in the enrollments json responds
for record in resp.json():
    # print the short_name field
    print(record["user"]["short_name"])
    # append the short name to the studentList
    studentList.append(record["user"]["short_name"])

In [None]:
# import the random library
import random 
# use it to print a random student name
print(random.choice(studentList))

In [None]:
print(random.choice(studentList))

## Prebuilt Libraries for using APIs
---

Often, if an API is popular, others have made libraries to engage with it and get data. Here I just searched for "Canvas API Python"

In [None]:
# load in the library
# Import the Canvas class
from canvasapi import Canvas
import random # I need this library to Randomize later

canvasUrl='https://canvas.instructure.com'
# Initialize a new Canvas object
canvas = Canvas(canvasUrl, canvasKey)

In [None]:
type(canvas)

In [None]:
alistair=canvas.get_current_user()
logins=alistair.get_user_logins()
for login in logins:
    print(login)

In [None]:
courseList=alistair.get_courses()
pd.DataFrame([{'id': course.id,'created': course.created_at_date, 'name': course.name } for course in courseList if course.enrollments[0]['type']=='teacher'])

In [None]:
mqeCourse=canvas.get_course(139970000000187972)

In [None]:
studentUsers = mqeCourse.get_users(enrollment_type='student')

Here I just make a list of the students, and grab their picture if one is set in Canvas:

In [None]:
studentListIn=[[student.short_name, student.get_profile()['avatar_url'], student.email] for student in studentUsers]
studentList=[]
for student in studentListIn:
        if student[1]=="https://canvas.instructure.com/images/messages/avatar-50.png":
            studentList.append([student[0],"../../img/Avatar_"+ student[2] +".svg",student[2]])
        else:
            studentList.append(student)

Now I create a function to draw k random students from the list of students in the class, where I'm going to use Python to display this!

In [None]:
from IPython.display import display_markdown
def random_student(k):
    selStudents=random.sample(studentList,k)
    strOut="| # | Student | Img | \n | --- | --- | --- |\n"
    i=1
    for student in selStudents:
        strOut=strOut+"| "+ str(i) + "| **" + student[0] + "** | <img src="+student[1]+ " width='128' height='100'>\n"
        i=i+1
    return display_markdown(strOut , raw=True)

In [None]:
random_student(2)

___
## US Census

This one also requires an access token, but they'll email you one if you just ask for it!

You can do that [US Census website](https://www.census.gov/data/developers/data-sets.html), as well as see the more extensive documentation of their API.

In [None]:
f = open("../../census.txt", "r")
# Open the file, read the authorization key, strip the whitespace
censusKey=f.read().rstrip()
f.close()

In [None]:
server = "https://api.census.gov/data/2014/pep/natstprc"
paramsIn = {"get": "STNAME,POP","for" : "state:*", "DATE_" : 7, "key": censusKey}
resp = requests.get(server,
                    params=paramsIn
                   )
if resp.status_code==200 : print("Good GET") 
print( type(resp.json() ) )
print( type(resp.json()[0])  ) 

In [None]:
headColumn=resp.json()[0] # the first entry here is the column heads
dfState=pd.DataFrame(resp.json()[1:])
dfState.columns=headColumn
dfState

In [None]:
CensusStateDict=dfState[["state","STNAME"]].set_index("state").to_dict()["STNAME"]

In [None]:
CensusStateDict['05']

### 2010 Census data

In [None]:
# Get the variables JSON file for the Decennial census from 2010
vars=requests.get("https://api.census.gov/data/2010/dec/sf1/variables.json")
print(vars.json()['variables']["P012005"])
print(vars.json()['variables']["P012029"])
print(vars.json()['variables']["P012016"])
print(vars.json()['variables']["P012040"])
print(vars.json()['variables']["COUNTY"])

In [None]:
server = "https://api.census.gov/data/2010/dec/sf1"
paramsIn = {"get": "P012005,P012029,P012016,P012040","for" : "COUNTY:*", "key": censusKey}
resp = requests.get(server,
                    params=paramsIn
                   )
if resp.status_code==200 : print("Good GET") 
print( type(resp.json() ) )
print( type(resp.json()[0])  ) 

In [None]:
# the first entry here is the column heads but PCT012A017,PCT012A124 aren't informative
df=pd.DataFrame(resp.json()[1:])
df.columns=["male_10_to_14", "female_10_to_14","male_50_to_54", "female_50_to_54", "state","count"]
df

In [None]:
server = "https://api.census.gov/data/2010/dec/sf1"
paramsIn = {"get": "P012005,P012029,P012016,P012040","for" : "STATE:*", "key": censusKey}
resp = requests.get(server,
                    params=paramsIn
                   )
if resp.status_code==200 : print("Good GET") 
print( type(resp.json() ) )
print( type(resp.json()[0])  ) 

In [None]:
df=pd.DataFrame(resp.json()[1:])
df.columns=["male_10_to_14","female_10_to_14","male_50_to_54", "female_50_to_54","state"]
df

In [None]:
# Add the state names using the dictionary we defined above and the map() method
df["state_name"]=df.state.map(CensusStateDict) 

In [None]:
df

In [None]:
# Here I change the data from strings to integers for the numeric fields
df=df.astype({'male_10_to_14': 'int32','female_10_to_14':'int32','male_50_to_54': 'int32','female_50_to_54':'int32'})

In [None]:
df["diff_young"]=df.male_10_to_14-df.female_10_to_14
df["diff_old"]=df.male_50_to_54-df.female_50_to_54

In [None]:
# Overall rate Men at 10-14
100*(df["male_10_to_14"].sum() )/(df["male_10_to_14"].sum()+df["female_10_to_14"].sum() )

In [None]:
# Overall rate Men at 50-54
100*(df["male_50_to_54"].sum() )/(df["male_50_to_54"].sum()+df["female_50_to_54"].sum() )

In [None]:
df["sex_ratio_10_to_14"]=100*df["male_10_to_14"]/(df["male_10_to_14"]+df["female_10_to_14"] )
df["sex_ratio_50_to_54"]=100*df["male_50_to_54"]/(df["male_50_to_54"]+df["female_50_to_54"] )
df["drop"]=df["sex_ratio_10_to_14"]-df["sex_ratio_50_to_54"]

In [None]:
df[ ["state_name", "sex_ratio_10_to_14" ,"sex_ratio_50_to_54", "drop"] ].sort_values(by=['drop'])

### As a Package
---
Again, searching online, I quickly found a [python package](https://pypi.org/project/CensusData/) for engaging with this API, where they had several Jupyter notebooks in the `/docs`

In [None]:
%pip install censusdata

In [None]:
import censusdata

In [None]:
# Download ACS 2011-2015 5-year estimates for Pittsburgh city, Pennsylvania on population size, median age, and median household income.
censusdata.download('acs5', 2015,censusdata.censusgeo([('state', '42'), ('place', '61000')]), ['B01001_001E', 'B01002_001E', 'B19013_001E'], key=censusKey)

## Packages that interact with APIs
Many large companies will offer some type of API (though it may not be open to everyone). Searching around for these types of resources can be very useful!

In [None]:
from nba_api.stats.static import players
from nba_api.stats.endpoints import playergamelog, playerprofilev2, commonplayerinfo
from nba_api.stats.library.parameters import SeasonAll

In [None]:
player_list = players.get_active_players()

In [None]:
player_list[10]

In [None]:
bamAdebayo=players.find_players_by_full_name('Bam Adebayo')[0]
bamAdebayo

In [None]:
bamAdebayo['id']

In [None]:
bamPlayerInfo=commonplayerinfo.CommonPlayerInfo(player_id=bamAdebayo['id']).get_normalized_dict()
pd.DataFrame(bamPlayerInfo['CommonPlayerInfo'])

In [None]:
bamData=playergamelog.PlayerGameLog(player_id=bamAdebayo['id'],season=SeasonAll.all).get_data_frames()[0]
bamData

And so now I can calculate his career average stats:

In [None]:
bamData[["PTS","REB","AST"]].mean()

___
## Open AI API

In [None]:
import sys
sys.path.append('/home/alistair/.keys')
import alistair_keys
import os

In [None]:
from openai import OpenAI
client =OpenAI(api_key=alistair_keys.open_ai['key'])
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "user",
      "content": "Write a Python function that takes as input a file path to an image, loads the image into memory as a numpy array, then crops the rows and columns around the perimeter if they are darker than a threshold value. Use the mean value of rows and columns to decide if they should be marked for deletion."
    }
  ],
  temperature=0.7,
  max_tokens=64,
  top_p=1
)

In [None]:
response