In [None]:
# Repeat from last week - Nested Dictionaries
import pprint # pprint is the Pretty Print package - it prints out complex data like nested dictionaries into something readable!

# Don't worry! You'll almost never have to craft your own dictionaries from scratch.
# This is a sample of data I got from the FAA's database.
dca_airport_data = {'IATA': 'DCA', 'ICAO': 'KDCA', 'city': 'Washington', 'delay': 'false', 'name': 'Ronald Reagan Washington National',
 'state': 'District of Columbia', 'status': {'avgDelay': '','closureBegin': '','closureEnd': '',
'endTime': '','maxDelay': '','minDelay': '','reason': 'No known delays for this airport.','trend': '',
'type': ''},'weather': {'meta': {'credit': "NOAA's National Weather Service",'updated': '8:52 PM Local','url': 'http://weather.gov/'},
'temp': '77.0 F (25.0 C)','visibility': 10.0,'weather': 'Mostly Cloudy','wind': 'North at 5.8mph'}}

print("Here is the regular print nested dictionary:\n")
print(dca_airport_data) # Normal print looks like a total mess!

print("\nHere is the pretty print nested dictionary:\n")
pprint.pprint(dca_airport_data) # Pretty Print does a much better job!

In [None]:
print(dca_airport_data['weather']['temp'])

In [None]:
## List Comprehensions - One-Line For Loops - OPTIONAL PART
# Before we get into cool stuff like webpages, we have to learn one more semi-boring programming concept
# The List Comprehension is a special way of making a list from a for loop with super-little code
# They're not REQUIRED to know, but they can be really useful to make your code shorter and more readable
# I've seen powerful list comprehensions replace as many as 10 lines of code with a single line!
# In the long term, these will save you countless hours of work
### Example 1: Using list comprehensions with lists

generic_list = ["Alpha","Beta","Gamma","Delta","Epsilon"]

# my_regular_old_list below will be identical to generic_list in every way
# Note that element doesn't exist prior to using it here
# just like you'd write "for element in generic_list:", element is a new variable
# Thus, in its absolute most basic form, list comprehensions don't do any kind of change to lists at all
my_regular_old_list = [len(element) for element in generic_list]
print(my_regular_old_list)

In [None]:
# The second list comprehension is equal to:
another_list = []
for element in new_list:
    if(element != "Beta"):
        another_list.append(element)
print(another_list)
# 5 lines of code in 2 lines. And this is a very basic usage.

In [None]:
## Turn a string into a list of letters
print([letter/100 for letter in "Hello, World!"])

In [None]:
## One of the most powerful features of list comprehensions is to modify every element of a list a certain way
## Add an exclamation point to every letter (and obviously cut the string up into a list of letter-! pairs)
print([letter + "!" for letter in "Hello, World!"])

In [None]:
# A fancy technique used alongside list comprehensions on strings is the string object's .join() method
# .join combines a list of strings together into a single string.
# If we modify the previous list comprehension...
# Since join is a method, you have to actually call it on a string. In this case, I called it on string that
# includes just a single space. What this means is that in-between every letter-! pair, there will be a space
print(" ".join([letter+"!" for letter in "Hello, World!"]))

# If we do the same exact thing without a space in the string (i.e. an emptry string), what will it print?
print("".join([letter+"!" for letter in "Hello, World!"]))

In [None]:
## A multiplication table for the 9's
print([9*num for num in range(0,13)])


In [None]:
## Make a list of letters in a string if they're not 'o'")

# If you're using ONLY IF, it goes at the very end!
print([letter for letter in "Hello, World!" if letter != 'o'])

# In a rarity, Python has a little awkward syntax here - if you are using IF AND ELSE, you have to have it BEFORE
# the FOR statement. It looks awkward, but oh well.
print([letter if letter != 'o' else "?" for letter in "Hello, World!"])

In [None]:
## OPTIONAL PART END
# OK, OK, I get it! Enough list comprehension stuff, how do I suck up websites?
# The skillset we are going to focus on from now on is called "web scraping"
# The general rule of thumb is that anything you can do online via clicking a mouse and typing in letters
# on your keyboard, you can do super-fast and repetitively with Python.

# Basic website data-grabbing
import requests

# The requests library is the most important for grabbing website data. Just call the .get() function in the requests
# library and enter the hyperlink as the parameter. It returns a Response object, which has a number of methods which
# give you access to all the critical information you'd want from a website
my_response_object = requests.get("http://mason.gmu.edu/~jlee17/python_workshop_files/example_data/index-very-simple.html")

# Let's print out what kind of object we're getting:
print("The type of the object is: " + str(type(my_response_object)))

# Two new concepts below: assert and status_code. Notice that status_code doesn't have () at the end! All this means is
# that rather than status_code being a method, it's just an attribute (i.e. a variable stored inside an object)
# As for what a status code is, all HTTP requests (that is, anytime you go to a webpage), the webpage sends you back
# a numeric status code. It is uniform across the internet! A status code of 200 means OK. Any other number
# (and there are dozens of other status codes) means that there's some sort of problem.
# You've probably seen them before in your web browsing: 404 Not Found, 500 Internal Server Error, and
# 403 Authorized Access Only are three of the most common ones.

# The assert statement is basically an error-checking capability of Python. You give assert statements conditions that
# are either True or False. If the assert statement is false, it will automatically throw up an Exception (i.e. an error)
# and stop further processing.
# Imagine you have a list of 50 websites that you're downloading. You certainly might want to know if one of them didn't
# come back properly and stop further processing. Try changing the == to != to see what happens when the assert statement
# fails
if (my_response_object.status_code != 200):
    print("Oh no, something went wrong!")

# Finally, there are two other attributes stored inside the Response object that are particularly important: Text and Content
# The difference between them is similar to the difference between "w" mode and "wb" mode when reading/writing files
# .text stores the actual text, whereas .content stores that same information in raw computer code (bytes).
# Because the website I went to is a regular old website, it's just made up of text. Thus, I want the .text attribute
# However, if you were doing requests.get() on a PDF file (see below), you'd want the .content attribute

print(my_response_object.text)
my_text = my_response_object.text
# To prove to yourself that the text is the literal real deal, copy-paste the hyperlink above into your browser. 
# When you get there, right click and there should be an option that looks like "View Source". 
# Click it, and you will see a new page identical to what is printed below.

# BUT WAIT! How do we extract the critical stuff we want FROM the text? Don't worry, that comes next week :)

In [None]:
# Downloading a PDF (or really any file)
# As I said, it's really easy to download a PDF file with Python
import requests

# Get the data, which Python interprets as a RESPONSE object, from the internet and put it into a variable.
# You can see for yourself it's a regular old PDF file, copy paste it into your browser!
response = requests.get("http://ku.ac.bd/wp-content/uploads/2016/05/pdf-sample.pdf")

# Make sure it got here OK and produce a Big Fat Error if it didn't
assert response.status_code == 200

# Open up a new file and prepare for download. Note the "wb" instead of "w" - this means Write in Byte mode.
# Without the "b", it is in text-writing mode, which will refuse to write BYTES
file = open("my_downloaded_file.pdf", "wb")

# Finally, use the file's write() method to write the response's content attribute.
# NOTE: Content vs. text!! Text is for regular webpages. Content is for non-text files
file.write(response.content)

# Close the file
file.close()
# Tell the user it's done
print("Completed")

In [None]:
my_list = ["MIA","DCA","BWI","IAD","RSW"]
my_airport_dictionaries = []
import pprint

for my_element in my_list:
    url = 'http://services.faa.gov/airport/status/' + my_element + '?format=application/json'
    response = requests.get(url)
    assert response.status_code == 200
    dict_file = response.json()
    my_airport_dictionaries.append(dict_file)
    pprint.pprint(dict_file)

In [None]:
# That nested dictionary I keep showing you... How did I get it in the first place?
# By accessing the FAA's Database with Python!
import requests
import pprint
# The Big Word: API - Application Programming Interface
# All API means is "we have configured our data so that you can access it super-easily with a programming language"

# Look below: you start out with a normal URL, but look what's at the end! 
# A weird little question mark and stuff is after it
# This means you're sending a variable TO the website. In this case, we're sending a format variable
# and it's equal to application/json. Note that for sending variables inside URLs, you do NOT use quotes!
# You can find the documention (i.e. How I figured out that 'format' was the variable I needed to set)
# on the FAA's official API website.
url = 'http://services.faa.gov/airport/status/BWI?format=application/json'
# Lets do requests.get on it!
response = requests.get(url)

# Here is the critical line! Oftentimes APIs will send you data in a special web format called JSON.
# JSONs are cousins of Python dictionaries, and Python can easily convert them for you out of the Response object
dict_file = response.json()

# Let's print it!
pprint.pprint(dict_file)

In [None]:
# There is one improvement we can make, however - it is possible to require multiple variables sometimes.
# And while you can add multiple variables with the & sign (That is, www.somewebsite.com/stuff?var_one=blah&var2=derp) 
# it's really annoying to write it that way. Imagine if you had to set 8 variables???
# Thankfully, we have the params parameter to the rescue!
# Rather than stick the variables on the end, you just write out the core website
# (that is, everything BEFORE the "?"). All the variables, you put into a dictionary and attach it to the GET function
# Let's see how this works

import requests
import pprint

# Configure basic URL
url = 'http://services.faa.gov/airport/status/MIA'

# Make new dictionary
payload = {}
# Set dictionary key format (before the "=") to dictionary value application/json (after the "=")
# Rinse and repeat this as necessary
payload['format'] = 'application/json'

# Finally, just write a requests.get with the params=your_dictionary argument at the end
response = requests.get(url, params = payload)
dict_file = response.json()
pprint.pprint(dict_file)

In [None]:
# Sadly, government isn't always super-competant, and the JSONs can be hellishly complex to understand at times
# Try running this cell and prepare to scream

import requests
import pprint

url = 'https://data.cdc.gov/api/views/rqg5-mkef/rows.json'
payload = {}
payload['accessType'] = "DOWNLOAD"

response = requests.get(url, params=payload)
# Note that writing out 'https://data.cdc.gov/api/views/rqg5-mkef/rows.json?accessType=DOWNLOAD' would also work
# But using payload is usually easier to read and modify
dict_file = response.json()
pprint.pprint(dict_file)

In [None]:
import unicodecsv as csv
import pprint
columns = dict_file['meta']['view']['columns']
#pprint.pprint(columns)
my_columns = [element['fieldName'] for element in columns]
print(my_columns)

In [None]:
len(dict_file['data'][0])

In [None]:
# Nevertheless, with sufficient effort, you can still parse your way through it...
# I didn't document this section on purpose, try to understand what I'm doing?
import unicodecsv as csv

columns = dict_file['meta']['view']['columns']
headers = [element['fieldName'] for element in columns]
print(headers)
csv_list = []
csv_list.append(headers)

for row in dict_file['data']:
    csv_list.append(row)

current_file = open("newcsv.csv","wb")
csv_file = csv.writer(current_file)

for row in csv_list:
    csv_file.writerow(row)
current_file.close()
print("Done!")

In [None]:
# The geocoder library is really quite useful! It's an example of a SOAP API, which we'll discuss in the workshop
import geocoder
g = geocoder.google('Mountain View, CA')
print(g.latlng)
g = geocoder.reverse(g.latlng)
print(g)

In [None]:
import requests
import pprint

base_url = "https://api.nytimes.com/svc/search/v2/articlesearch.json"
my_p = {}
my_p['q'] = 'Bill Clinton\'s Son'
my_p['api-key'] = '3e1c23485f254bd2a5fb02796ce16d42'
response = requests.get(base_url,params=my_p)
assert response.status_code == 200
dictionary = response.json()
pprint.pprint(dictionary)

In [None]:
# Get the FAA data, flatten it, and then write it out to a file
# Combining File Writing PLUS grabbing internet data PLUS basic database access

import unicodecsv as csv
import requests
import pprint

import flatdict
# The Big Word: API - Application Programming Interface

url = 'http://services.faa.gov/airport/status/MIA'

# Configure file name and payload
payload = {}
payload['format'] = 'application/json'
file_name = 'FAA_Data.csv'

# Send out a GET request to the server, making sure to set the params parameter (if applicable)
resp = requests.get(url, params=payload)
assert resp.status_code == 200

# Convert the response into a JSON object - represented in Python as a nested dictionary
# JSON is a standard option for recieving complex, nested data in Python. However, it only works on prpoperly structured data!
json_object = resp.json()
# We can print either the whole thing...
pprint.pprint(json_object)

# Before we write it, flatdict is a really useful library to download. We use it to flatten our dictionary!
flat_json = flatdict.FlatDict(json_object)

# Define the header row and the value row. We're only gonna have two rows, since we're only looking at one airport
data_key_row = flat_json.keys()
data_value_row = flat_json.values()

# Finally, write our information to a CSV file
opened_file = open(file_name,'wb')
my_csv_file = csv.writer(opened_file)
my_csv_file.writerow(data_key_row)
my_csv_file.writerow(data_value_row)
opened_file.close()
print("I am complete!")

In [None]:
# NEW ENTRY - Air Quality API. Let's practice with multiple payload variables
# Check out https://docs.openaq.org/
import requests
import pprint

my_payload = {}
my_payload['country'] = 'US'
my_payload['limit'] = 100
my_payload['page'] = 7
response = requests.get("https://api.openaq.org/v1/cities", params=my_payload)
assert response.status_code == 200
my_dict = response.json()
pprint.pprint(my_dict)

In [None]:
# Now, lets check out DC area air quality!

my_payload = {}
my_payload['city'] = 'Washington-Arlington-Alexandria'
response = requests.get("https://api.openaq.org/v1/latest", params=my_payload)
assert response.status_code == 200
my_dict = response.json()
pprint.pprint(my_dict)

In [None]:
# USEFUL CONCEPT: Custom Sorting

# We should remember Python's basic sorting capabilities from earlier
my_list = ["ABC","7","ZBV","TElephone","1BVX"]
my_list.sort()
print(my_list)

In [None]:
# But now, let's talk about custom sorting
# Excel can do somewhat complicated sorting, but for truly intricate sorting, nothing beats a programming language
# There are two ways of custom sorting in Python - What I will "unofficially" refer to as 
# "Sort by Return" and "Sort by Comparison"
# First, here is Sort by Return

# Get the results value from the dictionary we made in the previous cell
results = my_dict['results']

# Cretae a function with a single input argument. This argument will refer to a given element in the list you're sorting
def custom_sorting_function(element):
    return element['count']

# Finally, the magic - the sort() method has an optional parameter key, which lets you override its default sorting
# capabilities.
# However, this wll appear strange: You DON'T actually include the () at the end of the function!
# This is because we're not actually *calling* custom_sorting_function() ourselves. Rather,
# we're telling Python "please temporarily overwrite YOUR method of sorting with MY CUSTOM ONE" for this case only.
results.sort(key=custom_sorting_function)

# Now let's print what we've done
pprint.pprint(results)

In [None]:
# Functools is a useful little library which covers "functions that act on or return other functions"
# To broadly oversimplify, we're using it to explain to Python that "when I give you a function to use
# with the key parameter, I don't want to you to the standard 'Sort by Return', but instead I want you to
# do a 'Sort by Comparison'"
import functools

# Note that since sort() is an inplace method, we actually need to get our raw unsorted results again from the dictionary!
results = my_dict['results']

# This time, our function looks quite different! It takes in two parameters - any sorting algorithm will be a multitude
# of one-to-one comparisons. In this function, we're explaining to Python how it should determine which value is "greater"
# Thus, in this function we need to create a generalizable rule such that no matter which element in the list is 'a' and
# which element is 'b', it will also return the proper value. 
# - If it returns -1, it means that the 'a' value is smaller
# - If it returns 1, it means that the 'a' value is greater
# - If it returns 0, it means that the values are considered equal to one another
def comparator_function(a,b):
    if a['count'] < b['count']:
        return -1
    elif a['count'] > b['count']:
        return 1
    else:
        return 0
# Finally, remember what I noted above about functools - we're using the cmp_to_key function to essentially tell Python
# that we're using the 'Sort by Comparison" method not the 'Sort by Return' method we used above'
results.sort(key=functools.cmp_to_key(comparator_function))

# And, let's print!
pprint.pprint(results)

In [None]:
"""
In-Class Assignment: YOU figure out the documentation!
You have become a scholar studying Pokemon! You need to use the PokeAPI to create a spreadsheet
containing critical Poke-information.

Core Functionality:
(1) Visit the Poke-API's primary documentation page at http://pokeapi.co 
and study their documentation
(2) Construct requests.get() functions to get the information from the 
FIRST 151 POKEMON (HINT: FOR loop)
(3) Inside a spreadsheet, each pokemon should be represented by a row. 
GOTTA CATCH'EM ALL!!!
hyperlink = http://pokeapi.co/api/v2/pokemon/

Hint:
for number in range(1,152):
    current_hyperlink = ...

Hint 2:
Note that the base stats are inside of a list - 
A dictionary inside a list inside another dictionary. IE:
my_dict['stats'][0]['base_stat']

Hint 3:
Pokemon's name can be found at:
my_dict['name']

Hint 4:
Don't forget to use pretty print (pprint.pprint) when trying to parse through
the nested dictionary JSON. It should make it a little more readable.

Concentrate on the "{" and the "["!

In each row, insert the following information from the response JSON:
- Pokemon Name
- Pokemon Weight
- Pokemon Base Speed Stat
- Pokemon Base Defense Stat
- Pokemon Base Attack Stat
- Pokemon Base HP Stat
- Pokemon Type(s) - you only need the first one if there are multiple
(4) Write the spreadsheet to a file and close the file.
(HINT: You DON'T need to worry about a Pokemon's "attacks/moves" - you can skip a LOT in the middle)

Advanced Functionality:
(1) Before you send them to be outputted, sort the pokemon in PYTHON via their Base Speed Stat - The lower the
Base Speed Stat, the higher up in the spreadsheet the Pokemon should be!
"""