# Pulling Data from APIs
As a Data Analyst, there are usually two scenarios in which you will have to work with APIs:

1. You need to analyse or build reporting on your organization's data that is only available via one or multiple APIs
2. You want to enrich your organization's data with external data to improve decision making and drive business value

Therefore, being able to connect to and query from APIs is an invaluable skill and will make you more flexible when it comes to working with different data sources. We know working with APIs can be overwhelming and in order to spare you unnecessary frustration, this notebook will provide you with a structured step-by-step approach to make working with APIs as easy and fun as possible.

In this notebook we will be using the OpenWeather API https://openweathermap.org/ with the goal of enriching our existing flights data with additional weather data.

Furthermore, you're going to

1. Learn about OpenWeather API's available data and limitations
2. Sign-up to the OpenWeather API and use your API key to make your first call to the OpenWeather API
3. Learn how to adjust your API calls to get the data you need
4. Learn how to access and extract data from your JSON
5. Learn how to flatten nested JSON data and transform it into a DataFrame for future analysis
6. Learn how to making multiple calls to the API with different parameters in an automated way


## Introduction - How to use OpenWeather API

<img src="images/OpenWeather_API_Logo.jpg" width="600">

APIs all work very similar in that they use standard methods to make it easy to request and use data. Apart from their functionality, APIs can differ in multiple ways such as the type, amount, level of granularity and accuracy of the data, the authentication method, how many calls can be send per second/minute/day/in total annd the rules on using and publishing the data.

Before you register an account, check out their homepage and documentation (https://openweathermap.org/api) and try to answer the questions below.

* Does the API have the data I need for my use case in terms of type, completeness, granularity and accuracy?
* Is it free or paid?
* How many API calls can I make per month?
* How many API calls can I make per second?

*// Please put your answers here*

### Exercises

Please go through the use cases below and answer the questions.

Use case #1: Your organization wants you to record the current weather for 100 locations every hour.

* Which API provides this data?
* Will you have to purchase a paid API plan?
* Which limition(s) could be problematic?

*// Please put your answers here*

Use case #2: Your boss wants you to retrieve weather data for the last 12 months for 10 of your company's store locations to find out if certain weather conditions had an impact on sales. The budget for you to get this data is 100$.

* Which API provides this data?
* How much will it cost you to get this data?
* Will the budget be sufficient?

*// Please put your answers here*

Use case #3: Your boss loves chocolate pudding. What she hates is rain. Unfortunately, she also hates you. One day your boss is in full rage mode, because it rained on her way back from the lunch break and she got wet. The only thing that can safe her day is her beloved chocolate pudding she put in the fridge in the morning. Unfortunately, you ate it, because as chance would have it, you hate your boss even more. Your boss orders you into her office and she is furious. She tells you the following: "If I ever get wet from rain again, I will fire you." Oh no, this is not what you had in mind. You can't get fired, you have to pay aliments for 4 children from 5 ex-wives every month. Fortunately, you are smart and have an idea: you want to write a little app that queries the weather forecast from the OpenWeather API and sends your boss a push notification whenever it is about to rain. In order to get the most recent and accurate forecast and to minimise the risk of getting fired you plan on querying the API every second. Because you're poor you want to use the free API plan.

* Which API provides the data you need?
* How accurate is the forecasted weather data?

*// Please put your answers here*

You start querying the data and after exactly 16min40sec you get the following error message:
{ "cod": 429,
"message": "Your account is temporary blocked due to exceeding of requests limitation of your subscription type. 
Please choose the proper subscription http://openweathermap.org/price"
}
* Why are you getting this error message?

*// Please put your answer here*

When looking at the data you have queried so far, you see the same data being returned multiple times. After investigating further, you find a pattern where data changes only after every 600 rows. 
* Why is that?

*// Please put your answer here*

Now that you know why you are getting the error message and why data changes only after every 600 rows, you reconsider your initial plan.
* Taking the above information into consideration, can you use the API you initially selected with a free plan to build the weather forecast app for your boss? Why / why not?

*// Please put your answers here*

In a parallel universe you decided to go with the free API plan to build your weather forecast app. Your app works as expected and your boss receives push notification warnings whenever it's about to rain. One week later you receive a letter of termination: you got fired. Reason: Your boss got wet in the rain. You are confused. You check your code and the app again. Everything works fine. You check the logs to see what happened the day your boss got wet in the rain. You find the API didn't return data for 5 minutes shortly before it started raining. You decide to sue OpenWeather API, you want them to pay damages because you didn't do anything wrong and it's their fault that you got fired.
* Why will you suffer a humiliating defeat before court?

*// Please put your answer here*

## Sign-up - Registering an account with OpenWeather API

Now that you've gotten familiar with what the OpenWeather API offers and what limitations to consider, you're going to register an account. Having an account will provide you with an API key, without it you will not be able to query the API.

1. Go to https://openweathermap.org/
2. Click on "Sign In" and "Create an Account"
3. Enter your credentials and create an account
4. Go to your emails and activate your account in the confirmation email

## Query Data - Making calls to the API
When you type www.google.de into your web browser and press enter, your browser will send a request to the server that is hosting google's website, asking to retrieve the contents of the website so it can display it to you. Getting data from APIs works in a similar way. A request is send to the address of the API, but instead of returning html and javascript files that will be interpreted and rendered by you browser, almost always a JSON file with data is returned. Also while you could use a web browser to send the request, it's not needed and more convenient to use a programming language like python to do it.

In this notebook we are going to use the requests library. It is one of the most downloaded Python packages today, pulling in around 14M downloads / week. You can find out more about it here: https://pypi.org/project/requests/

In [1]:
import requests

Next, as mentioned above the URL of the API has to be specified. In this example we are going to pull current weather data. The documentation for this API can be found here: https://openweathermap.org/current

The url shown below is what needs to be used in order to connect to the API.

We can see that the url has 
* a fixed part: http://api.openweathermap.org/data/2.5/weather?'
* and a variable part: 'q={city name},{state code},{country_code}&appid={API key}'

<img src="images/Current_Weather_API_Call.png" width="600">

Let's take the fixed part to define a url variable.

In [2]:
url = 'http://api.openweathermap.org/data/2.5/weather?'

While the fixed part stays constant, the variable part consists of query strings or parameters, some optional some mandatory, which can be used to select or filter data or define the format or unit of measurement of the response.

Below are the optional and required parameters when pulling current weather data by city name.

<img src="images/Current_Weather_API_Call_Parameters.png" width="600">

Current weather data can be pulled not only by city name but also by

* city ID
* geographic coordinates (latitude, longitude)
* ZIP code
* cities within a rectangular zone
* cities in circle

Depending on which of the listed methods will be used, the number and names of available parameters can change. Read the documention carefully to avoid running into error messages.

In the following example we will stick with pulling the data by city name.

Let's define two variables:

1. a location variable that we will use to set the "q" parameter of the API call
2. a unit variable that we will use to set the "units" parameter

**Note: When using more than one parameter, we need to add the "&" sign, just like we would do when checking multiple conditions in an IF clause for example (p1 AND p2 AND p3).**

In [3]:
location = "q=Hamburg"
unit = "&units=metric"

Next, we need to define a variable that includes our API key that will be used for the appid parameter. Why? Because we need to authenticate ourselves to the API when we send a request, so the API knows that we have a registered account and can notify us in case we reach our query limit.

To do this,

1. Go to you user profile and go to "My API keys" (https://home.openweathermap.org/api_keys)
2. Copy your API key and assign it to the variable api_key below 

In [4]:
api_key = "&appid=" #copy & paste your API key after the "&appid="

Now we're almost ready to make our first API call!

You might be wondering why we split the URL into multiple parts. The answer is, it makes it easy to change our desired output in the future. Imagine you're having multiple API calls in your script and you want to change the location parameter for all of them to the same variable at the same time. All we need to do is change one line of code and all API calls will be adjusted. This is much more convenient and less error prone compared to having to find and adjust each individual API call. This approach will also be helpful when we want to make multiple calls for f.e. different locations in one code block using loops, which we will be doing later.

The last step before we can finally query the API is to combine the multiple parts into one final URL. To do this, we can simply concatenate all strings into one. Check out the code below.

In [None]:
url_f = url + location + unit + api_key
print(url_f)

Now that we have the url, we can finally send our first API request. Since we want to get data, we need to send a GET request. This is done using the get() function from the requests function. The get() function will send a request and retrieve the response for us.

In [None]:
r = requests.get(url_f)
print(r)

Weirdly, when we print the response, all we see is a code 200 message and no data. 

Did we do something wrong? No! 

Code 200 actually means that our request was successfull. Check out https://en.wikipedia.org/wiki/List_of_HTTP_status_codes to see a list of reponse codes and their definition. Knowing what the different response codes mean will help you understand error messages and resolve issues faster in the future.

So where is the data? The response that the API sent back is actually an object containing multiple elements such as cookies, encoding, headers and content. Run the code blocks below and check out the different elements of the response.

In [None]:
print(r.headers)

In [None]:
print(r.cookies)

In [None]:
print(r.encoding)

In [None]:
print(r.content)

Found it! The content body holds the data. Since by default it's in JSON format, we can use the built-in JSON decoder to increase readability and prepare the data for future data manipulation.

In [None]:
weather_hh = r.json()
print(weather_hh)

This already looks better. But we can make it look even better! To do this, we are going to use the json python library and its functions to transform the raw data into a json object and print it with proper indentation.

In [None]:
import json
json_object = json.loads(r.content)
print(json.dumps(json_object, indent = 3))

Awesome! Now it's super easy to identify the different key-value pairs and assess the available information.

You might be saying: "Yeah, this is cool and all, but I'm not a weather expert, what do all these fields mean?" 

-> **Read the documention!**

Go to https://openweathermap.org/current#current_JSON and you will find a list of all fields and their definition that can be found in our JSON response.

As mentioned above, we can see the different key-value pairs inside the JSON output. You've probably heard the term key-value pairs before. Do you remember where? Dictionaries, exactly! If you need a quick refresh on what dictionaries are and what they do, check out the official python documentation here: https://docs.python.org/3/tutorial/datastructures.html#dictionaries.

Printing the type of our weather_hh variable, which is the API response decoded by the JSON decoder, we get the confirmation that we are working with a dictionary.

In [None]:
print(type(weather_hh))

We can use different techniques to access all or only specific parts of the data. Run the code blocks below to see some of the methods in action.

In [None]:
# Loop through and print all key-value pairs
for key, value in weather_hh.items():
    print(key + ':', value)

In [None]:
# Print specific key-value pairs
print(weather_hh['base'])
print(weather_hh['main'])
print(weather_hh['main']['temp'])

## Working with (nested) JSONs
Now that you've learned how to pull current weather data from the OpenWeather API and explore the output, it's time to convert it to a DataFrame. Having the data in a DataFrame allows us to perform data exploration, manipulation or visualization.

The DataFrame() function in the pandas package allows us to transform a dictionary into a DataFrame.
Run the code below to import the library and transform the current weather in Hamburg into a DataFrame.

In [None]:
import pandas as pd
print(pd.DataFrame(weather_hh)) # This should throw an error.

Weird, we're getting an error message. Do you have an idea what could be the issue?

Don't worry if you have no clue what the problem is. Let's look at our JSON output again. Execute the code below.

In [None]:
print(json.dumps(json_object, indent = 3))

We can see that some key-value pairs have further key-value pairs as values. The key "weather" for example has four key value pairs nested within: "id", "main", "description", "icon". The DataFrame() function expects dictionaries with only one level of key-value pairs, but since we have nested pairs, it throws an error.

There are multiple solutions to this problem and the one that will be the most useful to you depends on what parts of the data you ultimately need for your analysis. We need all data in our output and therefore the method we are going to use is to flatten the JSON data using the json_normalize() function from the pandas package. It normalizes our JSON data into a flat table and transforms it into a DataFrame. Execute the code below and check the output.

In [None]:
print(type(pd.json_normalize(weather_hh, sep="_")))
weather_hh_norm = pd.json_normalize(weather_hh, sep="_")
print(weather_hh_norm)

Awesome! Now we have a DaraFrame with 1 row and 24 columns. But wait, for some reason the "weather" column still contains data in JSON format. This is because the nested key-value pairs are in a list. Probably because sometimes the weather column can be nested further. In order to resolve this we need to use the function's advanced parameters.

* record_path = specify the key that is nested deeper
* meta = specify the structure of the remaining JSON
* record_prefix = adds a prefix to the column names in the record_path to avoid duplicate column names

Execute the code below and compare to the output above and spot the differences.

In [None]:
weather_hh_df = pd.json_normalize(weather_hh, 
                                  sep="_", 
                                  record_path="weather", 
                                  meta=[["coord", "lon"], 
                                        ["coord", "lat"], 
                                        "base",
                                        ["main", "temp"],
                                        ["main", "feels_like"],
                                        ["main", "temp_min"],
                                        ["main", "temp_max"], 
                                        ["main", "pressure"], 
                                        ["main", "humidity"], 
                                        "visibility", 
                                        ["wind", "speed"], 
                                        ["wind", "deg"], 
                                        ["clouds", "all"], 
                                        "dt", 
                                        ["sys", "type"], 
                                        ["sys", "id"],
                                        ["sys", "country"],
                                        ["sys", "sunrise"],
                                        ["sys", "sunset"],
                                        "timezone",
                                        "id",
                                        "name",
                                        "cod"], 
                                  record_prefix="weather_"))

You're probably thinking: "Wow, this requires a lot of manual work typing all those keys in the paramaters". And yes, you're right. Unfortunately this is one of the easier methods to solve the issue. Also, it's not perfect and comes with its own limitations. If you have data that is nested even deeper you often have no other choice but to write your own flattening function. Fortunately there are other python specialists out there who have done so and shared their work here: 

* https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
* https://stackoverflow.com/questions/52795561/flattening-nested-json-in-pandas-data-frame
* https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e


Alright, now that the data is in the right format, we don't have to worry about nested JSONs anymore. To make our data more interesting, right now we only have one row, let's pull current weather data from more cities and create a proper table with multiple rows. In order to do that, we're going to combine, extend and apply everything we've learnt so far:

* Define the parameters in the request URL
* Send a request and retrieve the response using the get() function
* Decode the response using the JSON decoder .json()
* Flatten the JSON file and transform it into a DataFrame

The last step will be to combine all dataframes into one final table.

This time we want to get the current weather data of multiple locations. Unfortunately the location parameter "q" only takes one location at a time. Using a for-loop we should be able to make multiple API calls while iterating through several locations.

First, let's collect all the pieces we need below and

1. Change the location variable to locations and have to include a list with multiple locations
2. Define an empty DataFrame variable
3. Add a for-loop that iterates through all locations, flattens the JSON output and appends it to the DataFrame

**IMPORTANT: Don't run the code below to often per minute, since you only have 60 API calls per minute you can quickly run into a temporary query limit!**

In [None]:
url = 'http://api.openweathermap.org/data/2.5/weather?'
#location = "q=Hamburg" #not needed anymore
locations = ["q=Hamburg", 
             "q=Berlin", 
             "q=London", 
             "q=Madrid",
             "q=New York",
             "q=Moscow",
             "q=New York",
             "q=Ankara",
             "q=Baghdad",
             "q=Kabul",
             "q=Tokio",
             "q=Taipei",
             "q=Manila",
             "q=Auckland",
             "q=Seoul"]
unit = "&units=metric"
api_key = "&appid=" #Add your API key here
weather_df = pd.DataFrame([]) #Empty DataFrame, used to append each location's weather data

for location in locations:
    url_f = url + location + unit + api_key
    r = requests.get(url_f)
    # time.sleep(1) #uncomment if you run into a query limit
    weather_temp = r.json()
    weather_temp_df = pd.json_normalize(weather_temp, 
                                        sep="_", 
                                        record_path="weather", 
                                        meta=[["coord", "lon"], 
                                              ["coord", "lat"], 
                                              "base",
                                              ["main", "temp"],
                                              ["main", "feels_like"],
                                              ["main", "temp_min"],
                                              ["main", "temp_max"], 
                                              ["main", "pressure"], 
                                              ["main", "humidity"], 
                                              "visibility", 
                                              ["wind", "speed"], 
                                              ["wind", "deg"], 
                                              ["clouds", "all"], 
                                              "dt", 
                                              ["sys", "type"], 
                                              ["sys", "id"],
                                              ["sys", "country"],
                                              ["sys", "sunrise"],
                                              ["sys", "sunset"],
                                              "timezone",
                                              "id",
                                              "name",
                                              "cod"], 
                                        record_prefix="weather_")
    weather_df = weather_df.append(weather_temp_df, ignore_index=True)

print(weather_df)

Fantastic! Now we have a DataFrame with current weather information for 15 locations. What's even better, we didn't have to write a lot of complicated code. How cool is that!?