<a href="https://colab.research.google.com/github/michalis0/Business-Intelligence-and-Analytics/blob/master/labs/03%20-%20Pandas%20and%20Python/walkthroughs/API_walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center"> API Walkthrough - Lab 3</h1>

<div>
<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Logo_Universit%C3%A9_de_Lausanne.svg/2000px-Logo_Universit%C3%A9_de_Lausanne.svg.png" style="padding-right:10px;width:240px;float:left"/></td>
<h2 style="white-space: nowrap">Business Intelligence and Analytics</h2></td>
<hr style="clear:both">
<p style="font-size:0.85em; margin:2px; text-align:justify">

</div>

# Introduction

>This exercise aims at illustrating the practical use of API requests to obtain data from the Internet. During this exercise, we will show you how to obtain API data from various websites in `json` format. Then, you will use a small program that allows you to retrieve the data in Colab. From this point onwards, the data will be accessible for you to work on.

> We are going to use **2 examples**: one where you retrieve IMDb data about your **favorite movie** and another where you retrieve **current weather** data from any city in the world.

_Note_: data treatment will be developed in the following labs: "Pandas and Python" and "Data cleaning", respectively the 2nd and 3rd labs. This exercise only shows you the process of obtaining _real time_ online data from an API.

_Reminder_: data obtained in such a way is called _external_ data.




# 1. Data from your favorite movie

In this example, you will understand how to extract data concerning the movie of your choice.

First, you will have to make a request against an API to obtain the data.

Second, you will translate this data in Colab.

## Making a OMDb API request

**Let us take a look at the [OMDb API](https://www.omdbapi.com/)**

> You can see that you will be able to make a HTTP request with this URL: ```http://www.omdbapi.com/?apikey=[yourkey]&```

For this, you will need a **key** (see theory) for the API to control your requests. For OMDb API, free requests are limited to 1'000 per day.

> You can get your own key [here](https://www.omdbapi.com/apikey.aspx) by inputing your email address.

> You can also use the key I generated : `6be81d95`.

You will then need to apply **filters** to your HTTP request for the API to recognize what you are looking for. Those will be appended to the end of your URL.

For example, say I want to search for the movie "The Big Short" with a full plot.
> The filters would be the following: `t=The+Big+Short&plot=full`. You can see that `t` indicates the title of the movie (for this API, it is mandatory) and `plot` has an option `full`.

> The filters are always separed by a `&` sign.

Therefore, the URL of my request would be the following:

```https://www.omdbapi.com/?apikey=6be81d95&t=The+Big+Short&plot=full```

**Try inputing that URL in your browser and see what happens!**


### Result

If everything goes well, you get a webpage in `json` format !

You might recognize the **semi-structured** data type from your theory class.

>However, all of this is still happening in your web browser, and you cannot process this data with any program of your choice.

## Loading the data in Colab

More on this process can be found [here](https://colab.research.google.com/github/nestauk/im-tutorials/blob/3-ysi-tutorial/notebooks/APIs/API_tutorial.ipynb). This exercise is largely inspired by the official Colab tutorial.

In [1]:
import requests  # Import the requests library

# Query URL
url = ('https://www.omdbapi.com/?apikey=6be81d95&t=The+Big+Short&plot=full')  # Movie 'The Big Short' with a full plot

print(url)

response = requests.get(url)  # Make a GET request to the URL.

# Print status code (and associated text).
print(f"Request returned {response.status_code} : '{response.reason}'")

# Note: status 200 means the request was successful. Search online for documentation about request statuses.

# Print data returned (parsing as JSON)
payload = response.json()  # Parse `response.text` into JSON

import pprint # This library makes the result look good.

pp = pprint.PrettyPrinter(indent=1)
pp.pprint(payload)

https://www.omdbapi.com/?apikey=6be81d95&t=The+Big+Short&plot=full
Request returned 200 : 'OK'
{'Actors': 'Christian Bale, Steve Carell, Ryan Gosling',
 'Awards': 'Won 1 Oscar. 37 wins & 81 nominations total',
 'BoxOffice': '$70,259,870',
 'Country': 'United States',
 'DVD': 'N/A',
 'Director': 'Adam McKay',
 'Genre': 'Biography, Comedy, Drama',
 'Language': 'English',
 'Metascore': '81',
 'Plot': 'Three separate but parallel stories of the U.S mortgage housing '
         'crisis of 2005 are told. Michael Burry, an eccentric ex-physician '
         'turned one-eyed Scion Capital hedge fund manager, has traded '
         'traditional office attire for shorts, bare feet and a Supercuts '
         'haircut. He believes that the US housing market is built on a bubble '
         'that will burst within the next few years. Autonomy within the '
         'company allows Burry to do largely as he pleases, so Burry proceeds '
         'to bet against the housing market with the banks, who are m

**You did it!** You can now process the data obtained in the movie description

In [2]:
# Show the actors that played in the movie

payload['Actors']

'Christian Bale, Steve Carell, Ryan Gosling'

In [3]:
# Show the plot of the movie

plot = payload['Plot']

print(plot)

Three separate but parallel stories of the U.S mortgage housing crisis of 2005 are told. Michael Burry, an eccentric ex-physician turned one-eyed Scion Capital hedge fund manager, has traded traditional office attire for shorts, bare feet and a Supercuts haircut. He believes that the US housing market is built on a bubble that will burst within the next few years. Autonomy within the company allows Burry to do largely as he pleases, so Burry proceeds to bet against the housing market with the banks, who are more than happy to accept his proposal for something that has never happened in American history. The banks believe that Burry is a crackpot and therefore are confident in that they will win the deal. Jared Vennett with Deutschebank gets wind of what Burry is doing and, as an investor believes he too can cash in on Burry's beliefs. An errant telephone call to FrontPoint Partners gets this information into the hands of Mark Baum, an idealist who is fed up with the corruption in the f

In [4]:
# Count the number of words in the plot
print(type(plot))

plot_length = plot.split() # Use the split method on a string

print(len(plot_length)) # Count the length of the list containing all words

<class 'str'>
364


## Your turn !

Here, you will be able to search for your own favorite movie and retrieve the corresponding IMDb data.

In [5]:
your_url = ('Your url here')

# You will need the key and the name of the movie from omdbapi.com (see the first part of this example)

In [None]:
import requests

print(your_url) # The URL of your choice.

response = requests.get(your_url)  # Make a GET request to the URL.

# Print status code (and associated text).
print(f"Request returned {response.status_code} : '{response.reason}'")


# Print data returned (parsing as JSON)
your_movie = response.json()

import pprint
pp = pprint.PrettyPrinter(indent=1)
pp.pprint(your_movie)

In [None]:
# Find the ratings of your movie!

your_movie['Your code here']

In [None]:
# Find the writer of your movie!

'Your code here'

# 2. Current weather data

In this example, you will see how to obtain real-time weather data on the city of your choice.

> We will use [OpenWeather API](https://openweathermap.org/api). You can find documentation there.

The process is relatively similar to the first example, and hopefully by now you will have understood the intuition. This will therefore be a more concise illustration.

## Current weather in Lausanne

We have already generated an API key: `APPID=c3b012a09cda741d210d8739c796c3da`

The only parameters inputted here are the name of the city and the country code: `Lausanne,ch`.

You could also input the latitude and longitude according to [the documentation](https://openweathermap.org/current).

> The OpenWeather API is going to return the following `JSON`formatted data.

In [5]:
url = ('https://api.openweathermap.org/data/2.5/weather?q=Lausanne,ch&APPID=c3b012a09cda741d210d8739c796c3da') # City of Lausanne, CH

We automated the input of the city and the country code for you:

**Run the following code and input the corresponding request. Then press enter.**

In [6]:
url_city = input("Enter the city: ")

Enter the city: Lausanne


In [7]:
url_city = url_city.lower().capitalize()

url_country = input("Enter the country code:")

Enter the country code:ch


In [8]:
url_country = url_country.lower()
url_data = url_city + ',' + url_country

url = ('https://api.openweathermap.org/data/2.5/weather?q={}&APPID=c3b012a09cda741d210d8739c796c3da').format(url_data)

We can now repeat the same operation. We fetch the data from the API using the URL with the corresponding filters you just inputted.

In [9]:
import requests  # Import the requests library


print(url)

response = requests.get(url)  # Make a GET request to the URL.

# Print status code (and associated text).
print(f"Request returned {response.status_code} : '{response.reason}'")

# Note: status 200 means the request was successful. Search online for documentation about request statuses.

# Print data returned (parsing as JSON)
payload = response.json()  # Parse `response.text` into JSON

import pprint # This library makes the result look good.

pp = pprint.PrettyPrinter(indent=1)
pp.pprint(payload)

https://api.openweathermap.org/data/2.5/weather?q=Lausanne,ch&APPID=c3b012a09cda741d210d8739c796c3da
Request returned 200 : 'OK'
{'base': 'stations',
 'clouds': {'all': 0},
 'cod': 200,
 'coord': {'lat': 46.516, 'lon': 6.6328},
 'dt': 1738705843,
 'id': 2659994,
 'main': {'feels_like': 273.3,
          'grnd_level': 973,
          'humidity': 94,
          'pressure': 1033,
          'sea_level': 1033,
          'temp': 273.3,
          'temp_max': 276.13,
          'temp_min': 272.93},
 'name': 'Lausanne',
 'sys': {'country': 'CH',
         'id': 2036162,
         'sunrise': 1738652001,
         'sunset': 1738687285,
         'type': 2},
 'timezone': 3600,
 'visibility': 10000,
 'weather': [{'description': 'clear sky',
              'icon': '01n',
              'id': 800,
              'main': 'Clear'}],
 'wind': {'deg': 327, 'gust': 1.42, 'speed': 0.4}}


_Note_: You can play with this request by changing the URL with the city of your choice.

In [10]:
# What is the temperature?
print(type(payload['main']))

main_info = payload['main']
main_info

<class 'dict'>


{'temp': 273.3,
 'feels_like': 273.3,
 'temp_min': 272.93,
 'temp_max': 276.13,
 'pressure': 1033,
 'humidity': 94,
 'sea_level': 1033,
 'grnd_level': 973}

In [11]:
# The previous cell is a dictionary. You can directly obtain the temperature:
payload['main']['temp']

273.3

In [12]:
# You can also translate this data into a pandas DataFrame (cf. pandas walkthrough)
import pandas as pd

name = payload['name']
main_index = []
main_index.append(name)

df_main = pd.DataFrame(main_info, index=main_index)
df_main

Unnamed: 0,temp,feels_like,temp_min,temp_max,pressure,humidity,sea_level,grnd_level
Lausanne,273.3,273.3,272.93,276.13,1033,94,1033,973


In [13]:
# Temperature is in Kelvin- let us convert into Celsius

df_main[['temp', 'feels_like', 'temp_min', 'temp_max']] = df_main[['temp', 'feels_like', 'temp_min', 'temp_max']] - 273.15
df_main

Unnamed: 0,temp,feels_like,temp_min,temp_max,pressure,humidity,sea_level,grnd_level
Lausanne,0.15,0.15,-0.22,2.98,1033,94,1033,973
