<div style="text-align: center;" >
<h1 style="margin-top: 0.2em; margin-bottom: 0.1em;">APIs in Python</h1>
<h4 style="margin-top: 0.7em; margin-bottom: 0.3em; font-style:italic">Application Programming Interface</h4>
</div>
<br>

### __Google Slides__

Find the slides here:
[Slides API](https://docs.google.com/presentation/d/1AjiYJL5TblgwT1Ez-yW7sxlfe1UX5-NsePa4Q1jLXpE/edit?usp=sharing)

### __Structure__

1. What is an API 

    1.1 SOAP APIs
    
    1.2 REST APIs
    
    
2. Why to use APIs


3. Retriving data from an API

    3.1 Making a request
    
    3.2 Dealing with the response


4. Interactive Example

## __1. What is an API?__

An API (Application Programming Interface) is, as the name states, an interface that enables the exchange of information and data between two agents (i.e. a website or database and a user). The API itself is not the instance, that contains the data, but purely for the communication between the two sides, that want to exchange data. 


There are many kinds of API:

* The big social media platforms usually have an API which you can use to retreive data i.e. 
    * [Twitter](https://developer.twitter.com/en/docs/twitter-api) (to retreive information on users and tweets, dates, likes and retweets - unfortunately behind a paywall now)
    * [YouTube](https://developers.google.com/youtube/v3/docs) (for information on channels, videos, followers, clicks etc.)
    * [Reddit]() (on subreddits and comments) 
    * [Spotify](https://developer.spotify.com/documentation/web-api) (on artists, songs, albums)
* Imagine you have a company producing a product which you are selling online. Your website should display the correct stock of each item, which is stored in a (external) warehouse. The warehouse has its own system to keep track of the stocks (and could potentially be changed). If you are using an API to provide information on stocks from the warehouse system to your website, you are independant of the actualy system, the warehouse is using. 
* You can also use APIs for services like i.e. classification. 
    * The [Perspective API](https://perspectiveapi.com/) is an API which classifies the degree of toxicity in a text (on a scale of 0 to 1)
    * The [Spotify API](https://developer.spotify.com/documentation/web-api) can determine some features of a song (danceability, tempo, instrumentalness, etc.)



![title](src/API_Illustration.jpg)


There are different types of APIs, differing in their architectural structure and usage (for more information, see this [blog](https://blog.postman.com/soap-vs-rest/)). The two major types are:

#### __1.1 SOAP APIs__

SOAP stands for **S**imple **O**bject **A**ccess **P**rotocol and is a type of API that is usually associated with enterprises.
SOAP APIs have

* Strict security requirements and regulations (stricter contract-based usage)
* Therefore, better fit for systems that send and receive highly sensitive data, i.e. financial and healthcare information
* Mostly designed around actions


#### __1.2 REST APIs__

REST is an acronym for **RE**presentational **S**tate **T**ransfer and is an API architecture designed for access to web services (whose structure can change often).

The REST architecture ensures consistent access based on some priciples:

* Statelessness: the client (user) provides all necessary details so that the server does not have to care about the clients state 
* Consistent interface: ensures consistent and simple interactions using common HTTP methods like GET or POST
* Standard media type support: provide resources in standard data formats like i.e. JSON
* Separation of concerns: ensures that client (user) and server are completely de-coupled and can evolve independently from each other
* Layered architecture: so that the system can be modular and maintainable


REST APIs are typically used for public APIs and are ideal for fetching data from the Web. They are much lighter and closer to the HTTP specification than SOAP.
REST APIs are what we are usually interestes in.

***

## __2. Why to use an API?__

Maybe you are wondering "Why the hell should I care about APIs if I can also download a csv-file or scrape a website?"

Well, there are several reasons:

* **1. APIs provide a standardised way to access the data.** In order to retreive the data, you *don't have to know or understand the internal structure* of how the data is stored, but just the command (sytax) on how to communicate, what you want to get. If you are webscraping a page, for example, you have to know the structure to specifically define the html-elements, you want to retreive. Image the website being updated or resturctured. Now, your code won't run again, because the html-structure changed. With an API, you have an interface between the actual data and yourself, the user, which *ensures that the data access remains consistent*, even if the websites layout or database layout changes.

* **2. You want to retreive only small pices if data.** When downloading csv-files, you usually get the whole data and then need to filter for i.e. users or dates you are actually interested in. With APIs you can immediately define, which data you need (in i.e. terms of users, YouTube channels or time frames) and only collect this data.

* **3. Less repeated computation.** You can use services and functions, someone else already programmed and do not have to do everything yourself. 




APIs therefore provide a standardised way of accessing the data and enable fast and efficient exchange, indepenadant of whether the websites layout or data base structure changes of not. 



***

## __3. Retriving data from an API__

In the same way as with webscraping we also have to make *requests* to an API in order to get a *response*.

![title](src/api-request.svg)


* 1. The first step is to **request** a URL. 
    - APIs have (multiple) so called **endpoints**. These are different endings (suffixes) to the same base URL and provide access to different parts and pieces of information. For example with the YouTube API you have endpoints for channels, for videos, for comments for playlists etc. This way, you can clearly specify, which exact information you need to retreive. 
        - `https://www.googleapis.com/youtube/v3/videos`
        - `https://www.googleapis.com/youtube/v3/channels`
        - `https://www.googleapis.com/youtube/v3/comments`

    - Using the APIs **parameters** you can further refine your request. For example specify certain channel ids, for which you would like to retrive data (instead of getting them for all channels)
        - channel id 
        - statistics will encapsulate a channels statistics (like viewCount, commentCount etc.)
        - status (privat or public channel)
 

* 2. Once you've send the request, you will **get a response** in return (usually in JSON format) which you have to further process to extract and store the desired data.


**Documentation pages**

If you are wondering how to find out about the possible endpoints and parameters, each **APIs documentation page** will give you an overview. 
* The documentation pages might seem quiet messy and complex at first. Take your time to read and understand them. Usually, they are built in similar ways so you will get used to working with them.
* You will find all relevant information on there
    - i.e. regarding possible endpoints
    - or regarding query parameters
    
* Usually, there are also some examples or a sandbox, where you can pre-test your query to see, if it works
* Additionally, the documentation pages provide information the the returned response, its content and format

**API-Keys**

Not all APIs are free and publically available. Often, you will have to register for the API access. 
This means, you need to sign in in some way (usually by providing an e-mail adress) in order to get a so called API-key for authentication.
It depends on the specific API if this is free or if you need to pay for the access.

Once you've got an API-key it needs to be provided as a parameter in you API call.
Additionally, you have to be aware that for some APIs there can be a limit in i.e. the number of requests, you can make using this key. 

#### __3.1 Request__

In the same way as we had to make requests to websites when scraping them, we also have to make requests to an API. Therefore, we again use the `requests`-package. 

In [None]:
import requests

There are three easy (and free) APIs that predict a persons [age](https://agify.io/), [gender](https://genderize.io/) and [nationality](https://nationalize.io) based on a given name.

We are going to start with the **age-API**. 

The documentation tells us that, in order to request the predicted age of a name, we have to access the following url where we hand over the query-parameter, here the name for example michael: https://api.agify.io?name=michael

In [None]:
# Defining the name, we want to predict
name = 'Elena'

# defining the url
age_api_url =  f'https://api.agify.io?name={name}'

# sending the request
resp = requests.get(age_api_url)
resp

#### __3.2 Response__

Like with scraping websites, the response is a status code, indicating if the request worked. 

Apart from that, we also get the response content.
This content-body is stored in a JSON-format, which is similar to a python dictionary.

To access the prediction (the data) we have to transform the response into a JSON-file using  `.json()` on the response
(analogous with websites we had `soup = BeautifulSoup(response.content, 'html.parser')` to transform it into html-code).

In [None]:
# Transform into JSON
resp.json()

**JSON files in Python**

JSON is the short version for 'JavaScript Object Notation' and is an easy to read file format.
Since it is independant of a programming language is is mainly used to exchange information between applications.

JSON-files are in their structure similar to dictionaries. Below you can see a response-example from the Twitter API:

![title](src/JSON_response.png)

Turning back to our example with the age-API. We already got a response which we turned into JSON-format:

In [None]:
resp.json()

The syntax to access the desired data within the JSON response is similar to dictionaries.

In [None]:
data = resp.json()

data["age"]

***

## __4. Interactive Tutorial Part:__



Have a look at the documentatios for the [gender](https://genderize.io/) and [nationality](https://nationalize.io) APIs and get familiar with their way of working.

<div class="alert alert-block alert-info">
    <b>Exercise 1 </b>: Write a function (predict_personal_data()) which returs the predicted age, gender and nationality for a given name. 
    <b>Bonus; </b> Try to handle exceptions, i.e. if you hand over a name that is not valid or featured by the API
</div>

In [None]:
def predict_personal_data(name):
    
   

In [None]:
# Test your function

predict_personal_data()

# Test if the error-catching works
predict_personal_data()

<div class="alert alert-block alert-info">
    <b>Exercise 2 </b>: News API  
</div>

We now want to retreive data from the [News API](https://newsapi.org/).

This [website](https://newsapi.org/docs/authentication) tells us that we need to authenticate in in order to get access.

* 1.) Take some minutes to familiarize yourself with the website and documentation
* 2.) Sign up to get a key and be able to use the API
* 3.) Inspect the [documentation page](https://newsapi.org/docs):
    - What is the baseline URL?
    - Can you find information on the available endpoints?
    - What are parameters you can use to refine your search?
        * We want to get all englisch articles, that were published during the last month and had a headline in which contained (Donald) Trump and sort them by popularity.

In [None]:
import requests
import pandas as pd

In [None]:
# 2) Sign in for an API key

API_key = '' # Add your personal API key here



The baseline URL is:

`GET https://newsapi.org`


and the three possible endpoints:

* everything: `/v2/everything`
* top-headlines: `/v2/top-headlines`
* sources: `/v2/top-headlines/sources`

<div class="alert alert-block alert-info">
    <b>Exercise 2 a) </b> Which endpoint and parameters you need to use to request: all englisch articles, that were published during the last month and had a headline in which contained (Donald) Trump and sort them by popularity. 
</div>

Parameters we need:




<div class="alert alert-block alert-info">
    <b>Exercise 2 b) </b> Define the parameters and built the URL in order to request it from the API. Send the request and receive a response. Store it as a json-object.
</div>

In [None]:
# Solution








<div class="alert alert-block alert-info">
    <b>Exercise 2 c) </b>: Process and store the response. How many articles have been published during this time about Donald Trump? And how many does your response contain? Try to group the output by date and vizualize how many articles have been publsihed each day.
</div>

*Hint:* Have a look at the pandas `json_normalize` function.

In [None]:
# Solution






<div class="alert alert-block alert-info">
    <b>Exercise 2 d) </b>: Find out, which Newspaper has published most articles about Trump?
</div>

In [None]:
# Solution

