# LECTURE 7 - Data Sourcing with APIs

___

As a rule of thumb for Data Sourcing, you should get as close to the raw data as possible as a starting point for analysis. The data can be structures (csv, json, excel, SQl), unstructered (pdf, doc) or binary. If you want to get data online, you either find an accessible database or API, otherwise web scraping would also be a possibility.


**API (Application Program Interface)**

---

One of the file reading functions in pandas is CSV. Another one which we will need most of the time when communicating with APIs is JSON. A lot of different apps and software communicate via APIs which allows us to access certain functionalities and services from the API provider. Examples for this would be Revolut or N26.


While normal websites return HTML/CSS/JavaScript etc. when you send an HTTP request, APIs return a json file with data you can then use.

**RESTFUL API**

---
Restful in the API world means you have the ability to use the full CRUD spectrum

Example:
```
HTTP

CRUD
GET www.airbnb.com/flats            (read)
GET www.airbnb.com/flats/:id        (read)
POST www.airbnb.com/flats           (create)
PUT www.airbnb.com/flats/:id        (modify)
DELETE www.airbnb.com/flats/:id     (delete)
```

### Metaweather API

This is the API we will use in the example. Documentation is available [here](https://www.metaweather.com/api/).



In [12]:
# WE need the request package for handling the API
import requests

In [37]:
# First we need to ask the user for a city name for the weather search
city = input('City: \n').lower()

# Then we assign the correct URLs
base_url = 'https://www.metaweather.com/api/'
location_url = f'location/search/?query={city}'

# Then we need to call the API to get the specific id for the city (called woeid) and can also check the status to be sure we don't get an error
response = requests.get(base_url + location_url)
response.status_code

# We assign the data to a variable and exract the city id
data = response.json()

# If there is no data found because the city is not available or wrongly spelled print that here while saving the necessary data
try:
    city_id = data[0]['woeid']
    check = True
except IndexError:
    print(f'Unfortunately we could not fetch any data. Either {city} was not spelled correctly or we do not have it in our databse')
    check = False

if check:
    # Then we call the API to get the weater data based on a new endpoint
    weather_url = f'location/{city_id}'
    response = requests.get(base_url + weather_url)
    forecast = response.json()['consolidated_weather']

    # Finally we display the data
    print(f'Here is the 6-day weather forecast for {city.capitalize()}')
    print('\n')

    for day in forecast:
        print(f'Date: {day["applicable_date"]}')
        print(f'Temperature: {round(day["the_temp"],1)}')
        print(f'Max Temperature: {round(day["max_temp"],1)}')
        print(f'Min Temperature: {round(day["min_temp"],1)}')
        print(f'Humidity: {day["humidity"]}')
        print(f'Wind Speed: {round(day["wind_speed"],1)}')
        print(f'Predictability: {day["predictability"]}')
        print('\n')


Here is the 6-day weather forecast for London


Date: 2020-07-29
Temperature: 21.3
Max Temperature: 21.9
Min Temperature: 14.9
Humidity: 51
Wind Speed: 4.8
Predictability: 71


Date: 2020-07-30
Temperature: 26.7
Max Temperature: 29.1
Min Temperature: 15.5
Humidity: 36
Wind Speed: 3.5
Predictability: 68


Date: 2020-07-31
Temperature: 31.6
Max Temperature: 32.6
Min Temperature: 19.7
Humidity: 39
Wind Speed: 7.3
Predictability: 71


Date: 2020-08-01
Temperature: 23.1
Max Temperature: 25.5
Min Temperature: 17.1
Humidity: 56
Wind Speed: 6.9
Predictability: 73


Date: 2020-08-02
Temperature: 22.3
Max Temperature: 22.9
Min Temperature: 14.9
Humidity: 48
Wind Speed: 5.9
Predictability: 70


Date: 2020-08-03
Temperature: 15.8
Max Temperature: 18.9
Min Temperature: 14.8
Humidity: 76
Wind Speed: 5.2
Predictability: 77


