# 0. Goal

The goal for today is to automate the collection of weather data for multiple cities.

* Explore the data received from the API. What’s useful? What’s not?
* Extract the information you see as useful and make a DataFrame from it
* Consolidate the steps you took to make the DataFrame into a single function. The function should output a DataFrame with the weather of multiple cities, when given a list of cities as an input.
* You should be able to use this function to get the weather data for the cities you web scraped yesterday

# 1. Import Libraries

In [None]:
import pandas as pd

In [None]:
import requests

In [None]:
import json

In [None]:
from datetime import datetime

In [None]:
# created a module `yans_keys` (.py) to store API keys
#from yans_keys import my_weather_key

In [None]:
#API_key = my_weather_key

In [None]:
city = "Berlin"

In [None]:
url = f"http://api.openweathermap.org/data/2.5/forecast?q={city}&appid={API_key}&units=metric"

**units=metric is important to have temperatures in Celcius for example.**

In [None]:
response = requests.get(url)

In [None]:
# checking if request is successful

print("response: ", response.status_code) # 200 status code means OK!

response:  200


In [None]:
# converting to JSON and having a look at the JSON file

wea_resp = response.json()

In [None]:
# checking length of JSON file

len(wea_resp)

5

# 2. Exploring Data

## 2.1 Keys in `wea_resp`

In [None]:
wea_resp.keys()

dict_keys(['cod', 'message', 'cnt', 'list', 'city'])

**`cod`, `message`, `cnt` have no further keys.
We will look into `list` and `city`.**

### 2.1.1 Keys in `wea_resp` `list`

In [None]:
wea_resp["list"][0].keys()

dict_keys(['dt', 'main', 'weather', 'clouds', 'wind', 'visibility', 'pop', 'sys', 'dt_txt'])

**`dt`, `visibility`, `pop`, `dt_txt` have no further keys. We will look into `main`, `weather`, `clouds`, `wind` and `sys`.**

#### 2.1.1.1 Keys in `wea_resp` `list` `main`

In [None]:
wea_resp["list"][0]["main"].keys()

dict_keys(['temp', 'feels_like', 'temp_min', 'temp_max', 'pressure', 'sea_level', 'grnd_level', 'humidity', 'temp_kf'])

#### 2.1.1.2 Keys in `wea_resp` `list` `weather`

In [None]:
wea_resp["list"][0]["weather"][0].keys()

dict_keys(['id', 'main', 'description', 'icon'])

#### 2.1.1.3 Keys in `wea_resp` `list` `clouds`

In [None]:
wea_resp["list"][0]["clouds"].keys()

dict_keys(['all'])

#### 2.1.1.4 Keys in `wea_resp` `list` `wind`

In [None]:
wea_resp["list"][0]["wind"].keys()

dict_keys(['speed', 'deg', 'gust'])

#### 2.1.1.5 Keys in `wea_resp` `list` `sys`

In [None]:
wea_resp["list"][0]["sys"].keys()

dict_keys(['pod'])

### 2.1.2 Keys in `wea_resp` `city`

In [None]:
wea_resp["city"].keys()

dict_keys(['id', 'name', 'coord', 'country', 'population', 'timezone', 'sunrise', 'sunset'])

**Only `coord` has further keys, as shown below. The rest not.**

#### 2.1.2.1 Keys in `wea_resp` `city` `coord`

In [None]:
wea_resp["city"]["coord"].keys()

dict_keys(['lat', 'lon'])

# Try with "JSON normalize" method

In [None]:
pd.json_normalize(wea_resp)

Unnamed: 0,cod,message,cnt,list,city.id,city.name,city.coord.lat,city.coord.lon,city.country,city.population,city.timezone,city.sunrise,city.sunset
0,200,0,40,"[{'dt': 1667466000, 'main': {'temp': 8.96, 'fe...",2950159,Berlin,52.5244,13.4105,DE,1000000,3600,1667455553,1667489632


In [None]:
pd.json_normalize(wea_resp["list"]).head()

Unnamed: 0,dt,weather,visibility,pop,dt_txt,main.temp,main.feels_like,main.temp_min,main.temp_max,main.pressure,main.sea_level,main.grnd_level,main.humidity,main.temp_kf,clouds.all,wind.speed,wind.deg,wind.gust,sys.pod,rain.3h
0,1667466000,"[{'id': 800, 'main': 'Clear', 'description': '...",10000,0.0,2022-11-03 09:00:00,8.96,6.57,8.96,10.73,1008,1008,1012,80,-1.77,0,4.34,169,8.85,d,
1,1667476800,"[{'id': 802, 'main': 'Clouds', 'description': ...",10000,0.0,2022-11-03 12:00:00,10.47,9.35,10.47,13.49,1010,1010,1009,68,-3.02,33,4.13,159,7.25,d,
2,1667487600,"[{'id': 803, 'main': 'Clouds', 'description': ...",10000,0.0,2022-11-03 15:00:00,11.52,10.32,11.52,12.8,1011,1011,1006,61,-1.28,67,4.12,138,9.05,d,
3,1667498400,"[{'id': 804, 'main': 'Clouds', 'description': ...",10000,0.0,2022-11-03 18:00:00,11.48,10.28,11.48,11.48,1009,1009,1004,61,0.0,100,4.77,140,11.6,n,
4,1667509200,"[{'id': 804, 'main': 'Clouds', 'description': ...",10000,0.0,2022-11-03 21:00:00,10.79,9.62,10.79,10.79,1007,1007,1002,65,0.0,100,3.96,137,10.18,n,


In [None]:
pd.json_normalize(wea_resp["list"][0]["weather"])

Unnamed: 0,id,main,description,icon
0,800,Clear,clear sky,01d


**- "JSON normalize" method looks quite complicated since we have Columns with Lists containing Dictionaries.**
**- "Looping" method looks easier at this point.**

# 3. Extracting Useful Data

## Data that's useful

1. Time_Stamp - "list.dt_txt"
2. Temperature - "list.main.temp"
3. Feels_Temperature - "list.main.feels_like"
4. Humidity - "list.main.humidity"
5. Weather - "list.weather.main"
6. Weather_Desc - "list.weather.description"
7. Wind_Speed - "list.wind.speed"
8. Risk_Rain - "list.pop"
9. City - "city.name"
10. Country - "city.country"

## Other useful Data

1. Info. retrieved at
2. Forecasted amount of rain
3. Forecasted amount of snow

### 3.1 Extracting time_stamp

In [None]:
wea_resp["list"][0]["dt_txt"]

'2022-11-03 09:00:00'

### 3.2 Extracting Temperature

In [None]:
wea_resp["list"][0]["main"]

{'temp': 8.96,
 'feels_like': 6.57,
 'temp_min': 8.96,
 'temp_max': 10.73,
 'pressure': 1008,
 'sea_level': 1008,
 'grnd_level': 1012,
 'humidity': 80,
 'temp_kf': -1.77}

In [None]:
wea_resp["list"][0]["main"]["temp"]

8.96

### 3.3 Extracting Feels-like Temperature

In [None]:
wea_resp["list"][0]["main"]["feels_like"]

6.57

### 3.4 Extracting Humidity

In [None]:
wea_resp["list"][0]["main"]["humidity"]

80

### 3.5 Extracting Weather

In [None]:
wea_resp["list"][0]["weather"][0]

{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}

In [None]:
wea_resp["list"][0]["weather"][0]["main"]

'Clear'

### 3.6 Extracting Weather Description

In [None]:
wea_resp["list"][0]["weather"][0]["description"]

'clear sky'

### 3.7 Extracting Wind Speed

In [None]:
wea_resp["list"][0]["wind"]

{'speed': 4.34, 'deg': 169, 'gust': 8.85}

In [None]:
wea_resp["list"][0]["wind"]["speed"]

4.34

### 3.8 Extracting Risk of Rain

In [None]:
wea_resp["list"][0]["pop"]

0

### 3.9 Extracting City

In [None]:
wea_resp["city"]

{'id': 2950159,
 'name': 'Berlin',
 'coord': {'lat': 52.5244, 'lon': 13.4105},
 'country': 'DE',
 'population': 1000000,
 'timezone': 3600,
 'sunrise': 1667455553,
 'sunset': 1667489632}

In [None]:
wea_resp["city"]["name"]

'Berlin'

### 3.10 Extracting Country

In [None]:
wea_resp["city"]["country"]

'DE'

# 4. Looping through JSON file

**We are looking to get 40 entries (every 3 hours = 8 entries per day, for 5 days) per city for our Dataframe.**

In [None]:
# getting a timestamp for when data is retrieved

now = datetime.now()
now

datetime.datetime(2022, 11, 3, 9, 45, 15, 545806)

In [None]:
# we'll store the information in this dicitonary

wea_dict = {"city": [],
            "country": [],
            "forecast_time": [],
            "weather_outlook": [],
            "weather_detailed": [],
            "temperature": [],
            "feels_like_temperature": [],
            "humidity": [],
            "wind_speed": [],
            "risk_of_rain": [],
            "rain": [],
            "snow": [],
            "info_retrieved_at": []}

# start of loop

for wea in wea_resp["list"]:
    wea_dict["city"].append(wea_resp["city"]["name"])
    wea_dict["country"].append(wea_resp["city"]["country"])
    wea_dict["forecast_time"].append(wea["dt_txt"])
    wea_dict["weather_outlook"].append(wea["weather"][0]["main"])
    wea_dict["weather_detailed"].append(wea["weather"][0]["description"])
    wea_dict["temperature"].append(wea["main"]["temp"])
    wea_dict["feels_like_temperature"].append(wea["main"]["feels_like"])
    wea_dict["humidity"].append(wea["main"]["humidity"])
    wea_dict["wind_speed"].append(wea["wind"]["speed"])
    wea_dict["risk_of_rain"].append(wea["pop"])

    # data for rain and snow are sometimes missing, as it is not always raining or snowing
    # we will try to append a value if there is one, if not, append a 0

    try:
      wea_dict["rain"].append(wea["rain"]["3h"])
    except:
      wea_dict["rain"].append("0")
    try:
      wea_dict["snow"].append(wea["snow"]["3h"])
    except:
      wea_dict["snow"].append("0")

    wea_dict["info_retrieved_at"].append(now.strftime("%d/%m/%Y %H:%M:%S"))

# 5. Creating a Dataframe

**We will convert the dictionary `wea_dict` to a dataframe `wea_df`.**

In [None]:
wea_df = pd.DataFrame(wea_dict)

wea_df.head()

Unnamed: 0,city,country,forecast_time,weather_outlook,weather_detailed,temperature,feels_like_temperature,humidity,wind_speed,risk_of_rain,rain,snow,info_retrieved_at
0,Berlin,DE,2022-11-03 09:00:00,Clear,clear sky,8.96,6.57,80,4.34,0.0,0,0,03/11/2022 09:45:15
1,Berlin,DE,2022-11-03 12:00:00,Clouds,scattered clouds,10.47,9.35,68,4.13,0.0,0,0,03/11/2022 09:45:15
2,Berlin,DE,2022-11-03 15:00:00,Clouds,broken clouds,11.52,10.32,61,4.12,0.0,0,0,03/11/2022 09:45:15
3,Berlin,DE,2022-11-03 18:00:00,Clouds,overcast clouds,11.48,10.28,61,4.77,0.0,0,0,03/11/2022 09:45:15
4,Berlin,DE,2022-11-03 21:00:00,Clouds,overcast clouds,10.79,9.62,65,3.96,0.0,0,0,03/11/2022 09:45:15


# 6. Creating a function

**This will allow us to take a list of cities as input.**

In [None]:
def get_wea(cities):

    API_key = my_weather_key

    now = datetime.now()

    wea_dict = {"city": [],
                "country": [],
                "forecast_time": [],
                "weather_outlook": [],
                "weather_detailed": [],
                "temperature": [],
                "feels_like_temperature": [],
                "humidity": [],
                "wind_speed": [],
                "risk_of_rain": [],
                "amount_of_rain": [],
                "amount_of_snow": [],
                "info_retrieved_at": []}

    for city in cities:
        url = f"http://api.openweathermap.org/data/2.5/forecast?q={city}&appid={API_key}&units=metric"
        response = requests.get(url)
        wea_resp = response.json()

        for wea in wea_resp["list"]:
            wea_dict["city"].append(wea_resp["city"]["name"])
            wea_dict["country"].append(wea_resp["city"]["country"])
            wea_dict["forecast_time"].append(wea["dt_txt"])
            wea_dict["weather_outlook"].append(wea["weather"][0]["main"])
            wea_dict["weather_detailed"].append(wea["weather"][0]["description"])
            wea_dict["temperature"].append(wea["main"]["temp"])
            wea_dict["feels_like_temperature"].append(wea["main"]["feels_like"])
            wea_dict["humidity"].append(wea["main"]["humidity"])
            wea_dict["wind_speed"].append(wea["wind"]["speed"])
            wea_dict["risk_of_rain"].append(wea["pop"])

            try:
              wea_dict["amount_of_rain"].append(wea["rain"]["3h"])
            except:
              wea_dict["amount_of_rain"].append("0")
            try:
              wea_dict["amount_of_snow"].append(wea["snow"]["3h"])
            except:
              wea_dict["amount_of_snow"].append("0")

            wea_dict["info_retrieved_at"].append(now.strftime("%d/%m/%Y %H:%M:%S"))

    return pd.DataFrame(wea_dict)


In [None]:
# Calling the function by taking a list of cities as input

get_wea(["Frankfurt", "Berlin", "Cologne", "Munich", "Hamburg"])

Unnamed: 0,city,country,forecast_time,weather_outlook,weather_detailed,temperature,feels_like_temperature,humidity,wind_speed,risk_of_rain,amount_of_rain,amount_of_snow,info_retrieved_at
0,Frankfurt am Main,DE,2022-11-11 12:00:00,Clouds,overcast clouds,9.32,8.61,90,1.76,0.00,0,0,11/11/2022 12:56:17
1,Frankfurt am Main,DE,2022-11-11 15:00:00,Clouds,overcast clouds,10.24,9.41,80,0.72,0.00,0,0,11/11/2022 12:56:17
2,Frankfurt am Main,DE,2022-11-11 18:00:00,Clouds,overcast clouds,9.86,9.86,75,0.66,0.00,0,0,11/11/2022 12:56:17
3,Frankfurt am Main,DE,2022-11-11 21:00:00,Clouds,broken clouds,9.18,9.18,73,0.70,0.00,0,0,11/11/2022 12:56:17
4,Frankfurt am Main,DE,2022-11-12 00:00:00,Clouds,broken clouds,8.49,8.49,73,0.84,0.00,0,0,11/11/2022 12:56:17
...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,Hamburg,DE,2022-11-15 21:00:00,Rain,light rain,11.09,10.66,92,4.10,0.87,0.65,0,11/11/2022 12:56:17
196,Hamburg,DE,2022-11-16 00:00:00,Clouds,overcast clouds,11.22,10.56,83,4.19,0.68,0,0,11/11/2022 12:56:17
197,Hamburg,DE,2022-11-16 03:00:00,Clouds,overcast clouds,10.95,10.19,80,4.13,0.00,0,0,11/11/2022 12:56:17
198,Hamburg,DE,2022-11-16 06:00:00,Rain,light rain,11.50,10.85,82,6.22,0.44,0.36,0,11/11/2022 12:56:17
