# Project 2 - Weather Predictions

## Project context

There are some days when we would have been happy to stay teleworking. Among them
those days, those days that are both wet and windy when it is impossible to
maintain a decent hairstyle, despite all his efforts. Could one
using `Python` to predict what the English call *bad
hair day* (“bad hair day”)?

The aim of the project is to build a *bad hair index*.
bad hair day”) from weather and location data
graphically represent the evolution of this index in order to determine
in advance the days when we would do better to stay warm. In order
to get the right data, we will request APIs.

An API (Application Programming Interface) is a set of
rules and specifications that applications follow for
communicate with each other. It allows your code to **access
external or data-based features**, such as those from databases
weather data or location services. When talking about
requesting an API, this is usually done via the **protocol
HTTP**, which is the same protocol used to load web pages.
In this tutorial we will use the package
[requests](https://fr.python-requests.org/en/latest/), which simplifies the
HTTP request and response handling process.

The APIs we will use are:

- [Nominatim](https://nominatim.org/release-docs/latest/api/Overview/)
: a geocoding API offered by **OpenStreetMap** which allows us
allows you to convert a place name into geographic coordinates.
- [Open-Meteo Weather Forecast](https://open-meteo.com/en/docs) : a
API that provides detailed weather forecasts.

Let's start by importing the packages we will need during
this project.

In [None]:
import requests
import pandas
import seaborn as sns
import matplotlib.pyplot as plt

import solutions

## Part 1: Retrieving geographic coordinates for a given location

The open-meteo prediction API takes coordinates as input
geographical (latitude, longitude) of the place where the
predictions. We could manually retrieve the coordinates of the location
which interests us, but this would limit the reproducibility of our
analyses with other locations than the one chosen. We will therefore use a
second API, `Nominatim`, to get these coordinates for a location
given.

When working from an API, the first step is
always read your documentation. It is this which indicates to what extent
address we should send our requests, in what format, and what will
the API answers us. In our case, the documentation of `Nominatim` is
found at [this
address](https://nominatim.org/release-docs/develop/api/Overview/).
Do not hesitate to quickly browse it to assess the possibilities of
the API.

### Question 1

The first essential characteristic of an API is the *endpoint*,
that is, the URL to which we will send requests. In our
case, we will use the *endpoint* `/search` to the extent that we want
find a geographic object (coordinates) from a name
location. The [page of
documentation](%5B/search%5D(https://nominatim.org/release-docs/develop/api/Search/))
associated with this *endpoint* gives us all the information we need
we need:

- the format of a query is
`https://nominatim.openstreetmap.org/search?<params>` where `<params>`
must be replaced by the query parameters, separated by the
symbol `&`
- in the [Structured] section
Query](https://nominatim.org/release-docs/develop/api/Search/#structured-query),
we see that the API accepts `country` and `city` as parameters
(city), which we will use to configure our query.

Define a function `build_request_nominatim` that builds the link
of the query for a given country and city.

#### Expected result

In [None]:
url_request_nominatim = solutions.build_request_nominatim("France", "Montrouge")
url_request_nominatim

#### Up to you !

In [None]:
def build_request_nominatim(country, city):
    # Your code here
    return url_request

In [None]:
# Checking the result
url_request_nominatim = build_request_nominatim("France", "Montrouge")
url_request_nominatim

### Question 2

The next step is to send our parameterized request to the API.
To test it beforehand, you can simply put the address in a
browser and see what the API returns to us. If the results look
consistent, we can continue. If the API returns an error code, it
There is surely an error to be found in the query.

To perform this query from `Python` to retrieve
the results, we use the `requests.get()` function to which we
provides as the only parameter the URL of the request. In return we get
a “response” object, from which the `JSON` content can be extracted in the form
from a `Python` dictionary by applying the `.json()` method to it. It
You must then browse the dictionary to extract the information.
relevant; in our case: latitude and longitude.
In order to fullfill the nominatim policy, one have to add a second parameter to the function :`headers={"user-agent": "phd python"}`
Define a function `get_lat_long` that retrieves the latitude and longitude.
(central) longitude for a given country and city.

#### Expected result

In [None]:
lat, long = solutions.get_lat_long(query=url_request_nominatim)
print(lat, long)
print(type(lat))
print(type(long))

#### Up to you !

In [None]:
def get_lat_long(query):
    # Your code here
    return latitude, longitude

In [None]:
# Checking the result
lat, long = get_lat_long(query=url_request_nominatim)
print(lat, long)
print(type(lat))
print(type(long))

## Part 2: Retrieving Weather Forecasts

Now that we can retrieve the coordinates associated with a
Given location, we can query the `open-meteo.com` API for
get the weather prediction data associated with these coordinates. There
Again, the first step is to look at the documentation ([page
homepage](https://open-meteo.com/),
[doc](https://open-meteo.com/en/docs)), which provides us with several
information:

- the *endpoint* for the prediction API is
`https://api.open-meteo.com/v1/forecast`
- the API expects as input a `latitude` and a `latitude`, as well as
the desired weather variables. For our problem,
we will retrieve information about the humidity level
(`relativehumidity_2m`) and wind speed (`windspeed_10m`)
- by default, the API returns 7-day predictions

### Question 3

Knowing all this information and using the documentation,
define a function `build_request_open_meteo` that builds the link
of the query for a given latitude and longitude. Again, it
It is possible to test the validity of the query by executing the link
in a browser and checking that the results returned
appear coherent.

#### Expected result

In [None]:
url_request_open_meteo = solutions.build_request_open_meteo(latitude=lat, longitude=long)
url_request_open_meteo

#### Up to you !

In [None]:
def build_request_open_meteo(latitude, longitude):
    # Your code here
    return url_request

In [None]:
# Checking the result
url_request_open_meteo = build_request_open_meteo(latitude=lat, longitude=long)
url_request_open_meteo

### Question 4

Again, we use the `requests.get()` function to submit the
request to the API. We get a “response” object in return, which we can
extract the `JSON` content as a `Python` dictionary into it
applying the `.json()` method.

But what happens in case the submitted query is invalid?
(typo, missing parameters, etc.)? In this case, the API
returns an error. The response object of the request contains a
`.status_code` attribute that gives the response code of a request. The
code `200` indicates a successful request; any other code indicates
an error.

Define a function `get_meteo_data` that retrieves the dictionary
full data returned by the API following our request. The
behavior of the function must however depend on the response code
of the request:

- if the code is `200`, the function returns the dictionary of
predictions;
- if the code is different from `200`, the function displays the code
error and returns `None`.

#### Expected result

In [None]:
predictions = solutions.get_meteo_data(url_request_open_meteo)
type(predictions)

In [None]:
wrong_request = solutions.build_request_open_meteo(latitude=lat, longitude="dix-sept-virgule-quatre")
output = solutions.get_meteo_data(wrong_request)
print(output)

#### Up to you !

In [None]:
def get_meteo_data(query):
    # Your code here
    return response.json()

In [None]:
# Checking the result
predictions = get_meteo_data(url_request_open_meteo)
type(predictions)

In [None]:
# Checking the result
wrong_request = build_request_open_meteo(latitude=lat, longitude="dix-sept-virgule-quatre")
output = get_meteo_data(wrong_request)
print(output)

### Question 5

In order to fully understand the structure of the data we have
retrieved, explore the dictionary of predictions returned by the API
(keys, different levels, format of predictions, format of the
variable indicating the dates/times of the predictions, etc.)

In [None]:
# Data Exploration
print(type(predictions))
print(predictions.keys())
print(type(predictions["hourly"]))
print(predictions["hourly"].keys())
print(type(predictions["hourly"]["time"]))
print()

# Show data
print(predictions['hourly']["time"][:5])
print(predictions['hourly']["time"][-5:])
print()
print(predictions['hourly']["relativehumidity_2m"][:5])
print(predictions['hourly']["windspeed_10m"][:5])

## Part 3: Building and visualizing a *bad hair index*

The objective of this last part is to calculate and represent
graphically the *bad hair index*. Let us recall that we define this index
as the **product of relative humidity and wind speed**. It
This is a fun measure of the probability of having a “bad”
hairstyle” due to weather conditions.

### Question 6

Define a function `preprocess_predictions` that formats the
predictions from the API as a `Pandas DataFrame` in view
of a statistical analysis. The steps to implement are as follows:

1. convert the predicted data into a 3-column `Pandas DataFrame`
(date and time of observation, humidity, wind speed);
2. convert time column to `datetime` format
([documentation](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html))
3. add two new variables indicating the day of observation
and the time of observation
4. add a variable that calculates the *bad hair index*

#### Expected result

In [None]:
df_preds = solutions.preprocess_predictions(predictions)
df_preds.head()

#### Up to you !

In [None]:
def preprocess_predictions(predictions):
    # Your code here
    return df

In [None]:
# Checking the result
df_preds = preprocess_predictions(predictions)
df_preds.head()

### Question 7

For graphical representation purposes, we will represent the *bad
hair index* aggregated at two levels:

- average hour by hour. This will help answer the question:
“What time will it generally be best to stay at home?”
home next week?”
- average day by day. This will help answer the question:
“which day will it generally be better to stay home
next week?”

Define a function `plot_agg_avg_bhi` that calculates the aggregate index
in each case, and represents the result in the form of a
`lineplot`.

#### Expected result

In [None]:
solutions.plot_agg_avg_bhi(df_preds, agg_var="day")

In [None]:
solutions.plot_agg_avg_bhi(df_preds, agg_var="hour")

#### Up to you !

In [None]:
def plot_agg_avg_bhi(df_preds, agg_var="day"):
    # Your code here
    return None

In [None]:
# Checking the result
plot_agg_avg_bhi(df_preds, agg_var="day")

In [None]:
# Checking the result
plot_agg_avg_bhi(df_preds, agg_var="hour")

What do you conclude for the coming week?

### Question 8

Our bad hair day prediction tool works like a charm.
But the holidays are coming soon, and a trip to Berlin is planned.
Ideally, we would like to be able to use our tool for any
what locality. Fortunately, functions have been defined at each stage,
which will allow us to easily move to a “chief” function
orchestra” which calls all the others for a given locality.

Define a function `main` that represents the *bad hair index* for
a given country, city and level of aggregation.

#### Expected result

In [None]:
solutions.main(country="Germany", city="Berlin", agg_var="day")

In [None]:
solutions.main(country="Germany", city="Berlin", agg_var="hour")

#### Up to you !

In [None]:
def main(country, city, agg_var="day"):
    # Your code here
    return None

In [None]:
# Checking the result
main(country="Germany", city="Berlin", agg_var="day")

In [None]:
# Checking the result
main(country="Germany", city="Berlin", agg_var="hour")