# Students
- GHAITH Sarahnour (M2QF & ENSIIE)
- ROISEUX Thomas (M2QF & ENSIIE)

# Introduction
## Context

The goal of this project is to study 
the temporal evolution of temperature and wind in France, across one year.

## Required packages
- `pandas` : to manipulate dataframes.
- `numpy` : to manipulate arrays.
- `matplotlib` : to plot graphs.
- `cartopy` : to plot maps.
- `IPython` : to display dataframes in Jupyter Notebook.
- `scikit-learn` : to use machine learning algorithms.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from datetime import datetime, timedelta
from typing import Dict

from IPython.display import display

from sklearn.cluster import KMeans, AgglomerativeClustering

## Data importation
### Preparing GPS dataframe

In [None]:
gps_df = pd.read_csv("dataGPS.csv", header=None, sep=";")
gps_df.columns = ["Temp ID", "Lattitude", "Longitude"]
gps_df.insert(1, "Wind ID", gps_df["Temp ID"].str.replace("TEMP", "VVENT"))

display(gps_df.head())

### Preparing temperature dataframe and wind dataframe

In [None]:
year = 2019
hours = [datetime(year, 1, 1, 0, 0, 0) + timedelta(hours=i) for i in range(8760)]

In [None]:
temp_df = pd.read_csv("dataTemp.csv", header=None, sep=";", index_col=0)
temp_df.index.name = "Temperature ID"
temp_df.columns = hours


display(temp_df.head())


wind_df = pd.read_csv("dataWind.csv", header=None, sep=";", index_col=0)
wind_df.index.name = "Wind ID"
wind_df.columns = hours


display(wind_df.head())

## Example: weather in Paris
We are goinng to study the weather in Paris, the capital of France, as an example.
It is located at 48.51° N, 2.21° E.

Let's firstplace in on a map.

In [None]:
fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(1, 2, 1, projection=ccrs.PlateCarree())
ax.set_extent([-5, 9, 42, 52])
ax.set_title("France")
ax.stock_img()

x, y = 2.217999, 48.512381
ax.plot(x, y, "r*", markersize=15)
ax.text(x, y, "Paris")
plt.show()

Now, we are going to plot the evolution of temperature and wind in Paris, across one year.

In [None]:
plt.figure(figsize=(20, 10))
plt.title("Temperature in Paris")
plt.plot(temp_df.columns, temp_df.iloc[33, :], color="blue")
plt.show()

In [None]:
plt.figure(figsize=(20, 10))
plt.title("Wind in Paris")
plt.plot(wind_df.columns, wind_df.iloc[33, :], color="blue")
plt.show()

# Preliminaries
## Cities selection
We are going to select 3 more cities in France, to study the weather in different regions.

We chose:
- Strasbourg (48.58° N, 7.75° E);
- Nice (43.70° N, 7.26° E);
- Brest (48.39° N, 4.48° W);

In [None]:
def find_closest_point(x: float, y: float, df: pd.DataFrame) -> Dict[str, float | str]:
    """Get the closest point to the given coordinates in the given dataframe.

    Args:
        x (float): longitude
        y (float): lattitude
        df (pd.DataFrame): dataframe with columns Longitude and Lattitude

    Returns:
        dict[str, float | str]: closest point
    """
    distances = np.sqrt((df["Longitude"] - x) ** 2 + (df["Lattitude"] - y) ** 2)
    return df.iloc[np.argmin(distances)].to_dict()


strasbourg = find_closest_point(48.5734053, 7.7521113, gps_df)
print("Strasbourg:", (strasbourg["Longitude"], strasbourg["Lattitude"]))
nice = find_closest_point(43.7009358, 7.2683912, gps_df)
print("Nice:", (nice["Longitude"], nice["Lattitude"]))
brest = find_closest_point(48.390528, -4.486008, gps_df)
print("Brest:", (brest["Longitude"], brest["Lattitude"]))

In [None]:
fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(1, 2, 1, projection=ccrs.PlateCarree())
ax.set_extent([-5, 9, 42, 52])
ax.set_title("France")
ax.stock_img()

x, y = 2.217999, 48.512381
ax.plot(x, y, "r*", markersize=15)
ax.text(x, y, "Paris")
ax.plot(strasbourg["Lattitude"], strasbourg["Longitude"], "r*", markersize=15)
ax.text(strasbourg["Lattitude"], strasbourg["Longitude"], "Strasbourg")
ax.plot(nice["Lattitude"], nice["Longitude"], "r*", markersize=15)
ax.text(nice["Lattitude"], nice["Longitude"], "Nice")
ax.plot(brest["Lattitude"], brest["Longitude"], "r*", markersize=15)
ax.text(brest["Lattitude"], brest["Longitude"], "Brest")
plt.show()

We are now going to plot the evolution of temperature and wind in these cities, across one year.

In [None]:
plt.figure(figsize=(20, 10))
plt.title(f"Temperature")
plt.xlabel("Time")
plt.ylabel("Temperature")
for dict, names in zip((strasbourg, nice, brest), ("Strasbourg", "Nice", "Brest")):
    wind_id = dict["Wind ID"]
    temp_id = dict["Temp ID"]
    plt.plot(temp_df.columns, temp_df.loc[temp_id, :], label=names)

plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(20, 10))
plt.title(f"Wind")
plt.xlabel("Time")
plt.ylabel("Wind speed")
for dict, names in zip((strasbourg, nice, brest), ("Strasbourg", "Nice", "Brest")):
    wind_id = dict["Wind ID"]
    temp_id = dict["Temp ID"]
    plt.plot(wind_df.columns, wind_df.loc[wind_id, :], label=names)

plt.legend()
plt.show()

## Clustering using these cities
We are going to cluster the cities in France, using the temperature and wind data.
We will use 2 different clustering algorithms:
- K-means;
- hierarchical clustering.
### K-means
We will use $K=4$ for the number of clusters.

In [None]:
k_means_wind, k_means_temp = KMeans(n_clusters=4), KMeans(n_clusters=4)
k_means_wind.fit(wind_df.T)
k_means_temp.fit(temp_df.T)

print("Wind clusters:", k_means_wind.cluster_centers_)
print("Temperature clusters:", k_means_temp.cluster_centers_)