# Weather data project - Luca Sangiovanni

### Importing packages and datasets

Regarding the packages imported, more infos can be retrieved on the readme.md file.
Passing on to the datasets imported, I imported "tempByCity" and "tempByMajorCity" from the csv files provided on GitHub. Then, I created "majorCities", which contains the list of 100 major cities, whose coordinates are transformed in a format that can be interpreted more easily when creating maps. To do so, I created the function "conversion" (located in utils.py), which contains an API that downloads the coordinates of all the major cities an puts them in the dataframe. I decided to save the dataframe in a new csv file, called "majorCities", located in a local folder on my PC, so that the API is started only once, and not every time the program is debugged. 
I did the same with the "tempByCity" csv: I used the same API to download the coordinates of all the cities, and saved the informations on a csv, called "cities", that I loaded. 

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import plotly.express as px
from opencage.geocoder import OpenCageGeocode
import project.visualization as vz

In [None]:
path = "C:\\Users\sangi\Desktop\Info progetto python\Datasets"
tempByCity = pd.read_csv(path + "\GlobalLandTemperaturesByCity.csv").dropna().reset_index(drop = True)
#tempByCountry = pd.read_csv(path + "\GlobalLandTemperaturesByCountry.csv")
tempByMajorCity = pd.read_csv(path + "\GlobalLandTemperaturesByMajorCity.csv").dropna().reset_index(drop = True).drop(["Latitude", "Longitude"], axis = 1)
#tempByState = pd.read_csv(path + "\GlobalLandTemperaturesByState.csv")
majorCities = pd.read_csv(path + "\majorCities.csv", index_col = 0)
cities = pd.read_csv(path + "\cities.csv", index_col = 0)

### List of most represented countries in the dataset

I analyzed which are the most represented countries in my dataset, and plotted the result in a bar plot, showing the 15 countries with the most cities in the dataset. As we would expect, the most populated countries are the ones that have most cities represented.

In [None]:
def byCountry_List():
    numCities = cities.drop(["Latitude", "Longitude"], axis=1).groupby("Country").count().sort_values(by="City",                                                                                                   ascending=False)
    print(numCities.head(15))

byCountry_List()

In [None]:
def byCountry_Plot():
    numCities = cities.drop(["Latitude", "Longitude"], axis = 1).groupby("Country").count().sort_values(by = "City", ascending = False)
    plt.bar(numCities.index[:15], numCities.City[:15], color = "brown")
    plt.xticks(rotation = 70)
    plt.ylabel("Number of cities in the dataset\n")
    plt.title("Number of cities in the dataset, by country\n")

byCountry_Plot()

### Location of the cities of every country

Every time we run the function below, we can see a map showing all the cities of a random country of the dataset. If we want to choose a specific country, we can write the name of the country instead of the np.random.choice function. It is also possible to zoom in or out of the map, and see the coordinates of every city by moving the cursor on the city.

In [None]:
def citiesByCountry():
    nation = np.random.choice(cities.Country.unique())
    byCountry = cities[cities.Country == nation]
    number = str(byCountry["City"].count())
    if number == "1":
        mapTitle = str("There is " + number + " city in " + nation )
    else:
        mapTitle = str("There are " + number + " cities in " + nation )
    fig = px.scatter_geo(byCountry, lat = byCountry["Latitude"], lon = byCountry["Longitude"], hover_name = byCountry["City"], color_discrete_sequence = ["darkred"])
    fig.update_geos(showocean = True, oceancolor = "LightBlue", fitbounds = "locations", showcountries = True, showland = True, landcolor = "LightGreen")
    fig.update_layout(title_text = mapTitle, title_x = 0.5)
    fig.show()

citiesByCountry()

### Location of major cities in the dataset

Below we can see a map showing the 100 main cities of the dataset.

In [None]:
def majorCitiesMap():
    geo_df = gpd.read_file(r"C:\Users\sangi\Desktop\Info progetto python\Datasets\majorCities.csv", index_col = 0)
    fig = px.scatter_geo(geo_df, lat = "Latitude", lon = "Longitude", hover_name = "City", hover_data = ["Country", "Latitude", "Longitude"])
    fig.update_geos(showocean = True, oceancolor = "grey")
    fig.update_layout(title_text = "List of major cities in the dataset\n", title_x = 0.5)
    fig.show()

majorCitiesMap()

### Change of cities' temperatures 

Now we want to see how a certain city's temperature has changed during the years. In the graphs below we can see the data of any city we want, in both January and August (I chose these two months as representative of winter and summer). If we want to see random cities' temperatures, we just run the function, and it will show a different city each time.

In [None]:
def tempJanAug(city_name):
    temp = tempByCity[tempByCity["City"] == city_name]
    country = str(temp["Country"].unique())
    tempJan = temp[temp["dt"].str.contains("5-01-01")]
    tempAug = temp[temp["dt"].str.contains("5-08-01")]
    fig, (ax1, ax2) = plt.subplots(2, 1)
    fig.suptitle("Temperatures in " + city_name + " (" + country + ") " + " during the years\n")
    ax1.plot(tempJan["dt"], tempJan["AverageTemperature"], color = "b")
    ax2.plot(tempAug["dt"], tempAug["AverageTemperature"], color = "r")
    ax1.set_title("1st of January")
    ax2.set_title("1st of August")
    ax1.set_xticks(tempJan.dt, tempJan.dt.str[:4], rotation=50)
    ax2.set_xticks(tempAug.dt, tempAug.dt.str[:4], rotation=50)
    fig.supylabel("Temperatures (°C)")
    plt.subplots_adjust(bottom=0.15,top=0.85, hspace=0.8)
    plt.show()

#tempJanAug("alexandria".capitalize())
tempJanAug(np.random.choice(cities.City))

Here below instead we can see how the temperatures during the year of a random city have changed in 2012, compared to 1900.

In [None]:
months = {"01": 'January', '02': 'February', '03': 'March', '04': 'April', '05': 'May', '06': 'June', '07': 'July', '08': 'August', '09': 'September', '10': 'October', '11': 'November', '12': 'December'}

def tempMonths(city_name):
    temp = tempByCity[tempByCity["City"] == city_name]
    country = str(temp["Country"].unique())
    t1900 = temp[temp["dt"].str.contains("1900")]
    t2012 = temp[temp["dt"].str.contains("2012")]
    #plt.xticks(t2012.dt.str[-5:-3], months.values())
    plt.title("Temperatures in " + city_name + " (" + country + ") " + " in 1900 and 2012\n")
    plt.ylabel("Temperatures (°C)\n")
    plt.plot(t1900["dt"], t1900["AverageTemperature"], label = "1900")
    plt.plot(t2012["dt"], t2012["AverageTemperature"], label = "2012")
    plt.legend()
    plt.show()
    
tempMonths(np.random.choice(cities["City"]))    

### Temperatures from around the world

In the map below we can see the temperatures of major cities around the world. Every time we run the code, the temperatures of a random month in a random year are displayed. The color of the bubble represents the temperature.

In [None]:
def bubbleMap(year_month):
    titleText = "Average temperature in " + months[year_month[-2:]] + " " + year_month[:4]
    tempMonthYear = tempByMajorCity[tempByMajorCity["dt"] == year_month + "-01"]
    for index, row in tempMonthYear.iterrows():
        bubbleScale = (tempMonthYear["AverageTemperature"] + 30)
    fig = px.scatter_geo(tempMonthYear, lat = majorCities["Latitude"], lon = majorCities["Longitude"],  size = bubbleScale, hover_name = tempMonthYear["City"], color = tempMonthYear["AverageTemperature"], color_continuous_scale = px.colors.sequential.Hot_r,hover_data = ["Country", "AverageTemperature"])
    fig.update_geos(showocean = True, oceancolor = "Lightblue", fitbounds = "locations")
    fig.update_layout(title_text = titleText, title_x = 0.47)
    fig.show()

y = str(np.random.randint(1891, 2013))
m = str(np.random.choice(["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]))
date = str(y + "-" + m)

bubbleMap(date)

Here instead we can see some stats about random countries around the globe. Again, every time we run the code, a random country will be displayed.

In [None]:
def countryStats():
    nation = np.random.choice(cities.Country)
    byNation = tempByCity[tempByCity.Country == nation]
    print(("Here is some stats about " + nation + "\n").upper())
    first = str(byNation.dt.iloc[0])
    latest = str(byNation.dt.iloc[-1])
    maxTemp = str(round(max(byNation.AverageTemperature), 2))
    minTemp = str(round(min(byNation.AverageTemperature), 2))
    highest = str(byNation.sort_values(by=["AverageTemperature"], ascending= False).City.iloc[0])
    lowest = str(byNation.sort_values(by=["AverageTemperature"], ascending = True).City.iloc[0])
    print("First recorded temperature: " + months[first[-5:-3]] + " " + first[:4])
    print("Latest recorded temperature: " + months[latest[-5:-3]] + " " + latest[:4])
    print("Highest monthly average temperature recorded: " + maxTemp + "°C" + " in " + highest)
    print("Lowest monthly average temperature recorded: " + minTemp + "°C" + " in " + lowest)
    
countryStats()