# Project 3: Temperatures Dashboard

In this project, we will analyze a dataset with temperatures from 10 cities around the world extracting some interesting insights and developing two charts about them. We will be using once more Pandas and Matplotlib, but this time we will convert a column into timestamp date type, so we will be able to do some time sereis analysis and plots.

Data extracted from: https://www.kaggle.com/datasets/sudalairajkumar/daily-temperature-of-major-cities (with some cleaning and modifications).


### Project Tasks:

- `3.1.` Load the dataset from the defined data_path and display the first 5 rows.

- `3.2.` Create a new column called `AvgTemperatureCelsius` that contains the temperature in Celsius degrees.

- `3.3.` How many different countries are there? Provide a list of them.

- `3.4.` What is the minimum and maximum timestamps?

- `3.5.` What is the global minimum and maximum temperature? Find the city and the date of each of them.

- `3.6.` For a given city and a range of dates (start and end):
  - Make a line plot with the temperature reads of that city during the selected time period, the x axis has to be the timestamp column.
  - Make a histogram of the temperature reads of that city during the selected time period.
  - Make sure that all plots include a title, axis labels and a legend.

- `3.7.` Now repeat the previous question but for a list of cities instead of a single one:
  - Make a line plot with the temperature reads of the cities in the list, for the selected time period, every city has to be a different line with a different color, the x axis has to be the timestamp column.
  - Make a histogram of the temperature reads of a list of selected cities, for the selected time period, every city has to be its own distribution with a different color.
  - Make sure that all plots include a title, axis labels and a legend.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Ex 3.1: Load the dataset from the defined data_path and display the first 5 rows.

data_path = "../data/cities_temperatures.csv"

temps_df = pd.read_csv(data_path)

temps_df.head()

In [None]:
# Converting the date column to datetime date format in order to be able to analyze better the time series and plot it
temps_df["Date"] = pd.to_datetime(temps_df["Date"])

In [None]:
# Ex 3.2: Create a new column called `AvgTemperatureCelsius` that contains the temperature in Celsius degrees.

temps_df["AvgTemperatureCelsius"] = ((temps_df["AvgTemperatureFahrenheit"] - 32) * 5 / 9).round(1)

temps_df

In [None]:
# Ex 3.3: How many different cities are there? Provide a list of them.

unique_countries_list = unique_countries_list = temps_df["City"].unique().tolist()

# TODO: print a message with the number of unique countries and the list of them

print(f"There are {len(unique_countries_list)} unique countries in the dataset")

In [None]:
# Ex 3.4: What are the minimum and maximum dates?

min_date = temps_df["Date"].min()
max_date = temps_df["Date"].max()

# TODO: print a message with the min and max dates
print(f"The Date ranges from {min_date.date()} to {max_date.date()}.")

In [None]:
# Ex 3.5: What are the global minimum and maximum temperatures? Find the city and the date of each of them.
idx_min = temps_df["AvgTemperatureFahrenheit"].idxmin()
idx_max = temps_df["AvgTemperatureFahrenheit"].idxmax()

min_temp = min_temp = temps_df.loc[idx_min, "AvgTemperatureCelsius"]
max_temp = max_temp = temps_df.loc[idx_max, "AvgTemperatureCelsius"]

min_temp_city = min_temp_city = temps_df.loc[idx_min, "City"]
min_temp_date = min_temp_date = temps_df.loc[idx_min, "Date"]

max_temp_city = max_temp_city = temps_df.loc[idx_max, "City"]
max_temp_date = temps_df.loc[idx_max, "Date"]

# TODO: print a message with the min temperature, its city and date, and then another message with the max temperature, its city and date
print(f"The global minimum temperature was {min_temp}째F in {min_temp_city} on {min_temp_date.strftime('%Y-%m-%d')}.") #I didn't like the timestamp 00:00:00 in my output
print(f"The global maximum temperature was {max_temp}째F in {max_temp_city} on {max_temp_date.strftime('%Y-%m-%d')}.")

In [None]:
# Ex 3.6: For a given city and a range of dates (start and end):
#   - Make a line plot with the temperature reads of that city during the selected time period, the x axis has to be the timestamp column.
#   - Make a histogram of the temperature reads of that city during the selected time period.
#   - Make sure that all plots include a title, axis labels and a legend.

city = "Munich"
start_date = pd.to_datetime("2008-01-01").date()
end_date = pd.to_datetime("2010-12-31").date()

city_df = temps_df[temps_df["City"] == city]

city_df_period = city_df[(city_df["Date"].dt.date >= start_date) & (city_df["Date"].dt.date <= end_date)]

plt.figure(figsize=(10, 5))

# TODO: Uncomment and complete the following lines to plot the line plot using the city_df_period AvgTemperatureCelsius column as the y axis and the Date column as the x axis

plt.plot(city_df_period["Date"], city_df_period["AvgTemperatureCelsius"], label="Temp (째C)", color="blue")
plt.title(f"Temperature Trends in {city} from {start_date} to {end_date}")
plt.xlabel("Date")
plt.ylabel("Temperature (Celsius)")
plt.legend()
plt.grid(True)

plt.show()


In [None]:
# TODO: Build the histogram plot using the city_df_period AvgTemperatureCelsius column as the data to plot

plt.figure(figsize=(10, 5))

plt.hist(city_df_period["AvgTemperatureCelsius"], bins=20, label="Temp Distribution", color="yellow", edgecolor="black")
plt.title(f"Temperature Distribution in {city}")
plt.xlabel("Temperature (째C)")
plt.ylabel("Frequency")
plt.legend()

plt.show()

In [None]:
# Ex 3.7: Now repeat the previous question but for a list of cities:
#   - Make a line plot with the temperature reads of the cities in the list, for the selected time period, every city has to be a different line with a different color, the x axis has to be the timestamp column.
#   - Make a histogram of the temperature reads of a list of selected cities, for the selected time period, every city has to be its own distribution with a different color.
#   - Make sure that all plots include a title, axis labels and a legend.

selected_cities = ["Munich", "Buenos Aires", "Tokyo"]
start_date = pd.to_datetime("2008-01-01").date()
end_date = pd.to_datetime("2010-12-31").date()


plt.figure(figsize=(15, 5))

for city in selected_cities:
    # TODO: get a dataframe with the rows of the selected city
    city_df = temps_df[temps_df["City"] == city].copy() #I added copy to avoid having pandas warnings in my output
    
    # TODO: get a dataframe with the rows of the selected city and the selected period of time using the Date column and any of the <, >, <=, >= operators to compare with start_date and end_date
    city_df["Date"] = pd.to_datetime(city_df["Date"])
    city_df_period = city_df[(city_df["Date"].dt.date >= start_date) & (city_df["Date"].dt.date <= end_date)]

    # # TODO: Uncomment and complete the following lines to plot the line plot using the city_df_period AvgTemperatureCelsius column as the y axis and the Date column as the x axis
    city_df_period = city_df_period.sort_values("Date")

    # TODO plot each city line and use the label parameter to set the legend name for each city
    plt.plot(city_df_period["Date"], city_df_period["AvgTemperatureCelsius"], label=city)
           
plt.title(f"Temperature comparison overtime from {start_date} to {end_date})")
plt.xlabel("Time")
plt.ylabel("Temperature (Celsius)")

plt.legend()

plt.show()

In [None]:
# TODO: Build the histogram plot for the selected cities using the city_df_period AvgTemperatureCelsius column as the data to plot for each one

plt.figure(figsize=(15, 5))

for city in selected_cities:
    # TODO: get a dataframe with the rows of the selected city
    city_df = temps_df[temps_df["City"] == city]
    
    # TODO: get a dataframe with the rows of the selected city and the selected period of time using the Date column and any of the <, >, <=, >= operators to compare with start_date and end_date
    city_df_period = city_df[(city_df["Date"].dt.date >= start_date) & (city_df["Date"].dt.date <= end_date)]
    
    # TODO: plot each city histogram in the same plot and use the label parameter to set the legend name for each city 
    plt.hist(city_df_period["AvgTemperatureCelsius"],alpha=0.5, bins=20, label=city, edgecolor='black')                  

plt.title(f"Temperature Comparison overtime from {start_date} to {end_date}")
plt.xlabel("Temperature (Celsius)")
plt.ylabel("Days")

plt.legend()

plt.show()
