# Project 3: Temperatures Dashboard

In this project, we will analyze a dataset with temperatures from 10 cities around the world extracting some interesting insights and developing two charts about them. We will be using once more Pandas and Matplotlib, but this time we will convert a column into timestamp date type, so we will be able to do some time series analysis and plots.

Data extracted from: https://www.kaggle.com/datasets/sudalairajkumar/daily-temperature-of-major-cities (with some cleaning and modifications).


### Project Tasks:

- `3.1.` Load the dataset from the defined data_path and display the first 5 rows.

- `3.2.` Create a new column called `AvgTemperatureCelsius` that contains the temperature in Celsius degrees.

- `3.3.` How many different countries are there? Provide a list of them.

- `3.4.` What is the minimum and maximum timestamps?

- `3.5.` What is the global minimum and maximum temperature? Find the city and the date of each of them.

- `3.6.` For a given city and a range of dates (start and end):
  - Make a line plot with the temperature reads of that city during the selected time period, the x axis has to be the timestamp column.
  - Make a histogram of the temperature reads of that city during the selected time period.
  - Make sure that all plots include a title, axis labels and a legend.

- `3.7.` Now repeat the previous question but for a list of cities instead of a single one:
  - Make a line plot with the temperature reads of the cities in the list, for the selected time period, every city has to be a different line with a different color, the x axis has to be the timestamp column.
  - Make a histogram of the temperature reads of a list of selected cities, for the selected time period, every city has to be its own distribution with a different color.
  - Make sure that all plots include a title, axis labels and a legend.


In [15]:
import pandas as pd
import matplotlib.pyplot as plt

In [16]:
# Ex 3.1: Load the dataset from the defined data_path and display the first 5 rows.

data_path = "../data/cities_temperatures.csv"

temps_df = pd.read_csv(data_path)  # TODO

temps_df.head()

Unnamed: 0,Country,City,AvgTemperatureFahrenheit,Date,Month,Year
0,Argentina,Buenos Aires,79.5,2000-01-01,1,2000
1,Argentina,Buenos Aires,78.8,2000-01-02,1,2000
2,Argentina,Buenos Aires,74.3,2000-01-03,1,2000
3,Argentina,Buenos Aires,79.0,2000-01-04,1,2000
4,Argentina,Buenos Aires,77.1,2000-01-05,1,2000


In [17]:
# Converting the date column to datetime date format in order to be able to analyze better the time series and plot it

temps_df["Date"] = pd.to_datetime(temps_df["Date"]).dt.date

In [25]:
# Ex 3.2: Create a new column called `AvgTemperatureCelsius` that contains the temperature in Celsius degrees.

# used round function to only show 1 decimal after .
temps_df["AvgTemperatureCelsius"] = round((temps_df["AvgTemperatureFahrenheit"] - 32) * 5 / 9,1)


# TODO: uncomment this line to complete it

temps_df

Unnamed: 0,Country,City,AvgTemperatureFahrenheit,Date,Month,Year,AvgTemperatureCelsius
0,Argentina,Buenos Aires,79.5,2000-01-01,1,2000,26.4
1,Argentina,Buenos Aires,78.8,2000-01-02,1,2000,26.0
2,Argentina,Buenos Aires,74.3,2000-01-03,1,2000,23.5
3,Argentina,Buenos Aires,79.0,2000-01-04,1,2000,26.1
4,Argentina,Buenos Aires,77.1,2000-01-05,1,2000,25.1
...,...,...,...,...,...,...,...
72727,US,Washington,45.7,2019-12-27,12,2019,7.6
72728,US,Washington,49.6,2019-12-28,12,2019,9.8
72729,US,Washington,48.9,2019-12-29,12,2019,9.4
72730,US,Washington,55.0,2019-12-30,12,2019,12.8


In [42]:
# Ex 3.3: How many different cities are there? Provide a list of them.

unique_cities_list = temps_df["City"].unique()
num_unique_cities = len(unique_cities_list)

unique_countries_list = temps_df["Country"].unique()
num_unique_countries = len(unique_countries_list)

# TODO: this should be a list of unique countries / i also added cities on top

print(f"There are {num_unique_cities} different cities. Here is the list:")
print(unique_cities_list)

print(f"There are {num_unique_countries} different countries. Here is the list:")
print(unique_countries_list)

# TODO: print a message with the number of unique countries and the list of them
# i also added the unique cities before

There are 10 different cities. Here is the list:
['Buenos Aires' 'Canberra' 'Bogota' 'Cairo' 'Munich' 'Calcutta' 'Tokyo'
 'Dakar' 'Capetown' 'Washington']
There are 10 different countries. Here is the list:
['Argentina' 'Australia' 'Colombia' 'Egypt' 'Germany' 'India' 'Japan'
 'Senegal' 'South Africa' 'US']


In [None]:
# Ex 3.4: What are the minimum and maximum dates?

min_date = None  # TODO
max_date = None  # TODO

# TODO: print a message with the min and max dates

In [None]:
# Ex 3.5: What are the global minimum and maximum temperatures? Find the city and the date of each of them.

min_temp = None  # TODO
max_temp = None  # TODO

min_temp_city = None  # TODO
min_temp_date = None  # TODO

max_temp_city = None  # TODO
max_temp_date = None  # TODO

# TODO: print a message with the min temperature, its city and date, and then another message with the max temperature, its city and date

In [None]:
# Ex 3.6: For a given city and a range of dates (start and end):
#   - Make a line plot with the temperature reads of that city during the selected time period, the x axis has to be the timestamp column.
#   - Make a histogram of the temperature reads of that city during the selected time period.
#   - Make sure that all plots include a title, axis labels and a legend.

city = "Munich"
start_date = pd.to_datetime("2008-01-01").date()
end_date = pd.to_datetime("2010-12-31").date()

city_df = None          # TODO: get a dataframe with the rows of the selected city

city_df_period = None   # TODO: get a dataframe with the rows of the selected city and the selected period of time using the Date column and any of the <, >, <=, >= operators to compare with start_date and end_date

plt.figure(figsize=(10, 5))

# TODO: Uncomment and complete the following lines to plot the line plot using the city_df_period AvgTemperatureCelsius column as the y axis and the Date column as the x axis

# plt.plot()    # TODO
# plt.title()   # TODO
# plt.xlabel()  # TODO
# plt.ylabel()  # TODO
plt.legend()

plt.show()


In [None]:
# TODO: Build the histogram plot using the city_df_period AvgTemperatureCelsius column as the data to plot

plt.figure(figsize=(10, 5))

# plt.hist()    # TODO: use the city_df_period AvgTemperatureCelsius column as the data to plot, you can use the parameter bins=20
# plt.title()   # TODO
# plt.xlabel()  # TODO
# plt.ylabel()  # TODO

plt.show()

In [None]:
# Ex 3.7: Now repeat the previous question but for a list of cities:
#   - Make a line plot with the temperature reads of the cities in the list, for the selected time period, every city has to be a different line with a different color, the x axis has to be the timestamp column.
#   - Make a histogram of the temperature reads of a list of selected cities, for the selected time period, every city has to be its own distribution with a different color.
#   - Make sure that all plots include a title, axis labels and a legend.

selected_cities = ["Munich", "Buenos Aires", "Tokyo"]
start_date = pd.to_datetime("2008-01-01").date()
end_date = pd.to_datetime("2010-12-31").date()


plt.figure(figsize=(15, 5))

# TODO: Uncomment and complete the following lines to plot the line plot using the city_df_period AvgTemperatureCelsius column as the y axis and the Date column as the x axis

# for city in selected_cities:
#     city_df = None            # TODO: get a dataframe with the rows of the selected city
#     city_df_period = None     # TODO: get a dataframe with the rows of the selected city and the selected period of time using the Date column and any of the <, >, <=, >= operators to compare with start_date and end_date
#     plt.plot()                # TODO plot each city line and use the label parameter to set the legend name for each city

# plt.title()   # TODO
# plt.xlabel()  # TODO
# plt.ylabel()  # TODO

plt.legend()

plt.show()

In [None]:
# TODO: Build the histogram plot for the selected cities using the city_df_period AvgTemperatureCelsius column as the data to plot for each one

plt.figure(figsize=(15, 5))

# for city in selected_cities:
#     city_df = None            # TODO: get a dataframe with the rows of the selected city
#     city_df_period = None     # TODO: get a dataframe with the rows of the selected city and the selected period of time using the Date column and any of the <, >, <=, >= operators to compare with start_date and end_date
#     plt.hist()                    # TODO: plot each city histogram in the same plot and use the label parameter to set the legend name for each city 

# plt.title()   # TODO
# plt.xlabel()  # TODO
# plt.ylabel()  # TODO

plt.legend()

plt.show()
