![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=january-temperatures/january-temperatures.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# January Temperatures

January 2023 seemed quite warm, but was it the warmest January?

We'll use data from the [Government of Canada Historical Climate Data](https://climate.weather.gc.ca/) site for the [Edmonton International Airport weather station](https://climate.weather.gc.ca/climate_data/daily_data_e.html?hlyRange=1999-02-17%7C2023-02-05&dlyRange=1999-01-01%7C2023-02-05&mlyRange=1999-01-01%7C2007-11-01&StationID=27793&Prov=AB&urlExtension=_e.html&searchType=stnName&optLimit=specDate&StartYear=1840&EndYear=2023&selRowPerPage=25&Line=1&searchMethod=contains&Month=2&Day=1&txtStationName=Edmonton&timeframe=2&Year=2023).

In [None]:
%pip install -q pyodide_http plotly nbformat folium
import pyodide_http
pyodide_http.patch_all()
import pandas as pd
import plotly.express as px

station = 27793
df = pd.DataFrame()

for year in range(1999, 2024):
    url = f'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID={station}&Year={year}&Month=1&Day=1&time=&timeframe=2&submit=Download+Data'
    yearly_data = pd.read_csv(url)
    yearly_data = yearly_data.dropna(subset=['Max Temp (°C)', 'Mean Temp (°C)', 'Min Temp (°C)'])
    df = pd.concat([df, yearly_data])

temperature_columns = ['Max Temp (°C)', 'Mean Temp (°C)', 'Min Temp (°C)']
january_temperatures = df[df['Month'] == 1]
temperatures = january_temperatures.groupby('Year')[temperature_columns].mean().reset_index()
fig = px.line(temperatures, x='Year', y=temperature_columns, title='January Temperatures in Edmonton',
              color_discrete_sequence=['red', 'green', 'blue'])
fig.update_layout(yaxis_title='Temperature (°C)')

for col in temperature_columns:
    fig.add_hline(temperatures[col].max(), line_dash='dash', line_color=fig.data[0].line.color)

fig.show()

So it looks like it January 2001 was the warmest January at the Edmonton International Airport, but we can also look at other weather stations or other months.

First, let's get a list of weather stations, and display a map of the ones that are still reporting daily weather data.

In [None]:
import folium
from folium.plugins import MarkerCluster
stations = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/Mathematics/StatisticsProject/AccessingData/stations.csv')
latitude = stations['Latitude'].mean()
longitude = stations['Longitude'].mean()
station_map = folium.Map(location=[latitude,longitude], zoom_start=3)
marker_cluster = MarkerCluster()
for row in stations[stations['DLY Last Year']>2022].itertuples():
    marker_cluster.add_child(folium.Marker(location=[row.Latitude,row.Longitude], popup=row.Name+' '+str(row._4)))
station_map.add_child(marker_cluster)
station_map

We can now click on a marker to find its `Station ID`, and then put that value in the code cell below on the first line where it says `station = `

We've chosen the Fredericton Canadian Defence Academy (`station = 30309`), but you may want to choose one near where you live.

In [None]:
station = 30309
month = 1

start_year = int(stations[stations['Station ID']==station]['DLY First Year'].values[0])
end_year = int(stations[stations['Station ID']==station]['DLY Last Year'].values[0])
station_name = stations[stations['Station ID']==station]['Name'].values[0]
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
print('Getting data for', station_name, 'from', start_year, 'to', end_year, 'for', months[month-1])
df = pd.DataFrame()

for year in range(start_year, end_year+1):
    url = f'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID={station}&Year={year}&Month={month}&Day=1&time=&timeframe=2&submit=Download+Data'
    yearly_data = pd.read_csv(url)
    yearly_data = yearly_data.dropna(subset=['Max Temp (°C)', 'Mean Temp (°C)', 'Min Temp (°C)'])
    df = pd.concat([df, yearly_data])

temperature_columns = ['Max Temp (°C)', 'Mean Temp (°C)', 'Min Temp (°C)']
temperatures = df.groupby('Year')[temperature_columns].mean().reset_index()

fig = px.line(temperatures, y=['Max Temp (°C)','Mean Temp (°C)','Min Temp (°C)'], color_discrete_sequence=['red', 'green', 'blue'], title='Average Temperatures for '+station_name+' in '+months[month-1])
fig.update_layout(yaxis_title='Temperature (°C)')
fig.add_hline(temperatures['Max Temp (°C)'].max(), line_dash='dash', line_color='red')
fig.add_hline(temperatures['Mean Temp (°C)'].max(), line_dash='dash', line_color='green')
fig.add_hline(temperatures['Min Temp (°C)'].max(), line_dash='dash', line_color='blue')
fig.show()

It looks like in Fredericton the temperatures in January 2023 were close to the warmest in the last twenty years. Was it the same where you are?

## Comparing Locations

We can also compare two locations with the following code cell.

In [None]:
station1 = 27793
station2 = 49568
month = 1

start_year = max([int(stations[stations['Station ID']==station1]['DLY First Year'].values[0]), int(stations[stations['Station ID']==station2]['DLY First Year'].values[0])])
end_year = min([int(stations[stations['Station ID']==station1]['DLY Last Year'].values[0]), int(stations[stations['Station ID']==station2]['DLY Last Year'].values[0])])
station_name1 = stations[stations['Station ID']==station1]['Name'].values[0]
station_name2 = stations[stations['Station ID']==station2]['Name'].values[0]
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
print('Getting data for', station_name1, 'and', station_name2, 'from', start_year, 'to', end_year, 'for', months[month-1])
df = pd.DataFrame()
for year in range(start_year, end_year+1):
    url = 'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID='+str(station1)+'&Year='+str(year)+'&Month='+str(month)+'&Day=1&time=&timeframe=2&submit=Download+Data'
    df = pd.concat([df, pd.read_csv(url)])
    url = 'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID='+str(station2)+'&Year='+str(year)+'&Month='+str(month)+'&Day=1&time=&timeframe=2&submit=Download+Data'
    df = pd.concat([df, pd.read_csv(url)])
df

Now that we have data for those two locations, we can create a comparison visualization.

In [None]:
month = 1

temperatures = df[df['Month'] == month].groupby(['Station Name', 'Year'])[['Mean Temp (°C)']].mean().reset_index()
temperatures['Mean Temp (°C)'] = pd.to_numeric(temperatures['Mean Temp (°C)'], errors='coerce')
temperatures = temperatures.dropna(subset=['Mean Temp (°C)'])

temperatures = temperatures.pivot(index='Year', columns='Station Name', values='Mean Temp (°C)')
px.line(temperatures, title='Average '+months[month-1]+' Temperatures at '+station_name1+' and '+station_name2).update_layout(yaxis_title='Mean Temperature (°C)').show()
temperatures

What are some similarities and differences that you see?

How is the weather where you live?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)