![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)
<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=climate-change-temperature/temperature.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto’s Weekly Data Visualization

## Climate Change Evidence - Temperatures

### Recommended grade level: 7-12

### Instructions:
#### “Run” the cells to see the graphs
Click “Cell” and select “Run All”.<br> This will import the data and run all the code, so you can see this week's data visualization (scroll to the top after you’ve run the cells).<br> **You don’t need to do any coding**.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

### About The Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer our question. This includes creating visualizations. 
5. Interpret - Explain how the evidence answers our question. 
6. Communicate - Reflect on the interpretation. 

## 1. Question

Have you ever wondered whether you can see evidence of climate change? 

For instance, since the 1714 invention of the mercury thermometer (by Mr. Farenheit), people have been accurately recording temperature and weather data around the world. 

Can we use this data to see any trends in the recorded temperatures in various cities?

### Goal

Our goal is to show the temperature changes over approximately a century using climate data we can access online.

We will fit a "trend line", also known as a regression line, to quantify change, if any.

There are lots of places to find temperature data. We aim to automate the procedure, so we can easily access data for several cities.

Our focus will be on Canadian cities.

## 2. Gather

### Code:

The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
%pip install -r requirements.txt
import pyodide_http
pyodide_http.patch_all()

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

### Data:

There are several online sources for weather data, including the Environment Canada site:  https://climate.weather.gc.ca/historical_data/search_historic_data_e.html

We find it more convenient to access the data from **weatherstats.ca**, which is based on Environment and Climate Change Canada https://edmonton.weatherstats.ca/

This is a student-friendly web page. Here, we find the temperature data for several years and can simply cut-and-paste the data into a spreadsheet. It includes data for all the Canadian provincial and territorial capitals, plus Montreal, Toronto, and Vancouver.

The data can be downloaded as a spreadsheet or CSV (comma separated values) file, here:

https://edmonton.weatherstats.ca/download.html

We can select the **Climate Daily** item and set the number of rows to 80,000, in order to get several decades worth of data.

For your convenience, we have already downloaded the following files to accompany this notebook:
- weatherstats_edmonton_daily.csv
- weatherstats_fredericton_daily.csv
- weatherstats_toronto_daily.csv
- weatherstats_vancouver_daily.csv
- weatherstats_yellowknife_daily.csv

### Importing the data

This next line of code will read the data file and save it in a dataframe named **df**.

In [None]:
## import data
df = pd.read_csv('https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/climate-change-temperature/data/weatherstats_edmonton_daily.csv',low_memory=False)
df

### Comment on the data
We see in the above printout that there are over 50,000 rows of data going back to the year 1880, containing information about dates, temperatures, humidex, windchill and more. 

It is useful to quickly plot some of the data (1000 rows) just to see what the temperature data looks like. The following line of code plots about three years of temperature data. 

In [None]:
px.scatter(df[0:1000],x="date",y="avg_temperature",title="Edmonton daily temperature")

### Comments on the plot

Notice we see the seasonal variation in the temperature in Edmonton. In the summer months (July), the temperaure goes into the low 20 degrees (Celsius), while in the winter (January) the temperature drops to -20 degrees. 

## 3. Organize

The code below will arrange the data cleanly, so we can analyze it. This is a quality control step for our data and involves examining the data to detect anything odd with the data (e.g., structure and missing values), fixing the oddities, and checking if the fixes worked. 

First, we add a **relative day** column to indicate how many days have elapsed over the history of the data collection. This will be useful when doing the mathematics for data analysis, since numbers are more useful than text strings indicating a day/month/year. 

In [None]:
df['rel_day']= (pd.to_datetime(df['date'])-pd.to_datetime(df['date'][0])).dt.days

Second, we need to remove the rows that have no data in them. For instance, on some days a **NaN** (Not a Number) is recorded because no data was available. For instance, you will find missing data on dates during World War I and World War II, as well as at the earliest dates in the data file.

The following line identifies those rows with **NaN** and removes them.

In [None]:
df = df[df['avg_temperature'].isnull() == False]
df

### Comment on the data

We notice now there are only 51,151 rows in the dataframe, when we started with 51,338. So some rows have indeed been removed. 

## 4. Explore

The code below will be used to help us look for evidence to answer our question. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations to represent our data.

In this example, we compute a **best fit** trend line to the data. This line will be in the form
$$ y = mx + b$$
where $m$ is the slope of the line and $b$ is the y-intercept for the data. The information is returned to us in the form of the two variables m and b.

In [None]:
# data exploration
m,b = np.polyfit(df['rel_day'],df['avg_temperature'],1)
m,b

## 5. Interpret

Below we will discuss the results of the data exploration. 

- Describe what’s happening in the data visualization (graph). What do you notice (e.g. big or small values, or trends)? 
- How does our key evidence help answer our question?

### What does m and b tell us?

After all this work, we discovered **m** = 0.000050277 and **b** = 4.485. What does this mean?

First, **b** is just a temperature (**b** = 4.485 degrees), telling us the temperature on the trend line at the final day in the record. More interesting is **m**, the slope of the trend line. It is positive, which tells us the temperature is tending to increase. The exact value tells us how much the temperature is increasing **per day**, on average. 

It is more useful to change this to a number indicating how much the temperature is increasing, per century. We just multiply **m** by the number of days in a year (365.25) and by the number of years in a century (100). 

In [None]:
m*365.25*100

### In other words,

The temperature is rising by 1.836 degrees Celsius, per century. Based on the Edmonton data. 

## 5 a) Recap

Let's redo the calculation, in one code block.


In [None]:
## 5 lines of code to do it all
df = pd.read_csv('https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/climate-change-temperature/data/weatherstats_edmonton_daily.csv',low_memory=False)
df['rel_day']= (pd.to_datetime(df['date'])-pd.to_datetime(df['date'][0])).dt.days
df = df[df['avg_temperature'].isnull() == False]
m,b = np.polyfit(df['rel_day'],df['avg_temperature'],1)
print(m*365.25*100,'degrees Celsius rise per century, in Edmonton')

## 5 b) Many cities

Now that we know how to do the analysis for one city, let's do it for the five cities where we have data. We can do this by writing a loop in Python. 

We also compare the rise of the max and min temperatures, as well as the average temperature, since we have that data available. It will be a good check for consistency. 

To be comparable, we use the same length of time for each city. We also want to have the data start and end at the same month of the year, for a robust result. This avoids the problem of starting the data in a cold month, and ending in a hot month, which could skew the results. 

We will use 75 years of data, or 75x365.25 days to be exact.

In [None]:
for city in ['edmonton','toronto','vancouver','fredericton','yellowknife']:
    df = pd.read_csv('https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/climate-change-temperature/data/weatherstats_' + city + '_daily.csv',low_memory=False)
    df['rel_day']= (pd.to_datetime(df['date'])-pd.to_datetime(df['date'][0])).dt.days
    df = df[0:int(75*365.25)+1]  ## restrict to 75 years
    print(city, ' with ', int(df.shape[0]/365.25), ' years of data')
    for temp in ['max_temperature','min_temperature','avg_temperature','avg_hourly_temperature']:
        df = df[df[temp].isnull() == False]
        m,b = np.polyfit(df['rel_day'],df[temp],1)
        print(f'    Temperature increase per century {m*365.25*100:.2f} degrees C ({temp})')
 

## Comment on the results above

Notice all the cities are showing an increasing trend in temperature. Yellowknife is most extreme (3.5 degrees per century), while Fredericton is least (about 1.0 degrees per century).

Climate scientists worry when the temperature increase exceeds 2 degrees. 

## 5 c) Visualization

Let's plot the data and see how it looks compared to the trend line.

In [None]:
x_range = np.linspace(df['rel_day'].max(), df['rel_day'].min(), df.shape[0])
y_range = m*x_range + b

fig = px.scatter(df, x='date', y=temp, opacity=0.05)
fig.add_traces(go.Scatter(x=df['date'], y=y_range, name='Trend Line'))
fig.update_layout(title_text=city.capitalize()+f' daily temperature, increasing at {m*365.25*100:.2f} deg/century')
fig.show()

### Comment on the plot
We see the increasing trend over the 75 years of data (1955-2020), although the slope of the line is small.

Let's look at the data over five years only.

In [None]:
df = df[0:int(5*365.25)]

x_range = np.linspace(df['rel_day'].max(), df['rel_day'].min(), df.shape[0])
y_range = m*x_range + b

fig = px.scatter(df, x='date', y=temp, opacity=0.25, title=city+f' trend, {m*365.25*100:.2f} deg/century')
fig.add_traces(go.Scatter(x=df['date'], y=y_range, name='Trend Line'))
fig.update_layout(title_text=city.capitalize()+f' daily temperature, increasing at {m*365.25*100:.2f} deg/century')
fig.show()

### Comment on the plot

We clearly see the seasonal (up/down) variations in the temperature over five years. The trend line does go through the middle of the data. However, it is hard to see much of an increase in the trend, over such a short period.

## 5 d) A better visualization

With the seasonal variations, the temperatures goes up and down a lot. It can be hard to see that a trend line is meaningful.

Let's try another analysis and visualization. Instead of plotting daily temperature, let's plot the average **yearly** temperatures. We can find a trend line for these averages, and plot this. 

The dataframe code allows us to loop through each year of data and compute the average, or mean, temperature for that year. The code is as follows:

In [None]:
# first we load in the data for one city
city = 'yellowknife'
temp = 'avg_temperature'

df = pd.read_csv('https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/climate-change-temperature/data/weatherstats_' + city + '_daily.csv',low_memory=False)
df = df[df[temp].isnull() == False]

# Next we compute the mean (average) yearly temperature, for years 1946 to 2020
x=[]
y=[]
for year in range(1946,2021):
    x = x + [year]
    y = y + [df[df['date'].str.slice(0,4)==str(year)][temp].mean()]

# Finally we find the "trend line" and then plot the result


fig = px.scatter( x=x, y=y, opacity=1)
fig.update_xaxes(title_text='Year')
fig.update_yaxes(title_text='Annual Average Temp (Degs Celsius)')
fig.update_layout(title_text=city.capitalize()+f' daily temperatures')

try:
    m,b= np.polyfit(x,y,1)
    fig.add_traces(go.Scatter(x=x, y=m*np.array(x)+b, name='Trend Line'))
    fig.update_layout(title_text=city.capitalize()+f' annual temperature, increasing at {(m*100):.2f} deg/century')
except:
    print("Python could not calculate best fit line.")


fig.show()

## Comments on the graph above

- Notice the range of temperatures (on the y axis) is much smaller than in earlier graphs. Here, it ranges for -7 to -1 degrees Celsius.

- Notice we clearly see an increasing trend in the data points, which the trendline follows quite well. 

- Notice how the average annual temperatures at 1950 is around -6 degrees, while in the period 2010-2020, it is about -3.5 degrees. So we do see the increase in average temperature. 


## Customization, part one

Let's repear the above three plots, letting the code select any one of the cities, (Edmonton, Toronto, etc) and any one of the temperature readings (daily max temp, daily min temp,  daily average, etc); 

The code below lets you select which city to plot and what type of temperature measure. Easy to adjust this code.

In [None]:
city_list = ['edmonton','toronto','vancouver','fredericton','yellowknife']
temp_list = ['max_temperature','min_temperature','avg_temperature','avg_hourly_temperature']

city = city_list[0] ## your choice here (0,1,2,3 or 4)
temp = temp_list[2] ## your choice here (0,1,2, or 3)

df = pd.read_csv('https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/climate-change-temperature/data/weatherstats_' + city + '_daily.csv',low_memory=False)
df['rel_day']= (pd.to_datetime(df['date'])-pd.to_datetime(df['date'][0])).dt.days
df = df[0:int(75*365.25)+1]  ## restrict to 75 years
print(city, ' with ', int(df.shape[0]/365.25), ' years of data')
df = df[df[temp].isnull() == False]

fig = px.scatter(df, x='date', y=temp, opacity=0.05)
fig.update_xaxes(title_text='Year')
fig.update_yaxes(title_text=temp + ' (Degs Celsius)')
fig.update_layout(title_text=city.capitalize()+f' daily temperatures')

try:
    m,b = np.polyfit(df['rel_day'],df[temp],1)
    x_range = np.linspace(df['rel_day'].max(), df['rel_day'].min(), df.shape[0])
    y_range = m*x_range + b
    fig.add_traces(go.Scatter(x=df['date'], y=y_range, name='Trend Line'))
    fig.update_layout(title_text=city.capitalize()+f' daily temperatures, increasing at {m*365.25*100:.2f} deg/century')
except:
    print("Python could not calculate best fit line.")

fig.show()

## Customization, part two

Let's do as above, but with five years of data.

You can choose which city to plot and what type of temperature measure. Easy to adjust this code.

In [None]:
city_list = ['edmonton','toronto','vancouver','fredericton','yellowknife']
temp_list = ['max_temperature','min_temperature','avg_temperature','avg_hourly_temperature']

city = city_list[0] ## your choice here (0,1,2,3 or 4)
temp = temp_list[2] ## your choice here (0,1,2 or 3)

df = pd.read_csv('https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/climate-change-temperature/data/weatherstats_' + city + '_daily.csv',low_memory=False)
df['rel_day']= (pd.to_datetime(df['date'])-pd.to_datetime(df['date'][0])).dt.days
df = df[0:int(75*365.25)+1]  ## restrict to 75 years
print(city, ' with ', int(df.shape[0]/365.25), ' years of data')
df = df[df[temp].isnull() == False]
m,b = np.polyfit(df['rel_day'],df[temp],1)

## Restrict to 5 years of data
df = df[0:int(5*365.25)]

x_range = np.linspace(df['rel_day'].max(), df['rel_day'].min(), df.shape[0])
y_range = m*x_range + b

fig = px.scatter(df, x='date', y=temp, opacity=0.25)
fig.add_traces(go.Scatter(x=df['date'], y=y_range, name='Trend Line'))
fig.update_layout(title_text=city.capitalize()+f' daily temperatures, increasing at {m*365.25*100:.2f} deg/century')
fig.update_xaxes(title_text='Year')
fig.update_yaxes(title_text=temp + ' (Degs Celsius)')
fig.show()

## Customization, part three

Let's do the average yearly temperatures, for our choice of city and temperature readings. 

In [None]:
city_list = ['edmonton','toronto','vancouver','fredericton','yellowknife']
temp_list = ['max_temperature','min_temperature','avg_temperature','avg_hourly_temperature']

city = city_list[0] ## your choice here (0,1,2,3 or 4)
temp = temp_list[2] ## your choice here (0,1, or 2.  Number 3 might not work for all cities...)

df = pd.read_csv('https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/climate-change-temperature/data/weatherstats_' + city + '_daily.csv',low_memory=False)
df = df[df[temp].isnull() == False]

# Here we compute the mean (average) yearly temperature, for years 1946 to 2020
x=[]
y=[]
for year in range(1946,2021):
    x = x + [year]
    y = y + [df[df['date'].str.slice(0,4)==str(year)][temp].mean()]


fig = px.scatter( x=x, y=y, opacity=1)
fig.update_xaxes(title_text='Year')
fig.update_yaxes(title_text='Annual Average Temp (Degs Celsius)')
fig.update_layout(title_text=city.capitalize()+f' annual temperature')
    
try:   
    # here we find the "trend line" and then plot the result
    m,b= np.polyfit(x,y,1)
    fig.add_traces(go.Scatter(x=x, y=m*np.array(x)+b, name='Trend Line'))
    fig.update_layout(title_text=city.capitalize()+f' annual temperature, increasing at {(m*100):.2f} deg/century')
except:
    print("Python could not calculate line of best fit.")


fig.show()

## 6. Communicate


### Reflect on the interpretation

#### Cause and effect

What natural phenomena and human activities affect temperature increase?

How can temperature increase affect human activities and health?

#### Interrelationships

What connections exist between cities?

#### Ethics

How can personal and societal choices impact change?

How might temperature increase impact society or the economy?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)