![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto's Weekly Data Visualization

## Rental Prices in Canada

### Recommended Grade Levels 8 - 12

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

## About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist.

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need.
3. Organize - Arrange the data, so that you can easily explore it.
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations.
5. Interpret - Describe what's happening in the data visualization.
6. Communicate - Explain how the evidence answers the question.


## Question 

How does rental prices in Canada compare based on province and year?




## Gather

### Code 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data. 

Description of Libraries: 

*`Pandas` is a library that helps us with data analysis

*`plotly express and plotly go` are libraries that helps us to make visualizations

*`sklearn` library to do some machine learning

Without importing these libraries we would have to use much more code to analyze our data and generate visualizations. We import the libraries with abbreviations, or aliases, so that we have less typing to do in each line of our code below.


In [1]:
import pandas as pd
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from plotly.subplots import make_subplots
import plotly.graph_objects as go

import warnings
warnings.filterwarnings("ignore")


### Data

#### Import the Data

We are using data from the [Government of Canada](https://open.canada.ca/data/en/dataset/18b0c898-393f-4465-bb2a-31c922ad4d86). This dataset contains information of average rent for areas with a population of over **10,000 from 1987-2022**. 

`▶Run` the cell below to import the data. 

In [5]:
dataset = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/canada-rent/canada-rent.csv')

dataset

Unnamed: 0,REF_DATE,GEO,DGUID,Type of structure,Type of unit,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
0,1987,"Bay Roberts, Newfoundland and Labrador",2011S0504005,Row and apartment structures of three units an...,Bachelor units,Dollars,81,units,0,v42135513,192.3.1,,..,,,0
1,1987,"Bay Roberts, Newfoundland and Labrador",2011S0504005,Row and apartment structures of three units an...,One bedroom units,Dollars,81,units,0,v42135529,192.3.2,,..,,,0
2,1987,"Bay Roberts, Newfoundland and Labrador",2011S0504005,Row and apartment structures of three units an...,Two bedroom units,Dollars,81,units,0,v42135545,192.3.3,,..,,,0
3,1987,"Bay Roberts, Newfoundland and Labrador",2011S0504005,Row and apartment structures of three units an...,Three bedroom units,Dollars,81,units,0,v42135561,192.3.4,,..,,,0
4,1987,"Bay Roberts, Newfoundland and Labrador",2011S0504005,Row structures of three units and over,Bachelor units,Dollars,81,units,0,v42135577,192.2.1,,..,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
122355,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of three units and over,Three bedroom units,Dollars,81,units,0,v3824416,188.1.4,2133.0,,,,0
122356,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of six units and over,Bachelor units,Dollars,81,units,0,v3824602,188.4.1,1279.0,,,,0
122357,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of six units and over,One bedroom units,Dollars,81,units,0,v3824790,188.4.2,1563.0,,,,0
122358,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of six units and over,Two bedroom units,Dollars,81,units,0,v3824978,188.4.3,1820.0,,,,0


#### Cleaning the Data

Next we want to clean up the data so that we have information that is actually valuable for us. 

The cleanup we'll be doing includes:
- `Renaming columns` to give them a more meaningful name
- `Creating new columns` based on the values other columns
- `Removing rows` that don't contain a value

`▶Run` the cell below to import the data.

In [6]:
#dictionaries that we'll use to create columns which can be used for machine learning
prov_codes = {'Alberta': 403,'Saskatchewan':639,'Saskatchewan/Alberta':403,'Quebec':514,'Ontario':249,'Ontario/Quebec':249,'Manitoba':204,'New Brunswick':506,'New Brunswick/Quebec':506,'Newfoundland and Labrador':709,'Northwest Territories':867,'Prince Edward Island':902,'Nova Scotia':902,'British Columbia':778}
Rooms = {'Bachelor units':0,'One bedroom units':1,'Two bedroom units':2,'Three bedroom units':3}

dataset = dataset[dataset['VALUE'].notna()]
dataset.rename(columns={'REF_DATE':'YEAR','VALUE':'MONTHLY RENT'},inplace=True)
dataset['PROVINCE'] = dataset['GEO'].map(lambda x: x.split(', ')[-1])
dataset['PROVINCE'].replace('Saskachewan/Alberta','Saskatchewan/Alberta',inplace=True)
dataset['PROVINCE CODE'] = dataset['PROVINCE'].map(lambda x: prov_codes[x])
dataset['# of Rooms'] = dataset['Type of unit'].map(lambda x:Rooms[x])

dataset

Unnamed: 0,YEAR,GEO,DGUID,Type of structure,Type of unit,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,MONTHLY RENT,STATUS,SYMBOL,TERMINATED,DECIMALS,PROVINCE,PROVINCE CODE,# of Rooms
34,1987,"Corner Brook, Newfoundland and Labrador",2011S0504015,Apartment structures of six units and over,Two bedroom units,Dollars,81,units,0,v3824924,2.4.3,480.0,,,,0,Newfoundland and Labrador,709,2
49,1987,"Gander, Newfoundland and Labrador",2011A00051006009,Apartment structures of six units and over,One bedroom units,Dollars,81,units,0,v3824688,3.4.2,370.0,,,,0,Newfoundland and Labrador,709,1
50,1987,"Gander, Newfoundland and Labrador",2011A00051006009,Apartment structures of six units and over,Two bedroom units,Dollars,81,units,0,v3824876,3.4.3,414.0,,,,0,Newfoundland and Labrador,709,2
51,1987,"Gander, Newfoundland and Labrador",2011A00051006009,Apartment structures of six units and over,Three bedroom units,Dollars,81,units,0,v3825064,3.4.4,414.0,,,,0,Newfoundland and Labrador,709,3
69,1987,"Labrador City, Newfoundland and Labrador",2011A00051010032,Apartment structures of six units and over,One bedroom units,Dollars,81,units,0,v3824749,5.4.2,254.0,,,t,0,Newfoundland and Labrador,709,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
122355,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of three units and over,Three bedroom units,Dollars,81,units,0,v3824416,188.1.4,2133.0,,,,0,Northwest Territories,867,3
122356,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of six units and over,Bachelor units,Dollars,81,units,0,v3824602,188.4.1,1279.0,,,,0,Northwest Territories,867,0
122357,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of six units and over,One bedroom units,Dollars,81,units,0,v3824790,188.4.2,1563.0,,,,0,Northwest Territories,867,1
122358,2022,"Yellowknife, Northwest Territories",2011S0504995,Apartment structures of six units and over,Two bedroom units,Dollars,81,units,0,v3824978,188.4.3,1820.0,,,,0,Northwest Territories,867,2


We will also group the data based on province to help us better answer the question we defined in the beginning of the notebook.

In [None]:
grouped = dataset.groupby(['PROVINCE','YEAR','Type of unit'])[['MONTHLY RENT','PROVINCE CODE','# of Rooms']].mean().reset_index()
grouped

## Explore

The first visualization we made is a line graph on the changes in rent price over the years for a specific province. Run the code below to generate this visualization. You can also change the province we are exploring by changing the line specified below.

In [None]:
province = 'Alberta' #change this to any province you'd like

province_grouped = grouped.loc[grouped['PROVINCE'] == province]
px.line(province_grouped,x='YEAR',y='MONTHLY RENT',color='Type of unit', title='Average Monthly Rent in ' + province + ' over Time')

We can also look at how the overall rent of Canada looks like per year. We can create a bar graph where the x axis would be the year and the y axis would be the overall average rent price. To have this visualization provide even more information, we can create a stacked bar graph where each color corresponds to a province. We can double-click on a specific province to how the rent prices have changed for that province.

In [None]:
average_per_province = grouped.groupby(['PROVINCE','YEAR'])['MONTHLY RENT'].mean().reset_index()
px.bar(average_per_province,x='YEAR',y='MONTHLY RENT',color='PROVINCE',title='Average Monthly Rent by Year')

Let's take the previous visualization one step further. We'll now add the ability to animate through the years and split up the rent prices on the type of unit.

In [None]:
px.bar(grouped,x='PROVINCE',y='MONTHLY RENT',color='Type of unit',barmode='group',animation_frame='YEAR',title='Average Monthly Rent Animation')

The next visualization is another bar graph, but only looks at a specific year. You can change the year being explored by changing the first line in code below.

In [None]:
specific_year = 2022 # you can change this number to look at a specific year

year_info = grouped.loc[grouped['YEAR'] == specific_year]
year_info.sort_values('MONTHLY RENT',ascending=False,inplace=True)
px.bar(year_info,x='PROVINCE',y='MONTHLY RENT',color='Type of unit',barmode='group',title='Average Monthly Rent in ' + str(specific_year) + ' for each Canadian Province')

Now let's try and make some predictions on rent prices using machine learning! First we want to identify what we want to predict [Monthly Rent] so we set that as our target. The next step is to figure out what exactly affects the rent prices. In our model below, we will use `Year`, `Province Code` and `# of Rooms` since they seem to have a direct effect on how much rent would be.

We will use a LinearRegression model and make a prediction based on a line of best fit. Then we'll look at the predictions and true values for a specific province we're interested (which you can change to any province/territory) and create a dataframe containing all the information we'll need to plot our prediction.

In [None]:
province_of_interest = 'Ontario' #change this to any province you're interested in

target = grouped['MONTHLY RENT']
features = grouped[['YEAR','PROVINCE CODE','# of Rooms']]

X_train, X_test, Y_train, Y_test = train_test_split(features,target,test_size=0.33,random_state=42)

model = LinearRegression().fit(X_train,Y_train)
y_pred = model.predict(X_test)

X_test['Y_test'] = Y_test
X_test['Y_pred'] = y_pred

province_code = prov_codes[province_of_interest]

X_test = X_test[X_test['PROVINCE CODE'] == province_code]
X_test


Finally, let's plot line graphs to compare our predictions with the real rent value for each unit type. The **solid blue** line is the actual rent prices while the **dashed red** line is the prediction that our machine learning model gave us

In [None]:
X_test.sort_values(['YEAR','# of Rooms'],inplace=True)
fig = make_subplots(rows=4,cols=1,shared_xaxes=True,subplot_titles=('Bachelor Units','One Bedroom Units','Two Bedroom Units','Three Bedroom Units'))

for i in range(4):
    fig.append_trace(go.Scatter(x=X_test[X_test['# of Rooms'] == i]['YEAR'], y=X_test[X_test['# of Rooms'] == i]['Y_test'], mode='lines', line=dict(color='royalblue', width= 3)),row=i+1,col=1)
    fig.append_trace(go.Scatter(x=X_test[X_test['# of Rooms'] == i]['YEAR'], y=X_test[X_test['# of Rooms'] == i]['Y_pred'], mode='lines', line=dict(color='firebrick', width= 4,dash='dash')),row=i+1,col=1)

fig.update_layout(height=1000,width=1000,title_text='Prediction vs Actual Rent in ' + province_of_interest,showlegend=False)
fig.show()


## Interpret

## Reflect on What You See

After making your visualization the next step is to use the data and your visualization to answer the question. Look at and interact with the visualization above. When you hover your mouse over the plots, you’ll notice more information appears. You can also use the legend to make plots appear and disappear.

Think about the following questions.

What do you notice about these graphs?
What do you wonder about the data?
What kind of inferences can you make based on this data?
Is there another way to visualize this data that would change your inerpretation of the information?

Did some years seem to have more tornados than other years?
Did some times of year have more tornados than other times of year?
Were some parts of the country more likely to have tornaods than others?

Use the fill-in-the-blank prompts to summarize your thoughts.
"I used to think _______"
"Now I think _______"
"I wish I knew more about _______"
"These data visualizations remind me of _______"
"I really like _______"

## Communicate 

How can you communicate that information? What kind of product could you create to share that information with your school community and wider community?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)