![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=line-graph-holiday-lights/line-graph-holiday-lights.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto’s Weekly Data Visualization

## Weekly Title

### Recommended grade level: 6-12

### Instructions:

#### Step 1 (your only step): “Run” the cells to see the graphs

Click “Cell” and select “Run All.” This will import the data and run all the code so you can see this week's data visualization (scroll to the top after you’ve run the cells). **You don’t need to do any coding**.

### About The Notebook:

Callysto's Weekly Data Visualization is a learning resource that helps Grades 5-12 teachers and students grow and develop data literacy skills. We do this by providing a data visualization, like a graph, and asking teachers and students to interpret it. This companion resource walks learners through how the data visualization is created and interpreted using the data science process. The steps of this process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer our question. This includes creating visualizations. 
5. Interpret - Explain how the evidence answers our question. 
6. Communicate - Reflect on the interpretation. 

## 1. Question
 
**How common is the use of decorative LED lights during the holidays?**
 
Light Emitting Diode (LED) lights are recognized as being [more energy efficient](https://www.nrcan.gc.ca/energy/products/reference/15476) than incandescent bulbs because they emit less heat. This can make LEDs preferrable for household lighting. However, how common do you think it is for households to use LED holiday lights?

## 2. Gather

The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
%pip install -q pyodide_http plotly nbformat
import pyodide_http
pyodide_http.patch_all()
import os
import pandas as pd
import plotly.express as px

This code will read in a comma separated values (csv) file containing Statistics Canada (Stats Can) [data](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3810004801) about the use of holiday LEDs. This data was found as part of the [Christmas...by the numbers](https://www.statcan.gc.ca/eng/dai/smr08/2017/smr08_222_2017) data collection by Stats Can.

In [None]:
light_path = os.path.join('datasets', 'light_data.csv')
#read the csv file in and save it as a pandas dataframe
df = pd.read_csv(light_path)

This final code cell in the "gather" section will let us take a final look at our data.

In [None]:
#display the column names and first few rows of data
print(df.columns)
display(df.head())

## 3. Organize
 
The code below will cleanly arrange the data so that we can analyze it. This is a quality control step for our data and involves examining the data to detect anything odd (e.g. structure, missing values) – from there, we fix the oddities and then check if our fixes worked.
 
Our first issue with this notebook is that it has more columns than we need. Our next code cell below selects the three columns, `REF_DATE` (the year), `GEO` (the geographic area) and `VALUE` (percentage of households in the year and geographic location that use holiday LED lights).

In [None]:
#remove all but the specified columns:
df = df[['REF_DATE', 'GEO', 'VALUE']]
# print the dataset:
df

Our next and final step will be renaming our columns so they are more descriptive.

In [None]:
#rename the columns:
df.columns=['year', 'area', '% households using holiday LEDs']
# print the first few rows of the dataset:
df.head()

## 4. Explore

The code below will help us look for evidence to answer our question. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations to represent our data.
First we will reshape our dataframe into a more readable form. We will reshape our current dataframe called `df` to create a new one called `df_small`. `df_small` will have one row for each province and one column for each year. `df_small` is a more readable table. We need `df` to help us create our plots.

In [None]:
df_small = df.pivot(index='area', columns='year', values='% households using holiday LEDs').reset_index()
df_small.columns.name = ""
df_small

Next we will plot our 4 data points for each province in a line graph in order to get a sense of the dataset as a whole.

In [None]:
fig = px.line(df[df['area'] != 'Canada'].rename(columns={'area': 'Province'}), 
              y='% households using holiday LEDs', 
              x='year', color='Province', 
              title='Use of LED lights during the holidays')
fig.update_traces(mode='lines+markers')
fig.show()

Click on a province name on the above plot's legend to remove it from the graph. Additionally, you can hover over a datapoint to see more information. 

## 5. Interpret

Below we will discuss the results of the data exploration. Here are some questions to think about, to help you interpret what you see.

- Where did the data come from? 
- How was the data gathered? 
- Describe what’s happening in the data visualization (graph). What do you notice (e.g. big or small values, or trends)? 
- How does the information we see answer our question?

Looking at the table and plot generated in step 4, we see that between 2011 and 2017 the use of holiday LED lights tended to increase, but not in every province. We also see that most provinces had a drop between 2015 and 2017.

Over this time period the province with the greatest uptake in LED lights was Alberta. Alberta went from 29% to 42% use. That's a positive difference of 13%. The province with the lowest uptake was Nova Scotia. Nova Scotia went from 45% to 37%. That's a negative difference of 8%.

## 6. Communicate

Below we will reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have? These writing prompts can help you reflect.

- I used to think ____________________ but now I know ____________________. 
- I wish I knew more about ____________________. 
- This visualization reminds me of ____________________. 
- I really like ____________________.
- Why do you think Alberta had such an increase in the use of LED lights?
- Why do you think the use of LED lights in Nova Scotia decreased so much?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)