![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto’s Weekly Data Visualization

### Recommended grade level: 5-12

### Instructions

**You don’t need to do any coding to view the visualizations**.
The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

## Question
How much time do Canadians spend playing video games and how does this change with demographics? We will use official Statistics Canada data to examine this question.


### Goal
Our goal is to create a series of graphs to observe how much time Canadians spend gaming, and how does the class data compare with them.

## Gather

The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
import plotly.express as px  # used to create interactive plots
import pandas as pd  # used to work with datasets
from datetime import datetime, date  # used to get the current date and time
import requests  # used for data collection

The code below creates lists data from [this 2010 StatsCan table](https://www150.statcan.gc.ca/n1/pub/89-647-x/2011001/tbl/tbl31-eng.htm). The same study was done more recently in 2015. However, the more recent time use survey did not ask about video games.

Our lists are as follows:

|  List Name             | List Purpose                                                                             |
|------------------------|------------------------------------------------------------------------------------------|
| categories             | holds names for the age catagories for our bar chart                                     |
| free_time              | holds number of minutes in "free time" activities for the average person on an average day |
| videogame_time_all     | holds number of minutes spent gaming for the average person on an average day            |
| videogame_time_players | holds number of minutes spent gaming for the average gamer on an average day             |

In [None]:
## import data
categories = ["15 to 24", "25 to 34", "35 to 44", "45 to 54", "55 to 64", "65 to 74", "75 and over"]
free_time = [5*60+57, 4*60+53, 4*60+6, 4*60+44, 5*60+55, 7*60+19, 7*60+34]
videogame_time_all = [27, 10, 4, 4, 6, 6, 4]
videogame_time_players = [2*60+44, 2*60+34, 109, 127, 118, 133, 2*60+32]

## Organize

Since our data is just 4 simple lists there is no need to organize it further.

## Explore-1

The code below will be used to help us look for evidence to answer our question. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations to represent our data.

In [None]:
fig = px.bar(x=videogame_time_all, y=categories,
             title="Average Number of Minutes Spent Playing Video Games Per Day",
             labels={'y':'Age of Canadians - Years', 'x':'Minutes Gaming on Average Day'}
            )
fig.show()

## Interpret-1

Our first figure shows 15-24 year olds spending a lot more time than their older counterparts playing computer games but with a small bump for Canadians in early retirement age (55-64 and 65-75).

## Explore-2

In [None]:
fig = px.bar(x=videogame_time_players, y=categories,
             title="Average Number of Minutes Spent Playing Video Games Per Day",
             labels={'y':'Age of Canadians Who Play Computer Games - Years', 'x':'Minutes Gaming on Average Day'}
            )
fig.show()

## Interpret-2

There is a subtle difference between the last set of data, the data for the first figure, and this figure's data. The first calculated averages using all respondents to the census survey. This second figure just includes those who do actually play some computer games. Essentially, this second plot ignores any respondents who game zero hours on the average day. 

We see a very different plot for this second figure.  This figure is decidedly U-shaped. Those Canadians outside of working age seem to game the most.

## Explore-3

In [None]:
fig = px.bar(x=free_time, y=categories,
             title="Average Number of Minutes Spent on Free Time Activities Per Day",
             labels={'y':'Age of Canadians - Years', 'x':'Minutes of Free Time Activities Per Day'}
            )
fig.show()

## Interpret-3

This third plot isn't directly about gaming, but provides some context for the first few figures. It's showing how much time each age group has that is spent on free time activities including gaming. It seems to closely match the second figure.


## Activity-1: Add your data

Enter your free time and the amount of time you spend on video games to compare yourself with other Canadians.

In [None]:
## Enter your data here:
your_free_time = 120 # enter average number of minutes you spend on free time activities per day
your_videogame_time = 15  # enter average number of minutes you spend on playing video games per day

In [None]:
fig = px.bar(x=videogame_time_all, y=categories,
             title="Average Number of Minutes Spent Playing Video Games Per Day",
             labels={'y':'Age of Canadians - Years', 'x':'Minutes Gaming on Average Day'}
            )
fig.add_vline(x=your_videogame_time, line_color='red', line_dash='dash')
fig.update_xaxes(range=[0, max(your_videogame_time, max(videogame_time_all)) * 1.1])
fig.show()

fig = px.bar(x=videogame_time_players, y=categories,
             title="Average Number of Minutes Spent Playing Video Games Per Day",
             labels={'y':'Age of Canadians Who Play Computer Games - Years', 'x':'Minutes Gaming on Average Day'}
            )
fig.add_vline(x=your_videogame_time, line_color='red', line_dash='dash')
fig.update_xaxes(range=[0, max(your_videogame_time, max(videogame_time_players)) * 1.1])
fig.show()

fig = px.bar(x=free_time, y=categories,
             title="Average Number of Minutes Spent on Free Time Activities Per Day",
             labels={'y':'Age of Canadians - Years', 'x':'Minutes of Free Time Activities Per Day'}
            )
fig.add_vline(x=your_free_time, line_color='red', line_dash='dash')
fig.update_xaxes(range=[0, max(your_free_time, max(free_time)) * 1.1])
fig.show()

## Question(s)-1

- How do you compare with other Canadians on time spent on free time activities and video games per day?

## Activity-2: Compare class data with published data

In [None]:
class_code = 'callysto'  # if runnning multiple workshops in day, this can be changed arbitarily, 
# or the spreadsheet can be cleaned manually
date_and_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
year, month, day = date.today().strftime("%y"), date.today().strftime("%m"), date.today().strftime("%d")
ethercalc_id = str(class_code) + str(year) + str(month) + str(day)
base_url = 'https://ethercalc.net/'
post_url = base_url + '_/' + ethercalc_id
print('data will be posted to:', base_url+ethercalc_id)

if_upload = True

Run the cell below to upload your data to the cloud. The class data will then be compared with the published data.

In [None]:
if if_upload:
    r = requests.post(post_url, data= date_and_time +','+ str(class_code).upper() +','+ 
                      str(your_free_time) +','+ str(your_videogame_time))
    if_upload = False

Wait before everybody upload their data, then load the class data by running the following code cell.

In [None]:
print('reading data from', base_url+ethercalc_id)
class_data = pd.read_csv(base_url+ethercalc_id+'.csv')
class_data.columns=['Timestamp', 'class code', 'free time (min/day)', 'games (min/day)']  # rename the columns
class_data.drop(['Timestamp', 'class code'], inplace=True, axis=1)  # remove the necessary columns
class_data.head(5)  # look at sample data collected

In [None]:
# look at some statistics for the class data
class_data.describe()

In [None]:
# visualize the statistics using boxplots
px.box(class_data)

In [None]:
# vidualize the statistics using bar plots
# play with nbins (number of bins) to get a nice distribution
fig = px.histogram(class_data['games (min/day)'],
                   title="Average Number of Minutes Spent Playing Video Games Per Day",  
                   nbins=20
                  )
fig.show()
fig = px.histogram(class_data['free time (min/day)'], 
                   title="Number of Minutes Spent on Free Time Activities Per Day", 
                   nbins=20
                  )
fig.show()

In [None]:
free_time_class_avg = class_data['free time (min/day)'].mean()
games_class_avg = class_data['games (min/day)'].mean()

fig = px.bar(x=videogame_time_all, y=categories, 
             title="Average Number of Minutes Spent Playing Video Games Per Day", 
             labels={'y':'Age of Canadians - Years', 'x':'Minutes Gaming on Average Day'}
            )
fig.add_vline(x=games_class_avg, line_color='yellow', line_dash='dash')
fig.update_xaxes(range=[0, max(games_class_avg, max(videogame_time_all)) * 1.1])
fig.show()

fig = px.bar(x=videogame_time_players, y=categories, 
             title="Average Number of Minutes Spent Playing Video Games Per Day", 
             labels={'y':'Age of Canadians Who Play Computer Games - Years', 'x':'Minutes Gaming on Average Day'}
            )
fig.add_vline(x=games_class_avg, line_color='yellow', line_dash='dash')
fig.update_xaxes(range=[0, max(games_class_avg, max(videogame_time_players)) * 1.1])
fig.show()

fig = px.bar(x=free_time, y=categories, 
             title="Average Number of Minutes Spent on Free Time Activities Per Day", 
             labels={'y':'Age of Canadians - Years', 'x':'Minutes of Free Time Activities Per Day'}
            )
fig.add_vline(x=free_time_class_avg, line_color='yellow', line_dash='dash')
fig.update_xaxes(range=[0, max(free_time_class_avg, max(free_time)) * 1.1])
fig.show()

In [None]:
# look for correlations
fig = px.scatter(class_data, x="free time (min/day)", y="games (min/day)")
fig.show()

## Question(s)-2

- How does the class average compare with other Canadians on time spent on free time activities and video games per day?
- Is there any relation between the time spent on free time activities and time spent on video games per day?

## Communicate

Below we will reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have? These writing prompts can help you reflect.

- Why do you think the second and third charts are so alike?
- What does it mean that when you look at the population of Canadians, the average 15-24 year old spends much more time gaming than the 75 and over, but they're almost the same when you only look at people who game at least some?
- If we had current data, how do you think these plots would look?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)