![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto’s Weekly Data Visualization

## Disability Prevelance in Canada
### Recommended Grade levels: 5-9
<br>

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

# Question

What is the proportion of people who experience disability compared to those who do not in the Canadian population?


### Goal

Our goal with this notebook is to inspire you with visualizations of the proportion of the populatioin that identifies as having some type of disability compared to the proportion of the population that does not identify as having a disability. The data sets are taken from__. 



# Gather

### Code: 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data. Pandas is a library that helps us to analyze data and plotly.express is a library that has code that allows us to make visualizations. 

In [None]:
import pandas as pd
import plotly.express as px
import re

## Data
Run the code cell below that to import the dat3e sets that we will use for this data vizualization. We used data from____

In [None]:
by_pop = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/disabilities/by_pop.csv')
by_type = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/disabilities/by_type.csv')
male_female = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/disabilities/male_female_disabilities.csv')
employment = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/disabilities/employment.csv")

In order to see what is in each data set we can use the code below which allows us to see the first ten rows in each data frame. By investigating the data frames we can tall that there is information about the following: 

* Populations with and without disabilities in different geographic locations.
* Numbers of indiviudals with a certain type of disability.
* Numbers of individuals with disabilities based on potential to work or inability to work. 
* information ..

In [None]:
display(by_pop.head(), by_type.head(), male_female.head(), employment.head())

# Organize

Eric give details about data cleaning...

In [None]:
by_pop[["City", "Province"]] = by_pop['Geography'].str.split(",", n=1, expand=True)
by_pop['City'] = by_pop['City'].str.strip()
by_pop['Province'] = by_pop['Province'].str.strip()
by_pop

In [None]:
def remove_integers(string):
    return ''.join(i for i in string if not i.isdigit())

def remove_commas_and_letters(value):
    value = value.replace(',', '')  # Remove commas
    value = re.sub('[^0-9]', '', value)  # Remove non-digit characters using regex
    return int(value)

In [None]:
by_pop['Disability'] = by_pop["Disability"].apply(remove_integers)
by_pop['Number'] = by_pop["Number"].apply(remove_commas_and_letters)


by_type["Disability type (grouped)"] = by_type["Disability type (grouped)"].apply(remove_integers)
by_type['Number'] = by_type["Number"].apply(remove_commas_and_letters)

male_female['Potential to work'] = male_female["Potential to work"].apply(remove_integers)
male_female['Number'] = male_female["Number"].apply(remove_commas_and_letters)

In [None]:
display(by_pop.head(), by_type.head(), male_female.head(), employment.head())

In [None]:
by_type_fig = px.histogram(by_type, x="Disability type (grouped)", y="Number", color="Number")
by_type_fig.update_traces(showlegend=False).show()

In [None]:
provinces = px.treemap(by_pop, path=[px.Constant("Canada"), 'Province', 'City', 'Disability'], values='Number')
provinces.update_traces(root_color="lightgrey")
provinces.update_layout(margin = dict(t=50, l=35, r=35, b=35))
provinces.show()