![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

## Introduction to Data Science and Big Data For Educators

David Hay | [@misterhay](https://twitter.com/misterhay)

[Callysto.ca](https://callysto.ca) | [@callysto_canada](https://twitter.com/callysto_canada)

<a href='https://creativecommons.org/licenses/by/4.0/'><img src='images/ccby.png' alt='CC BY' width='100'></a>

## Introduction to Data Science and Big Data For Educators

The ability to critically analyse large sets of data is becoming increasingly important, and there are many applications in education. We will introduce participants to the fundamentals of data science, and look at how you can incorporate data science into your teaching. You will come away with an increased understanding of this topic as well as some practical activities to use in your learning environment.

# Data Science

Data science involves obtaining and **communicating** information from (usually large) sets of observations.
* collecting, cleaning, manipulating, visualizing, synthesizing
* describing, diagnosing, predicting, prescribing

## Why is Data Science Important?



## What Does Data Science Look Like?

e.g. Gapminder animation

## How Can We Introduce Data Science?



# Jupyter Notebooks

A Jupyter notebook is an online document that can include both **formatted text** and (Python) `code` in different “cells” or parts of the document.

These documents run on [Callysto Hub](https://hub.callysto.ca/) as well as [Google Colab](https://colab.research.google.com/), [IBM Watson Studio](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/notebooks-parent.html), and other places.

We'll be using Python code in Jupyter notebooks for data science and computational thinking.

Links in this slideshow (and on Callysto.ca) create copies of Jupyter notebooks in your (and your students’) Callysto Hub accounts. This slideshow is also a Jupyter notebook.

## Visualizing Data

Visualizations of data help with analysis and storytelling.
* Usually include tables and graphs

In a Jupyter notebook with Python code, a graph can be as easy as:

In [None]:
import plotly.express as px
px.pie(names=['left-handed', 'right-handed'], values=[3, 21], title='Handedness of People in our Class')

In [None]:
px.scatter(x=[1, 2, 3, 4], y=[1, 4, 9, 16])

In [None]:
labels = ['English','French','Aboriginal Languages','Other']
values = [56.9,21.3,0.6,21.2]
px.bar(x=labels, y=values, title='First Languages Spoken in Canada')

## Using Online Data

We can import data from webpages or other files hosted online.

### Examples of Data Sources

* Wikipedia
* Gapminder
* Statistics Canada
* Canada Open Data
* Alberta Open Data
 * Many cities and municipalities have open data portals

In [None]:
url = 'https://en.wikipedia.org/wiki/List_of_Alberta_general_elections'
import pandas as pd
df = pd.read_html(url)[2]
df

In [None]:
px.histogram(df, x='Winner', title='Political Parties Elected in Alberta')

### CSV Data Online

In [None]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='Callysto Demonstration')
coordinates = geolocator.geocode('Grande Prairie, AB')
temperature_url = 'https://climateknowledgeportal.worldbank.org/api/data/get-download-data/historical/tas/1901-2016/'+str(coordinates.latitude)+'$cckp$'+str(coordinates.longitude)+'/'+str(coordinates.latitude)+'$cckp$'+str(coordinates.longitude)
temperatures = pd.read_csv(temperature_url)
temperatures

In [None]:
px.scatter(temperatures, x=' Statistics', y='Temperature - (Celsius)', color=' Year', 
           title='Monthy Average Temperatures in Grande Prairie from 1901-2016')

In [None]:
px.line(temperatures, x=' Year', y='Temperature - (Celsius)', color=' Statistics', 
       title='Monthy Average Temperatures in Grande Prairie from 1901-2016')

In [None]:
px.bar(temperatures, x=' Statistics', y='Temperature - (Celsius)', animation_frame=' Year', 
      title='Temperatures in Grande Prairie').update_layout(yaxis_range=[-30, 30])

## Data Formatting

You may find data in "tidy" or "wide" format.

### Tidy (Long) Data

One observation per row.

Name|Assignment|Mark
-|-|-
Marie|Radium Report|88
Marie|Polonium Lab|84
Jane|Primate Report|94
Jane|Institute Project|77
Mae|Endeavour Launch|92
Jennifer|Genetics Project|87

### Wide Data

Multiple columns for variables.

Name|Science Lab|Science Report|Spelling Test|Math Worksheet|Discussion Questions
-|-|-|-|-|-
Ryder|80|60|90|70|80
Marshall|60|70|70|80|90
Skye|90|80|90|90|80
Everest|80|90|80|70|90

Data can be converted from one format to another, depending on how it is going to be visualized.

# Markdown

For formatting text in notebooks, e.g. **bold** and *italics*.

[Markdown Cheatsheet](https://www.ibm.com/support/knowledgecenter/SSHGWL_1.2.3/analyze-data/markd-jupyter.html)

## LaTeX

Mathematical and scientific formatting, e.g. 

$m = \frac{E}{c^2}$

$6 CO_2 + 6H_2O → C_6H_12O_6 + 6 O_2$

[LaTeX Cheatsheet](https://davidhamann.de/2017/06/12/latex-cheat-sheet)

# Curriculum Notebooks

The [Callysto](https://www.callysto.ca) project has been developing free curriculum-aligned notebooks and other resources.

<a href='https://www.callysto.ca/learning-modules/'><img src='images/learning_modules.png' target='_blank' alt="Callysto learning modules" width='90%' /></a>

## Some of my Favorite Notebooks

* [Statistics Project](https://github.com/callysto/curriculum-notebooks/tree/master/Mathematics/StatisticsProject)
* [Orphan Wells](https://github.com/callysto/curriculum-notebooks/blob/master/SocialStudies/OrphanWells/orphan-wells.ipynb)
* [Survive the Middle Ages](https://github.com/callysto/curriculum-notebooks/blob/master/SocialStudies/SurviveTheMiddleAges/survive-the-middle-ages.ipynb)
* [Asthma Rates](https://github.com/callysto/curriculum-notebooks/blob/master/Health/AsthmaRates/asthma-rates.ipynb)
* [Climate Graphs](https://github.com/callysto/curriculum-notebooks/blob/master/Science/Climatograph/climatograph.ipynb)
* [Shakespeare and Statistics](https://github.com/callysto/curriculum-notebooks/blob/master/EnglishLanguageArts/ShakespeareStatistics/shakespeare-and-statistics.ipynb)
* [Word Clouds](https://github.com/callysto/curriculum-notebooks/blob/master/EnglishLanguageArts/WordClouds/word-clouds.ipynb)

# Data Visualizations and Interesting Problems

[Weekly Data Visualizations](https://www.callysto.ca/weekly-data-visualization) are pre-made, introductory data science lessons. They are a way for students to develop critical thinking and problem solving skills. We start with a question, find an open dataset to answer the question, and then ask students to reflect.

[Interesting Problems](https://www.callysto.ca/interesting-problems/) are notebooks and often videos series that demonstrate critical thinking skills, and use programming code to solve interesting problems.

# Hackathons

Online hackathons, either facilitated or [planned yourself](https://docs.google.com/document/d/1tnHhiE554xAmMRbU9REiJZ0rkJmxtNlkkQVCFfCoowE), enable students and educators to collaborate intensely to explore data and solve problems.

# Introducing Data Science to Students

Visualizations: [explore](https://www.youcubed.org/resource/data-talks), modify, [create](http://bit.ly/2RXTLz8)
* Can start with Callysto resources
* Consider "ask three then me"

[Educator Starter Kit](https://www.callysto.ca/starter-kit)

[Online courses](https://www.callysto.ca/distance-learning)

[Basics of Python and Jupyter](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fpresentations&branch=master&subPath=IntroductionToJupyterAndPython/callysto-introduction-to-jupyter-and-python-1.ipynb&depth=1)

[Troubleshooting](https://www.callysto.ca/troubleshooting)

# Turtles

Another way to introduce students to Python, Jupyter, and data science.

Start with Python turtles:
* [Python Turtles student version](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2FTMTeachingTurtles&branch=master&subPath=TMPythonTurtles/turtles-and-python-intro-student.ipynb&depth=1)
* [Python Turtles instructor version (key)](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2FTMTeachingTurtles&branch=master&subPath=TMPythonTurtles/turtles-and-python-intro-instructor.ipynb&depth=1)

In [None]:
from mobilechelonian import Turtle
t = Turtle()
t.forward(50)
t.right(90)
t.penup()
t.forward(30)

### Enter to win an Amazon gift card by completing a Callysto feedback survey: [bit.ly/callysto-feedback](http://bit.ly/callysto-feedback)

## Contact

[contact@callysto.ca](mailto:contact@callysto.ca) or [@callysto_canada](https://twitter.com/callysto_canada) for in-class workshops, virtual hackathons, questions, etc.

Also check out [Callysto.ca](https://www.callysto.ca) and the [YouTube channel](https://www.youtube.com/channel/UCPdq1SYKA42EZBvUlNQUAng).

<img src='images/callysto_logo.png' alt='Callysto Logo' width='80%'>
<img src='images/callysto_partners2.png' alt='Callysto Partners' width='80%'>

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)