![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Introduction to Open Data

David Hay | [@misterhay](https://twitter.com/misterhay)

[Callysto.ca](https://callysto.ca) | [@callysto_canada](https://twitter.com/callysto_canada)

<a href='https://creativecommons.org/licenses/by/4.0/'><img src='https://raw.githubusercontent.com/callysto/presentations/6a92e54f75f7fe60964d889de0704a0bbe07f8b3/introduction-to-data-science-for-educators/images/ccby.png' alt='CC BY' width='100'></a>

# [Open Data](https://en.wikipedia.org/wiki/Open_data)
* freely available to use and repurpose
* usually published by governments or research institutions
* similar to [public data](https://www.google.com/publicdata/directory) and [open access](https://en.wikipedia.org/wiki/Open_access) and [open science](https://en.wikipedia.org/wiki/Open_science)

To be "open", the data should be both **openly licensed** and **machine-readable** (i.e. not images of scanned documents).

## The Importance of Data

* making good decisions
* efficiency
* improving accountability and transparency
* confidence and engagement
* collaboration
* culture of improvement
* innovation
* "what is measured is valued"

## Issues with (Open) Data

* accuracy and timeliness are difficult
* expenses related to collecting and publishing data (and sustainability)
* stories need to come from data, not the other way
* may be used for inappropriate purposes
* potential lack of data literacy (consider correlation and causation)
* privacy and security
* transparency trap
* "what is measured is valued"

# Finding Open Data

* [Open data in Canada](https://en.wikipedia.org/wiki/Open_data_in_Canada)
* [open.canada.ca](https://open.canada.ca)
* [open.alberta.ca](https://open.alberta.ca)
* [data.edmonton.ca](https://data.edmonton.ca)
* [data.strathcona.ca](https://data.strathcona.ca)

# Searching or Browsing

If you don't know what specifically you're looking for, there are usually ways to browse the data catalogues (e.g. by topic, popularity, or format).

Common types include
* public services
* demographic
* financial
* political
* cultural
* environmental
* geospacial

# Using Open Data

For example, in a [Jupyter notebook](https://jupyter.org) (e.g. on the [Callysto Hub](https://hub.callysto.ca)) we can download and display the data for [consumer price index changes in Alberta and Canada](https://open.alberta.ca/opendata/consumer-price-index-year-over-year-percentage-change-canada-and-alberta).

In [None]:
import pandas as pd
df = pd.read_csv('https://open.alberta.ca/dataset/443b43de-b8c0-4108-9aab-bde7df7532ed/resource/cfeb0607-bcb3-45e8-8947-59c5a3467118/download/stc_18-10-0004-01_consumer_price_index_csv_v29.0_2021-01-20.csv')
df

## Visualizing Data

While many open data portals have built-in visualization tools, we can also create visualizations of our own.

In [None]:
import plotly.express as px
data = df[df['REF_DATE']=='DEC2020']
px.bar(data, y='Products_and_product_groups', x='Percent_Change', color='GEO', title='Consumer Price Index Change')

In [None]:
px.scatter(df[df['Products_and_product_groups']=='Food'], x='REF_DATE', y='VALUE', title='Food Price Index')

### Animated Charts

In [None]:
px.bar(df.dropna(), y='Products_and_product_groups', x='Percent_Change', color='GEO', 
       animation_frame='REF_DATE', range_x=[df['Percent_Change'].min(),df['Percent_Change'].max()],
       orientation='h', barmode='group', title='Consumer Price Index Change')

### Maps

In [None]:
trees = pd.read_csv('https://prod-hub-indexer.s3.amazonaws.com/files/82841132047d47659508f60c52f6346a/0/full/4326/82841132047d47659508f60c52f6346a_0_full_4326.csv', low_memory=False)

import folium
from folium.plugins import MarkerCluster
tree_map = folium.Map(location=[trees['Y'].mean(), trees['X'].mean()], zoom_start=11)
marker_cluster = folium.plugins.MarkerCluster()
for row in trees.itertuples():
    marker_cluster.add_child(folium.Marker(location=[row.Y,row.X], popup=row.AssetID, tooltip=row.species))
tree_map.add_child(marker_cluster)
tree_map

# Open Data in Education

We [produce](https://open.alberta.ca/opendata?q=education&sort=score+desc) open data in education, but educators and students can also use data sets (and combinations of data sets) for [real-world learning](https://github.com/callysto/curriculum-notebooks/tree/master/Mathematics/StatisticsProject).