[![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)](https://callysto.ca)


<h2 align='center'>Data Literacy through Sports Analytics</h2>
<h3 align='center'>2022</h3>

<h3 align='center'>Byron Chu (Cybera)</h3><br>

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)

<center><img src='./images/ccby.png' alt="CC BY logo" width='300' /></center>

<p><center><a href='https://creativecommons.org/licenses/by/4.0/'>CC BY</a>:<br>
This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format,<br>
so long as attribution is given to the creator. 
</center></p>

In [None]:
import pandas as pd
from pandas import read_csv
import plotly.graph_objects as go
import plotly.express as px
from plotly import subplots
pd.options.plotting.backend = "plotly"
from IPython.display import YouTubeVideo
from ipysheet import sheet, cell, cell_range

# Overview

- Data literacy via sports
- The learning progression
- Examples of learning and data analysis
 - Professional soccer
 - Ice hockey
 - Field hockey
- Python, Jupyter, and Callysto

<center><img src='./images/data_literacy.png' alt='data literacy' width='85%' /></center>

#### Content and context

(Alberta Education, 2000, 2007, updated 2016, 2017)

## Example: professional soccer event data
Data source: https://github.com/metrica-sports/sample-data

In [None]:
df_soccer = pd.read_csv("https://raw.githubusercontent.com/metrica-sports/sample-data/master/data/Sample_Game_1/Sample_Game_1_RawEventsData.csv"); df_soccer

**Home team passes, second half**

In [None]:
df_soccer[(df_soccer['Team'] == 'Home') & (df_soccer['Period'] == 2) & (df_soccer['Type'] == 'PASS')] \
    .plot.scatter(x = "Start X", y = "Start Y")

## Bridging expert to novice

## Data visualization learning progression
<img src='./images/creating_scaffolding.png' alt='scaffolding' width='95%' />

## Data visualization learning progression
<img src='./images/creating_adapting.png' alt='adapting' width='95%' />

## Data gathering learning progression
<br>
<center><img src='./images/data_gathering.png' alt='data gathering' width='85%' /></center>
<br><br><br>Source: <a href='http://oceansofdata.org/sites/oceansofdata.org/files/pervasive-and-persistent-understandings-01-14.pdf'>Pervasive and Persistent Understandings about Data</a>, Kastens (2014)

## Authentic learning approach

- Learning design based on interdisciplinary<br>
connections and real-world examples
- Industry-aligned data science analysis process
- Python, an all-purpose programming language
- Jupyter notebook, a free industry-standard tool for data scientists
- CallystoHub, free cloud computing

## Learning design and athlete development

### U15 training to train
- Promotes tactical strategies for in-game decision making, reading the situation and inferring 
- Focuses on the team and the process
- Situates personal goals within a team approach

## Learning design and U15 sports analytics - current

Online communication, sometimes through shared video analysis spaces

Video replay during games and training

Post–game video analysis, limited statistics

## Learning design and flexibility around data sources
<br>
<img src='./images/flexibility.png' alt='adapting' width='90%' />

## Two data examples

1. Import a csv file and use a Python spreadsheet<br>to create shot maps (ice hockey)
2. Gather data from video to analyze and make decisions (field hockey)

## Data example 1:
## Using IIHF World Junior Championship data to create graphs and a shot map

## Defining ice hockey net zones:<br> What factors can lead to scoring?
<!--USA Hockey Goaltender Basics https://www.usahockeygoaltending.com/page/show/890039-stance-->
||
|-|-|
|<img src='./images/hockey_net_zones.png' width='100%'/>|<img src='https://cdn.hockeycanada.ca/hockey-canada/Team-Canada/Men/Under-18/2014-15/2014-15_goalie_camp.jpg?q=60' />|
||<a href='https://www.hockeycanada.ca/en-ca/news/34-goaltenders-invited-to-2014-poe-camp'>Image source: Hockey Canada</a>|

In [None]:
%%html 
<h2>Data source IIHF: Shot charts</h2><br>
<iframe width="1200" height="600" src="https://www.iihf.com/pdf/503/ihm503a13_77a_3_0" frameborder="0" ></iframe>

## Tally chart
<img src='./images/hockey_tally.png' alt='tally chart' width='85%' />

## Generating a csv file

Zone,Austria,Canada,Czech_Republic,Finland,Germany,Russia,Switzerland,Slovakia,Sweden,USA,Total<br>
one,0,7,0,3,2,0,0,0,3,3,18<br>
two,0,1,1,0,1,0,0,0,0,0,3<br>
three,0,5,0,2,2,4,1,0,3,6,23<br>
four,0,4,3,2,1,1,0,1,0,3,15<br>
five,0,1,0,2,1,0,0,0,0,0,4<br>
six,1,1,2,4,0,2,0,1,0,2,13<br>
seven,0,6,0,1,3,3,1,1,0,9,24<br>
eight,0,5,1,2,2,3,1,2,3,2,21<br>
nine,0,3,3,0,2,3,2,0,5,0,18<br>

## Exploring scoring on net zones 

In [None]:
hockey_goals_df = pd.read_csv('./data/hockey_goals.csv')
hockey_goals_df.head(9)

### What do measures of central tendency tell us about the total goals per net zone?

In [None]:
hockey_goals_df['Total'].sum()

In [None]:
hockey_goals_df['Total'].min()

In [None]:
hockey_goals_df['Total'].max()

In [None]:
#Total goals per net zone scatter plot
hockey_goals_df.plot.scatter(x="Zone",y="Total",title="Total goals per net zone")

In [None]:
hockey_goals_df['Total'].mean()

In [None]:
hockey_goals_df['Total'].median()

In [None]:
hockey_goals_df['Total'].mode()

### Which net zones score above the median? 

In [None]:
hockey_goals_df.sort_values('Total', ascending=False)

In [None]:
#Total goals by net zone bar chart
px.bar(hockey_goals_df,x="Zone", y="Total", title="Total goals by net zone")

### What connections exist between goalie position and scoring?

In [None]:
hockey_goals_df = pd.read_csv('./data/hockey_goals.csv') 
hockey_goals_df.Total

In [None]:
#prepare spreadsheet and goal net image
spread_sheet_hockey_net = sheet(rows=3, columns=3)

my_cells_net = cell_range([[18,3,23],[15,4,13],[24,21,18]],row_start=0,col_start=0,numeric_format="int")

figure_hockey_net = go.Figure(data=go.Heatmap(
          z =list(reversed(my_cells_net.value)),
          type = 'heatmap',
          colorscale = 'greys',opacity = 1.0))

axis_template = dict(range = [0,5], autorange = True,
             showgrid = False, zeroline = False,
             showticklabels = False,
             ticks = '' )

figure_hockey_net.update_layout(margin = dict(t=50,r=200,b=200,l=200),
    xaxis = axis_template,
    yaxis = axis_template,
    showlegend = False,
    width = 800, height = 500, title="Ice hockey net zones",
    autosize = True )

# Add image in the background
nLanes = 3
nZones = 3

figure_hockey_net.add_layout_image(
        dict(
            source="images/hockey_net.png",
            xref="x",
            yref="y",
            x=-0.5,
            y=-.5 + nLanes,  #this adjusts the placement of the image
            sizex=nZones,
            sizey=nLanes,
            sizing="fill",
            opacity=1.0,
            layer="above")
)

# changes in my_cells should trigger this function
def calculate(change):
    figure_hockey_net.update_traces(z=list(reversed(my_cells_net.value)))
    
my_cells_net.observe(calculate, 'value')

In [None]:
spread_sheet_hockey_net

In [None]:
24/139

In [None]:
figure_hockey_net.update()  # Click the keys "Shift-return" to update the figure

## Data example 2:

## Analyzing youth field hockey data to make decisions

<center><img src='./images/learning_cycle1.png' alt="Learning design and context" width='90%' /></center>

#### Learning design and context notes

The context is physical education, and the content is statistics. Within physical education, in-game skills, fair play, teamwork, and goal setting are integrated. Those outcomes can be applied to in-game decision making. The goal setting can also be part of the communication resulting from the data analysis. When considering in-game decision making, we can define an action as the result of a decision. Decision making is part of a learning cycle that incorporates a technological feedback loop.

(Field Hockey Alberta, 2020; Field Hockey Canada, 2020; Alberta Education, 2000)

<center><img src='./images/learning_cycle5.png' alt="Learning cycle" width='90%' /></center>

#### Learning cycle notes
The real situation occurs on the field where a decision is made and an action is executed. Then, the athlete forms a mental representation, processing occurs, and a real model is formed. The real model is integrated into the computational model, which results in a technological feedback, then a connection is made back into game play.

(Butler & Winne, 1995; Cleary & Zimmerman, 2001; Hadwin et al., 2017; Leard & Hadwin, 2001)

<center><img src='./images/computational_thinking.png' alt="Computationl thinking" width='90%' /></center>

#### Computational modelling and data literacy notes

The definition of computational thinking can vary.

Computational thinking is math reasoning combined with critical thinking plus the power of computers. We can use computers to do work more efficiently for us, like compute thousands of lines of data.

Under that definition of computational thinking, we can apply computational thinking strategies. The foundational process is decomposing to look for patterns. We can use computer programming to design algorithms to look for patterns. With these algorithms, we can infer through abstractions.

The abstractions can be in the form of computational models: data visualizations (including graphs from the curriculum), data analyses, and simulations of probability models. The data visualizations can extend beyond the curriculum to support math reasoning.

(Berikan & Özdemir, 2019; Gadanidis, 2020; Guadalupe & Gómez-Blancarte, 2019; Leard & Hadwin, 2001)

<center><img src='./images/analysis_process.png' alt="Data science analysis process" width='90%' /></center>

#### Data science analysis process notes

This data science analysis process was modified from how expert data scientists analyze data and aligned to several provincial curricula.

There are six steps:

1. Understand the problem. What questions are we trying to answer?
2. Gather the data. Find the data sources, with the extension of big data sets.
3. Organize the data so we can explore it, usually in the form of a table.
4. Explore the data to create computational models. Usually, there is more than one model. Look for evidence to answer our questions.
5. Interpret the data through inferences. Explain how the evidence answers our questions.
6. Communicate the results. In the context of sports analytics, the communication might be within a team to decide tactical strategies for game play.

(Alberta Education, 2007, updated 2016; Ferri, 2006; Leard & Hadwin, 2001; Manitoba Education and Training, 2020; Ontario Ministry of Education, 2020)

<center><img src='./images/collective.png' alt="Collective decision making" width='90%' /></center>

#### Learning cycle notes

How the individual makes decisions within the collective responsibilities and actions of the team can be considered. In-game decision making involves in-game communication with team members, with each athlete referring to their own real model.

While in-game decision making will always produce a real model, athletes also need to decide when it is appropriate to connect the real model to the computational model and integrate that connection back into game play.

(BC Ministry of Education, 2020; Hadwin et al., 2017; Leard & Hadwin, 2001)

<center><img src='./images/field_hockey_game.png' alt="Field hockey" width='90%' /></center>

<center><img src='./images/understand1.png' alt="Understand actions" width='90%' /></center>

<center><img src='./images/actions.png' alt="Understand viewpoints" width='90%' /></center>

In [None]:
print ('Passes received')
YouTubeVideo('mIwiiJO7Rk4?start=2893&end=2915', width='600', height='355')

<center><img src='./images/gather4.png' alt="Gather" width='90%' /></center>

<center><img src='./images/collection_passing.png' alt="Passing" width='90%' /></center>

## 3. Organize

In [None]:
possession_time_df = read_csv('data/field_hockey_possession_time.csv')
possession_time_df.head(8)

## 4. Explore: How can the data be analyzed to represent viewpoints?

Game summary:

Final score: Home 0, Away 1<br><br>
Home dominated shots on net with 2 shots in the third quarter<br>while Away shot once on net in the first quarter and scored.

**Attack viewpoint: How does ball possession<br>time affect the outcomes of the game?**

In [None]:
#time of possession bar chart
px.bar(possession_time_df,x="Possession Time (seconds)",y="Quarter",title="Possession per quarter",color="Team")

**Attack and defensive transition viewpoints**

In [None]:
lanes_home_passes_df = read_csv('data/field_hockey_lanes_home_passes.csv')
lanes_home_passes_df.head()

In [None]:
#Home team passes pie chart
px.pie(lanes_home_passes_df, values="Count", names="Action", 
       title="Passes received, intercepted, and missed for Home team")

## 4. Explore
**Home attack as passes received per quarter:<br>
What stays the same and what changes across zones?**

In [None]:
df_passes_home = pd.read_csv('data/field_hockey_home_passes.csv')
df_passes_home

In [None]:
#splitting up the dataset by quarter
df_temp_1 = df_passes_home.loc[lambda df: (df['Phase of Play'] == 'attack') &(df['Quarter'] == 'first') ];
df_temp_2 = df_passes_home.loc[lambda df: (df['Phase of Play'] == 'attack') &(df['Quarter'] == 'second') ]; 
df_temp_3 = df_passes_home.loc[lambda df: (df['Phase of Play'] == 'attack') &(df['Quarter'] == 'third') ]; 
df_temp_4 = df_passes_home.loc[lambda df: (df['Phase of Play'] == 'attack') &(df['Quarter'] == 'fourth') ]; 

In [None]:
#configuring the heatmap plots and their side-by-side placement
fig_all = subplots.make_subplots(rows=1, cols=4, subplot_titles=("Q1", "Q2", "Q3", "Q4"))
fig_all.add_trace(go.Heatmap(x=df_temp_1['Lane'],y=df_temp_1['Zone'], z=df_temp_1['Count'], 
                                              colorscale='blues',zmin=0,zmax=9),row=1,col=1)
fig_all.add_trace(go.Heatmap(x=df_temp_2['Lane'],y=df_temp_2['Zone'], z=df_temp_2['Count'], 
                                              colorscale='blues',zmin=0,zmax=9),row=1,col=2)
fig_all.add_trace(go.Heatmap(x=df_temp_3['Lane'],y=df_temp_3['Zone'], z=df_temp_3['Count'], 
                                              colorscale='blues',zmin=0,zmax=9),row=1,col=3)
fig_all.add_trace(go.Heatmap(x=df_temp_4['Lane'],y=df_temp_4['Zone'], z=df_temp_4['Count'], 
                                              colorscale='blues',zmin=0,zmax=9),row=1,col=4)
fig_all.update_xaxes(showticklabels = False, linecolor='black')
fig_all.update_yaxes(showticklabels = False, linecolor='black')
fig_all.update_traces(showscale=True)
fig_all.update_layout(title="Home team attack phase passes by quarter")

In [None]:
fig_all.show()

In [None]:
df_passes_home.loc[lambda df: (df['Lane'] == 1) &(df['Phase of Play'] == 'attack') &(df['Quarter'].isin(['first', 'second'])) ].sum()

In [None]:
df_passes_home.loc[lambda df: (df['Lane'] == 1) &(df['Phase of Play'] == 'attack') &(df['Quarter']== 'first') ].sum()

In [None]:
df_passes_home.loc[lambda df: (df['Lane'] == 1) &(df['Phase of Play'] == 'attack') &(df['Quarter']== 'second') ].sum()

In [None]:
df_passes_home.loc[lambda df: (df['Lane'] == 5) &(df['Phase of Play'] == 'attack') &(df['Quarter'].isin(['third', 'fourth'])) ].sum()

In [None]:
df_passes_home.loc[lambda df: (df['Lane'] == 5) &(df['Phase of Play'] == 'attack') &(df['Quarter']== 'third') ].sum()

In [None]:
df_passes_home.loc[lambda df: (df['Lane'] == 5) &(df['Phase of Play'] == 'attack') &(df['Quarter']== 'fourth') ].sum()

- The most passes in a zone occur toward the left outside lane of the opponent's net across quarters: first quarter, 14/49 (29%); second quarter, 13/32 (41%); third quarter, 16/42 (38%); fourth quarter, 8/29 (28%).
- In the second half, defence keeps passes out of the zone of the net. 0 passes occur in that top horizontal zone.

## 5. Interpret<br> How can the data exploration inform decision making?

> - keeping ball possession by carrying the ball
> - dominating play by keeping the ball out of the zone near the net
> - attacking on the outer lanes, especially toward the left side of the opponent's net

# The technology in this talk

- **Jupyter** notebooks, **Python** programming, **Pandas** for data
- Free to teachers and students
- **Callysto.ca** project (CanCode, Cybera, PIMS)
- This slideshow **IS** a Jupyter notebook!  (take a tour)

## Callysto resources

- <a href="https://www.callysto.ca/starter-kit/">Callysto starter kit</a>  Getting started
- <a href="https://courses.callysto.ca">courses.callysto.ca</a>  Online courses
- <a href="https://www.callysto.ca/lesson-plans/">Callysto lesson plans</a>  Lesson Plans including soccer analytics
- <a href="https://www.callysto.ca/weekly-data-visualization/">Weekly data visualizations</a> Quickies
- <a href="https://app.lucidchart.com/documents/view/8e3186f7-bdfe-46af-9c7f-9c426b80d083">Connecting data literacy and sports</a> About Sports Analytics

<center><a href='https://www.callysto.ca/learning-modules/'><img src='./images/learning_modules.png' target='_blank' alt="Callysto learning modules" width='90%' /></a></center>
<center>All free, all open source, aimed at teachers and students</center>

<center><img src='./images/qrcode-mailchimp-callysto-feedback-survey.png' alt="QR code feedback survey" width='300' /></center>

<center><h3>https://bit.ly/callysto-feedback</h3></center>

<br>
<p><center>Contact us at <a href="mailto:contact@callysto.ca">contact@callysto.ca</a><br>
for in-class workshops, virtual hackathons and more information<br>
    <a href="https://twitter.com/callysto_canada">@callysto_canada</a><br>
    <a href="https://callysto.ca">callysto.ca</a><br>
    <a href="https://www.youtube.com/channel/UCPdq1SYKA42EZBvUlNQUAng">YouTube</a>
</center></p>

<center><img src='./images/callysto_logo.png' alt="Callysto logo" width='80%' /></center>
<center><img src='./images/callysto_partners2.png' alt='Callysto partners' width='80%' /></center>

## Presentation slides available at:

<center><img src='./images/qrcode-data-literacy-sports-github.png' alt="QR code GitHub Repo" width='300' /></center>

<center><h3>https://github.com/callysto/data-literacy-through-sports-analytics</h3></center>

### References

Alberta Education. (2000). *Physical education* [Program of Studies]. https://education.alberta.ca/media/160191/phys2000.pdf

Alberta Education. (2007, updated 2016). *Mathematics kindergarten to grade 9* [Program of Studies]. https://education.alberta.ca/media/3115252/2016_k_to_9_math_pos.pdf

Alberta Education. (2017). *Career and Ttechnology foundations* [Program of Studies]. https://education.alberta.ca/media/3795641/ctf-program-of-studies-jan-4-2019.pdf

BC Ministry of Education. (2020). *BC's digital literacy framework*. https://www2.gov.bc.ca/assets/gov/education/kindergarten-to-grade-12/teach/teaching-tools/digital-literacy-framework.pdf

Berikan, B., & Özdemir, S. (2019). Investigating “problem-solving with datasets” as an implementation of computational thinking: A literature review. *Journal of Educational Computing Research, 58*(2), 502–534. https://doi.org/10.1177/0735633119845694 

Butler, D. L., & Winne, P. H. (1995). Feedback and self-regulated learning: A theoretical synthesis. *Review of Educational Research, 65*(3), 245–281. https://doi.org/10.3102/00346543065003245  

Cleary, T. J., & Zimmerman, B. J. (2001). Self-regulation differences during athletic practice by experts, non-experts, and novices. *I Journal of Applied Sport Psychology, 13*(2), 185–206. https://doi.org/10.1080/104132001753149883 

Ferri, R. B. (2006). Theoretical and empirical differentiations of phases in the modelling process. *ZDM, 38*(2), 86–95. https://doi.org/10.1007/bf02655883 

Field Hockey Alberta (2020). *Tactical Seminars*. http://www.fieldhockey.ab.ca/content/tactical-seminars

Field Hockey Canada (2020). *Ahead of the Game*. http://www.fieldhockey.ca/ahead-of-the-game-field-hockey-canada-webinar-series/

Gadanidis, G. (2020, September 2). *Shifting from computational thinking to computational modelling in math education* [Online plenary talk]. Changing the Culture 2020, Pacific Institute for the Mathematical Sciences.

Guadalupe, T. & Gómez-Blancarte, A. (2019). Assessment of informal and formal inferential reasoning: A critical research review. *Statistics Education Research Journal, 18*, 8-25. https://www.researchgate.net/publication/335057564_ASSESSMENT_OF_INFORMAL_AND_FORMAL_INFERENTIAL_REASONING_A_CRITICAL_RESEARCH_REVIEW

Hadwin, A., Järvelä, S., & Miller, M. (2017). Self-Regulation, Co-Regulation, and Shared Regulation in Collaborative Learning Environments. *Handbook of Self-Regulation of Learning and Performance*, 83–106. https://doi.org/10.4324/9781315697048-6 

Kastens, K. (2014). *Pervasive and Persistent Understandings about Data*. Oceans of Data Institute. http://oceansofdata.org/sites/oceansofdata.org/files/pervasive-and-persistent-understandings-01-14.pdf

Leard, T., & Hadwin, A. F. (2001, May). *Analyzing logfile data to produce navigation profiles of studying as self-regulated learning* [Paper presentaion]. Canadian Society for the Study of Education, Quebec City, Quebec, Canada.

Manitoba Education and Training (2020). *Literacy with ICT across the curriculum: A model for 21st century learning from K-12*. https://www.edu.gov.mb.ca/k12/tech/lict/index.html

Ontario Ministry of Education. (2020). *The Ontario curriculum grades 1‐8: Mathematics* [Program of Studies]. https://www.dcp.edu.gov.on.ca/en/curriculum/elementary-mathematics