## Northeastern University Open Sports Data & Analytics Conference Getting 
### Started with IMPECT Open Event Data and [Kloppy](https://kloppy.pysport.org/) (powered by [PySport](https://pysport.org/))

### Install Packages:
- Download Python3.11+ if you don't have it already.
- Make a virtual environment to store and install all the Python packages related to this project.
- Activate the virtual environment (select it as a Kernel for this Jupyter Notebook)

Install the following package to use this notebook:

In [None]:
!pip install "kloppy>=3.18.0" polars pyarrow

### Kloppy

Kloppy is _the_ industry standard open-source soccer data standardization package used by clubs in the English Premier League, Italian Seria A, La Liga, German BundesLiga, Major League Soccer, Dutch Eredivisie etc etc. It is used to standardize data from different data providers into a single format, because each data provider uses its own proprietary formats, event definitions and coordinate systems.

We can use Kloppy to directly load and access [Open IMPECT Event Data](https://github.com/ImpectAPI/open-data).

### IMPECT Open Event Data

[IMPECT](https://www.impect.com/en/) is a big data provider that offers free data event data for the 2023/24 Bundesliga season, available for research purposes.

### 306 Bundesliga Games

We can easily access and see all publicly available competitions using the functionality below. 

1. We load the [**matches file**](https://github.com/ImpectAPI/open-data/blob/main/data/matches/matches_743.json) and the [**squads file**](https://github.com/ImpectAPI/open-data/blob/main/data/squads/squads_743.json) directly from GitHub. 
2. We remove, rename and unpack (unnest) the json files using [**Polars**](https://pola.rs/), an improved and faster alternative to Pandas, to obtain `matches` and `squads`

In [None]:
import polars as pl
import requests
import io

from kloppy.utils import github_resolve_raw_data_url

# 1. Load matches and squads data from IMPECT Open Data repository
match_url = github_resolve_raw_data_url(
    repository="ImpectAPI/open-data",
    branch="main",
    file="data/matches/matches_743.json"
)
squads_url = github_resolve_raw_data_url(
    repository="ImpectAPI/open-data",
    branch="main",
    file="data/squads/squads_743.json"
)

# 2. Load and process matches data
response = requests.get(match_url)
matches = (
    pl.read_json(io.StringIO(response.text))
    .unnest("matchDay")
    .rename({'iterationId': 'competitionId', 'id': 'matchId'})
    .drop(['idMappings', 'lastCalculationDate', 'name', 'available'])
    .with_columns([
        (pl.col("index") + 1).alias("matchDay")
    ])
    .drop("index")
)

response = requests.get(squads_url)
squads = (
    pl.read_json(io.StringIO(response.text))
    .drop(['type', 'gender', 'imageUrl', 'idMappings', 'access', 'countryId'])
)


3. We combine `matches` with `squads` to know the `homeTeam` and `awayTeam` names, which we obtain from the `squads` file.

In [47]:
matches = (
    matches
    .join(
        squads.rename({"name": "homeTeam"}),
        left_on="homeSquadId",
        right_on="id",
        how="left"
    )
    .join(
        squads.rename({"name": "awayTeam"}),
        left_on="awaySquadId",
        right_on="id",
        how="left"
    )
)

competitionId,matchId,homeSquadId,awaySquadId,scheduledDate,matchDay,homeTeam,awayTeam
i64,i64,i64,i64,str,i64,str,str
743,122838,38,33,"""2023-08-18T18:30:00Z""",1,"""SV Werder Bremen""","""FC Bayern München"""
743,122839,41,37,"""2023-08-19T13:30:00Z""",1,"""Bayer 04 Leverkusen""","""RasenBallsport Leipzig"""
743,122840,30,432,"""2023-08-19T13:30:00Z""",1,"""VfL Wolfsburg""","""1. FC Heidenheim 1846"""
743,122841,31,34,"""2023-08-19T13:30:00Z""",1,"""TSG 1899 Hoffenheim""","""SC Freiburg"""
743,122842,46,416,"""2023-08-19T13:30:00Z""",1,"""VfB Stuttgart""","""VfL Bochum"""
…,…,…,…,…,…,…,…
743,123139,46,32,"""2024-05-18T13:30:00Z""",34,"""VfB Stuttgart""","""Borussia Mönchengladbach"""
743,123140,432,27,"""2024-05-18T13:30:00Z""",34,"""1. FC Heidenheim 1846""","""1. FC Köln"""
743,123141,38,416,"""2024-05-18T13:30:00Z""",34,"""SV Werder Bremen""","""VfL Bochum"""
743,123142,41,39,"""2024-05-18T13:30:00Z""",34,"""Bayer 04 Leverkusen""","""FC Augsburg"""


Now, we can load one game at a time using Kloppy's `impect.load_open_data` functionality.

In [None]:
from kloppy import impect

match_id = 122838 
dataset = impect.load_open_data(
    match_id=match_id,
    competition_id=743,
)


In [49]:
dataset

<EventDataset record_count=3057>

### Basic Kloppy Operations

- Transform the [**coordinate system**](https://kloppy.pysport.org/user-guide/concepts/coordinates/) to meters and such that X $\in$ (-52.5, 52.5) and Y $\in$ (-34.0, 34.0) [called "secondspectrum"].
    Note: kloppy supports many different coodinate systems, and even custom coordinate systems.
- Filter for Passes and Shots
- Output to [Polars](https://pola.rs/) dataframe

In [11]:
(
    dataset
    .transform(to_coordinate_system="secondspectrum")  
    .filter(lambda event: event.event_type.name in ["PASS", "SHOT"])
    .to_df(engine="polars")  # or engine="pandas"
)

event_id,event_type,period_id,timestamp,end_timestamp,ball_state,ball_owning_team,team_id,player_id,coordinates_x,coordinates_y,end_coordinates_x,end_coordinates_y,receiver_player_id,set_piece_type,body_part_type,result,success,is_under_pressure,pass_type,is_counter_attack
str,str,i64,duration[μs],duration[μs],str,str,str,str,f64,f64,f64,f64,str,str,str,str,bool,bool,str,bool
"""549b2296-55e3-4768-9a2d-3435b9…","""PASS""",1,311ms,1s 300383µs,"""alive""","""941""","""941""","""2988""",-0.04575,0.04575,-13.417143,0.68625,"""8118""","""KICK_OFF""","""LEFT_FOOT""","""COMPLETE""",true,,,
"""13848ab5-86ee-4183-8c62-1b01af…","""PASS""",1,3s 152ms,4s 294334µs,"""alive""","""941""","""941""","""8118""",-13.997143,0.59475,-18.802857,-10.305833,"""6994""",,"""RIGHT_FOOT""","""COMPLETE""",true,true,,
"""b2384b09-5349-4d4c-9c17-fb01a9…","""PASS""",1,5s 219ms,6s 608891µs,"""alive""","""941""","""941""","""6994""",-19.962857,-11.2225,-34.311875,-16.080833,"""21809""",,"""LEFT_FOOT""","""COMPLETE""",true,,,
"""88c1b8a6-498b-4dea-9965-64e698…","""PASS""",1,7s 320ms,9s 355382µs,"""alive""","""941""","""941""","""21809""",-36.504167,-15.439167,-47.9625,-3.24825,"""8326""",,"""RIGHT_FOOT""","""COMPLETE""",true,,,
"""f89d0b16-0081-4a47-a38f-88d4b9…","""PASS""",1,10s 710ms,14s 693779µs,"""alive""","""941""","""941""","""8326""",-46.129167,1.41825,-7.73175,33.961556,,,"""LEFT_FOOT""","""OUT""",false,true,"""LONG_BALL""",
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""a88269fa-48d1-4dea-adc1-afed97…","""PASS""",2,50m 22s 15ms,50m 23s 131081µs,"""alive""","""1839""","""1839""","""22008""",-22.448571,-29.348222,-28.911429,-29.578889,"""12751""",,"""RIGHT_FOOT""","""COMPLETE""",true,,,
"""58655b6a-3921-46f9-96d3-b55997…","""PASS""",2,50m 23s 211ms,50m 25s 76969µs,"""alive""","""1839""","""1839""","""12751""",-28.911429,-29.578889,-9.274286,-11.405833,"""23789""",,"""LEFT_FOOT""","""COMPLETE""",true,,"""HIGH_PASS""",
"""bd21538a-0a93-4924-bb3e-b392c0…","""PASS""",2,50m 25s 117ms,50m 27s 259333µs,"""alive""","""1839""","""1839""","""23789""",-9.274286,-11.405833,-2.79075,-5.080833,,,,"""INCOMPLETE""",false,true,"""HIGH_PASS""",
"""bfbee75b-474c-484f-b9b2-7c27fd…","""PASS""",2,51m 3s 887ms,51m 6s 898384µs,"""alive""","""941""","""941""","""8326""",-6.17625,5.814167,33.581875,24.965556,"""7829""","""FREE_KICK""","""LEFT_FOOT""","""COMPLETE""",true,,"""LONG_BALL""",


### Basic Kloppy Functionalities
- [EventDataset](https://kloppy.pysport.org/user-guide/concepts/event-data/)
- [Metadata (players, team names etc.)](https://kloppy.pysport.org/user-guide/concepts/metadata/)
- [Coordinate Systems](https://kloppy.pysport.org/user-guide/concepts/coordinates/#built-in-coordinate-systems)
- [Transformations](https://kloppy.pysport.org/user-guide/transformations/coordinates/)
- [Filter](https://kloppy.pysport.org/user-guide/getting-started/#filtering-data)
- [Exporting to pandas / polars DataFrames](https://kloppy.pysport.org/user-guide/exporting-data/dataframes/)

### Plotting

Use `mplsoccer` and `matplotlib` to plot some different configurations of tracking data.

See [Plotting Examples](https://kloppy.pysport.org/user-guide/getting-started/#exec-51--__tabbed_1_2)