## Getting Started 

[Kloppy](https://kloppy.pysport.org/) by [PySport](https://pysport.org/) is _the_ industry standard open-source fooball data standardization package. Kloppy simplifies football data processing by offering a single place to [**load**](https://kloppy.pysport.org/user-guide/loading-data/skillcorner/), [**filter**](https://kloppy.pysport.org/user-guide/getting-started/#filtering-data), [**transform**](https://kloppy.pysport.org/user-guide/transformations/coordinates/) and [**export**](https://kloppy.pysport.org/user-guide/exporting-data/) your football data in a standardized way. 

We can use Kloppy to directly load data from GitHub (see below).

---

### Before we get started:
- Download Python3.11+ if you don't have it already.
- Make a virtual environment to store and install all the Python packages related to this project.
- Activate the virtual environment (select it as a Kernel for this Jupyter Notebook)
- [Install Kloppy](https://kloppy.pysport.org/user-guide/installation/) (`pip install kloppy`) into the virtual environment

---

### Kloppy

Using Kloppy we can now easily load the tracking data directly from GitHub using the URL to the raw files.


In [5]:
# !pip install kloppy polars

In [2]:
from kloppy import skillcorner

match_id = 1886347

tracking_data_github_url = f"https://media.githubusercontent.com/media/SkillCorner/opendata/master/data/matches/{match_id}/{match_id}_tracking_extrapolated.jsonl"
meta_data_github_url = f"https://raw.githubusercontent.com/SkillCorner/opendata/master/data/matches/{match_id}/{match_id}_match.json"

dataset = skillcorner.load(
    meta_data=meta_data_github_url,
    raw_data=tracking_data_github_url,
    # Optional Parameters
    coordinates="skillcorner",  # or specify a different coordinate system
    sample_rate=(1 / 2),  # changes the data from 10fps to 5fps
    limit=100,  # only load the first 100 frames
)

### Basic Kloppy Operations

Kloppy has many built in fuctionalities, below is a basic example to:
- Transform the coordinate system such that the home team attack from left to right both periods
- Filter out only the first half
- Output to [Polars](https://pola.rs/) dataframe. You can also output to `"pandas"`

In [9]:
dataset.to_df(
        engine="polars"
    )

period_id,timestamp,frame_id,ball_state,ball_owning_team_id,ball_x,ball_y,ball_z,ball_speed,51009_x,51009_y,51009_d,51009_s,176224_x,176224_y,176224_d,176224_s,51649_x,51649_y,51649_d,51649_s,50983_x,50983_y,50983_d,50983_s,735578_x,735578_y,735578_d,735578_s,50978_x,50978_y,50978_d,50978_s,735574_x,735574_y,735574_d,735574_s,…,51667_s,33697_x,33697_y,33697_d,33697_s,51713_x,51713_y,51713_d,51713_s,133498_x,133498_y,133498_d,133498_s,14736_x,14736_y,14736_d,14736_s,23418_x,23418_y,23418_d,23418_s,133501_x,133501_y,133501_d,133501_s,965685_x,965685_y,965685_d,965685_s,50951_x,50951_y,50951_d,50951_s,38673_x,38673_y,38673_d,38673_s
i64,duration[μs],i64,str,i64,f64,f64,f64,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,…,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null
1,0µs,10,"""dead""",,0.32,0.38,0.13,,-39.63,-0.08,,,-19.21,-9.18,,,-21.83,0.47,,,-1.16,-32.47,,,-18.88,15.73,,,-7.41,7.13,,,-9.51,-5.01,,,…,,16.78,-3.67,,,17.03,14.69,,,17.55,-13.6,,,11.7,6.73,,,10.16,-2.12,,,0.91,18.96,,,7.74,-16.27,,,0.4,-8.28,,,2.67,9.94,,
1,200ms,12,"""dead""",,0.57,-0.07,0.19,,-40.06,-0.18,,,-19.24,-9.27,,,-21.81,0.4,,,-1.13,-32.66,,,-19.07,15.73,,,-7.32,7.14,,,-9.46,-5.15,,,…,,16.83,-3.72,,,17.13,14.55,,,17.59,-13.66,,,11.46,6.66,,,10.09,-2.31,,,1.05,18.74,,,7.8,-16.43,,,0.48,-8.49,,,2.56,9.87,,
1,400ms,14,"""dead""",,0.59,-0.03,0.14,,-40.39,-0.25,,,-19.26,-9.36,,,-21.78,0.34,,,-1.11,-32.82,,,-19.25,15.73,,,-7.23,7.13,,,-9.44,-5.24,,,…,,16.88,-3.74,,,17.19,14.43,,,17.62,-13.71,,,11.22,6.61,,,10.02,-2.45,,,1.17,18.53,,,7.84,-16.57,,,0.53,-8.64,,,2.45,9.79,,
1,600ms,16,"""dead""",,0.65,0.03,0.14,,-40.63,-0.3,,,-19.28,-9.43,,,-21.75,0.29,,,-1.09,-32.93,,,-19.41,15.73,,,-7.14,7.11,,,-9.43,-5.3,,,…,,16.93,-3.74,,,17.2,14.33,,,17.63,-13.75,,,11.0,6.58,,,9.93,-2.56,,,1.26,18.35,,,7.85,-16.69,,,0.56,-8.73,,,2.35,9.7,,
1,800ms,18,"""dead""",,0.67,0.06,0.15,,-40.72,-0.29,,,-19.27,-9.5,,,-21.72,0.27,,,-1.08,-32.99,,,-19.57,15.7,,,-7.05,7.04,,,-9.47,-5.3,,,…,,16.94,-3.7,,,17.15,14.27,,,17.64,-13.78,,,10.77,6.58,,,9.84,-2.61,,,1.35,18.22,,,7.81,-16.77,,,0.55,-8.74,,,2.26,9.61,,
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
1,38s 800ms,398,"""alive""",4177,-45.74,-0.33,0.81,,-45.55,-1.7,,,-45.0,-9.49,,,-47.36,2.84,,,-40.5,-18.28,,,-45.49,11.98,,,-30.92,-3.72,,,-33.66,-16.0,,,…,,-0.65,-11.21,,,-14.19,11.86,,,-16.35,-21.37,,,-18.48,-1.31,,,-38.57,-18.13,,,-36.17,11.17,,,-34.25,-19.54,,,-30.79,-12.05,,,-38.52,1.4,,
1,39s,400,"""alive""",4177,-45.53,-0.28,0.71,,-45.24,-1.58,,,-45.01,-9.86,,,-47.3,3.29,,,-39.9,-18.5,,,-45.32,12.59,,,-30.67,-3.15,,,-33.3,-15.74,,,…,,-0.54,-10.84,,,-13.89,12.13,,,-15.79,-21.15,,,-18.17,-1.01,,,-37.56,-17.88,,,-35.52,11.32,,,-34.04,-19.36,,,-30.8,-11.62,,,-38.12,1.69,,
1,39s 200ms,402,"""alive""",4177,-45.07,-0.45,0.59,,-44.94,-1.48,,,-45.06,-10.24,,,-47.24,3.67,,,-39.32,-18.73,,,-45.14,13.19,,,-30.46,-2.58,,,-32.95,-15.47,,,…,,-0.44,-10.46,,,-13.59,12.38,,,-15.24,-20.92,,,-17.87,-0.71,,,-36.57,-17.63,,,-34.9,11.46,,,-33.83,-19.16,,,-30.82,-11.2,,,-37.73,1.98,,
1,39s 400ms,404,"""alive""",1805,-44.79,-0.43,0.73,,-44.66,-1.41,,,-45.13,-10.62,,,-47.17,3.99,,,-38.74,-18.96,,,-44.98,13.78,,,-30.27,-2.02,,,-32.61,-15.19,,,…,,-0.33,-10.09,,,-13.28,12.63,,,-14.71,-20.67,,,-17.59,-0.43,,,-35.59,-17.38,,,-34.3,11.59,,,-33.61,-18.98,,,-30.85,-10.78,,,-37.37,2.26,,


df = (
    dataset.transform(
        to_orientation="STATIC_HOME_AWAY"
    )  # Now, all attacks happen from left to right
    .filter(lambda frame: frame.period.id == 1)  # Only keep frames from the first half
    .to_df(
        engine="polars"
    )  # Convert to a Polars DataFrame, or use engine="pandas" for a Pandas DataFrame
)

In [7]:
df

period_id,timestamp,frame_id,ball_state,ball_owning_team_id,ball_x,ball_y,ball_z,ball_speed,51009_x,51009_y,51009_d,51009_s,176224_x,176224_y,176224_d,176224_s,51649_x,51649_y,51649_d,51649_s,50983_x,50983_y,50983_d,50983_s,735578_x,735578_y,735578_d,735578_s,50978_x,50978_y,50978_d,50978_s,735574_x,735574_y,735574_d,735574_s,…,51667_s,33697_x,33697_y,33697_d,33697_s,51713_x,51713_y,51713_d,51713_s,133498_x,133498_y,133498_d,133498_s,14736_x,14736_y,14736_d,14736_s,23418_x,23418_y,23418_d,23418_s,133501_x,133501_y,133501_d,133501_s,965685_x,965685_y,965685_d,965685_s,50951_x,50951_y,50951_d,50951_s,38673_x,38673_y,38673_d,38673_s
i64,duration[μs],i64,str,i64,f64,f64,f64,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,…,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null,f64,f64,null,null
1,0µs,10,"""dead""",,-0.32,-0.38,0.13,,39.63,0.08,,,19.21,9.18,,,21.83,-0.47,,,1.16,32.47,,,18.88,-15.73,,,7.41,-7.13,,,9.51,5.01,,,…,,-16.78,3.67,,,-17.03,-14.69,,,-17.55,13.6,,,-11.7,-6.73,,,-10.16,2.12,,,-0.91,-18.96,,,-7.74,16.27,,,-0.4,8.28,,,-2.67,-9.94,,
1,200ms,12,"""dead""",,-0.57,0.07,0.19,,40.06,0.18,,,19.24,9.27,,,21.81,-0.4,,,1.13,32.66,,,19.07,-15.73,,,7.32,-7.14,,,9.46,5.15,,,…,,-16.83,3.72,,,-17.13,-14.55,,,-17.59,13.66,,,-11.46,-6.66,,,-10.09,2.31,,,-1.05,-18.74,,,-7.8,16.43,,,-0.48,8.49,,,-2.56,-9.87,,
1,400ms,14,"""dead""",,-0.59,0.03,0.14,,40.39,0.25,,,19.26,9.36,,,21.78,-0.34,,,1.11,32.82,,,19.25,-15.73,,,7.23,-7.13,,,9.44,5.24,,,…,,-16.88,3.74,,,-17.19,-14.43,,,-17.62,13.71,,,-11.22,-6.61,,,-10.02,2.45,,,-1.17,-18.53,,,-7.84,16.57,,,-0.53,8.64,,,-2.45,-9.79,,
1,600ms,16,"""dead""",,-0.65,-0.03,0.14,,40.63,0.3,,,19.28,9.43,,,21.75,-0.29,,,1.09,32.93,,,19.41,-15.73,,,7.14,-7.11,,,9.43,5.3,,,…,,-16.93,3.74,,,-17.2,-14.33,,,-17.63,13.75,,,-11.0,-6.58,,,-9.93,2.56,,,-1.26,-18.35,,,-7.85,16.69,,,-0.56,8.73,,,-2.35,-9.7,,
1,800ms,18,"""dead""",,-0.67,-0.06,0.15,,40.72,0.29,,,19.27,9.5,,,21.72,-0.27,,,1.08,32.99,,,19.57,-15.7,,,7.05,-7.04,,,9.47,5.3,,,…,,-16.94,3.7,,,-17.15,-14.27,,,-17.64,13.78,,,-10.77,-6.58,,,-9.84,2.61,,,-1.35,-18.22,,,-7.81,16.77,,,-0.55,8.74,,,-2.26,-9.61,,
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
1,38s 800ms,398,"""alive""",4177,45.74,0.33,0.81,,45.55,1.7,,,45.0,9.49,,,47.36,-2.84,,,40.5,18.28,,,45.49,-11.98,,,30.92,3.72,,,33.66,16.0,,,…,,0.65,11.21,,,14.19,-11.86,,,16.35,21.37,,,18.48,1.31,,,38.57,18.13,,,36.17,-11.17,,,34.25,19.54,,,30.79,12.05,,,38.52,-1.4,,
1,39s,400,"""alive""",4177,45.53,0.28,0.71,,45.24,1.58,,,45.01,9.86,,,47.3,-3.29,,,39.9,18.5,,,45.32,-12.59,,,30.67,3.15,,,33.3,15.74,,,…,,0.54,10.84,,,13.89,-12.13,,,15.79,21.15,,,18.17,1.01,,,37.56,17.88,,,35.52,-11.32,,,34.04,19.36,,,30.8,11.62,,,38.12,-1.69,,
1,39s 200ms,402,"""alive""",4177,45.07,0.45,0.59,,44.94,1.48,,,45.06,10.24,,,47.24,-3.67,,,39.32,18.73,,,45.14,-13.19,,,30.46,2.58,,,32.95,15.47,,,…,,0.44,10.46,,,13.59,-12.38,,,15.24,20.92,,,17.87,0.71,,,36.57,17.63,,,34.9,-11.46,,,33.83,19.16,,,30.82,11.2,,,37.73,-1.98,,
1,39s 400ms,404,"""alive""",1805,44.79,0.43,0.73,,44.66,1.41,,,45.13,10.62,,,47.17,-3.99,,,38.74,18.96,,,44.98,-13.78,,,30.27,2.02,,,32.61,15.19,,,…,,0.33,10.09,,,13.28,-12.63,,,14.71,20.67,,,17.59,0.43,,,35.59,17.38,,,34.3,-11.59,,,33.61,18.98,,,30.85,10.78,,,37.37,-2.26,,


---

### Metadata

Kloppy also stores all relevant [**Metadata**](https://kloppy.pysport.org/user-guide/concepts/metadata/) which you can use to easily obtain player names, jersey numbers, playing positions etc.

In [7]:
home_team, away_team = dataset.metadata.teams

for player in home_team.players:
    print(f"{player.jersey_no} - {player.name} - {player.starting_position}")

10 - Guillermo Luis May Bartesaghi - Striker
17 - Callan Elliot - Unknown
22 - Jake Brimmer - Striker
15 - Francis De Vries - Unknown
4 - Nando Pijnaker - Left Center Back
23 - Daniel Hall - Right Center Back
6 - Louis Verstraete - Defensive Midfield
14 - Liam  Gillion - Left Wing
25 - Neyder Stiven Moreno Betancur - Right Wing
5 - Tommy Smith - Center Back
9 - Max Mata - Striker
27 - Logan Rogerson - Right Wing
28 - Luis Felipe Gallegos Leiva - Attacking Midfield
3 - Scott Galloway - Unknown
1 - Michael Woud - Unknown
8 - Luis Toomey - Unknown
7 - Cameron Drew Howieson - Unknown
12 - Alex Noah Paulsen - Unknown


---

### Basic Kloppy Functionalities

Below is a non-exhaustive list of other Kloppy functionalities.

- [TrackingDataset](https://kloppy.pysport.org/user-guide/concepts/tracking-data/)
- [Metadata (players, team names etc.)](https://kloppy.pysport.org/user-guide/concepts/metadata/)
- [Coordinate Systems](https://kloppy.pysport.org/user-guide/concepts/coordinates/#built-in-coordinate-systems)
- [Transformations](https://kloppy.pysport.org/user-guide/transformations/coordinates/)
- [Filter](https://kloppy.pysport.org/user-guide/getting-started/#filtering-data)
- [Exporting to pandas / polars DataFrames](https://kloppy.pysport.org/user-guide/exporting-data/dataframes/)

## Existing Open Source Projects
Below is a non-exhaustive list of publicly available **open source** football analytics tools. Feel free to use these as inspiration, build on-top of these tools or develop them further. We advise Analytics Cup participants to not replicate existing work, but rather use it to build new open source tools instead.

**Plotting**

- [`mplsoccer`](https://mplsoccer.readthedocs.io/en/latest/) and `matplotlib` to plot some different configurations of tracking data.
- [`soccer-d3`](https://github.com/probberechts/d3-soccer)
- And check out [Kloppy's Plotting Examples](https://kloppy.pysport.org/user-guide/getting-started/#exec-51--__tabbed_1_2)

**Resources for Positional Tracking Data**
- [floodlight](https://floodlight.readthedocs.io/en/latest/)
- [databallpy](https://databallpy.readthedocs.io/en/latest/?badge=latest)
- [Hyunsung Kim @ KAIST](https://github.com/hyunsungkim-ds)
- [Friends of Tracking](https://github.com/Friends-of-Tracking-Data-FoTD)
- [unravelsports](https://github.com/UnravelSports/UnravelSports)

**Other Football Analytics Resources**
- [socceraction](https://socceraction.readthedocs.io/en/latest/)
- [Footballdata](https://Footballdata.readthedocs.io/en/latest/)
- [ETSY](https://github.com/ML-KULeuven/ETSY) 
- [soccer-xg](https://github.com/ML-KULeuven/soccer_xg)
- [OpenSTARLab](https://openstarlab.readthedocs.io/en/latest/)
- [Soccer Analytics Handbook](https://github.com/devinpleuler/analytics-handbook)
- [penaltyblog](https://pypi.org/project/penaltyblog/)
