# Converting events to actions

Each event data provider uses a proprietary format to describe the events that happend during a game. Socceraction, in contrast, uses a standardized tabular action-oriented data format, called SPADL. The distinction is that actions are a subset of events that require a player to perform the action. For example, a passing event is an action, whereas an event signifying the end of the game is not an action. Additionally, SPADL always store the same attributes for each action. Excluding optional information snippets enables us to store the data in a table and more easily apply automatic analysis tools.

SoccerAction implements two versions of this action-oriented data format: SPADL and Atomic-SPADL. For a detailed description of these formats, [see the documentation](https://socceraction.readthedocs.io/en/latest/documentation/spadl/index.html).

This notebook illustrates how to convert the provider-specific events to the SPADL and Atomic-SPADL representations. It assumes that you've already run the `1-load-statsbomb-data` notebook.

## Setup
We will user our dataset with StatsBomb events of the Big 5 leagues in 2015/16.

In [1]:
from pathlib import Path
from socceraction.data import HDFDataset, PartitionIdentifier

data_dir = Path("../../data")

# create a HDF dataset
dataset = HDFDataset(
    path=(data_dir / "statsbomb-bigfive-1516.h5"), 
    mode="a"
)

In [2]:
dataset.games()

Unnamed: 0,game_id,season_id,competition_id,competition_stage,game_day,game_date,home_team_id,away_team_id,home_score,away_score,venue,referee
0,3754058,27,2,Regular Season,20,2016-01-02 16:00:00,22,28,0,0,King Power Stadium,Andre Marriner
1,3754245,27,2,Regular Season,9,2015-10-17 16:00:00,27,41,1,0,The Hawthorns,Martin Atkinson
2,3754136,27,2,Regular Season,17,2015-12-19 18:30:00,37,59,1,1,St. James'' Park,Martin Atkinson
3,3754037,27,2,Regular Season,36,2016-04-30 16:00:00,29,28,2,1,Goodison Park,Neil Swarbrick
4,3754039,27,2,Regular Season,26,2016-02-13 16:00:00,31,23,1,2,Selhurst Park,Robert Madley
...,...,...,...,...,...,...,...,...,...,...,...,...
375,3878544,27,12,Regular Season,1,2015-08-23 20:45:00,2256,233,1,0,Stadio Renzo Barbera,Massimiliano Irrati
376,3878543,27,12,Regular Season,1,2015-08-23 20:45:00,238,228,1,0,Stadio Giuseppe Meazza,Gianpaolo Calvarese\t
377,3878542,27,12,Regular Season,1,2015-08-23 20:45:00,239,243,2,0,Stadio Artemio Franchi \t,Paolo Valeri
378,3878541,27,12,Regular Season,1,2015-08-22 18:00:00,226,229,1,1,Stadio Marc''Antonio Bentegodi,Marco Guida


In [5]:
dataset.events(game_id=375405)



KeyError: 'No data found'

In [4]:
"games/test_15".split("/", 1)

['games', 'test_15']

In [5]:
dataset.read_table("events", PartitionIdentifier(game_id=123))



KeyError: 'No data found'

## SPADL representation of event data

In [2]:
from socceraction.data.transforms import StatsBombEventsToActions

In [3]:
game = dataset.games().loc[3754058]
events = dataset.events(3754058).reset_index()

In [4]:
# surpress warnings regarding data version
import warnings
warnings.filterwarnings("ignore", message=".*fidelity.*")

to_actions = StatsBombEventsToActions()
to_actions(game, events)

Unnamed: 0,game_id,original_event_id,period_id,time_seconds,team_id,player_id,start_x,start_y,end_x,end_y,type_id,result_id,bodypart_id,action_id
0,3754058,2ca23eea-a984-47e4-8243-8f00880ad1c9,1,1.753,28,3343.0,51.66875,34.0425,52.19375,37.0175,0,1,5,0
1,3754058,a936e18c-3979-4576-8cc0-94114f1599db,1,2.061,28,3346.0,52.19375,37.0175,52.19375,37.0175,21,1,0,1
2,3754058,0fee7719-7e69-49c5-be81-3f2b77da604e,1,2.077,28,3346.0,52.19375,37.0175,63.04375,35.4025,0,1,5,2
3,3754058,6362aa69-892f-4d11-8644-21a680ea7c66,1,3.010,28,3344.0,63.04375,35.4025,72.23125,64.6425,0,1,4,3
4,3754058,d031d1d6-600c-4234-ac87-8e9eb9efdeee,1,5.465,28,6409.0,72.23125,64.6425,72.23125,63.3675,21,1,0,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2059,3754058,6a43c3ea-81d4-4450-8228-1e845730ce72,2,2965.427,22,3815.0,18.94375,41.3525,29.70625,51.3825,0,1,5,2059
2060,3754058,984207a2-bbc4-4f30-b20f-45644afeaf38,2,2966.815,22,3812.0,29.70625,51.3825,29.70625,50.3625,21,1,0,2060
2061,3754058,df1b681b-a2fb-4216-a9e2-efe528375436,2,2966.842,22,3812.0,29.70625,50.3625,26.99375,33.1075,0,1,4,2061
2062,3754058,20baabc3-44a5-4792-8336-60dc513cc0bd,2,2969.281,22,3813.0,26.99375,33.1075,43.70625,15.2575,21,1,0,2062


In [5]:
dataset.transform(to_actions)

UnboundLocalError: cannot access local variable 'game_ids' where it is not associated with a value

In [None]:
for (path, groups, leaves) in dataset.walk():
    print(groups, leaves)

In [None]:
import matplotsoccer

# Select the 5 actions preceding the 2-0
shot = 2201
a = actions[shot-4:shot+1].copy()

# Print the game date and timestamp of the goal
g = game.iloc[0]
minute = int((a.period_id.values[0]-1) * 45 + a.time_seconds.values[0] // 60)
game_info = f"{g.game_date} {g.home_team_name} {g.home_score}-{g.away_score} {g.away_team_name} {minute + 1}'"
print(game_info)

# Plot the actions
def nice_time(row):
    minute = int((row.period_id-1)*45 +row.time_seconds // 60)
    second = int(row.time_seconds % 60)
    return f"{minute}m{second}s"

a["nice_time"] = a.apply(nice_time, axis=1)
labels = a[["nice_time", "type_name", "player_name", "team_name"]]

ax = matplotsoccer.actions(
    location=a[["start_x", "start_y", "end_x", "end_y"]],
    action_type=a.type_name,
    team= a.team_name,
    result= a.result_name == "success",
    label=labels,
    labeltitle=["time", "actiontype", "player", "team"],
    zoom=False,
    figsize=6
)

## Atomic-SPADL representation of event data

In [6]:
dataset.close()