# Pulling data from the Opta Vision Remote (MA35) Feed

This notebook outlines a basic framework to pull event-by-event data enriched with Opta Vision from the MA35 SDAPI feed.

The feed particularily contains information about passes as created by our pass models xPass, xReceiver, and xThreat:
- **xPass** is the likelihood that a pass will be completed to a player
- **xReceiver** is the likelihood that the ball carrier is passing to a player
- **xThreat** is the likelihood that a pass to a player will be followed by a shot within the next 10 seconds

To be able to run this notebook properly please make sure that the file 'qualifier_names.csv' is located in the same folder as the notebook.

In [None]:
import pandas as pd
import json
import os
import numpy as np

pd.options.display.max_columns = 999

## Load Opta Vision data

In [None]:
base_path = '{path to pass prediction files}'
game_id = 1
file_name = f'{game_id}.json'

# load data from file
with open(f'{base_path}/{file_name}') as f:
    data=json.loads(f.read())
    
f.close()
 
# transform data into pandas dataframe
events = pd.json_normalize(data['liveData']['event'])

# make a copy of it for later usage
events_all = events.copy()

## Get Player Info

In [None]:
# merge on the player names from the Opta Vision data
# player_info = events_all[['playerId', 'playerName']].drop_duplicates()

## Get qualifier info

To do deeper analysis we're going to need qualifier info on each event.

Qualifier info is actually stored in the feed as a nested dictionary. We will merge on the qualifier names from a local dataset for ease of use here.

This method below shows how to use this qualifier info. However, there remain some drawbacks:
- This is also an expensive operation. For a full PL season I expect ~4 million qualifiers - which could be rather slow to handle.

In [None]:
# read in qualifier list
qualifier_names = pd.read_csv("qualifier_names.csv")

In [None]:
# explode coverts each element in each list to a separate row
cols = ['id', 'qualifier']
qualifiers = events_all[cols].explode('qualifier')

In [None]:
# remove NA
qualifiers = qualifiers[qualifiers.qualifier.notna()].reset_index(drop=True)

In [None]:
# save corresponding event ids for each qualifier
event_ids = qualifiers.id.tolist()

In [None]:
qualifiers = pd.json_normalize(qualifiers[qualifiers.qualifier.notna()]['qualifier'])

In [None]:
qualifiers['event_id'] = event_ids

In [None]:
qualifiers = qualifiers.merge(qualifier_names, how='left', on='qualifierId')

In [None]:
qualifiers.head()

### Keep only Pass End qualifiers (for now)

To expand this, add the qualifier ids we need.

In [None]:
pass_end_qualifiers = qualifiers[qualifiers.qualifierId.isin([140,141])].copy()

In [None]:
# convert to wide format with separate column for each qualifier
pass_end_qualifiers = pass_end_qualifiers.pivot(index='event_id', columns='qualifier', values='value').reset_index()

In [None]:
pass_end_qualifiers['Pass End X'] = pass_end_qualifiers['Pass End X'].astype(float)
pass_end_qualifiers['Pass End Y'] = pass_end_qualifiers['Pass End Y'].astype(float)

In [None]:
pass_end_qualifiers.head()

In [None]:
# Add the qualifiers to the events
events = events_all.merge(pass_end_qualifiers,
                                     how='left',
                                     left_on='id',
                                     right_on='event_id')

## Pass Data

In [None]:
# filter to only pass events
is_pass_event = events.typeId == 1
is_without_predictions = events['passOption.player'].isna()
remove_passes_without_predictions = events['passOption.player'].notna()
remove_passes_without_target_predictions = events['passTarget.player'].notna()

In [None]:
pass_events = events.loc[is_pass_event & remove_passes_without_predictions & remove_passes_without_target_predictions]
pass_events_without_predictions = events.loc[is_pass_event & is_without_predictions]

In [None]:
print("There are " + str(len(pass_events)) + " passes **with** predictions")
print("There are " + str(len(pass_events_without_predictions)) + " set piece passes without predictions")

In [None]:
# flatten json for pass options
pass_options = pass_events[['id', 'passOption.player']].explode('passOption.player')
pass_options_flat = pd.json_normalize(pass_options['passOption.player'])

# save corresponding event ids for each qualifier
event_ids = pass_options[pass_options['passOption.player'].notna()].id.tolist()
pass_options_flat['event_id'] = event_ids

In [None]:
# merge on the player names
# player_cols = ['playerId']
# pass_options_flat = pass_options_flat.merge(player_info[player_cols].drop_duplicates(),
#                                             how='left',
#                                             on=['playerId'])

In [None]:
# flatten json for pass target
pass_targets = pass_events[['id', 'passTarget.player']].explode('passTarget.player')
pass_targets_flat = pd.json_normalize(pass_targets['passTarget.player'])

# save corresponding event ids for each qualifier
target_event_ids = pass_targets[pass_targets['passTarget.player'].notna()].id.tolist()
pass_targets_flat['event_id'] = target_event_ids

In [None]:
# merge on the player names
# pass_targets_flat = pass_targets_flat.merge(player_info[player_cols].drop_duplicates(),
#                                             how='left',
#                                             on=['playerId'])

In [None]:
# join on the options and targets to the original passes
pass_events_with_info = (pass_events
                         .merge(pass_options_flat,
                                how='left',
                                on='event_id',
                                suffixes=("", "_option"))
                         .merge(pass_targets_flat,
                                how='left',
                                on='event_id',
                                suffixes=("", "_target"))
                        )

In [None]:
pass_events_with_info.head()

In [None]:
useful_pass_cols = ['id', 'typeId',
                    'periodId', 'timeMin', 'timeSec', 'playerId',
                    'outcome', 'x', 'y', 'Pass End X', 'Pass End Y', 'timeStamp', 'passOption.player',
                    'xThreat.applied', 'keyPass',
                    'assist', 'playerId_option', 'shirtNumber',
                    'predictions.expectedPassReceiver.value', 'predictions.expectedPassCompletion.value',
                    'predictions.expectedThreat.value', 'predictions.passOptionQuality.value', 'passTarget.player',
                    'playerId_target', 'shirtNumber_target', 'predictions.expectedPassReceiver.value_target',
                    'predictions.expectedPassCompletion.value_target', 'predictions.expectedThreat.value_target',
                    'predictions.passOptionQuality.value_target']

In [None]:
# tidy column names
passes = pass_events_with_info[useful_pass_cols].rename(columns={"typeId":"event_type_id",
                                                                 "id":"event_id",
                                                                 "periodId": "period",
                                                                 "timeMin": "minute",
                                                                 "timeSec": "second",
                                                                 "Pass End X": "endx",
                                                                 "Pass End Y": "endy",
                                                                 "xThreat.applied": "xThreat",
                                                                 "keyPass": "chance_created",
                                                                 "positionX": "x_option",
                                                                 "positionY": "y_option",
                                                                 "predictions.expectedPassReceiver.value": "xR_option",
                                                                 "predictions.expectedPassCompletion.value": "xP_option",
                                                                 "predictions.expectedThreat.value": "xT_option",
                                                                 "predictions.passOptionQuality.value": "pass_option_quality",
                                                                 "positionX_target": "x_target",
                                                                 "positionY_target": "y_target",
                                                                 "predictions.expectedPassReceiver.value_target": "xR_target",
                                                                 "predictions.expectedPassCompletion.value_target": "xP_target",
                                                                 "predictions.expectedThreat.value_target": "xT_target",
                                                                 "predictions.passOptionQuality.value_target": "pass_option_quality_target"
                                                                })

In [None]:
# assign columns as numeric
numeric_cols = ['xR_option', 'xP_option', 'xT_option', 'xR_target', 'xP_target', 'xT_target', 'x', 'y']
passes[numeric_cols] = passes[numeric_cols].apply(pd.to_numeric)

In [None]:
# assign pass columns
xReceiver_limit = 0.7

passes.loc[:, "is_realistic_pass"] = np.where(passes.xR_option >= xReceiver_limit,
                                              1, 0)

passes.loc[:, "is_target_pass"] = np.where(passes.playerId_target == passes.playerId_option,
                                           1, 0)

passes.loc[:, "is_realistic_and_target_pass"] = np.where((passes.playerId_target == passes.playerId_option) &
                                                         (passes.xR_option >= xReceiver_limit),
                                                         1, 0)

passes.loc[:, "starts_in_own_third"] = np.where(passes.x <= 33.33, 1, 0)

In [None]:
passes.head()

## Availability Analysis

In [None]:
availability_player_cols = ['playerId_option']

In [None]:
availability_analysis = (passes
                         .groupby(availability_player_cols)
                         .agg(total_passes_while_on_pitch = ('event_id', 'count'),
                              was_targeted = ('is_target_pass', 'sum'),
                              was_realistic_option = ('is_realistic_pass', 'sum')
                             )
                         .reset_index()
                        )

In [None]:
availability_analysis.loc[:, "targeted_perc"] = round(100*availability_analysis.was_targeted/
                                                      availability_analysis.total_passes_while_on_pitch,1)

availability_analysis.loc[:, "used_when_available_perc"] = round(100*availability_analysis.was_targeted/
                                                      availability_analysis.was_realistic_option,1)

availability_analysis.loc[:, "availability_perc"] = round(100*availability_analysis.was_realistic_option/
                                                          availability_analysis.total_passes_while_on_pitch,1)

In [None]:
availability_analysis.head()

In [None]:
# set filters for availablilty
# It would be better to use a minutes filter here
# 1364 is the average Ligue 1 passes per team per game for three games
is_mininum_passes_while_on_pitch = availability_analysis.total_passes_while_on_pitch >= 10

In [None]:
# let's look at the most available players
(availability_analysis
 .loc[is_mininum_passes_while_on_pitch]
 .sort_values(by='used_when_available_perc', ascending=False)
 .head(10)
)

## Pass Predictions Analysis

In [None]:
# each pass has the options of passes to all other teammates. Filter to only 'realistic' passes.
is_realistic_pass = passes.is_realistic_pass == 1

In [None]:
filtered_passes = passes.loc[is_realistic_pass]

In [None]:
xP_average = filtered_passes.xP_target.mean()
xP_upper_quartile = filtered_passes.xP_target.quantile(0.75)
xT_average = filtered_passes.xT_target.mean()
xT_upper_quartile = filtered_passes.xT_target.quantile(0.75)
xT_upper_2nd_decile = filtered_passes.xT_target.quantile(0.8)
xT_upper_decile = filtered_passes.xT_target.quantile(0.9)

In [None]:
print("xP upper quartile = " + str(round(xP_upper_quartile,4)) +
      " & xT upper quartile = " + str(round(xT_upper_quartile,4)))

In [None]:
# set new columns
filtered_passes.loc[:, "is_good_opportunity"] = np.where((filtered_passes.xT_option >= xT_upper_decile) &
                                                         (filtered_passes.xP_option >= 0.8) &
                                                         (filtered_passes.xR_option >= xReceiver_limit), 1, 0)

filtered_passes.loc[:, "good_opportunity_taken"] = np.where((filtered_passes.is_target_pass == 1) &
                                                            (filtered_passes.is_good_opportunity == 1), 1, 0)

In [None]:
filtered_passes.head()

In [None]:
group_pass_cols = ['event_id', 'playerId', 'outcome', 'period', 'minute',
                   'second', 'x', 'y', 'endx', 'endy', 'playerId_target',
                   'xR_target', 'xP_target', 'xT_target']

pass_analysis = (filtered_passes
                 .groupby(group_pass_cols)
                 .agg(options = ('playerId_option', 'count'),
                      good_passing_opportunites = ('is_good_opportunity', 'sum'),
                      good_passing_opportunites_taken = ('good_opportunity_taken', 'sum'),
                      xR_max = ('xR_option', 'max'),
                      xR_min = ('xR_option', 'min'),
                      xP_max = ('xP_option', 'max'),
                      xP_min = ('xP_option', 'min'),
                      xT_max = ('xT_option', 'max'),
                      xT_min = ('xT_option', 'min')
                     )
                 .reset_index()
                )

In [None]:
pass_analysis.loc[:, "is_safest_pass"] = np.where(pass_analysis.xP_target >= pass_analysis.xP_max, 1, 0)
pass_analysis.loc[:, "is_most_dangerous_pass"] = np.where(pass_analysis.xT_target >= pass_analysis.xT_max, 1, 0)
pass_analysis.loc[:, "is_safe_pass"] = np.where(pass_analysis.xP_target >= xP_upper_quartile, 1, 0)
pass_analysis.loc[:, "is_dangerous_pass"] = np.where(pass_analysis.xT_target >= xT_upper_quartile, 1, 0)
pass_analysis.loc[:, "is_unpredictable_pass"] = np.where(pass_analysis.xR_target < xReceiver_limit, 1, 0)
pass_analysis.loc[:, "good_passing_opportunites_missed"] = np.where((pass_analysis.good_passing_opportunites > 0) &
                                                                (pass_analysis.good_passing_opportunites_taken == 0) &
                                                                    (pass_analysis.is_most_dangerous_pass == 0) &
                                                                    #(pass_analysis.is_dangerous_pass == 0)
                                                                    (pass_analysis.xT_target >= xT_upper_decile)
                                                                    ,
                                                                    1, 0)

## Overall Player Pass Analysis

Overall summaries of passes taken by players

In [None]:
passes.loc[:, "is_good_opportunity"] = np.where((passes.xT_option >= xT_upper_decile) &
                                                (passes.xP_option >= 0.8) &
                                                (passes.xR_option >= xReceiver_limit), 1, 0)

passes.loc[:, "good_opportunity_taken"] = np.where((passes.is_target_pass == 1) &
                                                   (passes.is_good_opportunity == 1), 1, 0)

all_pass_analysis = (passes
                     .groupby(group_pass_cols)
                     .agg(
                          options = ('playerId_option', 'count'),
                          good_passing_opportunites = ('is_good_opportunity', 'sum'),
                          good_passing_opportunites_taken = ('good_opportunity_taken', 'sum'),
                          xR_max = ('xR_option', 'max'),
                          xR_min = ('xR_option', 'min'),
                          xP_max = ('xP_option', 'max'),
                          xP_min = ('xP_option', 'min'),
                          xT_max = ('xT_option', 'max'),
                          xT_min = ('xT_option', 'min')
                     )
                 .reset_index()
                )

In [None]:
all_pass_analysis.loc[:, "is_safest_pass"] = np.where(all_pass_analysis.xP_target >= all_pass_analysis.xP_max, 1, 0)
all_pass_analysis.loc[:, "is_most_dangerous_pass"] = np.where(all_pass_analysis.xT_target >= all_pass_analysis.xT_max, 1, 0)
all_pass_analysis.loc[:, "is_safe_pass"] = np.where(all_pass_analysis.xP_target >= xP_upper_quartile, 1, 0)
all_pass_analysis.loc[:, "is_dangerous_pass"] = np.where(all_pass_analysis.xT_target >= xT_upper_quartile, 1, 0)
all_pass_analysis.loc[:, "is_unpredictable_pass"] = np.where(all_pass_analysis.xR_target < xReceiver_limit, 1, 0)
all_pass_analysis.loc[:, "good_passing_opportunites_missed"] = np.where((all_pass_analysis.good_passing_opportunites > 0) &
                                                                (all_pass_analysis.good_passing_opportunites_taken == 0) &
                                                                    (all_pass_analysis.is_most_dangerous_pass == 0) &
                                                                    #(all_pass_analysis.is_dangerous_pass == 0),
                                                                    (all_pass_analysis.xT_target >= xT_upper_decile),
                                                                    1, 0)

In [None]:
player_cols = [ 'playerId']

In [None]:
player_analysis_all_passes = (all_pass_analysis
                              .groupby(player_cols)
                              .agg(total_passes = ('event_id', 'count'),
                                  )
                              .reset_index()
                             )

## Typical Player Pass Analysis

In [None]:
player_cols = ['playerId']

In [None]:
player_analysis = (pass_analysis
                   .groupby(player_cols)
                   .agg(total_passes = ('event_id', 'count'),
                        safe_passes_taken = ('is_safe_pass', 'sum'),
                        dangerous_option_taken = ('is_dangerous_pass', 'sum'),
                        unpredictable_option_taken = ('is_unpredictable_pass', 'sum'),
                        safest_pass_taken = ('is_safest_pass', 'sum'),
                        most_dangerous_option_taken = ('is_most_dangerous_pass', 'sum'),
                        good_passing_opportunites_missed = ('good_passing_opportunites_missed', 'sum'),
                        good_passing_opportunites_taken = ('good_passing_opportunites_taken', 'sum'),
                        average_options = ('options', 'mean'),
                        average_xR = ('xR_target', 'mean'),
                        average_xP = ('xP_target', 'mean'),
                        average_xT = ('xT_target', 'mean')
                       )
                   .reset_index()
                  )

In [None]:
player_analysis.head()

In [None]:
player_analysis.loc[:, "safest_pass_perc"] = round(100*player_analysis.safest_pass_taken/
                                                   player_analysis.total_passes,1)

player_analysis.loc[:, "most_dangerous_option_perc"] = round(100*player_analysis.most_dangerous_option_taken/
                                                             player_analysis.total_passes,1)

player_analysis.loc[:, "unpredictable_option_perc"] = round(100*player_analysis.unpredictable_option_taken/
                                                             player_analysis.total_passes,1)

player_analysis.loc[:, "safe_pass_perc"] = round(100*player_analysis.safe_passes_taken/
                                                 player_analysis.total_passes,1)

player_analysis.loc[:, "dangerous_option_perc"] = round(100*player_analysis.dangerous_option_taken/
                                                        player_analysis.total_passes,1)

player_analysis.loc[:,
                    "good_passing_opportunites_missed_perc"] = round(100*
                                                                     player_analysis.good_passing_opportunites_missed/
                                                                     (player_analysis.good_passing_opportunites_missed +
                                                                      player_analysis.good_passing_opportunites_taken),1)

## Summary Stats

In [None]:
def opta_vision_summaries(data):
    
    # create summarise of given dataset
    total_passes = data.event_id.nunique()
    safest_passes = data.is_safest_pass.sum()
    most_dangerous_passes = data.is_most_dangerous_pass.sum()
    safe_passes = data.is_safe_pass.sum()
    dangerous_passes = data.is_dangerous_pass.sum()
    unpredictable_passes = data.is_unpredictable_pass.sum()
    average_options = data.options.mean().round(1)
    safest_pass_perc = round(100*safest_passes/total_passes,1)
    most_dangerous_pass_perc = round(100*most_dangerous_passes/total_passes,1)
    safe_pass_perc = round(100*safe_passes/total_passes,1)
    dangerous_pass_perc = round(100*dangerous_passes/total_passes,1)
    unpredictable_pass_perc = round(100*unpredictable_passes/total_passes,1)
    missed_opportunites = data.good_passing_opportunites_missed.sum()
    good_opportunites = (data.good_passing_opportunites_taken.sum()+data.good_passing_opportunites_missed.sum())
    missed_opp_perc = round(100*missed_opportunites/good_opportunites,1)
    

    print(f"There are {total_passes} total passes")
    print(f"On average, a player has {average_options} options when they make a pass")
    print(f"A player takes the safest option {safest_pass_perc}% of the time")
    print(f"A player takes the most dangerous option {most_dangerous_pass_perc}% of the time")
    print(f"A player takes a safe option {safe_pass_perc}% of the time")
    print(f"A player takes a dangerous option {dangerous_pass_perc}% of the time")
    print(f"A player takes an unlikely option {unpredictable_pass_perc}% of the time")
    print(f"Good opportunies are missed {missed_opp_perc}% of the time")

In [None]:
print("ALL PASSES:")
opta_vision_summaries(all_pass_analysis)

In [None]:
print("REALISTIC PASSES:")
opta_vision_summaries(pass_analysis)

## Final Player Outputs

In [None]:
player_id_list = events_all.playerId.dropna().unique().tolist()[:3]

In [None]:
is_chosen_players = player_analysis.playerId.isin(player_id_list)
is_merged_players = player_analysis_all_passes.playerId.isin(player_id_list)
is_availability_players = availability_analysis.playerId_option.isin(player_id_list)

In [None]:
merged_cols = ['playerId']

In [None]:
pressure_cols = ['playerId', 'playerName',
                 'pressures', 'high_pressures', 'high_pressure_perc'
                ]

In [None]:
option_cols = ['playerId', 'total_passes',
               'average_options',
               'safest_pass_perc', 'most_dangerous_option_perc', 'unpredictable_option_perc', 'safe_pass_perc',
               'dangerous_option_perc', 'good_passing_opportunites_missed_perc']

In [None]:
availability_cols = ['playerId_option',
                     'total_passes_while_on_pitch', 
                     'targeted_perc', 
                     'used_when_available_perc', 
                     'availability_perc']

In [None]:
player_analysis.loc[is_chosen_players][option_cols]

In [None]:
availability_analysis.loc[is_availability_players][availability_cols]

## Insights

In [None]:
is_mininum_passes = player_analysis.total_passes >= 10

In [None]:
(player_analysis
 .loc[is_mininum_passes][option_cols]
 .sort_values(by='safest_pass_perc',
              ascending=True)
 .head(10)
)

In [None]:
(player_analysis
 .loc[is_mininum_passes][option_cols]
 .sort_values(by='most_dangerous_option_perc',
              ascending=False)
 .head(10)
)