# FEATURES EXTRACTION - POSSESSO<a class="anchor" id="up"></a>

Tutte le funzioni restituiscono un DataFrame del tipo


| teamId | teamName | feature |
| --- | --- | --- |


Le features sono

* [Lunghezza catene](#chain_length)
* [Giocatori per catene](#chain_players)
* [Possesso palla pesato](#weighted_ball) 
* [Distanza passaggi](#avg_distance)
* [Difensori in attacco](#attacking_defenders)

    


In [1]:
import numpy as np
import pandas as pd
from scipy.spatial import distance
import pickle
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('clean/events_no_champions.csv')
if 'Unnamed: 0' in df.columns:
    del df['Unnamed: 0']
    
feats = pd.read_csv('clean/feats.csv')
    
display(df.head(2))
display(feats.head(2))

Unnamed: 0,eventId,subEventName,tags,playerId,matchId,eventName,teamId,matchPeriod,eventSec,subEventId,id,League,x0,y0,x1,y1,teamName,playerName,playerRole
0,8,Simple pass,1801,25413,2499719,Pass,1609,1H,2.758649,85.0,177959171,England,49,49,31.0,78.0,Arsenal,A. Lacazette,Forward
1,8,High pass,1801,370224,2499719,Pass,1609,1H,4.94685,83.0,177959172,England,31,78,51.0,75.0,Arsenal,R. Holding,Defender


Unnamed: 0,teamId,teamName
0,1609,Arsenal
1,1631,Leicester City


In [3]:
with open('clean/passes.pickle', 'rb') as fr:
    passes = pickle.load(fr)

***
***

### Lunghezza media delle catene di passaggi <a class="anchor" id="chain_length"></a>[up](#up)

Lunghezza media delle catene di passaggi consecutivi. Per la feature sono validi tutti gli eventi di name `Pass` non interrotti da intervento avversario o da eventi di altro tipo.

In [4]:
def avg_chain_length(feats):
    
    chain_len = []
    
    for team in feats['teamId']:
        pass_len = []
        for match in passes[team]:
            pass_len += [len(l) for l in passes[team][match]]
            
        chain_len += [np.mean(pass_len)]
        
    chain_len = pd.DataFrame({'teamId': feats['teamId'], 'avg_chain_length': chain_len})
    return chain_len

a = avg_chain_length(feats)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,avg_chain_length
0,1609,Arsenal,4.133464
1,1631,Leicester City,2.756963
2,1625,Manchester City,5.370112
3,1651,Brighton & Hove Albion,2.891741
4,1646,Burnley,2.534475


### Numero medio di giocatori per catena di passaggi <a class="anchor" id="chain_players"></a>[up](#up)

Media del numero di giocatori diversi coinvolti per ciascuna catena di passaggi di una squadra. Per la feature sono validi tutti gli eventi di name `Pass` non interrotti da intervento avversario o da eventi di altro tipo.

In [5]:
def avg_chain_players(feats):
    
    chain_len = []
    
    for team in feats['teamId']:
        pass_len = []
        for match in passes[team]:
            for chain in passes[team][match]:
                pass_len += [len(set([p['player1'] for p in chain] + [p['player2'] for p in chain]))]
            
        chain_len += [np.mean(pass_len)]
    
    chain_len = pd.DataFrame({'teamId': feats['teamId'], 'avg_chain_players': chain_len})
    return chain_len

a = avg_chain_players(feats)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,avg_chain_length,avg_chain_players
0,1609,Arsenal,4.133464,3.609002
1,1631,Leicester City,2.756963,2.918625
2,1625,Manchester City,5.370112,3.981445
3,1651,Brighton & Hove Albion,2.891741,3.026132
4,1646,Burnley,2.534475,2.871482


### Possesso palla pesato <a class="anchor" id="weighted_ball"></a>[up](#up)

Possesso palla medio per ogni squadra. 

Possesso palla derivato da quanti eventi di tipo `Pass` e `GroundAttackingDuel` vinti una squadra ha fatto rispetto all'altra in una partita, in proporzione. 

Eventi pesati per $$\frac{1}{distanza^2}$$

dove `distanza` = distanza dalla porta avversaria


In [6]:
def weighted_ball(df):
    x = df.loc[((df['eventName'] == 'Pass') | ((df['subEventId'] == 11) & (df['tags'].str.contains('703')))),]
    x['possesso'] = 1/(101-x['x0'])**2
    x = x[['teamId','matchId','possesso']].groupby(['teamId','matchId']).sum().reset_index()
    
    matches = x[['matchId','possesso']].groupby('matchId').sum().reset_index()
    matches.columns = ['matchId', 'possesso_tot']
    
    x = pd.merge(x, matches, on = 'matchId')
    x['possesso'] /= x['possesso_tot']
    x = x[['teamId','possesso']].groupby(['teamId']).mean().reset_index()
    x['possesso'] = round(100*x['possesso'], 2)
    return x[['teamId', 'possesso']]

a = weighted_ball(df)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,avg_chain_length,avg_chain_players,possesso
0,1609,Arsenal,4.133464,3.609002,60.97
1,1631,Leicester City,2.756963,2.918625,50.64
2,1625,Manchester City,5.370112,3.981445,74.83
3,1651,Brighton & Hove Albion,2.891741,3.026132,44.67
4,1646,Burnley,2.534475,2.871482,40.35


### Distanza media passaggi <a class="anchor" id="avg_distance"></a>[up](#up)

Distanza media dei passaggi tentati da ogni squadra.

In [7]:
def pass_distance(df):
    x = df.loc[df['eventName'] == 'Pass',]
    x['pass_dist'] = x.apply(lambda row: distance.euclidean((row.x0, row.y0), (row.x1, row.y1)), axis = 1)
    
    x = x[['teamId','pass_dist']].groupby('teamId').mean().reset_index()
    return x

a = pass_distance(df)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,avg_chain_length,avg_chain_players,possesso,pass_dist
0,1609,Arsenal,4.133464,3.609002,60.97,23.062594
1,1631,Leicester City,2.756963,2.918625,50.64,25.396924
2,1625,Manchester City,5.370112,3.981445,74.83,22.55847
3,1651,Brighton & Hove Albion,2.891741,3.026132,44.67,25.923639
4,1646,Burnley,2.534475,2.871482,40.35,26.795647


### Tocchi difensori in attacco <a class="anchor" id="attacking_defenders"></a>[up](#up)

Numero medio dei tocchi dei difensori nella metà campo avversaria, cioè `x0 > 50`.

In [8]:
def attacking_defenders(df):
    x = df.loc[((df['playerRole'] == 'Defender') & (df['x0'] > 50)),]
    
    x = x[['teamId','matchId','x0']].groupby(['teamId', 'matchId']).count().reset_index()
    x = x[['teamId','x0']].groupby(['teamId']).mean().reset_index()
    x.columns = ['teamId', 'attacking_defenders']
    return x

a = attacking_defenders(df)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,avg_chain_length,avg_chain_players,possesso,pass_dist,attacking_defenders
0,1609,Arsenal,4.133464,3.609002,60.97,23.062594,137.342105
1,1631,Leicester City,2.756963,2.918625,50.64,25.396924,91.921053
2,1625,Manchester City,5.370112,3.981445,74.83,22.55847,134.789474
3,1651,Brighton & Hove Albion,2.891741,3.026132,44.67,25.923639,70.684211
4,1646,Burnley,2.534475,2.871482,40.35,26.795647,75.289474


In [9]:
feats.to_csv('clean/feats_possesso.csv', index = False)