# FEATURES EXTRACTION - ATTACK<a class="anchor" id="up"></a>

Tutte le funzioni restituiscono un DataFrame del tipo


| teamId | feature |
| --- | --- |


* [Varianza betweenness](#betweenness) 
* [Centro vs fasce](#center_sides)
* [Fluidità posizioni attaccanti](#horizontal_attack) 
* [Shot distance](#shot_distance)
* [Lanci lunghi](#long_passes)


In [1]:
import numpy as np
import pandas as pd
from scipy.spatial import distance
import pickle
import re
import warnings
import networkx as nx
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('clean/events_no_champions.csv')
if 'Unnamed: 0' in df.columns:
    del df['Unnamed: 0']
    
feats = pd.read_csv('clean/feats.csv')
    
display(df.head(2))
display(feats.head(2))

Unnamed: 0,eventId,subEventName,tags,playerId,matchId,eventName,teamId,matchPeriod,eventSec,subEventId,id,League,x0,y0,x1,y1,teamName,playerName,playerRole
0,8,Simple pass,1801,25413,2499719,Pass,1609,1H,2.758649,85.0,177959171,England,49,49,31.0,78.0,Arsenal,A. Lacazette,Forward
1,8,High pass,1801,370224,2499719,Pass,1609,1H,4.94685,83.0,177959172,England,31,78,51.0,75.0,Arsenal,R. Holding,Defender


Unnamed: 0,teamId,teamName
0,1609,Arsenal
1,1631,Leicester City


In [3]:
with open('clean/passes.pickle', 'rb') as fr:
    passes = pickle.load(fr)

***
***

### Varianza betweenness <a class="anchor" id="betweenness"></a>[up](#up)

Varianza della betweenness centrality nella rete dei passaggi di una squadra.

In [4]:
def betweenness():
    
    betw = []
    for team in feats['teamId']:
        
        betw_team = []
        for match in passes[team]:
            passlist = []
            
            for chain in range(len(passes[team][match])):
                passlist += [(p['player1'], p['player2']) for p in passes[team][match][chain]]
            
            G = nx.Graph()
            G.add_edges_from(passlist)
            degree_sequence = [d for n, d in G.degree()]
            betw_team += [np.var(degree_sequence)]
        
        betw += [np.mean(betw_team)]

    betweennes = pd.DataFrame({'teamId': feats['teamId'], 'betweennes': betw})
    return betweennes


a = betweenness()
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,betweennes
0,1609,Arsenal,7.285386
1,1631,Leicester City,8.530305
2,1625,Manchester City,7.381291
3,1651,Brighton & Hove Albion,8.081609
4,1646,Burnley,7.395548


### Sviluppo al centro o sulle fasce <a class="anchor" id="center_sides"></a>[up](#up)

Proporzione di `key passes` e `assist` realizzati al centro (`33 < y < 66`) oppure sulle fasce.

In [5]:
ids = '301|302'

def center_side(df):
    tmp = df.loc[(df['eventName'] == 'Pass') & (df['tags'].str.contains('301|302')),]
    tmp['center_side'] = 0
    tmp.loc[((tmp['y0'] > 33) & (tmp['y0'] < 66)), 'center_side'] = 1
    
    tmp = tmp[['teamId','center_side']].groupby('teamId').mean().reset_index()
    
    return tmp

a = center_side(df)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,betweennes,center_side
0,1609,Arsenal,7.285386,0.317708
1,1631,Leicester City,8.530305,0.27027
2,1625,Manchester City,7.381291,0.300493
3,1651,Brighton & Hove Albion,8.081609,0.146341
4,1646,Burnley,7.395548,0.177778


### Fluidità posizioni attaccanti <a class="anchor" id="horizontal_attack"></a>[up](#up)

Varianza della posizione `y` degli attaccanti.

In [6]:
def forward_position(df):
    x = df.loc[df['playerRole'] == 'Forward',]
    x = x[['teamId','matchId','y0']].groupby(['teamId','matchId']).var().reset_index()
    x = x[['teamId','y0']].groupby(['teamId']).mean().reset_index()
    x.columns = ['teamId', 'forward_position']
    
    return x

a = forward_position(df)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
display(feats.head(2))

Unnamed: 0,teamId,teamName,betweennes,center_side,forward_position
0,1609,Arsenal,7.285386,0.317708,659.156252
1,1631,Leicester City,8.530305,0.27027,726.319416


### Distanza di tiro <a class="anchor" id="shot_distance"></a>[up](#up)

Distanza media dei tiri tentati da ogni squadra, solo dimensione verticale (`x`).

In [7]:
def shot_distance(df):
    x = df.loc[df['eventName'] == 'Shot',]
    x['shot_distance'] = 100 - x['x0']
    
    x = x[['teamId','shot_distance']].groupby('teamId').mean().reset_index()
    return x

a = shot_distance(df)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,betweennes,center_side,forward_position,shot_distance
0,1609,Arsenal,7.285386,0.317708,659.156252,14.550186
1,1631,Leicester City,8.530305,0.27027,726.319416,15.096
2,1625,Manchester City,7.381291,0.300493,706.44502,14.729685
3,1651,Brighton & Hove Albion,8.081609,0.146341,793.636856,15.630986
4,1646,Burnley,7.395548,0.177778,635.833471,13.880814


### Rapporto fra lanci lunghi e corti <a class="anchor" id="long_passes"></a>[up](#up)

Proporzione $$\frac{Cross + Highpass + Launch}{Pass}$$

In [8]:
long_passes = ['Cross', 'High pass', 'Launch']

def long_short_passes(df):
    x = df.loc[df['eventName'] == 'Pass',]
    x['long_short_passes'] = 0
    x.loc[x['subEventName'].isin(long_passes), 'long_short_passes'] = 1
    
    x = x[['teamId','long_short_passes']].groupby('teamId').mean().reset_index()
    return x

a = long_short_passes(df)
feats = pd.merge(feats, a, on = 'teamId', how = 'left')
feats.head()

Unnamed: 0,teamId,teamName,betweennes,center_side,forward_position,shot_distance,long_short_passes
0,1609,Arsenal,7.285386,0.317708,659.156252,14.550186,0.090422
1,1631,Leicester City,8.530305,0.27027,726.319416,15.096,0.173541
2,1625,Manchester City,7.381291,0.300493,706.44502,14.729685,0.069324
3,1651,Brighton & Hove Albion,8.081609,0.146341,793.636856,15.630986,0.170539
4,1646,Burnley,7.395548,0.177778,635.833471,13.880814,0.210654


In [9]:
feats.to_csv('clean/feats_attacco.csv', index = False)