# Practico Mentoria - Analisis y Visualizacion de Datos

---

### Importaciones

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp

from collections import OrderedDict
from IPython.display import display

import warnings
warnings.filterwarnings('ignore')

In [2]:
sns.set_style("whitegrid")
sns.set_context('talk')

In [3]:
# Seteamos una semilla para Reproducibilidad
np.random.seed(1)

---

### Carga de los Datesets

In [4]:
df_player = pd.read_csv('./Datasets/football_player.csv')
df_team = pd.read_csv('./Datasets/football_team.csv')
df_match = pd.read_csv('./Datasets/football_match.csv')

### Exploremos un poco los Datasets

> #### Players Dataset

In [5]:
print("Shape = {}".format(df_player.shape))

Shape = (9925, 42)


In [6]:
df_player.sample(10)

Unnamed: 0,player_name,birthday,height,weight,overall_rating,potential,preferred_foot,attacking_work_rate,defensive_work_rate,crossing,...,vision,penalties,marking,standing_tackle,sliding_tackle,gk_diving,gk_handling,gk_kicking,gk_positioning,gk_reflexes
858,Ariel Borysiuk,1991-07-29,180.34,154,66.12,74.38,right,medium,high,56.92,...,70.46,49.21,51.58,63.58,63.08,12.71,13.04,18.71,15.54,13.04
8529,Sava Miladinovic Bento,1991-01-02,182.88,159,58.0,64.43,right,medium,medium,51.07,...,59.64,49.5,41.64,46.57,40.07,8.0,8.0,8.0,7.0,14.0
2527,Dusan Tadic,1988-11-20,180.34,168,78.16,81.88,left,medium,medium,81.52,...,84.32,76.28,39.96,35.56,27.56,10.16,10.16,12.56,8.16,15.16
8473,Samuel Souprayen,1989-02-18,187.96,165,64.24,71.76,left,medium,medium,58.29,...,46.52,42.71,65.62,65.19,66.9,8.33,10.95,15.19,14.19,14.19
1958,Daniele Croce,1982-09-09,172.72,150,67.68,67.68,right,high,medium,63.32,...,68.47,59.74,52.26,56.89,59.53,11.74,11.74,5.74,7.74,12.74
4555,John Arne Riise,1980-09-24,187.96,181,76.32,77.64,left,high,medium,84.0,...,64.14,70.59,75.45,79.59,80.59,13.27,10.09,33.14,13.5,13.68
8408,Saidy Janko,1995-10-22,177.8,154,62.13,76.53,right,high,medium,58.73,...,41.0,51.2,58.53,65.87,64.6,5.27,9.27,7.27,13.27,7.27
3743,Helder Postiga,1982-08-02,180.34,168,76.04,76.93,right,high,high,59.33,...,67.15,70.59,25.37,28.11,27.19,12.0,9.81,20.67,16.37,14.59
2314,Denzel Slager,1993-05-02,182.88,179,61.5,70.75,left,high,medium,60.25,...,47.0,50.0,20.62,20.0,21.0,14.0,8.0,11.0,11.0,15.0
2992,Fernando Marcal,1989-02-19,177.8,159,70.88,75.41,left,high,medium,72.53,...,51.53,42.35,65.47,70.0,71.53,5.35,10.35,13.35,12.35,10.35


In [7]:
df_player.dtypes

player_name             object
birthday                object
height                 float64
weight                   int64
overall_rating         float64
potential              float64
preferred_foot          object
attacking_work_rate     object
defensive_work_rate     object
crossing               float64
finishing              float64
heading_accuracy       float64
short_passing          float64
volleys                float64
dribbling              float64
curve                  float64
free_kick_accuracy     float64
long_passing           float64
ball_control           float64
acceleration           float64
sprint_speed           float64
agility                float64
reactions              float64
balance                float64
shot_power             float64
jumping                float64
stamina                float64
strength               float64
long_shots             float64
aggression             float64
interceptions          float64
positioning            float64
vision  

> #### Teams Dataset

In [8]:
print("Shape = {}".format(df_team.shape))

Shape = (288, 22)


In [9]:
df_team.sample(10)

Unnamed: 0,team_long_name,team_short_name,buildUpPlaySpeed,buildUpPlaySpeedClass,buildUpPlayDribblingClass,buildUpPlayPassing,buildUpPlayPassingClass,buildUpPlayPositioningClass,chanceCreationPassing,chanceCreationPassingClass,...,chanceCreationShooting,chanceCreationShootingClass,chanceCreationPositioningClass,defencePressure,defencePressureClass,defenceAggression,defenceAggressionClass,defenceTeamWidth,defenceTeamWidthClass,defenceDefenderLineClass
276,BSC Young Boys,YB,53.83,Balanced,Little,63.0,Mixed,Organised,46.0,Normal,...,52.0,Normal,Organised,46.5,Medium,40.0,Press,53.5,Normal,Cover
98,VfL Wolfsburg,WOL,61.33,Balanced,Little,51.33,Mixed,Organised,67.0,Risky,...,57.17,Normal,Organised,55.0,Medium,47.17,Press,53.0,Normal,Cover
186,Widzew Łódź,LOD,65.25,Balanced,Little,62.75,Long,Organised,56.75,Normal,...,53.0,Normal,Organised,34.25,Deep,42.75,Press,59.0,Normal,Cover
73,FC Sochaux-Montbéliard,SOC,61.33,Balanced,Little,46.0,Mixed,Organised,53.67,Normal,...,46.83,Normal,Organised,54.33,Medium,42.83,Press,54.33,Normal,Cover
269,RC Celta de Vigo,CEL,48.67,Balanced,Little,49.67,Mixed,Organised,52.67,Normal,...,56.0,Normal,Organised,42.67,Medium,47.0,Press,59.5,Normal,Cover
226,Heart of Midlothian,HEA,59.6,Balanced,Little,60.0,Mixed,Organised,58.4,Normal,...,64.0,Normal,Organised,53.4,Medium,59.0,Press,61.4,Normal,Cover
35,Middlesbrough,MID,62.67,Balanced,Little,55.83,Mixed,Organised,51.0,Normal,...,56.0,Normal,Organised,39.33,Medium,47.0,Press,42.83,Normal,Cover
154,Vitesse,VIT,42.0,Balanced,Little,39.0,Mixed,Organised,53.83,Normal,...,59.83,Normal,Organised,45.17,Medium,50.17,Press,52.33,Normal,Cover
253,Real Betis Balompié,BET,52.33,Balanced,Little,40.67,Mixed,Organised,55.67,Normal,...,56.5,Normal,Organised,54.0,Medium,46.83,Press,56.67,Normal,Cover
108,Karlsruher SC,KAR,57.4,Balanced,Little,47.4,Mixed,Organised,60.0,Normal,...,54.6,Normal,Organised,43.4,Medium,44.8,Press,45.4,Normal,Cover


In [10]:
df_team.dtypes

team_long_name                     object
team_short_name                    object
buildUpPlaySpeed                  float64
buildUpPlaySpeedClass              object
buildUpPlayDribblingClass          object
buildUpPlayPassing                float64
buildUpPlayPassingClass            object
buildUpPlayPositioningClass        object
chanceCreationPassing             float64
chanceCreationPassingClass         object
chanceCreationCrossing            float64
chanceCreationCrossingClass        object
chanceCreationShooting            float64
chanceCreationShootingClass        object
chanceCreationPositioningClass     object
defencePressure                   float64
defencePressureClass               object
defenceAggression                 float64
defenceAggressionClass             object
defenceTeamWidth                  float64
defenceTeamWidthClass              object
defenceDefenderLineClass           object
dtype: object

> #### Matchs Dataset

In [11]:
print("Shape = {}".format(df_match.shape))

Shape = (25979, 14)


In [12]:
df_match.sample(10)

Unnamed: 0,country_name,league_name,season,stage,date,home_team_long_name,home_short_long_name,away_team_long_name,away_short_long_name,home_team_goal,away_team_goal,B365H,B365D,B365A
15289,Netherlands,Netherlands Eredivisie,2014/2015,28,2015-03-20,FC Utrecht,UTR,NAC Breda,NAC,3,4,1.5,3.9,7.0
6697,France,France Ligue 1,2013/2014,11,2013-10-26,Valenciennes FC,VAL,Évian Thonon Gaillard FC,ETG,0,1,2.0,3.3,3.8
15489,Netherlands,Netherlands Eredivisie,2015/2016,17,2015-12-19,Heracles Almelo,HER,FC Groningen,GRO,2,1,2.2,3.4,3.2
2579,England,England Premier League,2010/2011,18,2011-01-26,Liverpool,LIV,Fulham,FUL,1,0,1.53,3.8,7.0
10264,Italy,Italy Serie A,2008/2009,1,2008-08-31,Torino,TOR,Lecce,LEC,3,0,1.8,3.1,5.25
17444,Poland,Poland Ekstraklasa,2015/2016,14,2015-10-30,Górnik Łęczna,LEC,Cracovia,CKR,1,0,,,
11088,Italy,Italy Serie A,2010/2011,16,2010-12-12,Brescia,BRE,Sampdoria,SAM,1,0,2.9,3.1,2.55
467,Belgium,Belgium Jupiler League,2009/2010,30,2010-03-21,Standard de Liège,STL,KAA Gent,GEN,0,2,1.85,3.6,4.0
17525,Poland,Poland Ekstraklasa,2015/2016,23,2016-02-21,Polonia Bytom,GOR,Ruch Chorzów,CHO,0,2,,,
15773,Poland,Poland Ekstraklasa,2008/2009,15,2008-11-22,GKS Bełchatów,BEL,Jagiellonia Białystok,BIA,2,0,,,


In [13]:
df_match.dtypes

country_name             object
league_name              object
season                   object
stage                     int64
date                     object
home_team_long_name      object
home_short_long_name     object
away_team_long_name      object
away_short_long_name     object
home_team_goal            int64
away_team_goal            int64
B365H                   float64
B365D                   float64
B365A                   float64
dtype: object

---

### Ejercicios

> Ejercicio 1

Calcular Estadisticos como son:
* Moda
* Media
* Mediana
* Desviacion Estandar
* Minimo y Maximo

de variables como la Altura y el Peso de los jugadores.

Ver si responden a alguna distribución conocida.

> Ejercicio 2

Realizar un Análisis de valores atípicos (outliers)

> Ejercicio 3

Explicar cómo varían las métricas cuando se desglosan por la pierna hábil de cada jugador (diestro o zurdo), Comparar cualitativamente y gráficamente ambas distribuciones.

> Ejercicio 4

Calcular la correlación entre las variables de los jugadores.

> Ejercicio 5: Preguntas

* Cual es la Liga Europea con mayor cantidad de partidos? Graficar.
* Cual es la temporada en donde hubo mayor cantidad de partidos? Graficar.
* Los 10 equipos con mayor cantidad de goles de local. Graficar.
* Los 10 equipos con mayor cantidad de goles de visitante. Graficar.
* El equipo que convierte la mayor cantidad de goles. Graficar.

**Extra:** Si se les ocurre algún otra métrica que puedan extraer de los datasets, los invito a que la hagan.

---