# Pandas
Es una librería de Python diseñada para trabajar con datos relacionados o etiquedatos de manera fácil e intuitiva. Principalmente se utiliza realizar análisis de datos.

Pandas trabaja con diferentes tipos de datos como:


*   Datos Tabulares ( Tabla SQL, Hoja de excel)
*   Series de tiempo
*   Data en forma de matrices
*   Cualquier otra forma de data estadística.




Pandas cuenta con 2 estructuras de datos principales, las cuales son las **Series** y los **DataFrame**. Las Series son de 1 dimensión mientras que los Dataframes de 2 dimensiones.


Recordar que Pandas está también construida en base a Numpy.


In [1]:
import numpy as np
import pandas as pd

## Series 
Son un arreglo unidimensionales capaces de almacenar cualquier tipo de data (integers, strings, floating point numbers, Python objects, etc).

Lo diferente de las series es que podemos especificar los indices.

In [2]:
#sintaxis
# pd.Series(data, index = index)
#np.nan es un 
#[ 20 , 12 , 14.5, '.', '*', '50000000']

serie = pd.Series([4,7,9,1,np.nan])
serie

0    4.0
1    7.0
2    9.0
3    1.0
4    NaN
dtype: float64

In [3]:
serie = pd.Series([4,7,9,1,np.nan], index = ["a","b","c","d","e"])
serie

a    4.0
b    7.0
c    9.0
d    1.0
e    NaN
dtype: float64

In [4]:
# También se pueden instanciar de diccionarios
d = {
    "p" : 6,
    "a" : "hola",
     "b" : 7.0
}
pd.Series(d)


p       6
a    hola
b     7.0
dtype: object

Las series actúan de manera muy parecida a un ndarray y es argumento válido para la mayoría de funciones de Numpy, sin embargo algunas operaciones afectan tanto al dato como a los índices.

In [None]:
print(serie)
print(serie[2:])
print(serie[[1,2]])
print(np.exp(serie))

a    4.0
b    7.0
c    9.0
d    1.0
e    NaN
dtype: float64
c    9.0
d    1.0
e    NaN
dtype: float64
b    7.0
c    9.0
dtype: float64
a      54.598150
b    1096.633158
c    8103.083928
d       2.718282
e            NaN
dtype: float64


In [None]:
# Las series también pueden ser accedidas como si fueran diccionarios
serie["a"]

4.0

In [5]:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

In [9]:
df

Unnamed: 0,col1,col2
0,1,3
1,2,4


In [7]:
# un dataframe es una tabla
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   col1    2 non-null      int64
 1   col2    2 non-null      int64
dtypes: int64(2)
memory usage: 160.0 bytes


In [8]:
df.columns

Index(['col1', 'col2'], dtype='object')

Es posible importar archivos .csv como un data frame

In [10]:
dfcsv = pd.read_csv('./Fullmetadata.csv')
print(dfcsv)

       player_id          player_name  games  time  goals        xG  assists  \
0           8865        Ollie Watkins      9   810      6  6.108615        1   
1            675        Jack Grealish      9   810      5  3.136721        5   
2            592         Ross Barkley      6   454      2  1.515623        1   
3           1024         Tyrone Mings      9   810      2  1.362349        1   
4           7726     Ezri Konsa Ngoyo      9   810      2  0.972794        0   
...          ...                  ...    ...   ...    ...       ...      ...   
18628       4267     Florian Hartherz     10   775      0  0.008183        2   
18629       4268  Christian Strohdiek     22  1608      0  0.192864        0   
18630       4311           Idir Ouali      6   296      0  0.608952        0   
18631       4334         Mirnes Pepic      2    20      0  0.000000        0   
18632       4363       Thomas Bertels      1     3      0  0.000000        0   

             xA  shots  key_passes  yel

In [11]:
dfcsv.columns

Index(['player_id', 'player_name', 'games', 'time', 'goals', 'xG', 'assists',
       'xA', 'shots', 'key_passes', 'yellow_cards', 'red_cards', 'position',
       'team_name', 'npg', 'npxG', 'xGChain', 'xGBuildup', 'year'],
      dtype='object')

In [12]:
dfcsv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18633 entries, 0 to 18632
Data columns (total 19 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   player_id     18633 non-null  int64  
 1   player_name   18633 non-null  object 
 2   games         18633 non-null  int64  
 3   time          18633 non-null  int64  
 4   goals         18633 non-null  int64  
 5   xG            18633 non-null  float64
 6   assists       18633 non-null  int64  
 7   xA            18633 non-null  float64
 8   shots         18633 non-null  int64  
 9   key_passes    18633 non-null  int64  
 10  yellow_cards  18633 non-null  int64  
 11  red_cards     18633 non-null  int64  
 12  position      18633 non-null  object 
 13  team_name     18633 non-null  object 
 14  npg           18633 non-null  int64  
 15  npxG          18633 non-null  float64
 16  xGChain       18633 non-null  float64
 17  xGBuildup     18633 non-null  float64
 18  year          18633 non-nu

In [15]:
# 5 primeras filas
dfcsv.head()

Unnamed: 0,player_id,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
0,8865,Ollie Watkins,9,810,6,6.108615,1,1.294178,22,12,0,0,F,Aston Villa,5,4.586278,6.570879,1.832243,2020
1,675,Jack Grealish,9,810,5,3.136721,5,3.884479,26,26,2,0,F M,Aston Villa,5,3.136721,7.380403,2.577331,2020
2,592,Ross Barkley,6,454,2,1.515623,1,1.767355,17,15,0,0,M,Aston Villa,2,1.515623,3.5464,0.667209,2020
3,1024,Tyrone Mings,9,810,2,1.362349,1,0.068672,7,2,2,0,D,Aston Villa,2,1.362349,1.677874,1.661455,2020
4,7726,Ezri Konsa Ngoyo,9,810,2,0.972794,0,0.0,7,0,0,0,D,Aston Villa,2,0.972794,0.627418,0.627418,2020


In [16]:
# 5 últimas filas
dfcsv.tail()

Unnamed: 0,player_id,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
18628,4267,Florian Hartherz,10,775,0,0.008183,2,0.341444,1,5,1,1,D S,Paderborn,0,0.008183,1.099364,0.908752,2014
18629,4268,Christian Strohdiek,22,1608,0,0.192864,0,0.095853,6,3,2,0,D S,Paderborn,0,0.192864,1.002443,0.89272,2014
18630,4311,Idir Ouali,6,296,0,0.608952,0,0.054503,4,1,1,0,M S,Paderborn,0,0.608952,0.983527,0.374576,2014
18631,4334,Mirnes Pepic,2,20,0,0.0,0,0.0,0,0,0,0,S,Paderborn,0,0.0,0.0,0.0,2014
18632,4363,Thomas Bertels,1,3,0,0.0,0,0.0,0,0,0,0,S,Paderborn,0,0.0,0.0,0.0,2014


Ademas es posible seleccionar una columna como indice. En este caso seleccionamos la columna $player\_name$

In [23]:
# analogas
#print(dfcsv.time)
print(dfcsv["games"])


0         9
1         9
2         6
3         9
4         9
         ..
18628    10
18629    22
18630     6
18631     2
18632     1
Name: games, Length: 18633, dtype: int64


In [24]:
# arreglo con valores
print(dfcsv["games"].values)

[9 9 6 ... 6 2 1]


In [26]:
dfcsv = pd.read_csv('./Fullmetadata.csv', index_col='player_name')
dfcsv

Unnamed: 0_level_0,player_id,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Ollie Watkins,8865,9,810,6,6.108615,1,1.294178,22,12,0,0,F,Aston Villa,5,4.586278,6.570879,1.832243,2020
Jack Grealish,675,9,810,5,3.136721,5,3.884479,26,26,2,0,F M,Aston Villa,5,3.136721,7.380403,2.577331,2020
Ross Barkley,592,6,454,2,1.515623,1,1.767355,17,15,0,0,M,Aston Villa,2,1.515623,3.546400,0.667209,2020
Tyrone Mings,1024,9,810,2,1.362349,1,0.068672,7,2,2,0,D,Aston Villa,2,1.362349,1.677874,1.661455,2020
Ezri Konsa Ngoyo,7726,9,810,2,0.972794,0,0.000000,7,0,0,0,D,Aston Villa,2,0.972794,0.627418,0.627418,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Florian Hartherz,4267,10,775,0,0.008183,2,0.341444,1,5,1,1,D S,Paderborn,0,0.008183,1.099364,0.908752,2014
Christian Strohdiek,4268,22,1608,0,0.192864,0,0.095853,6,3,2,0,D S,Paderborn,0,0.192864,1.002443,0.892720,2014
Idir Ouali,4311,6,296,0,0.608952,0,0.054503,4,1,1,0,M S,Paderborn,0,0.608952,0.983527,0.374576,2014
Mirnes Pepic,4334,2,20,0,0.000000,0,0.000000,0,0,0,0,S,Paderborn,0,0.000000,0.000000,0.000000,2014


Por ejemplo ahora seleccionaremos el $player\_id$ como índice

In [36]:
dfcsv = pd.read_csv('./data/Fullmetadata.csv', index_col='player_id')
dfcsv

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
8865,Ollie Watkins,9,810,6,6.108615,1,1.294178,22,12,0,0,F,Aston Villa,5,4.586278,6.570879,1.832243,2020
675,Jack Grealish,9,810,5,3.136721,5,3.884479,26,26,2,0,F M,Aston Villa,5,3.136721,7.380403,2.577331,2020
592,Ross Barkley,6,454,2,1.515623,1,1.767355,17,15,0,0,M,Aston Villa,2,1.515623,3.546400,0.667209,2020
1024,Tyrone Mings,9,810,2,1.362349,1,0.068672,7,2,2,0,D,Aston Villa,2,1.362349,1.677874,1.661455,2020
7726,Ezri Konsa Ngoyo,9,810,2,0.972794,0,0.000000,7,0,0,0,D,Aston Villa,2,0.972794,0.627418,0.627418,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4267,Florian Hartherz,10,775,0,0.008183,2,0.341444,1,5,1,1,D S,Paderborn,0,0.008183,1.099364,0.908752,2014
4268,Christian Strohdiek,22,1608,0,0.192864,0,0.095853,6,3,2,0,D S,Paderborn,0,0.192864,1.002443,0.892720,2014
4311,Idir Ouali,6,296,0,0.608952,0,0.054503,4,1,1,0,M S,Paderborn,0,0.608952,0.983527,0.374576,2014
4334,Mirnes Pepic,2,20,0,0.000000,0,0.000000,0,0,0,0,S,Paderborn,0,0.000000,0.000000,0.000000,2014


Un data frame es un objeto, como se vio en la clase de POO significa que cuenta con diversos metodos y atributos

In [None]:
# RETORNA LAS 5 PRIMERAS FILAS
dfcsv.head()

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
8865,Ollie Watkins,9,810,6,6.108615,1,1.294178,22,12,0,0,F,Aston Villa,5,4.586278,6.570879,1.832243,2020
675,Jack Grealish,9,810,5,3.136721,5,3.884479,26,26,2,0,F M,Aston Villa,5,3.136721,7.380403,2.577331,2020
592,Ross Barkley,6,454,2,1.515623,1,1.767355,17,15,0,0,M,Aston Villa,2,1.515623,3.5464,0.667209,2020
1024,Tyrone Mings,9,810,2,1.362349,1,0.068672,7,2,2,0,D,Aston Villa,2,1.362349,1.677874,1.661455,2020
7726,Ezri Konsa Ngoyo,9,810,2,0.972794,0,0.0,7,0,0,0,D,Aston Villa,2,0.972794,0.627418,0.627418,2020


In [None]:
# RETORNA LAS 5 ULTIMAS FILAS
dfcsv.tail()

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
4267,Florian Hartherz,10,775,0,0.008183,2,0.341444,1,5,1,1,D S,Paderborn,0,0.008183,1.099364,0.908752,2014
4268,Christian Strohdiek,22,1608,0,0.192864,0,0.095853,6,3,2,0,D S,Paderborn,0,0.192864,1.002443,0.89272,2014
4311,Idir Ouali,6,296,0,0.608952,0,0.054503,4,1,1,0,M S,Paderborn,0,0.608952,0.983527,0.374576,2014
4334,Mirnes Pepic,2,20,0,0.0,0,0.0,0,0,0,0,S,Paderborn,0,0.0,0.0,0.0,2014
4363,Thomas Bertels,1,3,0,0.0,0,0.0,0,0,0,0,S,Paderborn,0,0.0,0.0,0.0,2014


In [31]:
# RETORNA UNA FILA ALEATORIA
dfcsv.sample(10)

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
5196,Otegui,3,42,0,0.0,0,0.0,0,0,0,0,S,Osasuna,0,0.0,0.0,0.0,2016
1953,Karim Laribi,11,481,0,0.669416,0,1.08774,10,11,0,0,M S,Sassuolo,0,0.669416,1.661837,0.932126,2015
1557,Carlos Carmona,8,466,0,0.0598,0,0.196788,1,3,2,1,M S,Atalanta,0,0.0598,1.708201,1.451613,2015
1317,Andrés Tello,18,814,0,0.229302,0,0.55854,8,11,6,0,M S,Empoli,0,0.229302,1.657358,1.042832,2016
2164,Pablo Hernández,4,155,0,0.730133,0,0.092895,5,1,0,0,M S,Leeds,0,0.730133,1.448521,0.658298,2020
1306,Vincent Laurini,15,1078,0,0.04996,0,0.060422,2,2,1,0,D S,Empoli,0,0.04996,1.169131,1.058749,2014
3683,Ivan Cavaleiro,10,700,1,1.811039,0,0.374166,13,10,1,0,F M S,Fulham,0,0.288701,1.55936,1.092859,2020
5985,Jeremy Grimm,23,779,1,0.469312,0,0.141456,15,4,3,0,M S,Strasbourg,1,0.469312,2.049396,1.656551,2017
3799,Wesley Said,34,2261,4,5.300151,1,2.453087,59,25,4,0,D F M S,Dijon,4,4.540055,7.262549,2.374438,2018
586,John Stones,3,270,0,0.036258,0,0.010289,1,1,0,0,D,Manchester City,0,0.036258,0.520413,0.520413,2020


In [None]:
# RETORNA DATOS GENERALES DEL DATA FRAME
# FILAS Y COLUMNAS
print(dfcsv.shape)

(18633, 18)


In [None]:
# RETORNA EL TOTAL DE DATOS EN LA MATRIZ
print(dfcsv.size)

335394


In [None]:
# RETORNA INFORMACION SOBRE LOS VALORES NO NULOS
# Y EL TIPO DE DATO DE LA COLUMNA
dfcsv.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18633 entries, 8865 to 4363
Data columns (total 18 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   player_name   18633 non-null  object 
 1   games         18633 non-null  int64  
 2   time          18633 non-null  int64  
 3   goals         18633 non-null  int64  
 4   xG            18633 non-null  float64
 5   assists       18633 non-null  int64  
 6   xA            18633 non-null  float64
 7   shots         18633 non-null  int64  
 8   key_passes    18633 non-null  int64  
 9   yellow_cards  18633 non-null  int64  
 10  red_cards     18633 non-null  int64  
 11  position      18633 non-null  object 
 12  team_name     18633 non-null  object 
 13  npg           18633 non-null  int64  
 14  npxG          18633 non-null  float64
 15  xGChain       18633 non-null  float64
 16  xGBuildup     18633 non-null  float64
 17  year          18633 non-null  int64  
dtypes: float64(5), int64(10)

In [None]:
# CALCULA ALGUNAS ESTADISTICAS DESCRIPTIVAS SOBRE EL CONJUNTO DE DATOS
dfcsv.describe()

Unnamed: 0,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,npg,npxG,xGChain,xGBuildup,year
count,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0,18633.0
mean,17.072291,1207.483282,1.624054,1.660439,1.138088,1.165425,15.323888,11.354801,2.453979,0.127569,1.478399,1.517196,4.345903,2.547434,2016.937691
std,11.375527,975.753086,3.252418,2.974699,1.936238,1.735366,20.934645,15.065863,2.685745,0.370656,2.891849,2.620168,5.023077,2.95503,1.976606
min,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2014.0
25%,7.0,326.0,0.0,0.060762,0.0,0.04579,1.0,1.0,0.0,0.0,0.0,0.060527,0.771935,0.423074,2015.0
50%,16.0,976.0,0.0,0.560141,0.0,0.478022,8.0,5.0,2.0,0.0,0.0,0.544881,2.721288,1.623131,2017.0
75%,27.0,2004.0,2.0,1.900353,2.0,1.575897,21.0,16.0,4.0,0.0,2.0,1.78033,6.126256,3.599779,2019.0
max,38.0,3420.0,48.0,39.308761,20.0,20.620707,227.0,146.0,17.0,5.0,38.0,32.117727,54.75361,28.058852,2020.0


In [37]:
# ORDENA EL CONJUNTO DE DATOS RESPECTO A UNA COLUMNA
dfcsv.sort_values('player_name', ascending=False)

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
250,Ørjan Nyland,6,540,0,0.000000,0,0.000000,0,0,0,0,GK,Ingolstadt,0,0.000000,0.167960,0.167960,2015
250,Ørjan Nyland,12,1021,0,0.000000,0,0.000000,0,0,0,0,GK S,Ingolstadt,0,0.000000,0.593702,0.593702,2016
250,Ørjan Nyland,7,536,0,0.000000,0,0.000000,0,0,0,0,GK S,Aston Villa,0,0.000000,0.194507,0.194507,2019
2646,Özkan Yildirim,1,60,0,0.000000,0,0.000000,0,0,0,0,M,Werder Bremen,0,0.000000,0.013800,0.013800,2014
2646,Özkan Yildirim,1,13,0,0.083136,0,0.000000,1,0,0,0,S,Werder Bremen,0,0.083136,0.083136,0.000000,2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
534,Aaron Cresswell,20,1586,0,0.419239,1,0.937578,11,17,1,0,D S,West Ham,0,0.419239,2.039328,1.581602,2018
534,Aaron Cresswell,11,990,0,0.264341,3,1.606536,4,20,0,0,D,West Ham,0,0.264341,3.518969,2.893820,2020
534,Aaron Cresswell,31,2739,3,1.032538,0,3.006657,18,26,7,0,D,West Ham,3,1.032538,6.834942,4.891464,2019
7991,Aaron Connolly,24,1279,3,4.553526,1,0.562017,36,6,0,0,F S,Brighton,3,4.553526,4.356782,0.354199,2019


Ademas de los metodos que nos sirven para mostrar informacion de la data. Se puede hacer filtrados y seleccionar subconjuntos dentro de los dataframes.


In [40]:
# tarea
dfcsv["player_name"].unique()

array(['Ollie Watkins', 'Jack Grealish', 'Ross Barkley', ...,
       'Idir Ouali', 'Mirnes Pepic', 'Thomas Bertels'], dtype=object)

In [None]:
# SELECCIONA LOS VALORES DE UNA COLUMNA DE MANERA UNICA
dfcsv['team_name'].unique()

array(['Aston Villa', 'Everton', 'Southampton', 'Leicester',
       'West Bromwich Albion', 'Crystal Palace', 'Chelsea', 'West Ham',
       'Tottenham', 'Arsenal', 'Newcastle United', 'Liverpool',
       'Manchester City', 'Manchester United', 'Burnley', 'Brighton',
       'Fulham', 'Wolverhampton Wanderers', 'Sheffield United', 'Leeds',
       'Sevilla', 'Real Sociedad', 'Getafe', 'Atletico Madrid',
       'Valencia', 'Athletic Club', 'Barcelona', 'Real Madrid', 'Levante',
       'Celta Vigo', 'Real Betis', 'Villarreal', 'Granada', 'Eibar',
       'Osasuna', 'Alaves', 'Elche', 'Real Valladolid', 'SD Huesca',
       'Cadiz', 'Verona', 'Roma', 'Lazio', 'Bologna', 'Juventus',
       'Udinese', 'Genoa', 'Sampdoria', 'Sassuolo', 'Napoli', 'Inter',
       'Atalanta', 'Fiorentina', 'AC Milan', 'Torino', 'Crotone',
       'Cagliari', 'Benevento', 'Parma Calcio 1913', 'Spezia', 'Lille',
       'Paris Saint Germain', 'Rennes', 'Marseille', 'Montpellier',
       'Angers', 'Nantes', 'Nice', 'Mona

In [41]:
(dfcsv.goals==4)

player_id
8865    False
675     False
592     False
1024    False
7726    False
        ...  
4267    False
4268    False
4311    False
4334    False
4363    False
Name: goals, Length: 18633, dtype: bool

In [None]:
# UN OPERADOR BOOLEANO CON UNA COLUMNA DEL DATA FRAME RETORNARA TODAS LAS FILAS
# QUE SATISFACEN LA EXPRESION BOOLEANA Y LAS MAPEA COMO False o True
seleccionados = (dfcsv.goals==4)
seleccionados

player_id
8865    False
675     False
592     False
1024    False
7726    False
        ...  
4267    False
4268    False
4311    False
4334    False
4363    False
Name: goals, Length: 18633, dtype: bool

In [None]:
# SELECCIONA TODAS LAS FILAS MAPEADAS EN LA CELDA ANTERIOR COMO TRUE
dfcsv[seleccionados]

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
843,James Ward-Prowse,11,990,4,0.714341,3,1.256335,10,18,2,0,M,Southampton,4,0.714341,2.517873,2.041089,2020
65,Timo Werner,11,943,4,5.500642,3,2.101302,28,10,0,0,F M,Chelsea,4,5.500642,7.629157,2.278263,2020
935,Kurt Zouma,10,900,4,0.743740,0,0.018804,8,1,1,0,D,Chelsea,4,0.743740,1.932639,1.932639,2020
1776,Jarrod Bowen,11,921,4,3.402587,1,1.321204,27,10,0,0,F M,West Ham,4,3.402587,5.957856,2.169729,2020
838,Sadio Mané,10,825,4,4.921544,1,1.395934,32,16,2,0,F M S,Liverpool,4,4.921544,7.383594,1.892225,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,Jonathan Schmid,33,2826,4,5.148357,8,6.221347,60,64,2,0,F M S,Freiburg,4,5.148357,9.149545,3.101696,2014
194,Admir Mehmedi,28,2254,4,5.144821,1,1.570053,51,29,3,0,F M S,Freiburg,4,4.387044,8.266530,4.162098,2014
4246,Mike Frantz,25,1191,4,3.493235,1,1.234888,23,12,3,0,F M S,Freiburg,4,3.493235,5.093012,1.453917,2014
62,Lukas Rupp,31,2010,4,2.587367,1,2.207702,32,30,3,0,F M S,Paderborn,4,2.587367,6.888414,2.880639,2014


In [46]:
#from        where              select
otrodf = dfcsv.loc[dfcsv["goals"] == 4 , ["games", "goals"]]
otrodf

Unnamed: 0_level_0,games,goals
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1
843,11,4
65,11,4
935,10,4
1776,11,4
838,10,4
...,...,...
81,33,4
194,28,4
4246,25,4
62,31,4


In [48]:
dfcsv

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
8865,Ollie Watkins,9,810,6,6.108615,1,1.294178,22,12,0,0,F,Aston Villa,5,4.586278,6.570879,1.832243,2020
675,Jack Grealish,9,810,5,3.136721,5,3.884479,26,26,2,0,F M,Aston Villa,5,3.136721,7.380403,2.577331,2020
592,Ross Barkley,6,454,2,1.515623,1,1.767355,17,15,0,0,M,Aston Villa,2,1.515623,3.546400,0.667209,2020
1024,Tyrone Mings,9,810,2,1.362349,1,0.068672,7,2,2,0,D,Aston Villa,2,1.362349,1.677874,1.661455,2020
7726,Ezri Konsa Ngoyo,9,810,2,0.972794,0,0.000000,7,0,0,0,D,Aston Villa,2,0.972794,0.627418,0.627418,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4267,Florian Hartherz,10,775,0,0.008183,2,0.341444,1,5,1,1,D S,Paderborn,0,0.008183,1.099364,0.908752,2014
4268,Christian Strohdiek,22,1608,0,0.192864,0,0.095853,6,3,2,0,D S,Paderborn,0,0.192864,1.002443,0.892720,2014
4311,Idir Ouali,6,296,0,0.608952,0,0.054503,4,1,1,0,M S,Paderborn,0,0.608952,0.983527,0.374576,2014
4334,Mirnes Pepic,2,20,0,0.000000,0,0.000000,0,0,0,0,S,Paderborn,0,0.000000,0.000000,0.000000,2014


Es importante notar que el filtrado anterior se puede extender a cualquier función booleana incluso funciones definidas por nosotros mismos.

In [47]:
def es_cuatro(n):
  return n==4

#REALIZA EL MAPEO A CADA FILA Y SE GUARDA EN seleccionadosPorFuncion
seleccionadosPorFuncion = es_cuatro(dfcsv.goals)

dfcsv[seleccionadosPorFuncion]

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
843,James Ward-Prowse,11,990,4,0.714341,3,1.256335,10,18,2,0,M,Southampton,4,0.714341,2.517873,2.041089,2020
65,Timo Werner,11,943,4,5.500642,3,2.101302,28,10,0,0,F M,Chelsea,4,5.500642,7.629157,2.278263,2020
935,Kurt Zouma,10,900,4,0.743740,0,0.018804,8,1,1,0,D,Chelsea,4,0.743740,1.932639,1.932639,2020
1776,Jarrod Bowen,11,921,4,3.402587,1,1.321204,27,10,0,0,F M,West Ham,4,3.402587,5.957856,2.169729,2020
838,Sadio Mané,10,825,4,4.921544,1,1.395934,32,16,2,0,F M S,Liverpool,4,4.921544,7.383594,1.892225,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,Jonathan Schmid,33,2826,4,5.148357,8,6.221347,60,64,2,0,F M S,Freiburg,4,5.148357,9.149545,3.101696,2014
194,Admir Mehmedi,28,2254,4,5.144821,1,1.570053,51,29,3,0,F M S,Freiburg,4,4.387044,8.266530,4.162098,2014
4246,Mike Frantz,25,1191,4,3.493235,1,1.234888,23,12,3,0,F M S,Freiburg,4,3.493235,5.093012,1.453917,2014
62,Lukas Rupp,31,2010,4,2.587367,1,2.207702,32,30,3,0,F M S,Paderborn,4,2.587367,6.888414,2.880639,2014


Sin embargo, la sintaxis que proporciona pandas es mucho mas intuitiva podiendo realizar lo anterior de la siguiente manera:

In [None]:
df[df.goals==4]

Unnamed: 0,player_id,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
44,843,James Ward-Prowse,11,990,4,0.714341,3,1.256335,10,18,2,0,M,Southampton,4,0.714341,2.517873,2.041089,2020
128,65,Timo Werner,11,943,4,5.500642,3,2.101302,28,10,0,0,F M,Chelsea,4,5.500642,7.629157,2.278263,2020
129,935,Kurt Zouma,10,900,4,0.743740,0,0.018804,8,1,1,0,D,Chelsea,4,0.743740,1.932639,1.932639,2020
154,1776,Jarrod Bowen,11,921,4,3.402587,1,1.321204,27,10,0,0,F M,West Ham,4,3.402587,5.957856,2.169729,2020
241,838,Sadio Mané,10,825,4,4.921544,1,1.395934,32,16,2,0,F M S,Liverpool,4,4.921544,7.383594,1.892225,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18584,81,Jonathan Schmid,33,2826,4,5.148357,8,6.221347,60,64,2,0,F M S,Freiburg,4,5.148357,9.149545,3.101696,2014
18585,194,Admir Mehmedi,28,2254,4,5.144821,1,1.570053,51,29,3,0,F M S,Freiburg,4,4.387044,8.266530,4.162098,2014
18586,4246,Mike Frantz,25,1191,4,3.493235,1,1.234888,23,12,3,0,F M S,Freiburg,4,3.493235,5.093012,1.453917,2014
18610,62,Lukas Rupp,31,2010,4,2.587367,1,2.207702,32,30,3,0,F M S,Paderborn,4,2.587367,6.888414,2.880639,2014


Tambien se puede utilizar el metodo $.query()$

In [None]:
dfcsv.query('goals == 4')

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
843,James Ward-Prowse,11,990,4,0.714341,3,1.256335,10,18,2,0,M,Southampton,4,0.714341,2.517873,2.041089,2020
65,Timo Werner,11,943,4,5.500642,3,2.101302,28,10,0,0,F M,Chelsea,4,5.500642,7.629157,2.278263,2020
935,Kurt Zouma,10,900,4,0.743740,0,0.018804,8,1,1,0,D,Chelsea,4,0.743740,1.932639,1.932639,2020
1776,Jarrod Bowen,11,921,4,3.402587,1,1.321204,27,10,0,0,F M,West Ham,4,3.402587,5.957856,2.169729,2020
838,Sadio Mané,10,825,4,4.921544,1,1.395934,32,16,2,0,F M S,Liverpool,4,4.921544,7.383594,1.892225,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,Jonathan Schmid,33,2826,4,5.148357,8,6.221347,60,64,2,0,F M S,Freiburg,4,5.148357,9.149545,3.101696,2014
194,Admir Mehmedi,28,2254,4,5.144821,1,1.570053,51,29,3,0,F M S,Freiburg,4,4.387044,8.266530,4.162098,2014
4246,Mike Frantz,25,1191,4,3.493235,1,1.234888,23,12,3,0,F M S,Freiburg,4,3.493235,5.093012,1.453917,2014
62,Lukas Rupp,31,2010,4,2.587367,1,2.207702,32,30,3,0,F M S,Paderborn,4,2.587367,6.888414,2.880639,2014


Se puede utilizar el metodo $isin$ para seleccionar las filas que estan dentro de un conjunto de datos.

In [49]:
df

Unnamed: 0,col1,col2
0,1,3
1,2,4


In [50]:
players = ['Cristiano Ronaldo', 'Lionel Messi', 'Luis Suárez']
dfcsv.player_name.isin(players)

player_id
8865    False
675     False
592     False
1024    False
7726    False
        ...  
4267    False
4268    False
4311    False
4334    False
4363    False
Name: player_name, Length: 18633, dtype: bool

In [52]:
players = ['Cristiano Ronaldo', 'Lionel Messi', 'Luis Suárez']
top_players = dfcsv[ (dfcsv.player_name.isin(players))]

top_players

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2098,Luis Suárez,7,442,5,3.594253,1,0.601812,27,6,2,0,F S,Atletico Madrid,5,3.594253,4.075823,0.15847,2020
2097,Lionel Messi,10,851,4,6.6598,0,2.558279,49,22,3,0,F M S,Barcelona,2,5.173245,8.966301,5.945141,2020
8978,Luis Suárez,8,378,2,0.94555,0,0.234944,8,5,2,0,F M S,Granada,2,0.94555,1.863301,0.682806,2020
2371,Cristiano Ronaldo,6,470,8,5.79958,1,0.725525,29,7,0,0,F S,Juventus,6,4.276982,5.001183,1.742834,2020
2097,Lionel Messi,33,2876,25,20.849667,20,16.593363,159,88,4,0,F M S,Barcelona,20,17.133279,34.923467,13.537658,2019
2098,Luis Suárez,28,1989,16,13.913964,8,3.169745,79,26,4,0,F S,Barcelona,15,13.170687,18.988987,4.466442,2019
2371,Cristiano Ronaldo,33,2920,31,29.431679,5,6.067226,208,51,3,0,F,Juventus,19,19.534956,26.553609,7.224642,2019
2097,Lionel Messi,34,2704,36,25.997169,13,15.335166,170,93,3,0,F S,Barcelona,32,22.280909,38.459877,10.698799,2018
2098,Luis Suárez,33,2832,21,24.394436,6,7.323391,112,47,5,0,F S,Barcelona,17,21.421453,36.808274,12.378092,2018
2371,Cristiano Ronaldo,31,2692,21,23.32404,8,5.193874,175,48,3,0,F M S,Juventus,16,18.756287,21.85417,7.177339,2018


Usando la funcion $loc()$ podemos seleccionar filas o columnas siguiendo una etiqueta o tambien siguiendo una condicion

In [None]:
dfcsv.loc[dfcsv.goals == 4]

Unnamed: 0_level_0,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_name,npg,npxG,xGChain,xGBuildup,year
player_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
843,James Ward-Prowse,11,990,4,0.714341,3,1.256335,10,18,2,0,M,Southampton,4,0.714341,2.517873,2.041089,2020
65,Timo Werner,11,943,4,5.500642,3,2.101302,28,10,0,0,F M,Chelsea,4,5.500642,7.629157,2.278263,2020
935,Kurt Zouma,10,900,4,0.743740,0,0.018804,8,1,1,0,D,Chelsea,4,0.743740,1.932639,1.932639,2020
1776,Jarrod Bowen,11,921,4,3.402587,1,1.321204,27,10,0,0,F M,West Ham,4,3.402587,5.957856,2.169729,2020
838,Sadio Mané,10,825,4,4.921544,1,1.395934,32,16,2,0,F M S,Liverpool,4,4.921544,7.383594,1.892225,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,Jonathan Schmid,33,2826,4,5.148357,8,6.221347,60,64,2,0,F M S,Freiburg,4,5.148357,9.149545,3.101696,2014
194,Admir Mehmedi,28,2254,4,5.144821,1,1.570053,51,29,3,0,F M S,Freiburg,4,4.387044,8.266530,4.162098,2014
4246,Mike Frantz,25,1191,4,3.493235,1,1.234888,23,12,3,0,F M S,Freiburg,4,3.493235,5.093012,1.453917,2014
62,Lukas Rupp,31,2010,4,2.587367,1,2.207702,32,30,3,0,F M S,Paderborn,4,2.587367,6.888414,2.880639,2014
