# Pandas Tutorial

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/1200px-Pandas_logo.svg.png" width="30%">

In [38]:
import pandas as pd

## Series y DataFrames

Los dos elementos principales de pandas son las `Series` y los `DataFrame`. 

Una `Series` es básicamente una columna y un `DataFrame` es una tabla multidimensional compuesta de una colección de `Series`.

<img src="https://storage.googleapis.com/lds-media/images/series-and-dataframe.width-1200.png" width=600px />



### Creando DataFrames desde cero

Se puede crear a poartir de un simple `diccionario`.

En el ejemplo tenemos un puesto de frutas que vende manzanas y naranjas. Queremos una columna por cada fruta y una fila por cada compra de un cliente.


In [39]:
data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}

In [40]:
purchases = pd.DataFrame(data)

purchases

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


El **Indice** de este DataFrame se creó automaticamente al iniciarlo, usando los números0-3, pero podemos asignar los que queramos.

Los nombres de los clientes serán los índices 

In [41]:
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])

purchases

Unnamed: 0,apples,oranges
June,3,0
Robert,2,3
Lily,0,7
David,1,2


Ahora podemos buscar el pedido de un cliente usando su nombre:

In [42]:
purchases.loc['June']

apples     3
oranges    0
Name: June, dtype: int64

También podemos acceder por columnas

In [43]:
purchases['oranges']

June      0
Robert    3
Lily      7
David     2
Name: oranges, dtype: int64

### Leyendo datos desde un CSV


In [44]:
df = pd.read_csv('purchases.csv')

df

Unnamed: 0.1,Unnamed: 0,apples,oranges
0,June,3,0
1,Robert,2,3
2,Lily,0,7
3,David,1,2


Al leer podemos elegir qué columna es el `index_col`:

In [45]:
df = pd.read_csv('purchases.csv', index_col=0)

df

Unnamed: 0,apples,oranges
June,3,0
Robert,2,3
Lily,0,7
David,1,2


## Operaciones principales con DataFrame

Vamos a cargar la lista de películas IMDB:

In [46]:
movies_df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")

### Visualizando tus datos

Imprimimos unas pocas filas con `.head()`:

In [47]:
movies_df.head()

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


`.head()` muesatra las primeras **cinco** filas por defecto, pero se puede especificar otro número `movies_df.head(10)`.

Para ver las últimas **filas** usamos `.tail()`. 

In [48]:
movies_df.tail(2)

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Search Party,999,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,,22.0
Nine Lives,1000,"Comedy,Family,Fantasy",A stuffy businessman finds himself trapped ins...,Barry Sonnenfeld,"Kevin Spacey, Jennifer Garner, Robbie Amell,Ch...",2016,87,5.3,12435,19.64,11.0


### Obteniendo información de tus datos

`.info()` debería ser uno de tus primeros métodos después de cargar tus datos

In [49]:
movies_df.info()

&lt;class &#39;pandas.core.frame.DataFrame&#39;&gt;
Index: 1000 entries, Guardians of the Galaxy to Nine Lives
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Genre               1000 non-null   object 
 2   Description         1000 non-null   object 
 3   Director            1000 non-null   object 
 4   Actors              1000 non-null   object 
 5   Year                1000 non-null   int64  
 6   Runtime (Minutes)   1000 non-null   int64  
 7   Rating              1000 non-null   float64
 8   Votes               1000 non-null   int64  
 9   Revenue (Millions)  872 non-null    float64
 10  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(4)
memory usage: 93.8+ KB


In [50]:
movies_df.shape

(1000, 11)

### Cambiar los nombres de las columnas



In [51]:
movies_df.columns

Index([&#39;Rank&#39;, &#39;Genre&#39;, &#39;Description&#39;, &#39;Director&#39;, &#39;Actors&#39;, &#39;Year&#39;,
       &#39;Runtime (Minutes)&#39;, &#39;Rating&#39;, &#39;Votes&#39;, &#39;Revenue (Millions)&#39;,
       &#39;Metascore&#39;],
      dtype=&#39;object&#39;)

In [52]:
movies_df.rename(columns={
        'Runtime (Minutes)': 'Runtime', 
        'Revenue (Millions)': 'Revenue_millions'
    }, inplace=True)


movies_df.columns

Index([&#39;Rank&#39;, &#39;Genre&#39;, &#39;Description&#39;, &#39;Director&#39;, &#39;Actors&#39;, &#39;Year&#39;, &#39;Runtime&#39;,
       &#39;Rating&#39;, &#39;Votes&#39;, &#39;Revenue_millions&#39;, &#39;Metascore&#39;],
      dtype=&#39;object&#39;)

Otra opción, queremos todos los nombres de las columnas en minúscula. En lugar de `.rename()`:

In [53]:
movies_df.columns = ['rank', 'genre', 'description', 'director', 'actors', 'year', 'runtime', 
                     'rating', 'votes', 'revenue_millions', 'metascore']


movies_df.columns

Index([&#39;rank&#39;, &#39;genre&#39;, &#39;description&#39;, &#39;director&#39;, &#39;actors&#39;, &#39;year&#39;, &#39;runtime&#39;,
       &#39;rating&#39;, &#39;votes&#39;, &#39;revenue_millions&#39;, &#39;metascore&#39;],
      dtype=&#39;object&#39;)

But that's too much work. Instead of just renaming each column manually we can do a list comprehension:

In [54]:
movies_df.columns = [col.lower() for col in movies_df]

movies_df.columns

Index([&#39;rank&#39;, &#39;genre&#39;, &#39;description&#39;, &#39;director&#39;, &#39;actors&#39;, &#39;year&#39;, &#39;runtime&#39;,
       &#39;rating&#39;, &#39;votes&#39;, &#39;revenue_millions&#39;, &#39;metascore&#39;],
      dtype=&#39;object&#39;)

### Comprendiendo tus variables

Usando `describe()` obtenemos un resumen de la distribuación de todas las variables continuas:

In [55]:
movies_df.describe()

Unnamed: 0,rank,year,runtime,rating,votes,revenue_millions,metascore
count,1000.0,1000.0,1000.0,1000.0,1000.0,872.0,936.0
mean,500.5,2012.783,113.172,6.7232,169808.3,82.956376,58.985043
std,288.819436,3.205962,18.810908,0.945429,188762.6,103.25354,17.194757
min,1.0,2006.0,66.0,1.9,61.0,0.0,11.0
25%,250.75,2010.0,100.0,6.2,36309.0,13.27,47.0
50%,500.5,2014.0,111.0,6.8,110799.0,47.985,59.5
75%,750.25,2016.0,123.0,7.4,239909.8,113.715,72.0
max,1000.0,2016.0,191.0,9.0,1791916.0,936.63,100.0




`.describe()` se puede usar también con variables categóricas

In [56]:
movies_df['genre'].describe()

count                        1000
unique                        207
top       Action,Adventure,Sci-Fi
freq                           50
Name: genre, dtype: object

In [57]:
movies_df['genre'].value_counts().head(10)

Action,Adventure,Sci-Fi       50
Drama                         48
Comedy,Drama,Romance          35
Comedy                        32
Drama,Romance                 31
Animation,Adventure,Comedy    27
Action,Adventure,Fantasy      27
Comedy,Drama                  27
Comedy,Romance                26
Crime,Drama,Thriller          24
Name: genre, dtype: int64

#### Correlación entre variables continuas

Usando el comando `.corr()`:

In [58]:
movies_df.corr()

Unnamed: 0,rank,year,runtime,rating,votes,revenue_millions,metascore
rank,1.0,-0.261605,-0.221739,-0.219555,-0.283876,-0.271592,-0.191869
year,-0.261605,1.0,-0.1649,-0.211219,-0.411904,-0.12679,-0.079305
runtime,-0.221739,-0.1649,1.0,0.392214,0.407062,0.267953,0.211978
rating,-0.219555,-0.211219,0.392214,1.0,0.511537,0.217654,0.631897
votes,-0.283876,-0.411904,0.407062,0.511537,1.0,0.639661,0.325684
revenue_millions,-0.271592,-0.12679,0.267953,0.217654,0.639661,1.0,0.142397
metascore,-0.191869,-0.079305,0.211978,0.631897,0.325684,0.142397,1.0


### DataFrame: slicing, seleccionar y extraer



#### Por columna


In [59]:
genre_col = movies_df['genre']

type(genre_col)

pandas.core.series.Series

In [60]:
genre_col = movies_df[['genre']]

type(genre_col)

pandas.core.frame.DataFrame

In [61]:
subset = movies_df[['genre', 'rating']]

subset.head()

Unnamed: 0_level_0,genre,rating
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Guardians of the Galaxy,"Action,Adventure,Sci-Fi",8.1
Prometheus,"Adventure,Mystery,Sci-Fi",7.0
Split,"Horror,Thriller",7.3
Sing,"Animation,Comedy,Family",7.2
Suicide Squad,"Action,Adventure,Fantasy",6.2


#### Por filas

 

- `.loc` - busca por nombre
- `.iloc`- busca por índice



In [62]:
prom = movies_df.loc["Prometheus"]

prom

rank                                                                2
genre                                        Adventure,Mystery,Sci-Fi
description         Following clues to the origin of mankind, a te...
director                                                 Ridley Scott
actors              Noomi Rapace, Logan Marshall-Green, Michael Fa...
year                                                             2012
runtime                                                           124
rating                                                              7
votes                                                          485820
revenue_millions                                               126.46
metascore                                                          65
Name: Prometheus, dtype: object

In [63]:
prom = movies_df.iloc[1]
prom

rank                                                                2
genre                                        Adventure,Mystery,Sci-Fi
description         Following clues to the origin of mankind, a te...
director                                                 Ridley Scott
actors              Noomi Rapace, Logan Marshall-Green, Michael Fa...
year                                                             2012
runtime                                                           124
rating                                                              7
votes                                                          485820
revenue_millions                                               126.46
metascore                                                          65
Name: Prometheus, dtype: object

In [64]:
movie_subset = movies_df.loc['Prometheus':'Sing']

movie_subset = movies_df.iloc[1:4]

movie_subset

Unnamed: 0_level_0,rank,genre,description,director,actors,year,runtime,rating,votes,revenue_millions,metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0




#### Selección condicional 


In [65]:
condition = (movies_df['director'] == "Ridley Scott")

condition.head()

Title
Guardians of the Galaxy    False
Prometheus                  True
Split                      False
Sing                       False
Suicide Squad              False
Name: director, dtype: bool

In [66]:
movies_df[movies_df['director'] == "Ridley Scott"].head()

Unnamed: 0_level_0,rank,genre,description,director,actors,year,runtime,rating,votes,revenue_millions,metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
The Martian,103,"Adventure,Drama,Sci-Fi",An astronaut becomes stranded on Mars after hi...,Ridley Scott,"Matt Damon, Jessica Chastain, Kristen Wiig, Ka...",2015,144,8.0,556097,228.43,80.0
Robin Hood,388,"Action,Adventure,Drama","In 12th century England, Robin and his band of...",Ridley Scott,"Russell Crowe, Cate Blanchett, Matthew Macfady...",2010,140,6.7,221117,105.22,53.0
American Gangster,471,"Biography,Crime,Drama","In 1970s America, a detective works to bring d...",Ridley Scott,"Denzel Washington, Russell Crowe, Chiwetel Eji...",2007,157,7.8,337835,130.13,76.0
Exodus: Gods and Kings,517,"Action,Adventure,Drama",The defiant leader Moses rises up against the ...,Ridley Scott,"Christian Bale, Joel Edgerton, Ben Kingsley, S...",2014,150,6.0,137299,65.01,52.0


In [67]:
movies_df[movies_df['rating'] >= 8.6].head(3)

Unnamed: 0_level_0,rank,genre,description,director,actors,year,runtime,rating,votes,revenue_millions,metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Interstellar,37,"Adventure,Drama,Sci-Fi",A team of explorers travel through a wormhole ...,Christopher Nolan,"Matthew McConaughey, Anne Hathaway, Jessica Ch...",2014,169,8.6,1047747,187.99,74.0
The Dark Knight,55,"Action,Crime,Drama",When the menace known as the Joker wreaks havo...,Christopher Nolan,"Christian Bale, Heath Ledger, Aaron Eckhart,Mi...",2008,152,9.0,1791916,533.32,82.0
Inception,81,"Action,Adventure,Sci-Fi","A thief, who steals corporate secrets through ...",Christopher Nolan,"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen...",2010,148,8.8,1583625,292.57,74.0


# Ejercicio

Mostrar los directores que han dirigido una película de Sci-Fi con nota superior a un 8.

In [80]:
movies_df.director[(movies_df.rating > 8) & (movies_df.genre.str.contains("Sci-Fi"))]

Title
Guardians of the Galaxy           James Gunn
Interstellar               Christopher Nolan
The Prestige               Christopher Nolan
Mad Max: Fury Road             George Miller
The Avengers                     Joss Whedon
Inception                  Christopher Nolan
Name: director, dtype: object