# Pandas

Pandas está enfocada a la manipulación y análisis de datos.

- Al estar construido sobre NumPy es veloz.
- Requiere poco código para manipular los datos.
- Soporta múltiples formatos de archivos.
- Ordena los datos en una alineación inteligente.

Se pueden manejar **grandes cantidades de datos**, hacer analítica y generar dashboards.

## Series

Las `Series` son secuencias ordenadas unidimensionales que pueden contener diferentes tipos de valores. En esto se parecen a las listas. De hecho podemos crear `Series` usando listas.

In [1]:
import pandas as pd
import numpy as np

In [14]:
students = pd.Series(['Robert', 'Charly', 'George', 'Leo'], index=[1, 30, 10, 7])
students

1     Robert
30    Charly
10    George
7        Leo
dtype: object

In [7]:
students = pd.Series(['Robert', 'Charly', 'George', 'Leo'])
print(students)

0    Robert
1    Charly
2    George
3       Leo
dtype: object


In [4]:
students = pd.Series(['Robert', 'Charly', 'George', 'Leo'], index=['a', 'b', 'c', 'd'])
students

a    Robert
b    Charly
c    George
d       Leo
dtype: object

Es posible crear `Series` a partir de diccionarios:

In [5]:
dict = {1:'Robert', 7:'Charly', 10:'George', 30:'Leo'}
pd.Series(dict)

1     Robert
7     Charly
10    George
30       Leo
dtype: object

Se puede crear una serie de pandas desde una lista, un diccionario o incluso un formato json.

### Acceso a los elementos de la `Serie`

Para acceder a un elemento de la serie, se utiliza el `index` a sociado al elemento en particular.

In [12]:
students[7]

'Leo'

In [15]:
students[0:3]

1     Robert
30    Charly
10    George
dtype: object

### Ejercicio 1. Series

1.  Crear una Serie a partir de los nombres contenidos en cada una de las variables: `ejecutivo_`. Existe una variable `sueldos` que aun no ha sido asignada, tu reto consiste en asignar a la variable `sueldos` la información de dichos empleados.
2. Una vez definidos los sueldos crea una Serie de manera que el el bloque de código que imprime los datos funcione.

In [None]:
ejecutivo_1 = 'Marco P.'
ejecutivo_2 = 'Jenny'
ejecutivo_3 = 'Britney Baby'
ejecutivo_4 = 'Pepe Guardabosques'
ejecutivo_5 = 'Lombardo El Destructor'

sueldos = []

In [None]:
# Crear la Serie
sueldos = 

In [None]:
print('== Sueldos de los principales ejecutivos de EyePoker Inc. ==\n')

print(f'{("Ejecutivo"):25} | {("Sueldo")}')
print('----------------------------------------')
print(f'{ejecutivo_1:25} | ${(sueldos.loc[ejecutivo_1])} MXN')
print(f'{ejecutivo_2:25} | ${(sueldos.loc[ejecutivo_2])} MXN')
print(f'{ejecutivo_3:25} | ${(sueldos.loc[ejecutivo_3])} MXN')
print(f'{ejecutivo_4:25} | ${(sueldos.loc[ejecutivo_4])} MXN')
print(f'{ejecutivo_5:25} | ${(sueldos.loc[ejecutivo_5])} MXN')

## DataFrames

Los `DataFrames` son entonces estructuras de datos bidimensionales. Hay innumerables formas de crear `DataFrames`. Una forma simple es a apartir de un diccionario de listas.

In [16]:
dict = {'Jugador' : ['Navas', 'Mbappe', 'Neymar', 'Messi'] , 
'Altura' : [183.0, 170.0, 170.0, 165.0], 
'Goles': [2, 200, 200, 200]}

In [17]:
df_players = pd.DataFrame(dict, index=[1, 7, 10, 30])
df_players

Unnamed: 0,Jugador,Altura,Goles
1,Navas,183.0,2
7,Mbappe,170.0,200
10,Neymar,170.0,200
30,Messi,165.0,200


In [18]:
df_players.columns

Index(['Jugador', 'Altura', 'Goles'], dtype='object')

In [19]:
df_players.index

Index([1, 7, 10, 30], dtype='int64')

### Ejercicio 2. DataFrames

A partir de los siguientes datos e índices que les corresponde, crear un `DataFrame` utilizando la variable `datos_producto` e `indice`:

In [None]:
datos_productos = {
    "nombre": ["Pokemaster", "Cegatron", "Pikame Mucho", "Lazarillo de Tormes", "Stevie Wonder", "Needle", "El AyMeDuele"],
    "precio": [10000, 5500, 3500, 750, 15500, 12250, 23000],
    "peso": [1.2, 1.5, 2.3, 5.5, 3.4, 2.4, 8.8],
    "capacidad de destrucción retinal": [3, 7, 6, 8, 9, 2, 10],
    "disponible": [True, False, True, True, False, False, True]
}

indice = [1, 2, 3, 4, 5, 6, 7]

In [None]:
df_productos =

In [None]:
df_productos

## Archivos CSV y JSON

In [20]:
df_books = pd.read_csv('bestsellers-with-categories.csv', sep=',', header=0)

In [21]:
df_books

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


In [22]:
df_books.columns

Index(['Name', 'Author', 'User Rating', 'Reviews', 'Price', 'Year', 'Genre'], dtype='object')

Lectura de un archivo JSON

In [23]:
df_hardware = pd.read_json('hpcharactersdataraw.json', typ='Series')

In [24]:
df_hardware

0       {'Name': 'Mrs. Abbott', 'Link': 'https://www.h...
1       {'Name': 'Hannah Abbott', 'Link': 'https://www...
2       {'Name': 'Abel Treetops', 'Link': 'https://www...
3       {'Name': 'Euan Abercrombie', 'Link': 'https://...
4       {'Name': 'Aberforth Dumbledore', 'Link': 'http...
                              ...                        
1935    {'Name': 'Georgi Zdravko', 'Link': 'https://ww...
1936    {'Name': 'Zograf', 'Link': 'https://www.hp-lex...
1937    {'Name': 'Zonko', 'Link': 'https://www.hp-lexi...
1938    {'Name': 'Valentina Vázquez', 'Link': 'https:/...
1939    {'Name': 'Zygmunt Budge', 'Link': 'https://www...
Length: 1940, dtype: object

In [25]:
import json

f = open('hpcharactersdataraw.json', 'r')
json_data = json.load(f)
f.close()

df = pd.DataFrame.from_dict(json_data)


In [26]:
df

Unnamed: 0,Name,Link,Descr,Gender,Species/Race,Blood,School,Profession
0,Mrs. Abbott,https://www.hp-lexicon.org/character/abbott-fa...,"Mrs. Abbott was the mother of Hannah Abbott, a...",Female,Witch,Muggle-born,Unknown,Unknown
1,Hannah Abbott,https://www.hp-lexicon.org/character/abbott-fa...,Hannah Abbott is a Hufflepuff student in Harry...,Female,Witch,Half-blood,Hogwarts - Hufflepuff,Landlady of the Leaky Cauldron
2,Abel Treetops,https://www.hp-lexicon.org/character/abel-tree...,Abel Treetops was a wizard from Cincinnati who...,Male,Wizard,Unknown,Unknown,Unknown
3,Euan Abercrombie,https://www.hp-lexicon.org/character/abercromb...,Euan Abercrombie was a small boy with prominen...,Male,Wizard,Unknown,Hogwarts - Gryffindor,Unknown
4,Aberforth Dumbledore,https://www.hp-lexicon.org/character/dumbledor...,"Aberforth Dumbledore was a tall, thin, grumpy-...",Male,Wizard,Half-blood,Hogwarts - Student,Barman
...,...,...,...,...,...,...,...,...
1935,Georgi Zdravko,https://www.hp-lexicon.org/character/georgi-zd...,Georgi Zdravko played Keeper for the Bulgarian...,Male,Wizard,Unknown,Unknown,Quidditch player (Seeker)
1936,Zograf,https://www.hp-lexicon.org/character/zograf/,Zograf played Keeper for the Bulgarian Nationa...,,Wizard,Unknown,Unknown,Quidditch player (Keeper)
1937,Zonko,https://www.hp-lexicon.org/character/zonko/,Founder(?) of Zonko’s Joke Shop. Possibly a re...,,Unknown,Unknown,Unknown,Unknown
1938,Valentina Vázquez,https://www.hp-lexicon.org/character/valentina...,Valentina Vázquez was President of the Argenti...,Female,Witch,Unknown,Unknown,President of the Argentinian Council of Magic


## Indexación: loc & iloc

Los método avanzados de indexación son utiles al momento de explorar y procesar datos.

In [35]:
df_books[0:4]

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction


In [28]:
df_books['Name']

0                          10-Day Green Smoothie Cleanse
1                                      11/22/63: A Novel
2                12 Rules for Life: An Antidote to Chaos
3                                 1984 (Signet Classics)
4      5,000 Awesome Facts (About Everything!) (Natio...
                             ...                        
545         Wrecking Ball (Diary of a Wimpy Kid Book 14)
546    You Are a Badass: How to Stop Doubting Your Gr...
547    You Are a Badass: How to Stop Doubting Your Gr...
548    You Are a Badass: How to Stop Doubting Your Gr...
549    You Are a Badass: How to Stop Doubting Your Gr...
Name: Name, Length: 550, dtype: object

In [29]:
df_books[['Name', 'Author', 'Year']]

Unnamed: 0,Name,Author,Year
0,10-Day Green Smoothie Cleanse,JJ Smith,2016
1,11/22/63: A Novel,Stephen King,2011
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,2018
3,1984 (Signet Classics),George Orwell,2017
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,2019
...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,2019
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,2016
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,2017
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,2018


In [31]:
df_books.loc[:]

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


In [36]:
df_books.loc[0:4]

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


In [37]:
df_books.loc[0:4, ['Name', 'Author']]

Unnamed: 0,Name,Author
0,10-Day Green Smoothie Cleanse,JJ Smith
1,11/22/63: A Novel,Stephen King
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson
3,1984 (Signet Classics),George Orwell
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids


In [38]:
df_books.loc[:, ['Reviews']] * -1

Unnamed: 0,Reviews
0,-17350
1,-2052
2,-18979
3,-21424
4,-7665
...,...
545,-9413
546,-14331
547,-14331
548,-14331


In [39]:
df_books.loc[:, ['Author']] == 'JJ Smith'

Unnamed: 0,Author
0,True
1,False
2,False
3,False
4,False
...,...
545,False
546,False
547,False
548,False


In [40]:
df_books.iloc[:]

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


In [41]:
df_books.iloc[:, 0:3]

Unnamed: 0,Name,Author,User Rating
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7
1,11/22/63: A Novel,Stephen King,4.6
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7
3,1984 (Signet Classics),George Orwell,4.7
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8
...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7


In [42]:
df_books.iloc[1, 3] * -1

np.int64(-2052)

In [43]:
df_books.iloc[:2, 2:]

Unnamed: 0,User Rating,Reviews,Price,Year,Genre
0,4.7,17350,8,2016,Non Fiction
1,4.6,2052,22,2011,Fiction


### Ejercicio 3. Indexación

Retomar el ejercicio 2. Ahora, indexa tu DataFrame para obtener los subconjuntos requeridos. Los productos en existencia tienen un orden específico en la base de datos. El orden correcto es el que está definido en `datos_productos`. Eso significa que el "Pokemaster" tiene el índice `1` y es el primer producto; y el "El AyMeDuele" tiene el ìndice `7` y es el último producto.

Realiza las indexaciones debajo. Recuerda ordenar tus DataFrames en el orden en el que los menciona el Analista:

In [None]:
# Quiero un DataFrame que contenga los productos "Pikame Mucho" y "Stevie Wonder"
pm_sw =

# Quiero un DataFrame que contenga desde el producto #4 hasta el último
p4_final =

# Quiero un DataFrame que contenga los productos "El AyMeDuele", "Lazarillo de Tormes" y "Needle"
amd_lt_n =

# Quiero un DataFrame que contenga desde el primer producto hasta el producto #5
primer_p5 =

# Quiero un DataFrame que contenga los productos "Pikame Mucho" y "Lazarillo de Tormes", pero sólo con las columnas "nombre", "precio" y "peso"
pm_lt_pp =

# Quiero un DataFrame que contenga todos los productos pero con sólo las columnas 'nombre', 'precio' y 'capacidad de destrucción retinal'
t_pcdr =

# Quiero un DataFrame que contenga desde el producto #3 hasta el #6, pero sólo las columnas 'nombre', 'precio' y 'disponible'
p3_p6_pd =

## Agregar o eliminar datos con Pandas

In [44]:
df_books.head(5)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


In [49]:
# drop columns
df_books.drop('Genre', axis = 1)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019
...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018


In [50]:
df_books

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


In [51]:
df_books.drop('Genre', axis= 1, inplace=True)
# inplace borra la columna no solo en la salida, si no en el df

In [52]:
df_books.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019


In [53]:
df_books = df_books.drop('Year', axis=1)
# Otra forma de asegurarnos que drop afecte al df 

In [54]:
df_books.head(2)

Unnamed: 0,Name,Author,User Rating,Reviews,Price
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8
1,11/22/63: A Novel,Stephen King,4.6,2052,22


In [55]:
del df_books['Price']
# Es una función de python y no de pandas

In [56]:
df_books

Unnamed: 0,Name,Author,User Rating,Reviews
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350
1,11/22/63: A Novel,Stephen King,4.6,2052
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979
3,1984 (Signet Classics),George Orwell,4.7,21424
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665
...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331


In [57]:
# Para borrar filas
df_books.drop(0, axis=0).head(2)

Unnamed: 0,Name,Author,User Rating,Reviews
1,11/22/63: A Novel,Stephen King,4.6,2052
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979


In [58]:
df_books.drop([0,1,2], axis=0).head(2)

Unnamed: 0,Name,Author,User Rating,Reviews
3,1984 (Signet Classics),George Orwell,4.7,21424
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665


In [59]:
df_books.drop(range(0,10), axis=0).head(2)

Unnamed: 0,Name,Author,User Rating,Reviews
10,A Man Called Ove: A Novel,Fredrik Backman,4.6,23848
11,A Patriot's History of the United States: From...,Larry Schweikart,4.6,460


In [60]:
# Agregar columnas
df_books.head(2)

Unnamed: 0,Name,Author,User Rating,Reviews
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350
1,11/22/63: A Novel,Stephen King,4.6,2052


In [61]:
df_books['New_column'] = np.nan

In [62]:
df_books

Unnamed: 0,Name,Author,User Rating,Reviews,New_column
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,
1,11/22/63: A Novel,Stephen King,4.6,2052,
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,
3,1984 (Signet Classics),George Orwell,4.7,21424,
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,
...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,


In [63]:
data = np.arange(0, df_books.shape[0])
df_books['Range'] = data

In [64]:
df_books

Unnamed: 0,Name,Author,User Rating,Reviews,New_column,Range
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,,0
1,11/22/63: A Novel,Stephen King,4.6,2052,,1
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,,2
3,1984 (Signet Classics),George Orwell,4.7,21424,,3
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,,4
...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,,545
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,,546
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,,547
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,,548


In [72]:
# Agregar filas
df_books_new = pd.concat([df_books, df_books])#, ignore_index=True)
#df_books_new.reset_index()

df_books_new

Unnamed: 0,Name,Author,User Rating,Reviews,New_column,Range
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,,0
1,11/22/63: A Novel,Stephen King,4.6,2052,,1
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,,2
3,1984 (Signet Classics),George Orwell,4.7,21424,,3
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,,4
...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,,545
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,,546
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,,547
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,,548


In [73]:
df_books_new.loc[3, :]

Unnamed: 0,Name,Author,User Rating,Reviews,New_column,Range
3,1984 (Signet Classics),George Orwell,4.7,21424,,3
3,1984 (Signet Classics),George Orwell,4.7,21424,,3


### Ejercicio 4. Manipulación de columnas



In [None]:
datos_productos = {
    "nombre": ["Pokemaster", "Cegatron", "Pikame Mucho", "Lazarillo de Tormes", "Stevie Wonder", "Needle", "El AyMeDuele"],
    "precio": [10000, 5500, 3500, 750, 15500, 12250, 23000],
    "peso": [1.2, 1.5, 2.3, 5.5, 3.4, 2.4, 8.8],
    "capacidad de destrucción retinal": [3, 7, 6, 8, 9, 2, 10],
    "disponible": [True, False, True, True, False, False, True]
}

indice = [1, 2, 3, 4, 5, 6, 7]

Tareas a realizar: creación de una nueva columna, la asignación de nuevos datos a una columna y la eliminación de un par de columnas. Crea un DataFrame usando datos_productos e indice, realiza sus pedidos y envíalos para su verificación:

In [None]:
df_productos = 

In [None]:
# Agrega por favor una nueva columna a `df_productos_mas_columna_nueva` con el nombre de columna "nivel de dolor"
columna_nueva = [4, 7, 6, 8, 9, 7, 3]
df_productos_mas_columna_nueva = df_productos.copy()

# Cambia por favor el `DataFrame` `df_productos_descuento` cambiando la columna `precio` por la información contenida en `precios_descuento`
precios_descuento = [8000, 4000, 2000, 500, 14000, 10000, 15000]
df_productos_descuento = df_productos.copy()

# Elimina por favor las columnas "precio" y "peso" de `df_productos` y asigna el resultado a `df_productos_sin_precio_ni_peso`
df_productos_sin_precio_ni_peso =

## Manejo de datos nulos

In [None]:
dict ={'col1': [1, 2, 3, np.nan],
 'col2': [4, np.nan, 6, 7],
 'col3': ['a', 'b', 'c', None]}

df = pd.DataFrame(dict)
df

In [None]:
df.isna()

In [None]:
df.isnull()

In [None]:
df.isnull()*1

In [None]:
df.fillna('Missing')

In [None]:
df = df.drop(['col3'], axis=1)
df.fillna(df.mean())

In [None]:
df.interpolate()

### Limpieza de NaNs por filas



In [None]:
datos = {
    'precio': [34, 54, np.nan, np.nan, 56, 12, 34],
    'cantidad_en_stock': [3, 6, 14, np.nan, 5, 2, 10],
    'productos_vendidos': [3, 45, 23, np.nan, 24, 6, np.nan]
}

df = pd.DataFrame(datos, index=["Pokemaster", "Cegatron", "Pikame Mucho", "Lazarillo de Tormes", "Stevie Wonder", "Needle", "El AyMeDuele"])

In [None]:
df

Para limpiar las filas que tengan mínimo 1 valor `NaN`, se utiliza `dropna(axis=0, how='any')`:

In [None]:
df.dropna(axis=0, how='any')

Con `axis=0` se elimina por filas. Con `how='any'` se elimina cualquier fila que tenga mínimo un `NaN`.

Si se desea eliminar sólo las filas donde todos los valores sean NaN, podemos usar axis='all':

In [None]:
df.dropna(axis=0, how='all')

### Limpieza de NaNs por columnas

In [None]:
df['descuento'] = np.nan
df

Al igual que por filas, eliminar `NaNs` por columna también se puede realizar utilizando `any` y `all`. La única diferencia es que se utiliza `axis=1` para eliminar por columnas:

In [None]:
df.dropna(axis=1, how='any')

In [None]:
df_dropped = df.dropna(axis=1, how='all')

In [None]:
df_dropped

### Llenado de NaNs con valores 

Una estrategia común es llenar los valores NaN con algún valor, pare evitar eliminar datos.

Por ejemplo, considerando el dataset:

In [None]:
df

Primero eliminar filas y columnas donde todos los valores sean NaN:

In [None]:
df_no_nans = df.dropna(axis=0, how='all')
df_no_nans = df_no_nans.dropna(axis=1, how='all')

df_no_nans

Ahora, suponer que se asume que si hay un valor NaN en "productos_vendidos" es porque no ha sido vendido aún. En ese caso se puede rellenar el `NaN` usando `fillna`:

In [None]:
df_no_nans['productos_vendidos'] = df_no_nans['productos_vendidos'].fillna(0)
df_no_nans

Para finalizar, "precio" sí es una variable muy importante, así que nos deshacemos de las filas que aún tengan `NaNs`:

In [None]:
df_no_nans.dropna(axis=0)

### Ejercicio: Identificación y limpieza de NaNs

In [None]:
datos = {
    'precio': [12000, 5500, np.nan, 4800, 8900, np.nan, 1280, 1040, 23100, np.nan, 15000, 13400, np.nan],
    'cantidad_en_stock': [34, 54, np.nan, 78, 56, np.nan, 34, 4, 0, 18, 45, 23, 5],
    'cantidad_vendidos': [120, 34, np.nan, 9, 15, np.nan, 103, np.nan, np.nan, 23, 10, 62, 59],
    'descuentos': [np.nan] * 13
}

df = pd.DataFrame(datos, index=["Pokemaster", "Cegatron", "Pikame Mucho", "Lazarillo de Tormes", "Stevie Wonder", "Needle", "El AyMeDuele", "El Desretinador", "Sacamel Ojocles", "Desojado", "Maribel Buenas Noches", "Cíclope", "El Cuatro Ojos"])


In [None]:
df

Realizar los siguientes pasos para limpiar tu dataset:

1. Hacer un conteo de cuántos `NaNs` hay en cada fila y en cada columna
2. Elimina las filas y columnas donde todos los valores sean `NaN`.
3. Dado que la columna `cantidad_vendidos` no es tan importante, cambiar los `NaNs` que haya en esa columna por 0.
4. Dado que la columna `precio` es muy importante, eliminar las filas restantes que tengan algún `NaN` en dicha columna.

Realizar todas tus transformaciones usando el `DataFrame` `df_copy`.

In [None]:
df_copy = df.copy()

## Realiza aquí tus transformaciones
##

In [None]:
df_copy

## Filtrado por condiciones

In [None]:
df_books = pd.read_csv('bestsellers-with-categories.csv', sep=',', header=0)
df_books.head()

In [None]:
df_books['Year'] > 2016

In [None]:
gt_2016 = df_books['Year'] > 2016
df_books[gt_2016]

In [None]:
df_books[df_books['Year'] > 2016]

In [None]:
genre_fiction = df_books['Genre'] == 'Fiction'

In [None]:
df_books[genre_fiction & gt_2016]

In [None]:
df_books[~gt_2016]

## Funciones principales de Pandas

In [None]:
df_books.info

In [None]:
df_books.info()

In [None]:
# solo de los atributos numéricos
df_books.describe()

In [None]:
df_books.tail()

In [None]:
# Identificar que tanta memoria utiliza el dataframe, es conveniente iterarlo. o paralelizarlo
df_books.memory_usage(deep=True)

In [None]:
df_books['Author'].value_counts()

In [None]:
df_books.iloc[0]

In [None]:
# deprecated
df_books = df_books.append(df_books.iloc[0])

In [None]:
df_books = pd.concat([df_books, df_books.iloc[0:1]], ignore_index=True)
df_books.reset_index()

In [None]:
df_books.tail()

In [None]:
df_books.drop_duplicates(keep='last')

In [None]:
df_books.sort_values('Year', ascending=True)

## Groupby

In [None]:
df_books.groupby('Author').count()

In [None]:
df_books.groupby('Author').min()


In [None]:
df_books.groupby('Author').max()


In [None]:
df_books.groupby('Author').mean()


In [None]:
df_books[['Author', 'User Rating', 'Reviews']].groupby('Author').mean()

In [None]:
df_books.groupby('Author').sum()

In [None]:
df_books.groupby('Author').sum().loc['William Davis']

In [None]:
df_books.groupby('Author').sum().reset_index()

In [None]:
df_books.groupby('Author').agg(['min', 'max'])

In [None]:
df_books.groupby('Author').agg({'Reviews' : ['min', 'max'], 'User Rating' : 'sum'})

In [None]:
df_books.groupby(['Author', 'Year']).count()

## Combinación de DataFrames

## Concat

El método `concat` de la biblioteca pandas en Python se utiliza para combinar objetos como `DataFrames` o `Series` a lo largo de un eje específico (ya sea filas o columnas). Es muy útil cuando se necesita unir datos de manera flexible sin que necesariamente compartan el mismo índice.

EL método `concat` tiene la siguiente sintaxis general:

````python
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, ...)
````

Parámetros Clave

1. `objs`: Una lista u otro contenedor de objetos pandas (`DataFrame` o `Series`) que se desea concatenar.
2. `axis`: Determina el eje a lo largo del cual se concatenan los objetos.
    - `axis=0` (por defecto): Concatena a lo largo de las filas (apila).
	- `axis=1`: Concatena a lo largo de las columnas (une lado a lado).
3. `join`: Especifica cómo manejar índices o columnas no coincidentes:
	- '`outer`' (por defecto): Unión, conserva todos los índices o columnas.
	- '`inner`': Intersección, conserva solo índices o columnas comunes.
4. `ignore_index`: Si es `True`, reindexa las filas en el resultado, ignorando los índices originales.
5. `keys`: Etiquetas para distinguir los bloques concatenados en un objeto de tipo MultiIndex.

Ejemplos:


In [None]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
 'B': ['B0', 'B1', 'B2', 'B3'],
 'C': ['C0', 'C1', 'C2', 'C3'],
 'D': ['D0', 'D1', 'D2', 'D3']})
df1

In [None]:
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
 'B': ['B4', 'B5', 'B6', 'B7'],
 'C': ['C4', 'C5', 'C6', 'C7'],
 'D': ['D4', 'D5', 'D6', 'D7']})
df2

1. Concatenar DataFrames a lo largo de las filas

In [None]:
pd.concat([df1, df2])

Aquí, los índices originales se conservan. Si se desea reindexar:

In [None]:
pd.concat([df1, df2], ignore_index=True)

2. Concatenar DataFrames a lo largo de las columnas

In [None]:
pd.concat([df1, df2], axis=1)

3. Concatenar con diferentes índices y manejo de valores faltantes

In [None]:
df3 = pd.DataFrame({'A': [1, 2]}, index=[0, 1])
df4 = pd.DataFrame({'B': [3, 4]}, index=[2, 3])

result = pd.concat([df3, df4], axis=1, join='outer')
print(result)

4. Usar keys para etiquetar bloques concatenados

In [None]:
result = pd.concat([df1, df2], keys=['First', 'Second'])
print(result)

## Merge

In [None]:
izq = pd.DataFrame({'key': ['k0', 'k1', 'k2', 'k3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

der = pd.DataFrame({'key': ['k0', 'k1', 'k2', 'k3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

In [None]:
izq

In [None]:
der

In [None]:
izq.merge(der, on='key')

In [None]:
izq = pd.DataFrame({'key': ['k0', 'k1', 'k2', 'k3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

der = pd.DataFrame({'key2': ['k0', 'k1', 'k2', 'k3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

In [None]:
izq

In [None]:
der

In [None]:
izq.merge(der, on='key')

In [None]:
izq.merge(der, left_on='key', right_on='key2')

In [None]:
izq = pd.DataFrame({'key': ['k0', 'k1', 'k2', 'k3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

der = pd.DataFrame({'key2': ['k0', 'k1', 'k2', 'np.nan'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

In [None]:
izq.merge(der, left_on='key', right_on='key2')

In [None]:
izq.merge(der, left_on='key', right_on='key2', how='left')

In [None]:
izq.merge(der, left_on='key', right_on='key2', how='right')

In [None]:
izq.merge(der, left_on='key', right_on='key2', how='inner')

## Join

In [None]:
izq = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']}, 
index=['k0', 'k1', 'k2'])

der = pd.DataFrame({'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']},
index= ['k0', 'k2', 'k3'])


In [None]:
izq.join(der)

In [None]:
izq.join(der, how='inner')

In [None]:
izq.join(der, how='outer')

## Pivot y Melt

...


## Apply

In [None]:
def two_times(value):
    return value * 2

In [None]:
df_books.head()

In [None]:
df_books['User Rating'].apply(two_times)

In [None]:
# Mucho más eficiente que utilizar un for
df_books['Rating_2'] = df_books['User Rating'].apply(two_times)

In [None]:
df_books.head()

In [None]:
df_books['Rating_2'] = df_books['User Rating'].apply(lambda x : x * 3)
df_books.head()

In [None]:
df_books['Rating_2'] = df_books.apply(lambda x : x['User Rating'] * 2 if x['Genre'] == 'Fiction' else x['User Rating'], axis=1)
df_books.head()