# Librería `pandas`


[`pandas`](https://pandas.pydata.org/) es una librería de Python especializada en el manejo y análisis de estructuras de datos. 

Las principales características de esta librería son:
* Define nuevas estructuras de datos basadas en el tipo `array`de la librería *NumPy* pero agregan algunas funcionalidades.
* Permite leer y escribir fácilmente archivos en formato CSV, Excel y bases de datos SQL.
* Permite acceder a los datos mediantes índices o nombres para renglones y columnas.
* Ofrece métodos para reordenar, dividr y combinar conjuntos de datos.
* Permite trabajar con series temporales.

## Lectura de archivos CSV

* Primero, vamos a cargar algunos archivos que vamos a necesitar: `data_xml.xml`, `page.html`, `saleman.txt`, `times.csv`, y `tmdb_5000_movies.csv`. Estos archivos se encuentran en el repositorio de la materia.

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving times.csv to times.csv
User uploaded file "times.csv" with length 130 bytes


* Importamos la librería `pandas` para usarla a lo largo de este cuaderno de trabajo.

In [37]:
import pandas as pd
import numpy as np

* A continuación, cargueremos los datos del archivo a un `DataFrame` de pandas. Un `DataFrame` es un conjunto de datos estructurado en forma de tabla donde cada renglon corresponde a un registro diferente, y las columnas son las distintas características del registro.

* Un `DataFrame`contiene dos índices, uno para los renglones y otro para las columnas. Se pueden acceder mediante los índices de los renglones y los nombres (o índices) de las columnas.

In [None]:
df = pd.read_csv('tmdb_5000_movies.csv')
# El método `head` nos permite desplegar las primeras línea de un DataFrame.
df.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


In [None]:
# El método `info` nos despliega información sobre el tipo de dato de las columnas (características).
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

In [None]:
# El método `describe` nos da información estadística básica de las calumnas.
df.describe()

Unnamed: 0,budget,id,popularity,revenue,runtime,vote_average,vote_count
count,4803.0,4803.0,4803.0,4803.0,4801.0,4803.0,4803.0
mean,29045040.0,57165.484281,21.492301,82260640.0,106.875859,6.092172,690.217989
std,40722390.0,88694.614033,31.81665,162857100.0,22.611935,1.194612,1234.585891
min,0.0,5.0,0.0,0.0,0.0,0.0,0.0
25%,790000.0,9014.5,4.66807,0.0,94.0,5.6,54.0
50%,15000000.0,14629.0,12.921594,19170000.0,103.0,6.2,235.0
75%,40000000.0,58610.5,28.313505,92917190.0,118.0,6.8,737.0
max,380000000.0,459488.0,875.581305,2787965000.0,338.0,10.0,13752.0


* Usando `pandas` podemos leer cualquier archivo CSV sin importar la extensión que tenga.

In [None]:
df = pd.read_csv('salesman.txt')
df.head()

Unnamed: 0,salesman_id,name,city,commission
0,5001,James Hoog,New York,0.15
1,5002,Nail Knite,Paris,0.13
2,5005,Pit Alex,London,0.11
3,5006,Mc Lyon,Paris,0.14
4,5007,Paul Adam,Rome,0.13


* O, incluso, si utilizan separadores diferentes a la `,`.

In [None]:
df = pd.read_csv('times.csv', sep=';')
df.head()

Unnamed: 0,1,6,12:01:03,"0,50",WORST
0,2,16,07:42:51,32,BEST
1,3,19,12:01:29,50,
2,4,13,03:22:50,14,INTERMEDIATE
3,5,8,09:30:03,40,WORST


* También, podemos asignarles nombres a las columnas (si es que no tiene uno asignado).

In [None]:
df = pd.read_csv('times.csv', sep=';', names=['ID', 'TIME','SCORE','CLASS'])
df.head()

Unnamed: 0,ID,TIME,SCORE,CLASS
1,6,12:01:03,50,WORST
2,16,07:42:51,32,BEST
3,19,12:01:29,50,
4,13,03:22:50,14,INTERMEDIATE
5,8,09:30:03,40,WORST


* Algunas veces sólo necesitaremos algunas columnas. Por ejemplo, del archivo `tmdb_5000_movies.csv` sólo necesitamos las columnas `id` (1), `original_title` (6), `popularity` (8), `vote_average` (18), `vote_count` (19).

In [None]:
df = pd.read_csv('tmdb_5000_movies.csv', usecols=[1, 6, 8, 18, 19])
df.head(2)

Unnamed: 0,genres,original_title,popularity,vote_average,vote_count
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",Avatar,150.437577,7.2,11800
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",Pirates of the Caribbean: At World's End,139.082615,6.9,4500


## Lectura de archivos HTML

* Podemos usar `pandas` para leer todas las tablas contenidas en un archivo HTML.

In [None]:
df = pd.read_html('page.html')
print('Total tables:', len(df))
print(df[0])
print(df[1])

Total tables: 2
                        Company           Contact  Country
0           Alfreds Futterkiste      Maria Anders  Germany
1    Centro comercial Moctezuma   Francisco Chang   Mexico
2                  Ernst Handel     Roland Mendel  Austria
3                Island Trading     Helen Bennett       UK
4  Laughing Bacchus Winecellars   Yoshi Tannamuri   Canada
5  Magazzini Alimentari Riuniti  Giovanni Rovelli    Italy
             HEAD1            HEAD2
0  Row 1, Column 1  Row 1, Column 2
1  Row 2, Column 1  Row 2, Column 2


In [None]:
table1 = df[0]
table1.head()

Unnamed: 0,Company,Contact,Country
0,Alfreds Futterkiste,Maria Anders,Germany
1,Centro comercial Moctezuma,Francisco Chang,Mexico
2,Ernst Handel,Roland Mendel,Austria
3,Island Trading,Helen Bennett,UK
4,Laughing Bacchus Winecellars,Yoshi Tannamuri,Canada


* También, podemos optar sólo extraer alguna tabla en particular (que tenga alguna etiqueta determinada).

In [None]:
df = pd.read_html('https://en.wikipedia.org/wiki/Economy_of_the_United_States', match='Nominal GDP')
print('Total tables:', len(df))
df_GDP = df[0]
df_GDP.head(4)

Total tables: 1


Unnamed: 0,No.,Country/Economy,Nominal GDP,Agri.,Indus.,Serv.
0,1,United States,18624450,204868.95,3613143.3,14806437.75
1,*Percentages from CIA World Factbook[130],*Percentages from CIA World Factbook[130],*Percentages from CIA World Factbook[130],*Percentages from CIA World Factbook[130],*Percentages from CIA World Factbook[130],*Percentages from CIA World Factbook[130]


## Lectura de archivos XML
* XML es el acrónimo de Extensible Markup Language, es decir, es un lenguaje de marcado que define un conjunto de reglas para la codificación de documentos. El lenguaje de marcado es un conjunto de códigos que se pueden aplicar en el análisis de datos o la lectura de textos creados por computadoras o personas. El lenguaje XML proporciona una plataforma para definir elementos para crear un formato y generar un lenguaje personalizado.

In [None]:
df = pd.read_xml('data_xml.xml')
df.head()

Unnamed: 0,name,price,description,calories
0,Belgian Waffles,$5.95,Two of our famous Belgian Waffles with plenty ...,650
1,Strawberry Belgian Waffles,$7.95,Light Belgian waffles covered with strawberrie...,900
2,Berry-Berry Belgian Waffles,$8.95,Belgian waffles covered with assorted fresh be...,900
3,French Toast,$4.50,Thick slices made from our homemade sourdough ...,600
4,Homestyle Breakfast,$6.95,"Two eggs, bacon or sausage, toast, and our eve...",950


## Crear `Series` usando una lista de valores

In [39]:
s = pd.Series([1, 3, 5, np.nan, 9, 11])
s

0     1.0
1     3.0
2     5.0
3     NaN
4     9.0
5    11.0
dtype: float64

In [40]:
s[0]

1.0

## Crear un `DataFrame`

* Vamos a crear un `DataFrame` que contenga números aleatorios en donde los índices sean las fechas des el primero de enero de 2020 hasta el 6 de enero de 20202. El `DataFrame` tendrá 4 columnas, llamadas `A`, `B`, `C` y `D`.

In [41]:
dates = pd.date_range('20200101', periods=6)
dates

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06'],
              dtype='datetime64[ns]', freq='D')

In [42]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df.head()

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-02,-0.289123,0.470974,0.345542,0.919596
2020-01-03,1.099639,1.190708,-0.976664,0.130905
2020-01-04,-1.294632,-1.74977,1.50152,-2.418377
2020-01-05,0.147514,0.225269,0.224972,0.614694


* También, podemos crear un `DataFrame` con diferentes objetos.

In [43]:
df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20200101'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': pd.Series(2, index=list(range(4)), dtype='float32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2020-01-01,1.0,2.0,test,foo
1,1.0,2020-01-01,1.0,2.0,train,foo
2,1.0,2020-01-01,1.0,2.0,test,foo
3,1.0,2020-01-01,1.0,2.0,train,foo


In [45]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D           float32
E          category
F            object
dtype: object

## Visualizando los primeros y últimos valores de un `DataFrame`

In [46]:
df.head(2)

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-02,-0.289123,0.470974,0.345542,0.919596


In [47]:
df.tail(2)

Unnamed: 0,A,B,C,D
2020-01-05,0.147514,0.225269,0.224972,0.614694
2020-01-06,-0.187903,0.051085,0.496914,-0.854606


## Transponer la información

* En otras palabras, convertir las columnas en índices y los índices en columnas.

In [48]:
df.T

Unnamed: 0,2020-01-01,2020-01-02,2020-01-03,2020-01-04,2020-01-05,2020-01-06
A,0.335348,-0.289123,1.099639,-1.294632,0.147514,-0.187903
B,0.926352,0.470974,1.190708,-1.74977,0.225269,0.051085
C,0.394239,0.345542,-0.976664,1.50152,0.224972,0.496914
D,0.050463,0.919596,0.130905,-2.418377,0.614694,-0.854606


## Seleccionando columnas

* Para seleccionar lso valores de la columna `A`, tenemos las siguiente opciones:

In [49]:
df["A"]

2020-01-01    0.335348
2020-01-02   -0.289123
2020-01-03    1.099639
2020-01-04   -1.294632
2020-01-05    0.147514
2020-01-06   -0.187903
Freq: D, Name: A, dtype: float64

In [50]:
df['A']

2020-01-01    0.335348
2020-01-02   -0.289123
2020-01-03    1.099639
2020-01-04   -1.294632
2020-01-05    0.147514
2020-01-06   -0.187903
Freq: D, Name: A, dtype: float64

In [51]:
df.A

2020-01-01    0.335348
2020-01-02   -0.289123
2020-01-03    1.099639
2020-01-04   -1.294632
2020-01-05    0.147514
2020-01-06   -0.187903
Freq: D, Name: A, dtype: float64

## Seleccionando renglones

In [52]:
df[0:3]

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-02,-0.289123,0.470974,0.345542,0.919596
2020-01-03,1.099639,1.190708,-0.976664,0.130905


In [53]:
df['2020-01-01':'2020-01-03']

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-02,-0.289123,0.470974,0.345542,0.919596
2020-01-03,1.099639,1.190708,-0.976664,0.130905


## Seleccionar por posición

* Si queremos seleccionar los renglones con 2020-01-04 y 2020-01-05 además de las columnas `A` y `B`.

In [54]:
df.iloc[3:5, 0:2]

Unnamed: 0,A,B
2020-01-04,-1.294632,-1.74977
2020-01-05,0.147514,0.225269


* Y, si necesitamos seleccionar el primer valor del primer renglón.

In [56]:
df.iloc[0,0]

0.3353479453489034

## Filtrar valores

* Seleccionar solo aquellas columnas del `DataFrame` `df` que tenga un valor mayor a 0.

In [57]:
df[df > 0]

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-02,,0.470974,0.345542,0.919596
2020-01-03,1.099639,1.190708,,0.130905
2020-01-04,,,1.50152,
2020-01-05,0.147514,0.225269,0.224972,0.614694
2020-01-06,,0.051085,0.496914,


* Seleecionar los renglones del `DataFrame` `df` en los que la columna `A` sea mayor a 0.

In [58]:
df[df.A > 0]

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-03,1.099639,1.190708,-0.976664,0.130905
2020-01-05,0.147514,0.225269,0.224972,0.614694


* Seleecionmar los renglones del `DataFrame` `df2` donde los valores de la columna `E` tienen los valores `test` u `otro`.

In [59]:
df2[df2['E'].isin(['test','otro'])]

Unnamed: 0,A,B,C,D,E,F
0,1.0,2020-01-01,1.0,2.0,test,foo
2,1.0,2020-01-01,1.0,2.0,test,foo


## Trabajando con información faltante

In [63]:
df3 = df[df > 0]
df3

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-02,,0.470974,0.345542,0.919596
2020-01-03,1.099639,1.190708,,0.130905
2020-01-04,,,1.50152,
2020-01-05,0.147514,0.225269,0.224972,0.614694
2020-01-06,,0.051085,0.496914,


* Eliminar los renglones en donde no haya información (`NaN`) en alguna columna.

In [64]:
df3.dropna(how='any')

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-05,0.147514,0.225269,0.224972,0.614694


* Rellenar los valores faltantes (`NaN`) con el valor de 10.

In [67]:
df3.fillna(value=10)

Unnamed: 0,A,B,C,D
2020-01-01,0.335348,0.926352,0.394239,0.050463
2020-01-02,10.0,0.470974,0.345542,0.919596
2020-01-03,1.099639,1.190708,10.0,0.130905
2020-01-04,10.0,10.0,1.50152,10.0
2020-01-05,0.147514,0.225269,0.224972,0.614694
2020-01-06,10.0,0.051085,0.496914,10.0


## Borrando un renglón

In [90]:
df4 = pd.DataFrame({ 'A': [1,2,3,4, 5], 
                   'B': [10,20,30,40, 50],
                   'C': [20,40,60,80, 100]
                  }, 
                  index=['Linea 1', 'Linea 2', 'Linea 3', 'Linea 4', 'Linea 5'])
df4

Unnamed: 0,A,B,C
Linea 1,1,10,20
Linea 2,2,20,40
Linea 3,3,30,60
Linea 4,4,40,80
Linea 5,5,50,100


In [93]:
df5 = df4.drop(df4.index[[4]] )
df5

Unnamed: 0,A,B,C
Linea 1,1,10,20
Linea 2,2,20,40
Linea 3,3,30,60
Linea 4,4,40,80


## Agrupando valores

In [94]:
df6 = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
                    'B': ['one', 'one', 'two', 'three',
                    'two', 'two', 'one', 'three'],
                    'C': np.random.randn(8),
                    'D': np.random.randn(8)})
df6

Unnamed: 0,A,B,C,D
0,foo,one,-1.455822,0.175207
1,bar,one,1.060497,-1.669135
2,foo,two,-1.035946,0.058227
3,bar,three,-0.01183,-0.301112
4,foo,two,-0.96774,-2.221251
5,bar,two,-0.652096,0.388022
6,foo,one,0.148518,-1.361776
7,foo,three,1.328199,0.44842


* Agrupar los renglones del `DataFrame` `df4` con base a la columna `A` y realizar la suma de las columnas `B` y `C`.

In [95]:
df6.groupby('A').sum()

Unnamed: 0_level_0,C,D
A,Unnamed: 1_level_1,Unnamed: 2_level_1
bar,0.39657,-1.582224
foo,-1.982791,-2.901173


In [96]:
df6.groupby('A').mean()

Unnamed: 0_level_0,C,D
A,Unnamed: 1_level_1,Unnamed: 2_level_1
bar,0.13219,-0.527408
foo,-0.396558,-0.580235


* Agrupar los renglones del `DataFrame` `df4` con base a la columna `A` y `B, además realizar la suma de las columnas `C` y `D`.

In [97]:
df6.groupby(['A','B']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,C,D
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,1.060497,-1.669135
bar,three,-0.01183,-0.301112
bar,two,-0.652096,0.388022
foo,one,-1.307304,-1.186569
foo,three,1.328199,0.44842
foo,two,-2.003686,-2.163024


## Ordenar

* Ordenar con base a un columna (variable).

In [98]:
sortA = df6.groupby('A').sum()
sortA = sortA.sort_values(by='A', ascending=False)
sortA

Unnamed: 0_level_0,C,D
A,Unnamed: 1_level_1,Unnamed: 2_level_1
foo,-1.982791,-2.901173
bar,0.39657,-1.582224


* Ordenar con base a varias columnas.

In [99]:
sortA = df6.groupby(['A','B']).sum()
sortA = sortA.sort_values(by=['A','B'], ascending=True)
sortA

Unnamed: 0_level_0,Unnamed: 1_level_0,C,D
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,1.060497,-1.669135
bar,three,-0.01183,-0.301112
bar,two,-0.652096,0.388022
foo,one,-1.307304,-1.186569
foo,three,1.328199,0.44842
foo,two,-2.003686,-2.163024


## Unión

* `merge` es un método que nos permite unir dos `DataFrame` en uno solo. Para ello, es muy importante que ambos `DataFrame` compartar una columna determinada.

In [78]:
customer_product = {'Customer_id':pd.Series([1,2,3,4,5,6]),
  'Product':pd.Series(['Oven','Oven','Oven','Television','Television','Television'])}
customer_product = pd.DataFrame(customer_product)
customer_product

Unnamed: 0,Customer_id,Product
0,1,Oven
1,2,Oven
2,3,Oven
3,4,Television
4,5,Television
5,6,Television


In [79]:
customer = {'Customer_id':pd.Series([2,4,6]),
    'State':pd.Series(['California','California','Texas'])}
customer = pd.DataFrame(customer)
customer

Unnamed: 0,Customer_id,State
0,2,California
1,4,California
2,6,Texas


* Realizamos una unión, regresando los valores de la tabla izquierda.

In [80]:
pd.merge(customer, customer_product, on='Customer_id', how='left')

Unnamed: 0,Customer_id,State,Product
0,2,California,Oven
1,4,California,Television
2,6,Texas,Television


* Realizamos una unión, regresando los valores de la tabla derecha.

In [81]:
pd.merge(customer, customer_product, on='Customer_id', how='right')

Unnamed: 0,Customer_id,State,Product
0,1,,Oven
1,2,California,Oven
2,3,,Oven
3,4,California,Television
4,5,,Television
5,6,Texas,Television


* O, podemos regresar solos los renglones en donde existen coincidencias.

In [82]:
pd.merge(customer, customer_product, on='Customer_id', how='inner')

Unnamed: 0,Customer_id,State,Product
0,2,California,Oven
1,4,California,Television
2,6,Texas,Television


## El método `apply`

* Este método es utiliza para aplicar una función a lo largo de un eje de un `DataFrame` o una `Serie`.

In [100]:
df7 = pd.DataFrame({ 'A': [1,2,3,4], 
                   'B': [10,20,30,40],
                   'C': [20,40,60,80]
                  }, 
                  index=['Linea 1', 'Linea 2', 'Linea 3', 'Linea 4'])
df7

Unnamed: 0,A,B,C
Linea 1,1,10,20
Linea 2,2,20,40
Linea 3,3,30,60
Linea 4,4,40,80


Es posible aplicar una función sobre un renglón o columna determinada.

In [101]:
# Función promedio
def custom_mean(row):
    return row.mean()

# axis = 0 - usando los valores de una columna
# axis = 1 - usando los valores de un renglón.
df7['D'] = df7.apply(custom_mean, axis=1)
df7

Unnamed: 0,A,B,C,D
Linea 1,1,10,20,10.333333
Linea 2,2,20,40,20.666667
Linea 3,3,30,60,31.0
Linea 4,4,40,80,41.333333


In [102]:
def custom_sum(col):
    return col.sum()

df7.loc['Sum'] = df7.apply(custom_sum, axis=0)
df7

Unnamed: 0,A,B,C,D
Linea 1,1.0,10.0,20.0,10.333333
Linea 2,2.0,20.0,40.0,20.666667
Linea 3,3.0,30.0,60.0,31.0
Linea 4,4.0,40.0,80.0,41.333333
Sum,10.0,100.0,200.0,103.333333


## El método `map'

* El método `map` realiza operaciones sobre una serie de valores (`Serie`).

In [103]:
s = pd.Series(['estufa','hornito','xbox','switch'])
s

0     estufa
1    hornito
2       xbox
3     switch
dtype: object

* Clasificaremos a cada elementos en electrodomético o consola.

In [104]:
s.map({'estufa':'electrodoméstico','hornito':'electrodoméstico','xbox':'consola','switch':'consola'})

0    electrodoméstico
1    electrodoméstico
2             consola
3             consola
dtype: object

## El método `applymap`

* Es utilizado para realizar operaciones a través de todo el `DataFrame`.

In [105]:
df8 = pd.DataFrame({ 'A': [1,2,3,4], 
                   'B': [10,20,30,40],
                   'C': [20,40,60,80]
                  }, 
                  index=['Linea 1', 'Linea 2', 'Linea 3', 'Linea 4'])
df8

Unnamed: 0,A,B,C
Linea 1,1,10,20
Linea 2,2,20,40
Linea 3,3,30,60
Linea 4,4,40,80


* Calcular el cuadrado de cada número del `DataFrame`.

In [106]:
df8.applymap(np.square)

Unnamed: 0,A,B,C
Linea 1,1,100,400
Linea 2,4,400,1600
Linea 3,9,900,3600
Linea 4,16,1600,6400
