<img src="https://user-images.githubusercontent.com/7065401/75165824-badf4680-5701-11ea-9c5b-5475b0a33abf.png"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Pandas - DataFrames

Probablemente, la estructura de datos más importante de pandas es el `DataFrame`. Es una estructura tabular estrechamente integrada con las `Series`.


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on!

In [None]:
import numpy as np
import pandas as pd

Continuaremos con nuestro análisis de los países del G7 y ahora veremos los DataFrames. Como se mencionó, un DataFrame se parece mucho a una tabla (como la que puedes ver [aquí](https://docs.google.com/spreadsheets/d/1IlorV2-Oh9Da1JAZ7weVw86PQrQydSMp-ydVMH135iI/edit?usp=sharing)):

<img width="700" src="https://user-images.githubusercontent.com/872296/38153492-72c032ca-3443-11e8-80f4-9de9060a5127.png" />

Crear `DataFrame`s manualmente puede ser tedioso. El 99% del tiempo obtendrás los datos desde una base de datos, un archivo CSV o la web. Pero aún así, puedes crear un DataFrame especificando las columnas y los valores:


In [None]:
df = pd.DataFrame({
    'Population': [35.467, 63.951, 80.94 , 60.665, 127.061, 64.511, 318.523],
    'GDP': [
        1785387,
        2833687,
        3874437,
        2167744,
        4602367,
        2950039,
        17348075
    ],
    'Surface Area': [
        9984670,
        640679,
        357114,
        301336,
        377930,
        242495,
        9525067
    ],
    'HDI': [
        0.913,
        0.888,
        0.916,
        0.873,
        0.891,
        0.907,
        0.915
    ],
    'Continent': [
        'America',
        'Europe',
        'Europe',
        'Europe',
        'Asia',
        'Europe',
        'America'
    ]
}, columns=['Population', 'GDP', 'Surface Area', 'HDI', 'Continent'])

_(El atributo `columns` es opcional. Lo estoy usando para mantener el mismo orden que en la imagen de arriba)_


In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
0,35.467,1785387,9984670,0.913,America
1,63.951,2833687,640679,0.888,Europe
2,80.94,3874437,357114,0.916,Europe
3,60.665,2167744,301336,0.873,Europe
4,127.061,4602367,377930,0.891,Asia
5,64.511,2950039,242495,0.907,Europe
6,318.523,17348075,9525067,0.915,America


Los `DataFrame`s también tienen índices. Como puedes ver en la "tabla" de arriba, pandas ha asignado automáticamente un índice numérico y autoincremental a cada "fila" de nuestro DataFrame. En nuestro caso, sabemos que cada fila representa un país, así que simplemente vamos a reasignar el índice:


In [None]:
df.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]

In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
Canada,35.467,1785387,9984670,0.913,America
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Italy,60.665,2167744,301336,0.873,Europe
Japan,127.061,4602367,377930,0.891,Asia
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df.columns

Index(['Population', 'GDP', 'Surface Area', 'HDI', 'Continent'], dtype='object')

In [None]:
df.index

Index(['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United Kingdom',
       'United States'],
      dtype='object')

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7 entries, Canada to United States
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Population    7 non-null      float64
 1   GDP           7 non-null      int64  
 2   Surface Area  7 non-null      int64  
 3   HDI           7 non-null      float64
 4   Continent     7 non-null      object 
dtypes: float64(2), int64(2), object(1)
memory usage: 336.0+ bytes


In [None]:
df.size

35

In [None]:
df.shape

(7, 5)

In [None]:
df.describe()

Unnamed: 0,Population,GDP,Surface Area,HDI
count,7.0,7.0,7.0,7.0
mean,107.302571,5080248.0,3061327.0,0.900429
std,97.24997,5494020.0,4576187.0,0.016592
min,35.467,1785387.0,242495.0,0.873
25%,62.308,2500716.0,329225.0,0.8895
50%,64.511,2950039.0,377930.0,0.907
75%,104.0005,4238402.0,5082873.0,0.914
max,318.523,17348080.0,9984670.0,0.916


In [None]:
df.dtypes

Population      float64
GDP               int64
Surface Area      int64
HDI             float64
Continent        object
dtype: object

In [None]:
df.dtypes.value_counts()

float64    2
int64      2
object     1
dtype: int64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexación, selección y slicing

Las columnas individuales en el DataFrame se pueden seleccionar con indexación regular. Cada columna está representada como una `Series`:


In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
Canada,35.467,1785387,9984670,0.913,America
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Italy,60.665,2167744,301336,0.873,Europe
Japan,127.061,4602367,377930,0.891,Asia
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df.loc['Canada']

Population       35.467
GDP             1785387
Surface Area    9984670
HDI               0.913
Continent       America
Name: Canada, dtype: object

In [None]:
df.iloc[-1]

Population       318.523
GDP             17348075
Surface Area     9525067
HDI                0.915
Continent        America
Name: United States, dtype: object

In [None]:
df['Population']

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: Population, dtype: float64

Notese que el `index` de la Serie devuelta es el mismo que el del DataFrame. Y su `name` es el nombre de la columna. Si estás trabajando en un notebook y quieres ver un formato más parecido a un DataFrame, puedes usar el método `to_frame`:


In [None]:
df['Population'].to_frame()

También se pueden seleccionar múltiples columnas de manera similar a `numpy` y a las `Series`:


In [None]:
df[['Population', 'GDP']]

En este caso, el resultado es otro `DataFrame`. El slicing funciona de manera diferente: actúa a nivel de "fila" y puede ser un poco contraintuitivo:


In [None]:
df[1:3]

La selección a nivel de filas funciona mejor con `loc` e `iloc`, **los cuales se recomiendan** por encima del "slicing directo" (`df[:]`).

`loc` selecciona las filas que coinciden con el índice dado:


In [None]:
df.loc['Italy']

Population       60.665
GDP             2167744
Surface Area     301336
HDI               0.873
Continent        Europe
Name: Italy, dtype: object

In [None]:
df.loc['France': 'Italy']

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Italy,60.665,2167744,301336,0.873,Europe


As a second "argument", you can pass the column(s) you'd like to select:

In [None]:
df.loc['France': 'Italy', 'Population']

France     63.951
Germany    80.940
Italy      60.665
Name: Population, dtype: float64

In [None]:
df.loc['France': 'Italy', ['Population', 'GDP']]

Unnamed: 0,Population,GDP
France,63.951,2833687
Germany,80.94,3874437
Italy,60.665,2167744


`iloc` funciona con la "posición" (numérica) del índice:


In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
Canada,35.467,1785387,9984670,0.913,America
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Italy,60.665,2167744,301336,0.873,Europe
Japan,127.061,4602367,377930,0.891,Asia
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df.iloc[0]

Population       35.467
GDP             1785387
Surface Area    9984670
HDI               0.913
Continent       America
Name: Canada, dtype: object

In [None]:
df.iloc[-1]

Population       318.523
GDP             17348075
Surface Area     9525067
HDI                0.915
Continent        America
Name: United States, dtype: object

In [None]:
df.iloc[[0, 1, -1]]

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
Canada,35.467,1785387,9984670,0.913,America
France,63.951,2833687,640679,0.888,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df.iloc[1:3]

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe


In [None]:
df.iloc[1:3, 3]

France     0.888
Germany    0.916
Name: HDI, dtype: float64

In [None]:
df.iloc[1:3, [0, 3]]

In [None]:
df.iloc[1:3, 1:3]

> **RECOMENDADO: Usa siempre `loc` e `iloc` para reducir ambigüedades, especialmente con `DataFrame`s que tienen índices numéricos.**


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Selección condicional (arreglos booleanos)

Vimos la selección condicional aplicada a las `Series` y funcionará de la misma manera para los `DataFrame`s. Después de todo, un `DataFrame` es una colección de `Series`:


In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
Canada,35.467,1785387,9984670,0.913,America
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Italy,60.665,2167744,301336,0.873,Europe
Japan,127.061,4602367,377930,0.891,Asia
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df['Population'] > 70

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: Population, dtype: bool

In [None]:
df.loc[df['Population'] > 70]

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
Germany,80.94,3874437,357114,0.916,Europe
Japan,127.061,4602367,377930,0.891,Asia
United States,318.523,17348075,9525067,0.915,America


La comparación booleana se realiza a nivel de Índice, por lo que puedes filtrar por cualquier fila, siempre que contenga los índices correctos. La selección de columnas sigue funcionando como se espera:


In [None]:
df.loc[df['Population'] > 70, 'Population']

Germany           80.940
Japan            127.061
United States    318.523
Name: Population, dtype: float64

In [None]:
df.loc[df['Population'] > 70, ['Population', 'GDP']]

Unnamed: 0,Population,GDP
Germany,80.94,3874437
Japan,127.061,4602367
United States,318.523,17348075


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Eliminar elementos

En contraste con el concepto de selección, tenemos el de "eliminar". En lugar de indicar qué valores te gustaría _seleccionar_, puedes indicar cuáles te gustaría `eliminar`:


In [None]:
df.drop('Canada')

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Italy,60.665,2167744,301336,0.873,Europe
Japan,127.061,4602367,377930,0.891,Asia
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df.drop(['Canada', 'Japan'])

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Italy,60.665,2167744,301336,0.873,Europe
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df.drop(columns=['Population', 'HDI'])

Unnamed: 0,GDP,Surface Area,Continent
Canada,1785387,9984670,America
France,2833687,640679,Europe
Germany,3874437,357114,Europe
Italy,2167744,301336,Europe
Japan,4602367,377930,Asia
United Kingdom,2950039,242495,Europe
United States,17348075,9525067,America


In [None]:
df.drop(['Italy', 'Canada'], axis=0)

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
France,63.951,2833687,640679,0.888,Europe
Germany,80.94,3874437,357114,0.916,Europe
Japan,127.061,4602367,377930,0.891,Asia
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


In [None]:
df.drop(['Population', 'HDI'], axis=1)

Unnamed: 0,GDP,Surface Area,Continent
Canada,1785387,9984670,America
France,2833687,640679,Europe
Germany,3874437,357114,Europe
Italy,2167744,301336,Europe
Japan,4602367,377930,Asia
United Kingdom,2950039,242495,Europe
United States,17348075,9525067,America


In [None]:
df.drop(['Population', 'HDI'], axis=1)

Unnamed: 0,GDP,Surface Area,Continent
Canada,1785387,9984670,America
France,2833687,640679,Europe
Germany,3874437,357114,Europe
Italy,2167744,301336,Europe
Japan,4602367,377930,Asia
United Kingdom,2950039,242495,Europe
United States,17348075,9525067,America


In [None]:
df.drop(['Population', 'HDI'], axis='columns')

Unnamed: 0,GDP,Surface Area,Continent
Canada,1785387,9984670,America
France,2833687,640679,Europe
Germany,3874437,357114,Europe
Italy,2167744,301336,Europe
Japan,4602367,377930,Asia
United Kingdom,2950039,242495,Europe
United States,17348075,9525067,America


In [None]:
df.drop(['Canada', 'Germany'], axis='rows')

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent
France,63.951,2833687,640679,0.888,Europe
Italy,60.665,2167744,301336,0.873,Europe
Japan,127.061,4602367,377930,0.891,Asia
United Kingdom,64.511,2950039,242495,0.907,Europe
United States,318.523,17348075,9525067,0.915,America


Todos estos métodos `drop` devuelven un nuevo `DataFrame`. Si deseas modificarlo "en el lugar" (in place), puedes usar el atributo `inplace` (hay un ejemplo más abajo).


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Operations

In [None]:
df[['Population', 'GDP']]

Unnamed: 0,Population,GDP
Canada,35.467,1785387
France,63.951,2833687
Germany,80.94,3874437
Italy,60.665,2167744
Japan,127.061,4602367
United Kingdom,64.511,2950039
United States,318.523,17348075


In [None]:
df[['Population', 'GDP']] / 100

Unnamed: 0,Population,GDP
Canada,0.35467,17853.87
France,0.63951,28336.87
Germany,0.8094,38744.37
Italy,0.60665,21677.44
Japan,1.27061,46023.67
United Kingdom,0.64511,29500.39
United States,3.18523,173480.75


**Las operaciones con Series** funcionan a nivel de columna, aplicándose a lo largo de las filas (lo cual puede resultar contraintuitivo).


In [None]:
crisis = pd.Series([-1_000_000, -0.3], index=['GDP', 'HDI'])
crisis

Unnamed: 0,0
GDP,-1000000.0
HDI,-0.3


In [None]:
df[['GDP', 'HDI']]

Unnamed: 0,GDP,HDI
Canada,1785387,0.913
France,2833687,0.888
Germany,3874437,0.916
Italy,2167744,0.873
Japan,4602367,0.891
United Kingdom,2950039,0.907
United States,17348075,0.915


In [None]:
df[['GDP', 'HDI']] + crisis

Unnamed: 0,GDP,HDI
Canada,785387.0,0.613
France,1833687.0,0.588
Germany,2874437.0,0.616
Italy,1167744.0,0.573
Japan,3602367.0,0.591
United Kingdom,1950039.0,0.607
United States,16348075.0,0.615


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Modificando DataFrames

Es simple e intuitivo. Puedes agregar columnas o reemplazar valores en columnas sin problemas:


### Adding a new column

In [None]:
langs = pd.Series(
    ['French', 'German', 'Italian'],
    index=['France', 'Germany', 'Italy'],
    name='Language'
)

In [None]:
langs

France      French
Germany     German
Italy      Italian
Name: Language, dtype: object

In [None]:
df['Language'] = langs

In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent,Language
Canada,35.467,1785387,9984670,0.913,America,
France,63.951,2833687,640679,0.888,Europe,French
Germany,80.94,3874437,357114,0.916,Europe,German
Italy,60.665,2167744,301336,0.873,Europe,Italian
Japan,127.061,4602367,377930,0.891,Asia,
United Kingdom,64.511,2950039,242495,0.907,Europe,
United States,318.523,17348075,9525067,0.915,America,


---
### Reemplazando valores por columna


In [None]:
df['Language'] = 'English'

In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent,Language
Canada,35.467,1785387,9984670,0.913,America,English
France,63.951,2833687,640679,0.888,Europe,English
Germany,80.94,3874437,357114,0.916,Europe,English
Italy,60.665,2167744,301336,0.873,Europe,English
Japan,127.061,4602367,377930,0.891,Asia,English
United Kingdom,64.511,2950039,242495,0.907,Europe,English
United States,318.523,17348075,9525067,0.915,America,English


---
### Renombrando Columnas


In [None]:
df.rename(
    columns={
        'HDI': 'Human Development Index',
        'Anual Popcorn Consumption': 'APC'
    }, index={
        'United States': 'USA',
        'United Kingdom': 'UK',
        'Argentina': 'AR'
    })

Unnamed: 0,Population,GDP,Surface Area,Human Development Index,Continent,Language
Canada,35.467,1785387,9984670,0.913,America,English
France,63.951,2833687,640679,0.888,Europe,English
Germany,80.94,3874437,357114,0.916,Europe,English
Italy,60.665,2167744,301336,0.873,Europe,English
Japan,127.061,4602367,377930,0.891,Asia,English
UK,64.511,2950039,242495,0.907,Europe,English
USA,318.523,17348075,9525067,0.915,America,English


In [None]:
df.rename(index=str.upper)

In [None]:
df.rename(index=lambda x: x.lower())

---
### Removiendo Columnas

In [None]:
df.drop(columns='Language', inplace=True)

---
### Agregando valores

In [None]:
df.append(pd.Series({
    'Population': 3,
    'GDP': 5
}, name='China'))

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent,Language
Canada,35.467,1785387.0,9984670.0,0.913,America,English
France,63.951,2833687.0,640679.0,0.888,Europe,English
Germany,80.94,3874437.0,357114.0,0.916,Europe,English
Italy,60.665,2167744.0,301336.0,0.873,Europe,English
Japan,127.061,4602367.0,377930.0,0.891,Asia,English
United Kingdom,64.511,2950039.0,242495.0,0.907,Europe,English
United States,318.523,17348075.0,9525067.0,0.915,America,English
China,3.0,5.0,,,,


*Append* returna un nuevo `DataFrame`:

In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent,Language
Canada,35.467,1785387,9984670,0.913,America,English
France,63.951,2833687,640679,0.888,Europe,English
Germany,80.94,3874437,357114,0.916,Europe,English
Italy,60.665,2167744,301336,0.873,Europe,English
Japan,127.061,4602367,377930,0.891,Asia,English
United Kingdom,64.511,2950039,242495,0.907,Europe,English
United States,318.523,17348075,9525067,0.915,America,English


Puedes asignar directamente el nuevo índice y los valores al `DataFrame`:


In [None]:
df.loc['China'] = pd.Series({'Population': 1_400_000_000, 'Continent': 'Asia'})

In [None]:
df

Podemos usar `drop` para eliminar una fila por índice:


In [None]:
df.drop('China', inplace=True)

In [None]:
df

---
### More radical index changes

In [None]:
df.reset_index()

In [None]:
df.set_index('Population')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Crear columnas a partir de otras columnas

Modificar un DataFrame a menudo implica combinar diferentes columnas en otra. Por ejemplo, en nuestro análisis de países, podríamos intentar calcular el "PIB per cápita", que es simplemente `PIB / Población`.


In [None]:
df[['Population', 'GDP']]

La forma habitual de expresar eso en pandas es simplemente dividiendo cada serie:


In [None]:
df['GDP'] / df['Population']

Canada            50339.385908
France            44310.284437
Germany           47868.013343
Italy             35733.025633
Japan             36221.712406
United Kingdom    45729.239975
United States     54464.120330
China                      NaN
dtype: float64

El resultado de esa operación es simplemente otra serie que puedes agregar al `DataFrame` original:


In [None]:
df['GDP Per Capita'] = df['GDP'] / df['Population']

In [None]:
df

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent,Language,GDP Per Capita
Canada,35.467,1785387.0,9984670.0,0.913,America,English,50339.385908
France,63.951,2833687.0,640679.0,0.888,Europe,English,44310.284437
Germany,80.94,3874437.0,357114.0,0.916,Europe,English,47868.013343
Italy,60.665,2167744.0,301336.0,0.873,Europe,English,35733.025633
Japan,127.061,4602367.0,377930.0,0.891,Asia,English,36221.712406
United Kingdom,64.511,2950039.0,242495.0,0.907,Europe,English,45729.239975
United States,318.523,17348075.0,9525067.0,0.915,America,English,54464.12033
China,1400000000.0,,,,Asia,,


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Información estadística

Ya has visto el método `describe`, que te da un buen "resumen" del `DataFrame`. Exploremos otros métodos con más detalle:


In [None]:
df.head()

Unnamed: 0,Population,GDP,Surface Area,HDI,Continent,Language,GDP Per Capita
Canada,35.467,1785387.0,9984670.0,0.913,America,English,50339.385908
France,63.951,2833687.0,640679.0,0.888,Europe,English,44310.284437
Germany,80.94,3874437.0,357114.0,0.916,Europe,English,47868.013343
Italy,60.665,2167744.0,301336.0,0.873,Europe,English,35733.025633
Japan,127.061,4602367.0,377930.0,0.891,Asia,English,36221.712406


In [None]:
df.describe()

Unnamed: 0,Population,GDP,Surface Area,HDI,GDP Per Capita
count,8.0,7.0,7.0,7.0,7.0
mean,175000100.0,5080248.0,3061327.0,0.900429,44952.254576
std,494974700.0,5494020.0,4576187.0,0.016592,6954.983875
min,35.467,1785387.0,242495.0,0.873,35733.025633
25%,63.1295,2500716.0,329225.0,0.8895,40265.998421
50%,72.7255,2950039.0,377930.0,0.907,45729.239975
75%,174.9265,4238402.0,5082873.0,0.914,49103.699626
max,1400000000.0,17348080.0,9984670.0,0.916,54464.12033


In [None]:
population = df['Population']

In [None]:
population.min(), population.max()

In [None]:
population.sum()

In [None]:
population.sum() / len(population)

In [None]:
population.mean()

In [None]:
population.std()

In [None]:
population.median()

In [None]:
population.describe()

In [None]:
population.quantile(.25)

In [None]:
population.quantile([.2, .4, .6, .8, 1])

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)


## Carga de datos de diferentes fuentes

In [None]:
#Google drive
from google.colab import drive

In [None]:
drive.mount("/content/Drive")

Mounted at /content/Drive


In [None]:
file_path = "/content/Drive/MyDrive/datos_personas.csv"
df = pd.read_csv(file_path)
df

Unnamed: 0,ID,Nombre,Apellido,Edad,Peso (kg),Estatura (m)
0,1001,Carlos,Gómez,39,80.4,1.72
1,1002,Elena,Rodríguez,33,80.8,1.84
2,1003,María,Ramírez,41,55.2,1.85
3,1004,Sofía,Pérez,50,51.8,1.58
4,1005,Luis,Martínez,32,92.0,1.69
...,...,...,...,...,...,...
95,1096,Paula,Torres,20,80.7,1.59
96,1097,Luis,Torres,37,99.5,1.66
97,1098,Paula,Pérez,37,59.7,1.68
98,1099,Ana,Pérez,35,72.4,1.94


In [None]:
#GitHub
! git clone https://github.com/ThomyRa/datos.git

Cloning into 'datos'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (2/2), done.[K
Receiving objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)[K


In [None]:
file_path = "/content/datos/datos_personas.csv"
df = pd.read_csv(file_path)
df

Unnamed: 0,ID,Nombre,Apellido,Edad,Peso (kg),Estatura (m)
0,1001,Carlos,Gómez,39,80.4,1.72
1,1002,Elena,Rodríguez,33,80.8,1.84
2,1003,María,Ramírez,41,55.2,1.85
3,1004,Sofía,Pérez,50,51.8,1.58
4,1005,Luis,Martínez,32,92.0,1.69
...,...,...,...,...,...,...
95,1096,Paula,Torres,20,80.7,1.59
96,1097,Luis,Torres,37,99.5,1.66
97,1098,Paula,Pérez,37,59.7,1.68
98,1099,Ana,Pérez,35,72.4,1.94
