# Inspeccionar un objeto DataFrame

## Acerca de los datos
En este notebook trabajaremos con datos de terremotos del 18 de septiembre de 2018 al 13 de octubre de 2018 (obtenidos del US Geological Survey (USGS) mediante la [USGS API](https://earthquake.usgs.gov/fdsnws/event/1/))

## Configuración
Estaremos trabajando con el archivo `data/earthquakes.csv` nuevamente, por lo que necesitamos manejar nuestras importaciones y leerlo.

In [1]:
import numpy as np
import pandas as pd

df = pd.read_csv('data/earthquakes.csv')

## Examining dataframes
### ¿Está vacío?

In [None]:
df.empty

### ¿Cuáles son las dimensiones?

In [None]:
df.shape

### ¿Qué columnas tenemos?
Sabemos que hay 26 columnas, pero ¿cuáles son? Usemos el atributo `columns` para verlo:

In [None]:
df.columns

### ¿Qué aspecto tienen los datos?
Vea las filas desde arriba con `head()`:

In [2]:
df.head()

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,",ci,",automatic,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,",ci,",automatic,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02137,28.0,21.0,",ci37389194,",3.42,ml,...,",ci,",automatic,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...
3,,,37389186,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02618,,39.0,",ci37389186,",0.44,ml,...,",ci,",automatic,1539474978070,"M 0.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475196167,https://earthquake.usgs.gov/earthquakes/eventp...
4,,,73096941,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.07799,,192.0,",nc73096941,",2.16,md,...,",nc,",automatic,1539474716050,"M 2.2 - 10km NW of Avenal, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,scit...",-480.0,1539477547926,https://earthquake.usgs.gov/earthquakes/eventp...


Visualiza las filas desde abajo con `tail()`. Veamos 2 filas:

In [None]:
df.tail(2)

*Consejo: podemos modificar las opciones de visualización para ver más columnas:*

```python
# comprueba la configuración de max columnas
>>> pd.get_option('display.max_columns')
20

# establecer el maximo de columnas a mostrar cuando se imprima el marco de datos a 26
>>> pd.set_option('display.max_columns', 26)
# O
>>> pd.options.display.max_columns = 26

# restablecer la opción
>>> pd.reset_option('display.max_columnas')

# obtener información sobre todas las opciones de visualización
>>> pd.describe_option('display')
```

*Puede encontrar más información en la documentación [aquí](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html).*

### ¿Qué tipos de datos tenemos?

In [None]:
df.dtypes

### Obtener información extra y encontrar nulos

In [None]:
df.info()

## Describir y resumir
### Obtener estadísticas resumidas

In [6]:
df.describe()

Unnamed: 0,cdi,dmin,felt,gap,mag,mmi,nst,rms,sig,time,tsunami,tz,updated
count,329.0,6139.0,329.0,6164.0,9331.0,93.0,5364.0,9332.0,9332.0,9332.0,9332.0,9331.0,9332.0
mean,2.754711,0.544925,12.31003,121.506588,1.497345,3.651398,19.053878,0.362122,56.899914,1538284000000.0,0.006537,-451.99014,1538537000000.0
std,1.010637,2.214305,48.954944,72.962363,1.203347,1.790523,15.492315,0.317784,91.872163,608030600.0,0.080589,231.752571,656413500.0
min,0.0,0.000648,0.0,12.0,-1.26,0.0,0.0,0.0,0.0,1537229000000.0,0.0,-720.0,1537230000000.0
25%,2.0,0.020425,1.0,66.1425,0.72,2.68,8.0,0.119675,8.0,1537793000000.0,0.0,-540.0,1537996000000.0
50%,2.7,0.05905,2.0,105.0,1.3,3.72,15.0,0.21,26.0,1538245000000.0,0.0,-480.0,1538621000000.0
75%,3.3,0.17725,5.0,159.0,1.9,4.57,25.0,0.59,56.0,1538766000000.0,0.0,-480.0,1539110000000.0
max,8.4,53.737,580.0,355.91,7.5,9.12,172.0,1.91,2015.0,1539475000000.0,1.0,720.0,1539537000000.0


Especificación de los percentiles 5<sup>º</sup> y 95<sup>º</sup>:

In [4]:
df.describe(percentiles=[0.05, 0.95])

Unnamed: 0,cdi,dmin,felt,gap,mag,mmi,nst,rms,sig,time,tsunami,tz,updated
count,329.0,6139.0,329.0,6164.0,9331.0,93.0,5364.0,9332.0,9332.0,9332.0,9332.0,9331.0,9332.0
mean,2.754711,0.544925,12.31003,121.506588,1.497345,3.651398,19.053878,0.362122,56.899914,1538284000000.0,0.006537,-451.99014,1538537000000.0
std,1.010637,2.214305,48.954944,72.962363,1.203347,1.790523,15.492315,0.317784,91.872163,608030600.0,0.080589,231.752571,656413500.0
min,0.0,0.000648,0.0,12.0,-1.26,0.0,0.0,0.0,0.0,1537229000000.0,0.0,-720.0,1537230000000.0
5%,2.0,0.005491,1.0,35.0,-0.04,0.0,4.0,0.03,0.0,1537344000000.0,0.0,-600.0,1537387000000.0
50%,2.7,0.05905,2.0,105.0,1.3,3.72,15.0,0.21,26.0,1538245000000.0,0.0,-480.0,1538621000000.0
95%,4.3,2.6789,40.2,276.0,4.4,6.38,49.0,0.96,298.0,1539319000000.0,0.0,-60.0,1539400000000.0
max,8.4,53.737,580.0,355.91,7.5,9.12,172.0,1.91,2015.0,1539475000000.0,1.0,720.0,1539537000000.0


Describir tipos de datos específicos:

In [5]:
df.describe(include=object)

Unnamed: 0,alert,code,detail,ids,magType,net,place,sources,status,title,type,types,url
count,59,9332,9332,9332,9331,9332,9332,9332,9332,9332,9332,9332,9332
unique,2,9332,9332,9332,10,14,5433,52,2,7807,5,42,9332
top,green,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,",ci37389218,",ml,ak,"10km NE of Aguanga, CA",",ak,",reviewed,"M 0.4 - 10km NE of Aguanga, CA",earthquake,",geoserve,origin,phase-data,",https://earthquake.usgs.gov/earthquakes/eventp...
freq,58,1,1,1,6803,3166,306,2981,7797,55,9081,5301,1


O describirlos todos:

In [7]:
df.describe(include='all')

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,sources,status,time,title,tsunami,type,types,tz,updated,url
count,59,329.0,9332.0,9332,6139.0,329.0,6164.0,9332,9331.0,9331,...,9332,9332,9332.0,9332,9332.0,9332,9332,9331.0,9332.0,9332
unique,2,,9332.0,9332,,,,9332,,10,...,52,2,,7807,,5,42,,,9332
top,green,,37389218.0,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,",ci37389218,",,ml,...,",ak,",reviewed,,"M 0.4 - 10km NE of Aguanga, CA",,earthquake,",geoserve,origin,phase-data,",,,https://earthquake.usgs.gov/earthquakes/eventp...
freq,58,,1.0,1,,,,1,,6803,...,2981,7797,,55,,9081,5301,,,1
mean,,2.754711,,,0.544925,12.31003,121.506588,,1.497345,,...,,,1538284000000.0,,0.006537,,,-451.99014,1538537000000.0,
std,,1.010637,,,2.214305,48.954944,72.962363,,1.203347,,...,,,608030600.0,,0.080589,,,231.752571,656413500.0,
min,,0.0,,,0.000648,0.0,12.0,,-1.26,,...,,,1537229000000.0,,0.0,,,-720.0,1537230000000.0,
25%,,2.0,,,0.020425,1.0,66.1425,,0.72,,...,,,1537793000000.0,,0.0,,,-540.0,1537996000000.0,
50%,,2.7,,,0.05905,2.0,105.0,,1.3,,...,,,1538245000000.0,,0.0,,,-480.0,1538621000000.0,
75%,,3.3,,,0.17725,5.0,159.0,,1.9,,...,,,1538766000000.0,,0.0,,,-480.0,1539110000000.0,


También funciona con columnas:

In [8]:
df.felt.describe()

count    329.000000
mean      12.310030
std       48.954944
min        0.000000
25%        1.000000
50%        2.000000
75%        5.000000
max      580.000000
Name: felt, dtype: float64

También existen métodos para estadísticas específicas. He aquí una muestra de ellos:

| Método | Descripción | Data types |
| --- | --- | --- |
| `count()` | Número de observaciones no nulas | Any |
| `nunique()` | Número de valores únicos| Any |
| `sum()` | El total de los valores | Numerico o Boolean |
| `mean()` | La media de los valores | Numerico o Boolean |
| `median()` | La mediana de los valores | Numerico |
| `min()` | El mínimo de los valores | Numerico |
| `idxmin()` | El índice donde se producen los valores mínimos | Numerico |
| `max()` | El máximo de los valores | Numerico |
| `idxmax()` | El índice donde se produce el valor máximo | Numerico |
| `abs()` | Los valores absolutos de los datos | Numerico |
| `std()` | La desviación típica | Numerico |
| `var()` | La varianza |  Numerico |
| `cov()` | La covarianza entre dos `Series`, o una matriz de covarianza para todas las combinaciones de columnas en un `DataFrame`.| Numerico |
| `corr()` | La correlación entre dos `Series`, o una matriz de correlación para todas las combinaciones de columnas en un `DataFrame`.| Numerico |
| `quantile()` | Calcula un cuantil específico | Numerico |
| `cumsum()` | La suma acumulada | Numerico o Boolean |
| `cummin()` | El mínimo acumulado | Numerico |
| `cummax()` | El máximo acumulado | Numerico |

Por ejemplo, encontrar los valores únicos en la columna `alert`:

In [None]:
df.alert.unique()

A continuación, podemos utilizar `value_counts()` para ver cuántos de cada valor único tenemos:

In [None]:
df.alert.value_counts()

Tenga en cuenta que los objetos `Index` también tienen varios métodos para ayudar a describir y resumir nuestros datos:

| Método | Descripción |
| --- | --- |
| `argmax()`/`argmin()` | Encontrar la ubicación del valor máximo/mínimo en el índice |
| `equals()` | Compara el índice con otro objeto `Index` para comprobar la igualdad |
| `isin()` | Comprueba si los valores del índice están en una lista de valores y devuelve una matriz de booleanos |
| `max()`/`min()` | Encontrar el valor máximo/mínimo del índice |
| `nunique()` | Obtener el número de valores únicos en el índice |
| `to_series()` | Crear un objeto `Series` a partir del índice|
| `unique()` | Encontrar los valores únicos del índice |
| `value_counts()`| Crear una tabla de frecuencias para los valores únicos del índice |

<hr>
<div>
    <a href="./3-creando_dataframe.ipynb">
        <button style="float: left;">&#8592; Notebook Anterior</button>
    </a>
    <a href="./5-subconjunto_datos.ipynb">
        <button style="float: right;">Siguiente Notebook &#8594;</button>
    </a>
</div>
<br>
<hr>