## Ejemplo 5: Filtros

### 1. Objetivos:
    - Aprender cómo funcionan los filtros
    - Aplicar varios filtros para verlos en acción
 
---
    
### 2. Desarrollo:

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv('new_york_times_bestsellers-dirty.csv', index_col=0)
df.head(3)

Unnamed: 0,amazon_product_url,author,description,publisher,title,oid,bestsellers_date.numberLong,published_date.numberLong,rank.numberInt,rank_last_week.numberInt,weeks_on_list.numberInt,price.numberDouble
0,http://www.amazon.com/The-Host-Novel-Stephenie...,Stephenie Meyer,Descr: Aliens have taken control of the minds ...,"Little, Brown",THE HOST,5b4aa4ead3089013507db18c,2008-05-24 00:00:00,1212883200000,2,1,3,25.99
1,http://www.amazon.com/Love-Youre-With-Emily-Gi...,Emily Giffin,Descr: A woman's happy marriage is shaken when...,St. Martin's,LOVE THE ONE YOU'RE WITH,5b4aa4ead3089013507db18d,2008-05-24 00:00:00,1212883200000,3,2,2,24.95
2,http://www.amazon.com/The-Front-Garano-Patrici...,Patricia Cornwell,Descr: A Massachusetts state investigator and ...,Putnam,THE FRONT,5b4aa4ead3089013507db18e,2008-05-24 00:00:00,1212883200000,4,0,1,22.95


Digamos que queremos todos los registros donde el nombre del autor empiece con 'R'. Primero, usamos `operadores de comparación` (o en este caso, el método `str.startswith(-patrón-)`) para obtener nuestro filtro:

In [4]:
# crea tu condición aquí
df['author'].str.startswith('R')

0       False
1       False
2       False
3       False
5       False
        ...  
3027    False
3028    False
3029    False
3030    False
3031    False
Name: author, Length: 2266, dtype: bool

Lo que obtenemos de regreso es una `Serie` con la misma longitud que la `Serie` original. Se aplicó el método o la comparación a cada elemento de la `Serie` original. Estos métodos o comparaciones regresan `True` o `False` dependiendo de cada valor. La `Serie` resultante acumula los `Trues` y `Falses` que obtengamos de la comparación o de la aplicación del método.

Después, al pasar este filtro al `operador de indexación` del `DataFrame`, todas las filas a las que les corresponda un `True` se mantienen, mientras que las filas a las que les corresponde un `False` se dejan fuera del subconjunto resultante:

`dataframe[ -serie índice para filtrado- ]`

In [5]:
# crea tu filtro aquí
df[df['author'].str.startswith('R')].head(3)

Unnamed: 0,amazon_product_url,author,description,publisher,title,oid,bestsellers_date.numberLong,published_date.numberLong,rank.numberInt,rank_last_week.numberInt,weeks_on_list.numberInt,price.numberDouble
79,http://www.amazon.com/Chasing-Darkness-Elvis-N...,Robert Crais,Descr: he Los Angeles private eye Elvis Cole r...,Simon & Schuster,CHASING DARKNESS,5b4aa4ead3089013507db209,2008-07-05 00:00:00,1216512000000,7,0,1,25.95
94,http://www.amazon.com/Chasing-Darkness-Elvis-N...,Robert Crais,Descr: Is the Los Angeles private eye Elvis Co...,Simon & Schuster,CHASING DARKNESS,5b4aa4ead3089013507db221,2008-07-12 00:00:00,1217116800000,11,7,2,25.95
110,http://www.amazon.com/Killer-View-Fleming-Ridl...,Ridley Pearson,"Descr: A sheriff in Sun Valley, Idaho, investi...",Putnam,KILLER VIEW,5b4aa4ead3089013507db239,2008-07-19 00:00:00,1217721600000,15,0,1,24.95


Podemos también guardar nuestras condiciones en variables y después utilizarlos, por ejemplo, encuentra todas las producciones cuyo precio es mayor a 20:

In [9]:
mayor_a_20 = df['price.numberDouble'] > 20

In [10]:
df[mayor_a_20].head(3)

Unnamed: 0,amazon_product_url,author,description,publisher,title,oid,bestsellers_date.numberLong,published_date.numberLong,rank.numberInt,rank_last_week.numberInt,weeks_on_list.numberInt,price.numberDouble
0,http://www.amazon.com/The-Host-Novel-Stephenie...,Stephenie Meyer,Descr: Aliens have taken control of the minds ...,"Little, Brown",THE HOST,5b4aa4ead3089013507db18c,2008-05-24 00:00:00,1212883200000,2,1,3,25.99
1,http://www.amazon.com/Love-Youre-With-Emily-Gi...,Emily Giffin,Descr: A woman's happy marriage is shaken when...,St. Martin's,LOVE THE ONE YOU'RE WITH,5b4aa4ead3089013507db18d,2008-05-24 00:00:00,1212883200000,3,2,2,24.95
2,http://www.amazon.com/The-Front-Garano-Patrici...,Patricia Cornwell,Descr: A Massachusetts state investigator and ...,Putnam,THE FRONT,5b4aa4ead3089013507db18e,2008-05-24 00:00:00,1212883200000,4,0,1,22.95


Podemos incluso aplicar dos o más filtros utilizando `operadores lógicos`. En este caso, nuestro operador `and` se representa con un `&` y el operador `or` se representa con `|`.

Así que podemos obtener todas las producciones con rango 1 y que además cuyo precio sea mayor a 20:

In [14]:
rank_numero_uno = df['rank.numberInt'] == '1'

In [16]:
df[mayor_a_20 & rank_numero_uno].shape

(135, 12)

In [13]:
df.dtypes

amazon_product_url              object
author                          object
description                     object
publisher                       object
title                           object
oid                             object
bestsellers_date.numberLong     object
published_date.numberLong        int64
rank.numberInt                  object
rank_last_week.numberInt         int64
weeks_on_list.numberInt          int64
price.numberDouble             float64
dtype: object

---
---

## Reto 5: Filtros

### 1. Objetivos:
    - Practicar el uso de filtros para la obtención de subconjuntos de datos
    
---
    
### 2. Desarrollo:

#### a) Filtrando por fechas, booleanos y valores numéricos

Vamos a trabajar con el mismo dataset que guardaste del Reto anterior. Este Reto consiste en los siguiente:

Usando filtros, crea 3 subconjuntos de datos:

1. Un subconjunto llamado `df_hazardous` que contenga sólo los records que correspondan a los objetos donde `is_potentially_hazardous_asteroid` sea `True` (o `1`).
2. Un subconjunto llamado `df_greater_than_1000` que contenga sólo los records donde el `estimated_diameter.meters.estimated_diameter_max` sea mayor a 1000 metros.
3. Un subconjunto llamado `df_february` que contenga sólo los records que pertenezcan exactamente al mes de Febrero de 1995. Recuerda que los datos en la columna `epoch_date_close_approach` están en milisegundos.


In [1]:
import pandas as pd

In [2]:
df_reto_5 = pd.read_csv('near_earth_objects-jan_feb_1995-reto_4.csv', index_col=0)
df_reto_5.head(3)

Unnamed: 0,id_name,is_potentially_hazardous_asteroid,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,close_approach_date,epoch_date_close_approach,orbiting_body,relative_velocity.kilometers_per_second,relative_velocity.kilometers_per_hour,orbit_class_description,id,name,relative_velocity.kilometers_per_minute,proportion_of_max_diameter_to_earth
0,2154652-154652 (2004 EP20),0,483.676488,1081.533507,1995-01-07,1995-01-07 08:33:00,Earth,16.142864,58114.308667,Near Earth asteroid orbits similar to that of ...,2154652,154652 (2004 EP20),968.571811,8.5e-05
1,3153509-(2003 HM),1,96.506147,215.794305,1995-01-07,1995-01-07 15:09:00,Earth,12.351044,44463.757734,Near Earth asteroid orbits which cross the Ear...,3153509,(2003 HM),741.062629,1.7e-05
2,3837644-(2019 AY3),0,46.190746,103.285648,1995-01-07,1995-01-07 21:25:00,Earth,22.478615,80923.015021,Near Earth asteroid orbits similar to that of ...,3837644,(2019 AY3),1348.716917,8e-06


In [3]:
# tu código aquí
df_hazardous = df_reto_5[df_reto_5['is_potentially_hazardous_asteroid'] == 1]
df_greater_than_1000 = df_reto_5[df_reto_5['estimated_diameter.meters.estimated_diameter_max'] > 1000]
df_february = df_reto_5[df_reto_5['epoch_date_close_approach'] >'1995-01-31 23:59:59']

In [5]:
df_hazardous.head()

Unnamed: 0,id_name,is_potentially_hazardous_asteroid,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,close_approach_date,epoch_date_close_approach,orbiting_body,relative_velocity.kilometers_per_second,relative_velocity.kilometers_per_hour,orbit_class_description,id,name,relative_velocity.kilometers_per_minute,proportion_of_max_diameter_to_earth
1,3153509-(2003 HM),1,96.506147,215.794305,1995-01-07,1995-01-07 15:09:00,Earth,12.351044,44463.757734,Near Earth asteroid orbits which cross the Ear...,3153509,(2003 HM),741.062629,1.7e-05
6,2446862-446862 (2001 VB76),1,231.502122,517.654482,1995-01-08,1995-01-08 09:13:00,Earth,7.590711,27326.560174,Near Earth asteroid orbits similar to that of ...,2446862,446862 (2001 VB76),455.44267,4.1e-05
15,3766463-(2017 AY13),1,133.215567,297.879063,1995-01-03,1995-01-03 01:31:00,Earth,14.235092,51246.330073,Near Earth asteroid orbits which cross the Ear...,3766463,(2017 AY13),854.105501,2.3e-05
16,3342323-(2006 SF6),1,278.326768,622.357573,1995-01-03,1995-01-03 08:00:00,Earth,5.248637,18895.092087,Near Earth asteroid orbits which cross the Ear...,3342323,(2006 SF6),314.918201,4.9e-05
17,2452807-452807 (2006 KV89),1,146.067964,326.617897,1995-01-03,1995-01-03 03:33:00,Earth,5.201917,18726.902889,Near Earth asteroid orbits which cross the Ear...,2452807,452807 (2006 KV89),312.115048,2.6e-05


In [9]:
df_greater_than_1000.head(8)

Unnamed: 0,id_name,is_potentially_hazardous_asteroid,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,close_approach_date,epoch_date_close_approach,orbiting_body,relative_velocity.kilometers_per_second,relative_velocity.kilometers_per_hour,orbit_class_description,id,name,relative_velocity.kilometers_per_minute,proportion_of_max_diameter_to_earth
0,2154652-154652 (2004 EP20),0,483.676488,1081.533507,1995-01-07,1995-01-07 08:33:00,Earth,16.142864,58114.308667,Near Earth asteroid orbits similar to that of ...,2154652,154652 (2004 EP20),968.571811,8.5e-05
5,3824107-(2018 JB3),0,802.703167,1794.898848,1995-01-08,1995-01-08 10:54:00,Earth,32.160753,115778.710301,Near Earth asteroid orbits which cross the Ear...,3824107,(2018 JB3),1929.645172,0.000141
8,3645123-(2013 NX23),0,483.676488,1081.533507,1995-01-05,1995-01-05 22:31:00,Earth,21.605199,77778.715682,Near Earth asteroid orbits similar to that of ...,3645123,(2013 NX23),1296.311928,8.5e-05
13,2137062-137062 (1998 WM),0,1272.198785,2844.722965,1995-01-06,1995-01-06 04:47:00,Earth,9.604877,34577.556386,Near Earth asteroid orbits similar to that of ...,2137062,137062 (1998 WM),576.292606,0.000223
23,2138947-138947 (2001 BA40),0,506.471459,1132.504611,1995-01-04,1995-01-04 12:16:00,Earth,10.155092,36558.329695,Near Earth asteroid orbits which cross the Ear...,2138947,138947 (2001 BA40),609.305495,8.9e-05
53,2002062-2062 Aten (1976 AA),0,1010.543415,2259.643771,1995-01-12,1995-01-12 01:44:00,Earth,10.324052,37166.587108,Near Earth asteroid orbits similar to that of ...,2002062,2062 Aten (1976 AA),619.443118,0.000177
58,2152964-152964 (2000 GP82),0,766.575574,1714.115092,1995-01-13,1995-01-13 12:45:00,Earth,10.734807,38645.303568,Near Earth asteroid orbits which cross the Ear...,2152964,152964 (2000 GP82),644.088393,0.000135
77,3643994-(2013 LV28),0,461.90746,1032.856481,1995-01-11,1995-01-11 20:09:00,Earth,25.508069,91829.048391,Near Earth asteroid orbits which cross the Ear...,3643994,(2013 LV28),1530.48414,8.1e-05


In [8]:
df_february

Unnamed: 0,id_name,is_potentially_hazardous_asteroid,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,close_approach_date,epoch_date_close_approach,orbiting_body,relative_velocity.kilometers_per_second,relative_velocity.kilometers_per_hour,orbit_class_description,id,name,relative_velocity.kilometers_per_minute,proportion_of_max_diameter_to_earth
156,3405174-(2008 ED8),0,44.111820,98.637028,1995-02-04,1995-02-04 09:06:00,Earth,13.317721,47943.796767,Near Earth asteroid orbits similar to that of ...,3405174,(2008 ED8),799.063279,0.000008
157,3792431-(2017 XT61),0,60.891262,136.157002,1995-02-04,1995-02-04 00:37:00,Earth,9.359303,33693.492339,Near Earth asteroid orbits which cross the Ear...,3792431,(2017 XT61),561.558206,0.000011
158,3511508-(2010 CW180),0,253.837029,567.596853,1995-02-05,1995-02-05 10:31:00,Earth,11.657770,41967.973343,An asteroid orbit contained entirely within th...,3511508,(2010 CW180),699.466222,0.000045
159,3781883-(2017 SN14),0,461.907460,1032.856481,1995-02-05,1995-02-05 18:17:00,Earth,15.857860,57088.295988,Near Earth asteroid orbits similar to that of ...,3781883,(2017 SN14),951.471600,0.000081
160,2403039-403039 (2008 AE),0,305.179233,682.401509,1995-02-05,1995-02-05 12:05:00,Earth,6.078498,21882.592129,An asteroid orbit contained entirely within th...,2403039,403039 (2008 AE),364.709869,0.000054
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
296,2311554-311554 (2006 BQ147),0,483.676488,1081.533507,1995-02-21,1995-02-21 17:29:00,Earth,15.474761,55709.139812,Near Earth asteroid orbits similar to that of ...,2311554,311554 (2006 BQ147),928.485664,0.000085
297,2267136-267136 (2000 EF104),0,441.118200,986.370281,1995-02-21,1995-02-21 04:17:00,Earth,16.180392,58249.410194,Near Earth asteroid orbits similar to that of ...,2267136,267136 (2000 EF104),970.823503,0.000077
298,3360486-(2006 WE4),0,441.118200,986.370281,1995-02-21,1995-02-21 15:44:00,Earth,15.106140,54382.104639,Near Earth asteroid orbits which cross the Ear...,3360486,(2006 WE4),906.368411,0.000077
299,3656919-(2014 BG3),0,160.160338,358.129403,1995-02-21,1995-02-21 12:08:00,Earth,20.343173,73235.423517,An asteroid orbit contained entirely within th...,3656919,(2014 BG3),1220.590392,0.000028
