## Ejemplo 5: Filtros

### 1. Objetivos:
    - Aprender cómo funcionan los filtros
    - Aplicar varios filtros para verlos en acción
 
---
    
### 2. Desarrollo:

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../../Datasets/new_york_times_bestsellers-dirty.csv', index_col=0)

df.head()

Unnamed: 0,amazon_product_url,author,description,publisher,title,oid,bestsellers_date.numberLong,published_date.numberLong,rank.numberInt,rank_last_week.numberInt,weeks_on_list.numberInt,price.numberDouble
0,http://www.amazon.com/The-Host-Novel-Stephenie...,Stephenie Meyer,Descr: Aliens have taken control of the minds ...,"Little, Brown",THE HOST,5b4aa4ead3089013507db18c,2008-05-24 00:00:00,1212883200000,2,1,3,25.99
1,http://www.amazon.com/Love-Youre-With-Emily-Gi...,Emily Giffin,Descr: A woman's happy marriage is shaken when...,St. Martin's,LOVE THE ONE YOU'RE WITH,5b4aa4ead3089013507db18d,2008-05-24 00:00:00,1212883200000,3,2,2,24.95
2,http://www.amazon.com/The-Front-Garano-Patrici...,Patricia Cornwell,Descr: A Massachusetts state investigator and ...,Putnam,THE FRONT,5b4aa4ead3089013507db18e,2008-05-24 00:00:00,1212883200000,4,0,1,22.95
3,http://www.amazon.com/Snuff-Chuck-Palahniuk/dp...,Chuck Palahniuk,Descr: An aging porn queens aims to cap her ca...,Doubleday,SNUFF,5b4aa4ead3089013507db18f,2008-05-24 00:00:00,1212883200000,5,0,1,24.95
5,http://www.amazon.com/Phantom-Prey-John-Sandfo...,John Sandford,Descr: The Minneapolis detective Lucas Davenpo...,Putnam,PHANTOM PREY,5b4aa4ead3089013507db191,2008-05-24 00:00:00,1212883200000,7,4,3,26.95


Digamos que queremos todas los registros donde el nombre del autor empiece con 'R'. Primero, usamos `operadores de comparación` (o en este caso, el método `str.startswith`) para obtener nuestro filtro:

In [3]:
# la función startwith("texto") se queda con los registros
# que empiezan con la cadena "texto"
df['author'].str.startswith('R')
# Noten que regresa una serie de booleanos

0       False
1       False
2       False
3       False
5       False
        ...  
3027    False
3028    False
3029    False
3030    False
3031    False
Name: author, Length: 2266, dtype: bool

Lo que obtenemos de regreso es una `Serie` con la misma longitud que la `Serie` original. Se aplicó el método o la comparación a cada elemento de la `Serie` original. Estos métodos o comparaciones regresan `True` o `False` dependiendo de cada valor. La `Serie` resultante acumula los `Trues` y `Falses` que obtengamos de la comparación o de la aplicación del método.

Después, al pasar este filtro al `operador de indexación` del `DataFrame`, todas las filas a las que les corresponda un `True` se mantienen, mientras que las filas a las que les corresponde un `False` se dejan fuera del subconjunto resultante:

In [7]:
mi_condicion = df['author'].str.startswith('R')
# en mi_condicion vienen puros Trues y Falses
df[mi_condicion] # quedate sólo con los que cumplieron
# mi_condicion, i.e con los Trues

Podemos también guardar nuestros filtros en variables y después utilizarlos:

In [8]:
# me quedo con los libros que cuestan mas de 20
filtro_precio_mayor_a_20 = df['price.numberDouble'] > 20

In [9]:
filtro_precio_mayor_a_20

0       True
1       True
2       True
3       True
5       True
        ... 
3027    True
3028    True
3029    True
3030    True
3031    True
Name: price.numberDouble, Length: 2266, dtype: bool

In [11]:
# lo aplico a mi dataframe con los corchetes cuadrados
df[filtro_precio_mayor_a_20]#.head()

Unnamed: 0,amazon_product_url,author,description,publisher,title,oid,bestsellers_date.numberLong,published_date.numberLong,rank.numberInt,rank_last_week.numberInt,weeks_on_list.numberInt,price.numberDouble
0,http://www.amazon.com/The-Host-Novel-Stephenie...,Stephenie Meyer,Descr: Aliens have taken control of the minds ...,"Little, Brown",THE HOST,5b4aa4ead3089013507db18c,2008-05-24 00:00:00,1212883200000,2,1,3,25.99
1,http://www.amazon.com/Love-Youre-With-Emily-Gi...,Emily Giffin,Descr: A woman's happy marriage is shaken when...,St. Martin's,LOVE THE ONE YOU'RE WITH,5b4aa4ead3089013507db18d,2008-05-24 00:00:00,1212883200000,3,2,2,24.95
2,http://www.amazon.com/The-Front-Garano-Patrici...,Patricia Cornwell,Descr: A Massachusetts state investigator and ...,Putnam,THE FRONT,5b4aa4ead3089013507db18e,2008-05-24 00:00:00,1212883200000,4,0,1,22.95
3,http://www.amazon.com/Snuff-Chuck-Palahniuk/dp...,Chuck Palahniuk,Descr: An aging porn queens aims to cap her ca...,Doubleday,SNUFF,5b4aa4ead3089013507db18f,2008-05-24 00:00:00,1212883200000,5,0,1,24.95
5,http://www.amazon.com/Phantom-Prey-John-Sandfo...,John Sandford,Descr: The Minneapolis detective Lucas Davenpo...,Putnam,PHANTOM PREY,5b4aa4ead3089013507db191,2008-05-24 00:00:00,1212883200000,7,4,3,26.95
...,...,...,...,...,...,...,...,...,...,...,...,...
3027,http://www.amazon.com/Unintended-Consequences-...,Stuart Woods,Descr: The New York lawyer Stone Barrington di...,Putnam,UNINTENDED CONSEQUENCES,5b4aa4ead3089013507dc592,2013-04-20 00:00:00,1367712000000,8,4,2,26.95
3028,http://www.amazon.com/Six-Years-Harlan-Coben/d...,Harlan Coben,Descr: Jake Fisher discovers that neither the ...,Dutton,SIX YEARS,5b4aa4ead3089013507dc593,2013-04-20 00:00:00,1367712000000,9,8,5,27.95
3029,http://www.amazon.com/The-Interestings-Novel-M...,Meg Wolitzer,Descr: Six friends meet in the 1970s at a summ...,Riverhead,THE INTERESTINGS,5b4aa4ead3089013507dc595,2013-04-20 00:00:00,1367712000000,11,11,2,27.95
3030,http://www.amazon.com/Man-Without-Breath-Berni...,Philip Kerr,"Descr: Bernie Gunther, the Berlin cop, is sent...",Marian Wood/Putnam,A MAN WITHOUT BREATH,5b4aa4ead3089013507dc597,2013-04-20 00:00:00,1367712000000,13,0,1,26.95


Podemos incluso aplicar dos o más filtros utilizando `operadores lógicos`. En este caso, nuestro operador `and` se representa con un `&` y el operador `or` se representa con `|`:

In [13]:
# me quedo sólo con los que tengan ranking 1
filtro_rank_numero_uno = df['rank.numberInt'] == '1'

In [14]:
# me quedo con los que cumplen los dos filtros
# i.e. cuestan más de 20 y son ranking 1
df[filtro_precio_mayor_a_20 & filtro_rank_numero_uno].head()

Unnamed: 0,amazon_product_url,author,description,publisher,title,oid,bestsellers_date.numberLong,published_date.numberLong,rank.numberInt,rank_last_week.numberInt,weeks_on_list.numberInt,price.numberDouble
51,http://www.amazon.com/Fearless-Fourteen-Janet-...,Janet Evanovich,Descr: Stephanie Plum and her boyfriend Joe Mo...,St. Martin’s,FEARLESS FOURTEEN,5b4aa4ead3089013507db1db,2008-06-21 00:00:00,1215302400000,1,0,1,27.95
63,http://www.amazon.com/Fearless-Fourteen-Janet-...,Janet Evanovich,Descr: Stephanie Plum and her boyfriend Joe Mo...,St. Martin’s,FEARLESS FOURTEEN,5b4aa4ead3089013507db1ef,2008-06-28 00:00:00,1215907200000,1,1,2,27.95
85,http://www.amazon.com/Tribute-Nora-Roberts/dp/...,Nora Roberts,Descr: A former child star returns to Virginia...,Putnam,TRIBUTE,5b4aa4ead3089013507db217,2008-07-12 00:00:00,1217116800000,1,0,1,26.95
98,http://www.amazon.com/Tribute-Nora-Roberts/dp/...,Nora Roberts,Descr: A former child star returns to Virginia...,Putnam,TRIBUTE,5b4aa4ead3089013507db22b,2008-07-19 00:00:00,1217721600000,1,1,2,26.95
111,http://www.amazon.com/Secret-Servant-Gabriel-A...,Daniel Silva,"Descr: Gabriel Allon, an art restorer and an o...",Putnam,THE SECRET SERVANT,5b4aa4ead3089013507db23f,2008-07-26 00:00:00,1218326400000,1,0,1,26.95
