# Filtering DataFrames and Series

In [1]:
import pandas as pd

We are going to use a dataset that has Airbnb listing information in Lisbon.

In [2]:
df = pd.read_csv('data/airbnb.csv', index_col='room_id')

In [3]:
df.shape

(13232, 8)

In [4]:
df.head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0


# Selecting rows

## Selecting rows by their position - iloc

We use the function [iloc](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html) to select specific rows on a Data Frame (regardless of the index).

With `iloc` we select rows regarding their row number, starting at 0.

In [5]:
df.iloc[0]

host_id                           14455
room_type               Entire home/apt
neighborhood                      Belém
reviews                               8
overall_satisfaction                  5
accommodates                          2
bedrooms                              1
price                                57
Name: 6499, dtype: object

In [6]:
type(df.iloc[0])

pandas.core.series.Series

If we want the selection to be a dataframe (instead of a Series), we can use double brackets `[[]]`

In [7]:
df.iloc[[0]]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0


We can select multiple rows at once:

In [8]:
df.iloc[[0, 3,5]]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29720,128075,Entire home/apt,Estrela,14,5.0,16,9.0,1154.0


Or use slices like with arrays:

In [9]:
df.iloc[2:10]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0
29720,128075,Entire home/apt,Estrela,14,5.0,16,9.0,1154.0
29872,128698,Entire home/apt,Alcântara,25,5.0,2,1.0,75.0
29891,128792,Entire home/apt,Misericórdia,28,5.0,3,1.0,49.0
29915,128890,Entire home/apt,Avenidas Novas,28,4.5,3,1.0,58.0
33312,144398,Entire home/apt,Misericórdia,24,4.5,4,1.0,66.0


## Selecting rows by their index value - loc

* With [.loc](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) we can select rows based on their index value.

Since we have set the dataframe index as the Airbnb listing, we can select a specific room based on its id, for example, the listing 10186098.

In [10]:
df.loc[10186098]

host_id                            520897
room_type                 Entire home/apt
neighborhood            Santa Maria Maior
reviews                                 2
overall_satisfaction                    0
accommodates                            4
bedrooms                                1
price                                  64
Name: 10186098, dtype: object

Selecting an index value that doesnt exist wil fail like it would do with a dictionary

In [11]:
df.loc[[5]]

KeyError: "None of [Int64Index([5], dtype='int64', name='room_id')] are in the [index]"

Same as with .iloc, we can select multiple values at once.

In [12]:
df.loc[[29872, 19188572, 4612503 ]]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
29872,128698,Entire home/apt,Alcântara,25,5.0,2,1.0,75.0
19188572,134216988,Private room,Arroios,0,0.0,4,1.0,58.0
4612503,22078192,Entire home/apt,Santa Maria Maior,12,5.0,3,1.0,113.0


We can use a boolean array (a list of True and False) to select multiple rows with loc, this is called a **mask**.

In [13]:
df.loc[[True, True, False, True]]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0


We see we have selected rows 1,2 and 4 of the dataframe

In [14]:
df.head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0


## Column Selection

## Selecting columns by their name

We can select columns using dot notation **(as long as the column names dont have spaces or non alphanumerical characters on them)**

In [15]:
df.room_type

room_id
6499        Entire home/apt
17031       Entire home/apt
25659       Entire home/apt
29248       Entire home/apt
29396       Entire home/apt
29720       Entire home/apt
29872       Entire home/apt
29891       Entire home/apt
29915       Entire home/apt
33312       Entire home/apt
33348          Private room
34783          Private room
34977       Entire home/apt
40817       Entire home/apt
42172       Entire home/apt
42519       Entire home/apt
44043       Entire home/apt
46567          Private room
47717       Entire home/apt
50108       Entire home/apt
55116          Private room
56906       Entire home/apt
57850       Entire home/apt
59227       Entire home/apt
65553       Entire home/apt
65878       Entire home/apt
72807          Private room
73764       Entire home/apt
75171       Entire home/apt
77130       Entire home/apt
                 ...       
19360001       Private room
19361749    Entire home/apt
19362076    Entire home/apt
19362417       Private room
19363100    

Which is the same as doing:

In [16]:
df['room_type']

room_id
6499        Entire home/apt
17031       Entire home/apt
25659       Entire home/apt
29248       Entire home/apt
29396       Entire home/apt
29720       Entire home/apt
29872       Entire home/apt
29891       Entire home/apt
29915       Entire home/apt
33312       Entire home/apt
33348          Private room
34783          Private room
34977       Entire home/apt
40817       Entire home/apt
42172       Entire home/apt
42519       Entire home/apt
44043       Entire home/apt
46567          Private room
47717       Entire home/apt
50108       Entire home/apt
55116          Private room
56906       Entire home/apt
57850       Entire home/apt
59227       Entire home/apt
65553       Entire home/apt
65878       Entire home/apt
72807          Private room
73764       Entire home/apt
75171       Entire home/apt
77130       Entire home/apt
                 ...       
19360001       Private room
19361749    Entire home/apt
19362076    Entire home/apt
19362417       Private room
19363100    

When we select one column we receive a pd.Series, we can use double brackets to select multiple columns (if we select multiple columns we will always receive a dataframe). 

In [17]:
df[["room_type", "price"]].head()

Unnamed: 0_level_0,room_type,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1
6499,Entire home/apt,57.0
17031,Entire home/apt,46.0
25659,Entire home/apt,69.0
29248,Entire home/apt,58.0
29396,Entire home/apt,67.0


We can always select columns with loc

In [18]:
df.loc[:, "room_type"][:10]

room_id
6499     Entire home/apt
17031    Entire home/apt
25659    Entire home/apt
29248    Entire home/apt
29396    Entire home/apt
29720    Entire home/apt
29872    Entire home/apt
29891    Entire home/apt
29915    Entire home/apt
33312    Entire home/apt
Name: room_type, dtype: object

The index doesnt have to be unique, for example we can set the neighbourhood as the index.

In [19]:
df.head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0


In [20]:
df = df.set_index("neighborhood")

In [21]:
df.loc["Belém"].head()

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Belém,14455,Entire home/apt,8,5.0,2,1.0,57.0
Belém,992647,Entire home/apt,54,4.0,2,1.0,45.0
Belém,2083563,Private room,2,0.0,2,1.0,127.0
Belém,2341627,Entire home/apt,64,4.5,4,1.0,67.0
Belém,3168004,Entire home/apt,57,4.5,3,2.0,46.0


We set back the index to `host_id`, we need to use the argument `drop=False` so pandas doesnt remove the original index 

In [22]:
df = df.set_index("host_id", drop=False)

## Mask

The function ([Mask](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mask.html)) allows us to "hide" parts of a dataframe that match a certain condition.

In [23]:
df.mask(df.overall_satisfaction == 5.0)

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
14455,,,,,,,
66015,66015.0,Entire home/apt,0.0,0.0,2.0,1.0,46.0
107347,,,,,,,
125768,125768.0,Entire home/apt,225.0,4.5,4.0,1.0,58.0
126415,,,,,,,
128075,,,,,,,
128698,,,,,,,
128792,,,,,,,
128890,128890.0,Entire home/apt,28.0,4.5,3.0,1.0,58.0
144398,144398.0,Entire home/apt,24.0,4.5,4.0,1.0,66.0


We see that the rows that dont match the condition appear as `NaN`, which stands for **Not a Number**, a standard way of saying *"there is no relevant data here"*. Pandas will usually ignore the NaNs.

## Where

On the other hand, [where](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.where.html) hides those rows that don't match the condition (where is the opposite of mask).

In [24]:
df.where(df.overall_satisfaction == 5.0)

#you can use the any or all functions to allow for multiple conditions 

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
14455,14455.0,Entire home/apt,8.0,5.0,2.0,1.0,57.0
66015,,,,,,,
107347,107347.0,Entire home/apt,63.0,5.0,3.0,1.0,69.0
125768,,,,,,,
126415,126415.0,Entire home/apt,132.0,5.0,4.0,1.0,67.0
128075,128075.0,Entire home/apt,14.0,5.0,16.0,9.0,1154.0
128698,128698.0,Entire home/apt,25.0,5.0,2.0,1.0,75.0
128792,128792.0,Entire home/apt,28.0,5.0,3.0,1.0,49.0
128890,,,,,,,
144398,,,,,,,


# Filtering with []

We can also filter by using brackets.
The difference between filtering with brackets and using `mask/where` is that with brackets we only receive a segment of the dataframe (less rows), while with `mask/where` we receive a dataframe with the same rows and index than the original one.

For example, we can filter the dataframe to see all the listings in `Belem`:

In [28]:
df = pd.read_csv('data/airbnb.csv', index_col='room_id')

In [29]:
df.head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0


In [30]:
df.where(df.neighborhood=="Belém").shape

(13232, 8)

If we use brackets, the dataframe we get is smaller

In [31]:
df[df.neighborhood == 'Belém']

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
202654,992647,Entire home/apt,Belém,54,4.0,2,1.0,45.0
418945,2083563,Private room,Belém,2,0.0,2,1.0,127.0
472183,2341627,Entire home/apt,Belém,64,4.5,4,1.0,67.0
635674,3168004,Entire home/apt,Belém,57,4.5,3,2.0,46.0
758927,3999677,Entire home/apt,Belém,10,5.0,4,2.0,69.0
783766,4132746,Private room,Belém,49,4.5,3,1.0,22.0
787134,3634413,Entire home/apt,Belém,48,4.0,5,3.0,58.0
820798,1756107,Entire home/apt,Belém,4,4.5,4,1.0,68.0
862418,4520597,Entire home/apt,Belém,84,4.5,4,1.0,46.0


In [32]:
df[df.neighborhood == 'Belém'].shape

(254, 8)

We can select the inverse of a condition if we put `~` in front of it.

For example, to select all listings that are not in Belem, we can do this:

In [33]:
df[~(df.neighborhood ==  "Belém")].shape

(12978, 8)

# Multiple Selection

We can filter a dataframe based on multiple conditions.

We can select rows that match multiple conditions by concatenating the conditions with `&`.

For example, if we want those listings in Belém with more than 3 bedrooms:

In [34]:
df[(df.neighborhood == 'Belém') & (df.bedrooms > 3)]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2127428,8810620,Entire home/apt,Belém,17,5.0,9,5.0,184.0
5737018,29744618,Entire home/apt,Belém,46,4.5,10,4.0,128.0
6884183,11926451,Entire home/apt,Belém,1,0.0,10,5.0,138.0
9522737,46064752,Entire home/apt,Belém,27,5.0,8,4.0,78.0
15272166,17263208,Entire home/apt,Belém,1,0.0,6,4.0,288.0
17228964,101182323,Entire home/apt,Belém,2,0.0,16,7.0,260.0


Same way, we can select rows that match one condition OR the other with the pipe (`|`)

In [35]:
df[(df.neighborhood == "Belém") | (df.neighborhood == "Benfica")]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
202654,992647,Entire home/apt,Belém,54,4.0,2,1.0,45.0
212915,1097919,Private room,Benfica,0,0.0,6,1.0,93.0
418945,2083563,Private room,Belém,2,0.0,2,1.0,127.0
472183,2341627,Entire home/apt,Belém,64,4.5,4,1.0,67.0
573313,2820083,Entire home/apt,Benfica,12,4.5,4,2.0,74.0
635674,3168004,Entire home/apt,Belém,57,4.5,3,2.0,46.0
758927,3999677,Entire home/apt,Belém,10,5.0,4,2.0,69.0
783766,4132746,Private room,Belém,49,4.5,3,1.0,22.0
787134,3634413,Entire home/apt,Belém,48,4.0,5,3.0,58.0


## Isnull/Notnull

Sometimes we simple want to select those rows there there are no null (`NaN`) values.

We can select rows where a column is null by doing `column.isnull()`.

In [36]:
df[df.overall_satisfaction.isnull()]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1424459,7665008,Entire home/apt,Misericórdia,63,,8,4.0,104.0
3112523,15814568,Entire home/apt,Misericórdia,1,,3,1.0,69.0
4460918,6317960,Entire home/apt,Arroios,75,,4,1.0,40.0
5569600,24291881,Entire home/apt,Misericórdia,151,,4,2.0,45.0
10257022,52731804,Entire home/apt,Santa Maria Maior,21,,2,0.0,46.0
10413859,53629861,Entire home/apt,Arroios,70,,4,1.0,58.0
11255567,56974375,Entire home/apt,Arroios,8,,2,1.0,52.0
15963308,55556833,Entire home/apt,Santo António,0,,3,2.0,97.0
17848221,119340757,Entire home/apt,Misericórdia,1,,4,1.0,45.0
19225610,62988799,Entire home/apt,Misericórdia,0,,2,0.0,74.0


Likewise, we can select those rows where a column is not null by using `notnull()`.

In [37]:
df[df.overall_satisfaction.notnull()].head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0


An easy way to check if any column has null values is by using `df.notnull().all()` (all will return True only if all rows match the condition:

In [38]:
df.notnull().all()

host_id                  True
room_type                True
neighborhood             True
reviews                  True
overall_satisfaction    False
accommodates             True
bedrooms                 True
price                    True
dtype: bool

So we see that the column overall_satisfaction has some null values on it

We can find those rows that have any null like this (any returns true if any value is true, and using axis=1 means we are checking rows instead of columns:

In [39]:
df[df.isnull().any(axis=1)]

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1424459,7665008,Entire home/apt,Misericórdia,63,,8,4.0,104.0
3112523,15814568,Entire home/apt,Misericórdia,1,,3,1.0,69.0
4460918,6317960,Entire home/apt,Arroios,75,,4,1.0,40.0
5569600,24291881,Entire home/apt,Misericórdia,151,,4,2.0,45.0
10257022,52731804,Entire home/apt,Santa Maria Maior,21,,2,0.0,46.0
10413859,53629861,Entire home/apt,Arroios,70,,4,1.0,58.0
11255567,56974375,Entire home/apt,Arroios,8,,2,1.0,52.0
15963308,55556833,Entire home/apt,Santo António,0,,3,2.0,97.0
17848221,119340757,Entire home/apt,Misericórdia,1,,4,1.0,45.0
19225610,62988799,Entire home/apt,Misericórdia,0,,2,0.0,74.0


# Isin

We can check if an element belongs to a python list like this:

In [40]:
"potato" in ["potato", "tomato", "lettuce"]

True

We can use a similar approach with pandas dataframes using `.isin`. For example, if we want to select those listings where the neighborhood is in a specific list we can do it like this:

In [41]:
favorite_neighbourhoods = ["Belém", "Parque das Nações"]

listings_i_like = df[df.neighborhood.isin(favorite_neighbourhoods)]

listings_i_like.head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
105962,106839925,Entire home/apt,Parque das Nações,29,5.0,4,1.0,69.0
129822,639944,Private room,Parque das Nações,50,4.5,2,1.0,55.0
137434,671930,Entire home/apt,Parque das Nações,0,0.0,6,2.0,138.0
184888,887175,Entire home/apt,Parque das Nações,7,4.5,6,2.0,115.0


# Query

The method `.query` allows us to use SQL to select rows from a dataframe.

In [42]:
df.query("neighborhood=='Belém' and price>150")

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1803927,9456814,Entire home/apt,Belém,0,0.0,6,3.0,173.0
2127428,8810620,Entire home/apt,Belém,17,5.0,9,5.0,184.0
2608025,9456814,Entire home/apt,Belém,0,0.0,10,2.0,288.0
2945967,2083563,Entire home/apt,Belém,1,0.0,2,1.0,202.0
2993599,15262252,Private room,Belém,0,0.0,2,1.0,404.0
3024528,15263699,Entire home/apt,Belém,0,0.0,4,2.0,692.0
3473364,3953109,Entire home/apt,Belém,0,0.0,5,3.0,209.0
3566413,3455184,Entire home/apt,Belém,5,5.0,6,3.0,173.0
4101878,21277737,Entire home/apt,Belém,0,0.0,4,2.0,230.0
5245908,5776233,Entire home/apt,Belém,14,5.0,4,1.0,173.0


## Filtering based on datatypes

In [43]:
df.dtypes

host_id                   int64
room_type                object
neighborhood             object
reviews                   int64
overall_satisfaction    float64
accommodates              int64
bedrooms                float64
price                   float64
dtype: object

We can use the method `select_dtypes` to select those columns that have specific types. 

For example, if we want to select only the columns that are floats, we can do:

In [44]:
df.select_dtypes(include=[float]).head()

Unnamed: 0_level_0,overall_satisfaction,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
6499,5.0,1.0,57.0
17031,0.0,1.0,46.0
25659,5.0,1.0,69.0
29248,4.5,1.0,58.0
29396,5.0,1.0,67.0


We can also use the parameter `exclude` to filter excluding certain data types. 

For example, if we want to exclude those columns that are python objects (and strings are objects), we can do so like:

In [45]:
df.select_dtypes(exclude=[object]).head()

Unnamed: 0_level_0,host_id,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
6499,14455,8,5.0,2,1.0,57.0
17031,66015,0,0.0,2,1.0,46.0
25659,107347,63,5.0,3,1.0,69.0
29248,125768,225,4.5,4,1.0,58.0
29396,126415,132,5.0,4,1.0,67.0
