# Filtering the pandas.DataFrame with boolean arrays (masks)

## Data

In [1]:
import pandas as pd

df_countries = pd.read_csv('data/gapminder.csv', index_col=0)
df_countries

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
...,...,...,...,...,...,...,...
Zambia,Africa,2007,42.384,11746035,1271.211593,ZMB,894
Zimbabwe,Africa,2007,43.487,12311143,469.709298,ZWE,716


[**See `filter` instructions**](https://datons.craft.me/h3f5pSQSE7l6RW) to complete the following exercises.

## Single condition

**Exercise**: Filter countries from `Asia`.

### Categorical

#### Create mask

```python
mask = df['column'] == 'value'
```

In [2]:
mask_asia = df_countries['continent'] == 'Asia'
mask_asia

country
Afghanistan     True
Albania        False
               ...  
Zambia         False
Zimbabwe       False
Name: continent, Length: 142, dtype: bool

#### Filter DataFrame with mask

```python
df[mask]
```

In [3]:
df_countries[mask_asia]

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Bahrain,Asia,2007,75.635,708573,29796.04834,BHR,48
Bangladesh,Asia,2007,64.062,150448339,1391.253792,BGD,50
Cambodia,Asia,2007,59.723,14131858,1713.778686,KHM,116
China,Asia,2007,72.961,1318683096,4959.114854,CHN,156
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
India,Asia,2007,64.698,1110396331,2452.210407,IND,356
Indonesia,Asia,2007,70.65,223547000,3540.651564,IDN,360
Iran,Asia,2007,70.964,69453570,11605.71449,IRN,364
Iraq,Asia,2007,59.545,27499638,4471.061906,IRQ,368


### Numerical

**Exercise**: Filter countries with `lifeExp greater than 80` years old.

#### Create mask

```python
mask = df['column'] > number
```

In [4]:
mask_80 = df_countries['lifeExp'] > 80
mask_80

country
Afghanistan    False
Albania        False
               ...  
Zambia         False
Zimbabwe       False
Name: lifeExp, Length: 142, dtype: bool

#### Filter DataFrame with mask

```python
df[mask]
```

In [5]:
df_countries[mask_80]

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Australia,Oceania,2007,81.235,20434176,34435.36744,AUS,36
Canada,Americas,2007,80.653,33390141,36319.23501,CAN,124
France,Europe,2007,80.657,61083916,30470.0167,FRA,250
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
Iceland,Europe,2007,81.757,301931,36180.78919,ISL,352
Israel,Asia,2007,80.745,6426679,25523.2771,ISR,376
Italy,Europe,2007,80.546,58147733,28569.7197,ITA,380
Japan,Asia,2007,82.603,127467972,31656.06806,JPN,392
New Zealand,Oceania,2007,80.204,4115771,25185.00911,NZL,554
Norway,Europe,2007,80.196,4627926,49357.19017,NOR,578


## Combine multiple conditions

**Exercise**: Filter countries from `Asia` and with `lifeExp greater than 80` years old.

### Intersection

```python
mask = mask1 & mask2 # intersection (true on both conditions)
```

In [6]:
mask = mask_asia & mask_80
df_countries[mask]

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
Israel,Asia,2007,80.745,6426679,25523.2771,ISR,376
Japan,Asia,2007,82.603,127467972,31656.06806,JPN,392


### Union

```python
mask = mask1 | mask2 # union (true on at least one condition)
```

In [7]:
mask = mask_asia | mask_80
df_countries[mask]

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Australia,Oceania,2007,81.235,20434176,34435.36744,AUS,36
Bahrain,Asia,2007,75.635,708573,29796.04834,BHR,48
Bangladesh,Asia,2007,64.062,150448339,1391.253792,BGD,50
Cambodia,Asia,2007,59.723,14131858,1713.778686,KHM,116
Canada,Americas,2007,80.653,33390141,36319.23501,CAN,124
China,Asia,2007,72.961,1318683096,4959.114854,CHN,156
France,Europe,2007,80.657,61083916,30470.0167,FRA,250
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
Iceland,Europe,2007,81.757,301931,36180.78919,ISL,352
