# Filtering the DataFrame with Boolean Arrays (Masks)

- Use [this CheatSheet](https://www.craft.do/s/G80r1dqrQKrjTb/b/F80131CD-4914-414F-8B93-C03B5D1AFCD5/DataFrame) to work better with the following exercises.

## Load the data

In [8]:
import plotly.express as px #!

df_countries = px.data.gapminder()
df_countries = df_countries.query('year == 2007')
df_countries = df_countries.set_index('country')
df_countries

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
Algeria,Africa,2007,72.301,33333216,6223.367465,DZA,12
Angola,Africa,2007,42.731,12420476,4797.231267,AGO,24
Argentina,Americas,2007,75.320,40301927,12779.379640,ARG,32
...,...,...,...,...,...,...,...
Vietnam,Asia,2007,74.249,85262356,2441.576404,VNM,704
West Bank and Gaza,Asia,2007,73.422,4018332,3025.349798,PSE,275
"Yemen, Rep.",Asia,2007,62.698,22211743,2280.769906,YEM,887
Zambia,Africa,2007,42.384,11746035,1271.211593,ZMB,894


## Select only Asian countries in the DataFrame

### Create the mask of Trues and Falses based on the condition

In [11]:
mask = df_countries.continent == "Asia"
mask

country
Afghanistan            True
Albania               False
Algeria               False
Angola                False
Argentina             False
                      ...  
Vietnam                True
West Bank and Gaza     True
Yemen, Rep.            True
Zambia                False
Zimbabwe              False
Name: continent, Length: 142, dtype: bool

### Filter the DataFrame with the `mask` of Trues

In [12]:
df_Asia = df_countries[mask].copy()
df_Asia

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Bahrain,Asia,2007,75.635,708573,29796.04834,BHR,48
Bangladesh,Asia,2007,64.062,150448339,1391.253792,BGD,50
Cambodia,Asia,2007,59.723,14131858,1713.778686,KHM,116
China,Asia,2007,72.961,1318683096,4959.114854,CHN,156
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
India,Asia,2007,64.698,1110396331,2452.210407,IND,356
Indonesia,Asia,2007,70.65,223547000,3540.651564,IDN,360
Iran,Asia,2007,70.964,69453570,11605.71449,IRN,364
Iraq,Asia,2007,59.545,27499638,4471.061906,IRQ,368


## Which countries has a Life Expentancy greater than 80 years old?

### Create the mask of Trues and Falses based on the condition

In [13]:
df_countries.columns

Index(['continent', 'year', 'lifeExp', 'pop', 'gdpPercap', 'iso_alpha',
       'iso_num'],
      dtype='object')

In [14]:
mask = df_countries.lifeExp > 80
mask

country
Afghanistan           False
Albania               False
Algeria               False
Angola                False
Argentina             False
                      ...  
Vietnam               False
West Bank and Gaza    False
Yemen, Rep.           False
Zambia                False
Zimbabwe              False
Name: lifeExp, Length: 142, dtype: bool

### Filter the DataFrame with the `mask` of Trues

In [15]:
df_Age = df_countries[mask].copy()
df_Age

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Australia,Oceania,2007,81.235,20434176,34435.36744,AUS,36
Canada,Americas,2007,80.653,33390141,36319.23501,CAN,124
France,Europe,2007,80.657,61083916,30470.0167,FRA,250
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
Iceland,Europe,2007,81.757,301931,36180.78919,ISL,352
Israel,Asia,2007,80.745,6426679,25523.2771,ISR,376
Italy,Europe,2007,80.546,58147733,28569.7197,ITA,380
Japan,Asia,2007,82.603,127467972,31656.06806,JPN,392
New Zealand,Oceania,2007,80.204,4115771,25185.00911,NZL,554
Norway,Europe,2007,80.196,4627926,49357.19017,NOR,578


## Countries from Asia with Life Expentancy greater than 80 years old

### Compute the masks

In [19]:
mask1 = (df_countries.lifeExp > 80) & (df_countries.continent == "Asia")
mask1

country
Afghanistan           False
Albania               False
Algeria               False
Angola                False
Argentina             False
                      ...  
Vietnam               False
West Bank and Gaza    False
Yemen, Rep.           False
Zambia                False
Zimbabwe              False
Length: 142, dtype: bool

In [23]:
mask1 = (df_countries.lifeExp > 80) | (df_countries.continent == "Asia")
mask1

country
Afghanistan            True
Albania               False
Algeria               False
Angola                False
Argentina             False
                      ...  
Vietnam                True
West Bank and Gaza     True
Yemen, Rep.            True
Zambia                False
Zimbabwe              False
Length: 142, dtype: bool

### Filter the `DataFrame` based on multiple conditions

#### Intersection

In [21]:
df_Age = df_countries[mask1].copy()
df_Age

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
Israel,Asia,2007,80.745,6426679,25523.2771,ISR,376
Japan,Asia,2007,82.603,127467972,31656.06806,JPN,392


#### Union

In [24]:
df_Age = df_countries[mask2].copy()
df_Age

KeyError: "None of [Index([(False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, True, False, True, False, False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, True, False, False, False, False, ...), (True, False, False, False, False, False, False, True, True, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, True, True, True, True, False, True, False, False, True, True, False, True, True, True, True, False, False, False, False, False, True, False, False, False, False, True, False, False, False, True, False, True, False, False, False, False, False, False, True, True, False, False, ...)], dtype='object')] are in the [columns]"

## Get access to our Python Course for Data Visualization

https://courses.resolvingpython.com/data-visualization