In [1]:
import pandas as pd

# Boolean Indexing

In [2]:
# Sample data

df = pd.DataFrame({
    "Name": ["Onkar", "Amit", "Sara", "Rohit", "Neha"],
    "Age": [21, 25, 23, 29, 20],
    "City": ["Pune", "Mumbai", "Nashik", "Pune", "Nagpur"],
    "Salary": [50000, 65000, 55000, 70000, 48000]
})

df

Unnamed: 0,Name,Age,City,Salary
0,Onkar,21,Pune,50000
1,Amit,25,Mumbai,65000
2,Sara,23,Nashik,55000
3,Rohit,29,Pune,70000
4,Neha,20,Nagpur,48000


## 1. Booleam Indexing -> Using a True/False array to select rows

The filtering creates a True False mask

In [3]:
df["Age"] > 23

0    False
1     True
2    False
3     True
4    False
Name: Age, dtype: bool

When we apply that mask to DataFrame, the DataFrame will return only those values which are True.

In [4]:
df[df["Age"]>23]

Unnamed: 0,Name,Age,City,Salary
1,Amit,25,Mumbai,65000
3,Rohit,29,Pune,70000


## 2. Can store mask in a variable

In [5]:
mask = df["Salary"] > 60000
mask

0    False
1     True
2    False
3     True
4    False
Name: Salary, dtype: bool

In [6]:
df[mask]

Unnamed: 0,Name,Age,City,Salary
1,Amit,25,Mumbai,65000
3,Rohit,29,Pune,70000


Usefull when want to apply filter multiple times

## 3. Combine multiple masks

In [8]:
mask1 = ((df["Age"]>23) & (df["City"]=="Pune"))
df[mask1]

Unnamed: 0,Name,Age,City,Salary
3,Rohit,29,Pune,70000


In [9]:
mask2 = ((df["Age"]>23) & (df["Salary"]>60000))
df[mask2]

Unnamed: 0,Name,Age,City,Salary
1,Amit,25,Mumbai,65000
3,Rohit,29,Pune,70000


In [12]:
mask3 = ~(df["Age"]<23)
df[mask3]

Unnamed: 0,Name,Age,City,Salary
1,Amit,25,Mumbai,65000
2,Sara,23,Nashik,55000
3,Rohit,29,Pune,70000


In [13]:
mask3 = ~(df["Age"]<=23)
df[mask3]

Unnamed: 0,Name,Age,City,Salary
1,Amit,25,Mumbai,65000
3,Rohit,29,Pune,70000


## 4. Mask with `.isin()`

In [14]:
mask = df["City"].isin(["Pune", "Mumbai"])
df[mask]

Unnamed: 0,Name,Age,City,Salary
0,Onkar,21,Pune,50000
1,Amit,25,Mumbai,65000
3,Rohit,29,Pune,70000


In [15]:
mask = ~df["City"].isin(["Pune", "Mumbai"])
df[mask]

Unnamed: 0,Name,Age,City,Salary
2,Sara,23,Nashik,55000
4,Neha,20,Nagpur,48000


## 5. Boolean mask from string operation

In [18]:
mask = df["Name"].str.contains("a", case=False)
df[mask]

Unnamed: 0,Name,Age,City,Salary
0,Onkar,21,Pune,50000
1,Amit,25,Mumbai,65000
2,Sara,23,Nashik,55000
4,Neha,20,Nagpur,48000


In [19]:
mask = df["Name"].str.contains("a")
df[mask]

Unnamed: 0,Name,Age,City,Salary
0,Onkar,21,Pune,50000
2,Sara,23,Nashik,55000
4,Neha,20,Nagpur,48000


## 6. Boolean mask with multiple column logic

In [24]:
mask = ((df["Salary"]>=50000) & (df["Salary"]<=65000) & (df["Age"]<28))
df[mask]

Unnamed: 0,Name,Age,City,Salary
0,Onkar,21,Pune,50000
1,Amit,25,Mumbai,65000
2,Sara,23,Nashik,55000


## 7. Imp rule

The number of rows in the mask has to be equal to number of rows in DataFrame.  
If not -> Error

## 8. What boolean indexing really does

It just keep only those rows who have True

# Summary

Boolean indexing = Uses a True/False mask to pick/filter rows.  