## Filtering Series and DataFrames

Similar to arrays, the use of boolean arrays can help to filter Series and DataFrames. Filters can be applied to a Series, DataFrame Column (also a Series), or a DataFrame provided the boolean array is appropirately sized.

In [6]:
import pandas as pd
import numpy as np

#### Row Filtering

In [7]:
df1 = pd.DataFrame(np.random.rand(3, 4), columns=["a", "b", "c", "d"], index=[1, 2, 3])
df1

Unnamed: 0,a,b,c,d
1,0.817405,0.608123,0.938207,0.574088
2,0.637347,0.521438,0.705501,0.221501
3,0.23052,0.044124,0.197023,0.06227


##### Boolean Filtering

In [8]:
df1.loc[np.array([True, False, True])]

Unnamed: 0,a,b,c,d
1,0.817405,0.608123,0.938207,0.574088
3,0.23052,0.044124,0.197023,0.06227


##### Index Filtering

In [9]:
df1.loc[[1, 2]]

Unnamed: 0,a,b,c,d
1,0.817405,0.608123,0.938207,0.574088
2,0.637347,0.521438,0.705501,0.221501


#### Column Filtering

In [12]:
df1.loc[:, np.array([True, False, False, True])]

Unnamed: 0,a,d
1,0.817405,0.574088
2,0.637347,0.221501
3,0.23052,0.06227


#### Filtering By Comparison Mask

In [25]:
mask = df1 > 0.8
mask 

Unnamed: 0,a,b,c,d
1,True,False,True,False
2,False,False,False,False
3,False,False,False,False


You can apply any or all mask
by `axis 0` and `axis 1` to column and row.

In [17]:
mask.any(axis=1)

1     True
2     True
3    False
dtype: bool

In [20]:
mask.all()

a    False
b    False
c    False
d    False
dtype: bool

In [21]:
df1

Unnamed: 0,a,b,c,d
1,0.817405,0.608123,0.938207,0.574088
2,0.637347,0.521438,0.705501,0.221501
3,0.23052,0.044124,0.197023,0.06227


In [22]:
mask

Unnamed: 0,a,b,c,d
1,True,False,True,False
2,False,False,True,False
3,False,False,False,False


In [26]:
df1.loc[mask.any(axis=1), :]

Unnamed: 0,a,b,c,d
1,0.817405,0.608123,0.938207,0.574088


### Filtering Inline

##### Filtering based on individual values.

In [27]:
df2 = pd.DataFrame(np.random.rand(3, 4), columns=['a', 'b', 'c', 'd'], index=[1, 2, 3])
df2

Unnamed: 0,a,b,c,d
1,0.489635,0.727563,0.374323,0.924405
2,0.657168,0.85021,0.880022,0.422601
3,0.409413,0.584174,0.847283,0.734742


In [30]:
df2[df2 > 0.8]

Unnamed: 0,a,b,c,d
1,,,,0.924405
2,,0.85021,0.880022,
3,,,0.847283,


#### Filtering based on column values.

In [31]:
df2["color"] = ["blue", "green", "blue"]
df2

Unnamed: 0,a,b,c,d,color
1,0.489635,0.727563,0.374323,0.924405,blue
2,0.657168,0.85021,0.880022,0.422601,green
3,0.409413,0.584174,0.847283,0.734742,blue


In [44]:
df2[df2["color"] == "blue"]

Unnamed: 0,a,b,c,d,color
1,0.489635,0.0,0.374323,0.924405,blue
3,0.409413,0.0,0.847283,0.734742,blue


In [33]:
df2.loc[df2["color"] == "green", 'b']

2    0.85021
Name: b, dtype: float64

##### Combining Filtering With Assignment.

In [35]:
df2.loc[df2['color'] == 'blue', 'b'] = 0
df2

Unnamed: 0,a,b,c,d,color
1,0.489635,0.0,0.374323,0.924405,blue
2,0.657168,0.85021,0.880022,0.422601,green
3,0.409413,0.0,0.847283,0.734742,blue


#### Fellowship eligibility
Consider DataFrame grades containing grades earned by three friends—Bing, Mahima, and Otto—in introductory-level university courses such as Python 101, Statistics 101, Visualization 101, and Physics 101. The DataFrame is provided below:

In [36]:
grades = pd.DataFrame({'python_101' : [78, 87, 71],
                      'statistics_101' : [88, 67, 90],
                      'visualization_101' : [82, 76, 39],
                      'physics_101' : [65, 39, 56]},
                     index = ['Bing','Mahima','Otto'])
grades

Unnamed: 0,python_101,statistics_101,visualization_101,physics_101
Bing,78,88,82,65
Mahima,87,67,76,39
Otto,71,90,39,56


The university wants to analyze the students’ grades in order to choose the recipient(s) for a fellowship. To receive the fellowship, a student must have a grade of at least 65 in all subjects. Write code that returns True for every student if they fulfill this requirement and False if they do not.

In [43]:
(grades >= 65).all(axis = 1)

Bing       True
Mahima    False
Otto      False
dtype: bool