## Filtering Pandas DataFrames
 you saw a step-by-step approach to filter observations from a DataFrame based on boolean arrays. Let's start simple and try to find all observations

In [1]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict={
    'country':['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt'],
    'drives_right':[True, False, False, False, True, True, True],
     'cpc':[809, 731, 588, 18, 200, 70, 45]
    }

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)

         country  drives_right  cpc
0  United States          True  809
1      Australia         False  731
2          Japan         False  588
3          India         False   18
4         Russia          True  200
5        Morocco          True   70
6          Egypt          True   45


In [2]:
# Extract drives_right column as Series: dr
dr = cars['drives_right']

# Use dr to subset cars: sel
sel = cars[dr] 

# Print sel
print(sel)

         country  drives_right  cpc
0  United States          True  809
4         Russia          True  200
5        Morocco          True   70
6          Egypt          True   45


The code in the previous example worked fine, but you actually unnecessarily created a new variable dr. You can achieve the same result without this intermediate variable. Put the code that computes dr straight into the square brackets that select observations from cars.

In [3]:
# Convert code to a one-liner
sel = cars[cars['drives_right']]


# Print sel
print(sel)

         country  drives_right  cpc
0  United States          True  809
4         Russia          True  200
5        Morocco          True   70
6          Egypt          True   45


Let's stick to the cars data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars.

In [5]:
cpc = cars['cpc']
# Create car_maniac: observations that have a cars_per_cap over 500
many_cars = cpc > 500

car_maniac = cars[many_cars]

# Print car_maniac
print(car_maniac)

         country  drives_right  cpc
0  United States          True  809
1      Australia         False  731
2          Japan         False  588


the Numpy variants of the and, or and not operators You can also use them on Pandas Series to do more advanced filtering operations.

In [6]:
# Import numpy, you'll need this
import numpy as np

# Create medium: observations with cars_per_cap between 100 and 500
cpc = cars['cpc']
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]

# Print medium
print(medium)

  country  drives_right  cpc
4  Russia          True  200
