## V11: Filtering Pandas Data Frame:

Get columns using:
    
    1. brics["area"] -- This will be a Panda Series
    2. brics.loc[:, "area"] -- This will give pandas dataframe
    3. brics.iloc[:,2] -- This will give pandas dataframe
    
### Find area which is greater than 8 million km^2.
We can do this in below steps:
  1. Select the area column
      * brics["area"]
  2. Do comparision on area column
      * brics["area"] > 8
  3. Use result to select countries
      * brics[brics["area"]>8]
  
            
### Using Logical Boolean operators on DataFrame:
        
        import numpy as np
        
        brics[np.logical_and(brics['area' > 8], brics['area'] <10)]
        
        
          

## Example 1: Driving right (1)
Remember that cars dataset, containing the cars per 1000 people (cars_per_cap) and whether people drive right (drives_right) for different countries (country)? The code that imports this data in CSV format into Python as a DataFrame is included in the script.

In the video, you saw a step-by-step approach to filter observations from a DataFrame based on boolean arrays. Let's start simple and try to find all observations in cars where drives_right is True.

drives_right is a boolean column, so you'll have to extract it as a Series and then use this boolean Series to select observations from cars.

### Steps: 
1. Extract the drives_right column as a Pandas Series and store it as dr.
2. Use dr, a boolean Series, to subset the cars DataFrame. Store the resulting selection in sel.
3. Print sel, and assert that drives_right is True for all observations.

In [1]:

import pandas as pd
import numpy as np

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars)

# Extract drives_right column as Series: dr
dr=cars.loc[:,'drives_right']
print(dr)
# Use dr to subset cars: sel
sel=cars[dr]

# Print sel
print(sel)


## Second way to implement it:

dr2 = cars['drives_right']
print('\nDrives_Right column:\n',dr2)

# 2. Use dr, a boolean Series, to subset the cars DataFrame. Store the resulting selection in sel

sel2 = cars[dr2]

print('Drives_right2 is true', sel2)

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True
US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool
     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

Drives_Right column:
 US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool
Drives_right2 is true      cars_per_cap        country  drives_right
US            809  United States          True
RU  

## Exampele 2: Driving right (2)
The code in the previous example worked fine, but you actually unnecessarily created a new variable dr. You can achieve the same result without this intermediate variable. Put the code that computes dr straight into the square brackets that select observations from cars.

### Steps:
Convert the code to a one-liner that calculates the variable sel as before.



In [2]:
import pandas as pd

cars21 = pd.read_csv('cars.csv')

sel21 = cars21[cars21['drives_right']]
print(sel21)

  Unnamed: 0  cars_per_cap        country  drives_right
0         US           809  United States          True
4         RU           200         Russia          True
5        MOR            70        Morocco          True
6         EG            45          Egypt          True


## Example 3: Cars per capita (1)
Let's stick to the cars data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars.

Similar to the previous example, you'll want to build up a boolean Series, that you can then use to subset the cars DataFrame to select certain observations. If you want to do this in a one-liner, that's perfectly fine!

### Steps:

1. Select the cars_per_cap column from cars as a Pandas Series and store it as cpc.
2. Use cpc in combination with a comparison operator and 500. You want to end up with a boolean Series that's True if the corresponding country has a cars_per_cap of more than 500 and False otherwise. Store this boolean Series as many_cars.
3. Use many_cars to subset cars, similar to what you did before. Store the result as car_maniac.
4. Print out car_maniac to see if you got it right.


In [10]:
cars3 = pd.read_csv('cars.csv', index_col=0)
print(cars3)

cpc = cars3['cars_per_cap']
many_cars = cpc> 500
car_maniac= cars3[many_cars]

print('\n\n\n',car_maniac)

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True



      cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JPN           588          Japan         False


## Example 4: Cars per capita (2)
Remember about np.logical_and(), np.logical_or() and np.logical_not(), the NumPy variants of the and, or and not operators? You can also use them on Pandas Series to do more advanced filtering operations.

Take this example that selects the observations that have a cars_per_cap between 10 and 80. Try out these lines of code step by step to see what's happening.

    cpc = cars['cars_per_cap']
    between = np.logical_and(cpc > 10, cpc < 80)
    medium = cars[between]

### Steps:
1. Use the code sample provided to create a DataFrame medium, that includes all the observations of cars that have a cars_per_cap between 100 and 500.
2. Print out medium.

In [14]:
## print(cars)

medium = np.logical_and( cpc > 100, cpc < 500 )
print(cars[medium])


    cars_per_cap country  drives_right
RU           200  Russia          True
