# Intermediate Python for Data Science

## Chapter 3 - Logic, Control Flow and Filtering

### Comparison Operators
Comparison operators can tell how Python values relate and result in a boolean. 
Less than ( < )
Greater than ( > )
Less than or equal to ( <= )
Greater than or equal to (>= )
Equal to ( == ) 
Not equal to (!=)

All comparison operators can be used not just on numbers, but on strings as well. According to the alphabet, carl comes before chris, so in the statement below, the output will be true.

In [1]:
"carl" < "chris"

True

What happens when you try to compare a string to an integer? You get an error. Python can't tell how two objects with different types relate.

In [2]:
3 < "chris"

TypeError: '<' not supported between instances of 'int' and 'str'

Exceptions are different numeric types, such as floats and integers. In general, always make sure you make comparisons of objects of the same type.

In [3]:
3 < 4.1

True

Another exception arises when we review the BMI example from Introduction to Python, Chapter 4, NumPy Arrays. When we compare the NumPy array BMI with an integer 23. NumPy figures out that you want to compare every element in the array with 23 and returns corresponding booleans. 

In [4]:
#Copy over BMI NumPy arrays from Introduction to Python
import numpy as np
height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
np_height = np.array(height)
np_weight = np.array(weight)
bmi = np_weight/np_height**2

bmi > 23

array([False, False, False,  True, False])

Behind the scenes, NumPy builds a NumPy array of the same size filled with 23 and then performs an element wise comparison. 

### Exercise 1

#### Equality
To check if two Python values, or variables, are equal you can use ==. To check for inequality, you need !=. As a refresher, have a look at the following examples that all result in True. Feel free to try them out in the IPython Shell.
<br>
2 == (1 + 1)<br>
"intermediate" != "python"<br>
True != False<br>
"Python" != "python"<br>
When you write these comparisons in a script, you will need to wrap a print() function around them to see the output.<br>
<br>
__Instructions:__
* 1In the editor on the right, write code to see if True equals False.
* Write Python code to check if -5 * 15 is not equal to 75.
* Ask Python whether the strings "pyscript" and "PyScript" are equal.
* What happens if you compare booleans and integers? Write code to see if True and 1 are equal.

In [1]:
#Compare booleans
print(True == False)

#Compare integers
print(-5*15 != 75)

#Compare strings
print('pyscript' == "PyScript")

#Compare booleans to strings
print(True == 1)

False
True
False
True


#### Greater and less than
In the video, Filip also talked about the less than and greater than signs, < and > in Python. You can combine them with an equals sign: <= and >=. Pay attention: <= is valid syntax, but =< is not.
<br>
All Python expressions in the following code chunk evaluate to True:
<br>
3 < 4<br>
3 <= 4<br>
"alpha" <= "beta"<br>
Remember that for string comparison, Python determines the relationship based on alphabetical order.

__Instructions:__
* Write Python expressions, wrapped in a print() function, to check whether:
> x is greater than or equal to -10. x has already been defined for you.<br>
>"test" is less than or equal to y. y has already been defined for you.<br>
>True is greater than False.

In [1]:
#Compare integers
x = -3 * 6
print(x >= -10)

#Compare strings
y = 'test'
print( "test" <= y)

#Compare booleans
True > False

False
True


True

#### Compare arrays
Out of the box, you can also use comparison operators with Numpy arrays.
<br>
Remember areas, the list of area measurements for different rooms in your house from Introduction to Python? This time there's two Numpy arrays: my_house and your_house. They both contain the areas for the kitchen, living room, bedroom and bathroom in the same order, so you can compare them.

__Instructions:__
* Using comparison operators, generate boolean arrays that answer the following questions:
 * Which areas in my_house are greater than or equal to 18?<br>
 * You can also compare two Numpy arrays element-wise. Which areas in my_house are smaller than the ones in your_house?<br>
 * Make sure to wrap both commands in a print() statement so that you can inspect the output!

In [3]:
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than or equal to 18
print(my_house >=18)

# my_house less than your_house
print(my_house <= your_house)

[ True  True False False]
[False  True  True False]


### Boolean Operators

The next step in using comparison operators is to combine them using boolean operators. The 3 most common are add, or and not.

#### And Operator
The And operators takes two booleans and only returns True if both operators are True. This means that True and True will evaluate to True, but all other combinations of True and False will result in False.

In [8]:
True and True

True

In [9]:
False and True

False

In [10]:
True and False

False

In [11]:
False and False

False

Instead of using booleans, you can use results of comparisons. Suppose you have a variable x = 12. To check if this variable is greater than 5 and less than 15, use the and operator. Since both parts of the statement, before and after the and, will evaluate True, the result will be True.

In [12]:
x = 12
x > 5 and x < 15  

True

#### Or Operator
The or operator works similarly. The difference is only, at least one, of the statements must be True for the result to be True. The or operator can also be used with variables. 

In [13]:
True or True

True

In [14]:
False or True

True

In [15]:
True or False

True

In [16]:
False or False

False

In [17]:
y = 5 
y < 7 or y > 13

True

#### Not Operator
This operator negates the boolean value you use it on. 

In [19]:
not True

False

In [20]:
not False

True

The not operator is useful when you combine boolean operators and then want to negate that result. 

#### NumPy Arrays and Comparison Operators
Combining comparison operators is different with NumPy arrays. Using the BMI example, let's say we want to find the BMIs greater than 21 and less than 22. Each individual comparison works as expected, but when you combine them with and, you get an error.

In [21]:
bmi > 21

array([ True, False,  True,  True,  True])

In [22]:
bmi < 22

array([ True,  True,  True, False,  True])

In [23]:
bmi > 21 and bmi < 22

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The NumPy documentation will show the following operators to combine comparison operators for a NumPy array:
logical_and()
logical_or()
logical_not()

In [24]:
np.logical_and(bmi > 21, bmi < 22)

array([ True, False,  True, False,  True])

To select the values of the array that meet the comparison operators, you can include the comparisons in square brackets.

In [25]:
bmi[np.logical_and(bmi > 21, bmi < 22)]

array([21.85171573, 21.75028214, 21.44127836])

### Exercise 2 

#### and, or, not (1)
A boolean is either 1 or 0, True or False. With boolean operators such as and, or and not, you can combine these booleans to perform more advanced queries on your data.
<br>
In the sample code on the right, two variables are defined: my_kitchen and your_kitchen, representing areas.

__Instructions:__
* Write Python expressions, wrapped in a print() function, to check whether:
 * my_kitchen is bigger than 10 and smaller than 18.
 * my_kitchen is smaller than 14 or bigger than 17.
 * double the area of my_kitchen is smaller than triple the area of your_kitchen.

In [26]:
#Define variables
my_kitchen = 18.0
your_kitchen = 14.0

# my_kitchen bigger than 10 and smaller than 18?
print(10 < my_kitchen and my_kitchen <18)

# my_kitchen smaller than 14 or bigger than 17?
print(14 < my_kitchen or my_kitchen > 17)

# Double my_kitchen smaller than triple your_kitchen?
print(my_kitchen*2 < your_kitchen*3)

False
True
True


#### Boolean operators with Numpy
Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the boolean operators and, or, and not.<br>
<br>
To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here's an example on the my_house and your_house arrays from before to give you an idea:<br>
<br>
np.logical_and(my_house > 13, your_house < 15)<br>
<br>
__Instructions:__
* Generate boolean arrays that answer the following questions:
* Which areas in my_house are greater than 18.5 or smaller than 10?
* Which areas are smaller than 11 in both my_house and your_house? Make sure to wrap both commands in print() statement, so that you can inspect the output.

In [4]:
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10
print(np.logical_or(10>my_house, my_house>18.5))

# Both my_house and your_house smaller than 11
print(np.logical_and(my_house<11, your_house<11))

[False  True False  True]
[False False False  True]


### if, elif, else

Using boolean operators to combine comparison operators can then be used to direct how the Python code behaves, depending on the outcome using conditional statements like if, elif and else.

#### if
In the code below, the if statement checks to see if z modulo 2 equals 0. Since this if statement will result in True with z equal to 4, the code after the colon, in the line(s) below will execute. 

In [28]:
#Example of if conditional statement
z = 4
if z % 2 == 0:
    print('z is even')

z is even


The general recipe for an if statement is:

if CONDITION :
    EXPRESSION!![ifsyntax.png](attachment:ifsyntax.png)

If condition, execute expression. Notice the colon at the end and then indent the code 4 spaces to tell Python what to do in case the condition succeeds. To exit the if statement, continue with Python code without the indentation. 

There can be multiple lines of code in the expression:

In [30]:
z = 4
if z % 2 == 0:
    print("checking " + str(z))
    print("z is even")

checking 4
z is even


If the condition does not pass, then none of lines in the execution statement(s) are executed. We can see this if we change z to 5.

In [31]:
z = 5
if z % 2 == 0:
    print("checking " + str(z))
    print("z is even")

Suppose you want to print out Z is odd when that is the case. You can simply use an else statement.

In [32]:
z = 5
if z % 2 == 0:
    print("checking " + str(z))
    print("z is even")
else: 
    print("z is odd")

z is odd


The general recipe for an if, else statement looks like this:
if condition:
    expression
else:
    expression

There is no condition that is needed for the else statement. The corresponding expression gets executed when the condition for the if statement does not hold. 

#### elif
There are cases where even more customized behavior is needed. Say you want different print out that are divisible by 2 and by 3. You can throw some elifs in there to check multiple conditions, one at a time. 

In [33]:
z = 3
if z % 2 == 0:
    print("z is even")
elif z % 3 == 0:
    print("z is divisible by 3")
else: 
    print("z is neither divisible by 2 nor by 3")

z is divisible by 3


What happens if we change z to 6? As soon as Python meets a condition, the rest of the statements are not evaluated and the expression for the met condition is executed.

In [35]:
z = 6
if z % 2 == 0:
    print("z is divisible by 2")
elif z % 3 == 0:
    print("z is divisible by 3")
else: 
    print("z is neither divisible by 2 nor by 3")

z is divisible by 2


The general recipe for an if, elif, else statement looks like this:
if condition:
    expression
elif condition:
    expression
else:
    expression

### Exercise 3
#### if
It's time to take a closer look around in your house. Two variables are defined in the sample code: room, a string that tells you which room of the house we're looking at, and area, the area of that room.<br>
<br>
__Instructions:__
* Examine the if statement that prints out "Looking around in the kitchen." if room equals "kit".
* Write another if statement that prints out "big place!" if area is greater than 15.'''

In [36]:
# Define variables
room = "kit"
area = 14.0

# if statement for room
if room == "kit" :
    print("looking around in the kitchen.")

# if statement for area
if area > 15:
    print("big place!")

looking around in the kitchen.


#### Add else
On the right, the if construct for room has been extended with an else statement so that "looking around elsewhere." is printed if the condition room == "kit" evaluates to False.<br>
<br>
Can you do a similar thing to add more functionality to the if construct for area?

__Instructions:__
* Add an else statement to the second control structure so that "pretty small." is printed out if area > 15 evaluates to False.

In [37]:
# Define variables
room = "kit"
area = 14.0

# if-else construct for room
if room == "kit" :
    print("looking around in the kitchen.")
else :
    print("looking around elsewhere.")

# if-else construct for area
if area > 15 :
    print("big place!")
else:
    print("pretty small.")

looking around in the kitchen.
pretty small.


#### Customize further: elif
It's also possible to have a look around in the bedroom. The sample code contains an elif part that checks if room equals "bed". In that case, "looking around in the bedroom." is printed out.<br>
<br>
It's up to you now! Make a similar addition to the second control structure to further customize the messages for different values of area.<br>
<br>
__Instructions:__
* Add an elif to the second control structure such that "medium size, nice!" is printed out if area is greater than 10.

In [38]:
# Define variables
room = "bed"
area = 14.0

# if-elif-else construct for room
if room == "kit" :
    print("looking around in the kitchen.")
elif room == "bed":
    print("looking around in the bedroom.")
else :
    print("looking around elsewhere.")

# if-elif-else construct for area
if area > 15 :
    print("big place!")
elif area > 10 :
    print("medium size, nice!")
else :
    print("pretty small.")

looking around in the bedroom.
medium size, nice!


### Filtering Pandas DataFrames
First, let's import the brics data from the csv file.

In [2]:
import pandas as pd
brics = pd.read_csv('C:\\datacamp\\02-IntermediatePython\\data\\brics.csv', index_col = 0)
brics

Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0
SA,South Africa,Pretoria,1.221,52.98


Suppose now you want to select the countries with an area greater than 8 million square kilometers. There are 3 steps:
Get the area column from brics as a Pandas Series, not a DataFrame
Do the comparison on the area column and store the results
Use the result to select the countries from the brics DataFrame

In [41]:
#Step 1: Get Column
brics['area']

BR     8.516
RU    17.100
IN     3.286
CH     9.597
SA     1.221
Name: area, dtype: float64

In [42]:
#Step 2: Do the Comparison by appending > 8
brics['area'] > 8

BR     True
RU     True
IN    False
CH     True
SA    False
Name: area, dtype: bool

In [43]:
#Store the Pandas Series as is_huge
is_huge = brics['area'] > 8

In [44]:
#Final step is to pass the boolean series within square brackets to subset the Pandas Dataframe
brics[is_huge]

Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
CH,China,Beijing,9.597,1357.0


In [45]:
#All 3 steps can be combined
brics[brics["area"] > 8]

Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
CH,China,Beijing,9.597,1357.0


Because Pandas is based on NumPy, the NumPy conditional operators can be used on a Pandas DataFrame. Simply import NumPy.

In [46]:
import numpy as np
np.logical_and(brics['area'] > 8, brics['area']<10)

BR     True
RU    False
IN    False
CH     True
SA    False
Name: area, dtype: bool

Place the conditional operator code above inside square brackets to subet the brics DataFrame.

In [47]:
brics[np.logical_and(brics['area'] > 8, brics['area']<10)]

Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
CH,China,Beijing,9.597,1357.0


### Exercise 4

#### Driving right (1)
Remember that cars dataset, containing the cars per 1000 people (cars_per_cap) and whether people drive right (drives_right) for different countries (country)? The code that imports this data in CSV format into Python as a DataFrame is available on the right.<br>
<br>
In the video, you saw a step-by-step approach to filter observations from a DataFrame based on boolean arrays. Let's start simple and try to find all observations in cars where drives_right is True.<br>
<br>
drives_right is a boolean column, so you'll have to extract it as a Series and then use this boolean Series to select observations from cars.<br>

__Instructions:__
* Extract the drives_right column as a Pandas Series and store it as dr.
* Use dr, a boolean Series, to subset the cars DataFrame. Store the resulting selection in sel.
* Print sel, and assert that drives_right is True for all observations.

In [1]:
# Import cars data
import pandas as pd
cars = pd.read_csv('C:\\datacamp\\02-IntermediatePython\\data\\cars.csv', index_col = 0)

# Extract drives_right column as Series: dr
dr = cars["drives_right"] == 1

# Use dr to subset cars: sel
sel = cars[dr]

# Print sel
print(sel)

     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True


#### Driving right (2)
The code in the previous example worked fine, but you actually unnecessarily created a new variable dr. You can achieve the same result without this intermediate variable. Put the code that computes dr straight into the square brackets that select observations from cars.<br>

__Instructions:__
* Convert the code to a one-liner that calculates the variable sel as before.

In [49]:
#Convert code to one line
sel = cars[cars['drives_right']]
print(sel)

     cars_per_cap       country  drives_right
US            809  UnitedStates          True
RU            200        Russia          True
MOR            70       Morocco          True
EG             45         Egypt          True


#### Cars per capita (1)
Let's stick to the cars data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars.<br>
<br>
Similar to the previous example, you'll want to build up a boolean Series, that you can then use to subset the cars DataFrame to select certain observations. If you want to do this in a one-liner, that's perfectly fine!<br>

__Instructions:__
* Select the cars_per_cap column from cars as a Pandas Series and store it as cpc.
* Use cpc in combination with a comparison operator and 500. You want to end up with a boolean Series that's True if the corresponding country has a cars_per_cap of more than 500 and False otherwise. Store this boolean Series as many_cars.
* Use many_cars to subset cars, similar to what you did before. Store the result as car_maniac.
* Print out car_maniac to see if you got it right.

In [52]:
#Create car_maniac: observations that have cars_per_cap > 500
many_cars = cars['cars_per_cap'] > 500
car_maniac = cars[many_cars]
print(car_maniac)

     cars_per_cap       country  drives_right
US            809  UnitedStates          True
AUS           731     Australia         False
JAP           588         Japan         False


#### Cars per capita (2)
Remember about np.logical_and(), np.logical_or() and np.logical_not(), the Numpy variants of the and, or and not operators? You can also use them on Pandas Series to do more advanced filtering operations.<br>
<br>
Take this example that selects the observations that have a cars_per_cap between 10 and 80. Try out these lines of code step by step to see what's happening.<br>
<br>
cpc = cars['cars_per_cap']<br>
between = np.logical_and(cpc > 10, cpc < 80)<br>
medium = cars[between]<br>

__Instructions:__
* Use the code sample above to create a DataFrame medium, that includes all the observations of cars that have a cars_per_cap between 100 and 500.
* Print out medium.

In [2]:
#Import NumPy
import numpy as np

#Create medium: observations with cars_per_cap between 100 and 500
cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]
print(medium)

    cars_per_cap country  drives_right
RU           200  Russia          True
