# Python day 2

## Making choices

In our last lesson, we discovered something suspicious was going on in our inflammation data by drawing some plots. How can we use Python to automatically recognize the different features we saw, and take a different action for each? In this lesson, we’ll learn how to write code that runs only when certain conditions are true.In our last lesson, we discovered something suspicious was going on in our inflammation data by drawing some plots. How can we use Python to automatically recognize the different features we saw, and take a different action for each? In this lesson, we’ll learn how to write code that runs only when certain conditions are true.

In [1]:
# if-statement
num = 37
if num > 100:
    print("greater")
else:
    print("not greater")
print("done")

not greater
done


In [2]:
# If-statements don't require an else
num = 57
print("before conditional")
if num > 100:
    print(num, "is greater than 100")
print("after conditional")

before conditional
after conditional


Consider conditional statements carefully to check which conditions are considered and you would like to consider. In the above example, `num < 100` is not triggered. Sometimes this is the behaviour you want, othertimes you are missing specific conditions:

- What if num is < 100?
- What if num is complex?
- What is num is negative?
- What if num is a string?

In [7]:
# Chain multiple if-statements
num = -3

if num > 0:
    print(num, "is positive")
elif num == 0:
    print(num, "is zero")
else:
    print(num, "is negative")
    

-3 is negative


We can also combine tests using `and` and `or`. `and` is only true if both parts are true:

In [8]:
# Combine comparisons with and
if (1 > 0) and (-1 >= 0):
    print("both parts are true")
else:
    print("at least on part is false")

at least on part is false


In [9]:
# Combine with or
if (1 > 0) or (-1 >= 0):
    print("at least on test is true")

at least on test is true


## Checking our data

1. Let's rerun the `inflammation_analysis.ipynb` 
1. Discuss data 

Let's catch the suspicious data


From the first couple of plots, we saw that maximum daily inflammation exhibits a strange behavior and raises one unit a day. Wouldn’t it be a good idea to detect such behavior and report it as suspicious? Let’s do that! First, we should obtain the suspicious data:

In [10]:
import numpy

In [11]:
data = numpy.loadtxt(fname="data/inflammation-01.csv", delimiter=",")

In [12]:
max_inflammation = numpy.max(data, axis=0)
print(max_inflammation)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10.  9.  8.  7.  6.  5.
  4.  3.  2.  1.]


However, instead of checking every single day of the study, let’s merely check if maximum inflammation in the beginning (day 0) and in the middle (day 20) of the study are equal to the corresponding day numbers.

In [14]:
# Check that day_0 == 0 and day_20 == 20
if (max_inflammation[0] == 0) and (max_inflammation[20] == 20):
    print("Suspicious looking maxima!")

Suspicious looking maxima!


In [15]:
data = numpy.loadtxt(fname="data/inflammation-03.csv", delimiter=",")

In [16]:
min_inflammation = numpy.min(data, axis=0)
print(min_inflammation)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


We also saw a different problem in the third dataset; the minima per day were all zero (looks like a healthy person snuck into our study). We can also check for this with an `elif` condition:

In [17]:
# Let's use sum == 0
if numpy.sum(min_inflammation) == 0:
    print("Minima add up to zero")

Minima add up to zero


### Optional: Let's test all our datasets

1. Find files
1. Create for-loop over filenames
1. Load data
1. Test data

In [43]:
import glob
import numpy

filenames = glob.glob('data/inflammation*.csv')
for filename in filenames:
    data = numpy.loadtxt(fname=filename, delimiter=',')
    
    # Data to test
    max_inflammation = numpy.max(data, axis=0)
    min_inflammation = numpy.min(data, axis=0)
    
    if (max_inflammation[0] == 0) and (max_inflammation[20] == 20):
        print('Suspicious looking maxima in:', filename) 
    elif numpy.sum(min_inflammation) == 0:
        print('Minima add up to zero in:', filename) 
    else:
        print(filename, ' looks OK') 

Suspicious looking maxima in: data\inflammation-01.csv
Suspicious looking maxima in: data\inflammation-02.csv
Minima add up to zero in: data\inflammation-03.csv
Suspicious looking maxima in: data\inflammation-04.csv
Suspicious looking maxima in: data\inflammation-05.csv
Suspicious looking maxima in: data\inflammation-06.csv
Suspicious looking maxima in: data\inflammation-07.csv
Minima add up to zero in: data\inflammation-08.csv
Suspicious looking maxima in: data\inflammation-09.csv
Suspicious looking maxima in: data\inflammation-10.csv
Minima add up to zero in: data\inflammation-11.csv
Suspicious looking maxima in: data\inflammation-12.csv


In this way, we have asked Python to do something different depending on the condition of our data. Here we printed messages in all cases, but we could also imagine not using the `else` catch-all so that messages are only printed when something is wrong, freeing us from having to manually examine every plot for features we’ve seen before.

## Creating functions

At this point, we’ve written code to draw some interesting features in our inflammation data, loop over all our data files to quickly draw these plots for each of them, and have Python make decisions based on what it sees in our data. But, our code is getting pretty long and complicated; what if we had thousands of datasets, and didn’t want to generate a figure for every single one?

Cutting and pasting it is going to make our code get very long and very repetitive, very quickly. We’d like a way to package our code so that it is easier to reuse, and Python provides for this by letting us define things called `functions` — a shorthand way of re-executing longer pieces of code. Let’s start by defining a function `fahr_to_celsius` that converts temperatures from Fahrenheit to Celsius:At this point, we’ve written code to draw some interesting features in our inflammation data, loop over all our data files to quickly draw these plots for each of them, and have Python make decisions based on what it sees in our data. But, our code is getting pretty long and complicated; what if we had thousands of datasets, and didn’t want to generate a figure for every single one?

Cutting and pasting it is going to make our code get very long and very repetitive, very quickly. We’d like a way to package our code so that it is easier to reuse, and Python provides for this by letting us define things called `functions` — a shorthand way of re-executing longer pieces of code. Let’s start by defining a function `fahr_to_celsius` that converts temperatures from Fahrenheit to Celsius:

In [18]:
def fahr_to_celsius(temp_F):
    temp_C = (temp_F - 32) * (5/9)
    return temp_C

Let’s try running our function. This command should call our function, using “32” as the input and return the function value. In fact, calling our own function is no different from calling any other function:

In [19]:
fahr_to_celsius(32)

0.0

Now that we’ve seen how to turn Fahrenheit into Celsius, we can also write the function to turn Celsius into Kelvin:

In [20]:
# Second example to convert Celsius to Kelvin
def celsius_to_kelvin(temp_C):
    return temp_C + 273.15

In [21]:
print("freezing point of water in Kelvin:", celsius_to_kelvin(0))

freezing point of water in Kelvin: 273.15


What about converting Fahrenheit to Kelvin? We could write out the formula, but we don’t need to. Instead, we can compose the two functions we have already created:

In [24]:
def fahr_to_kelvin(temp_F):
    temp_C = fahr_to_celsius(temp_F)
    temp_K = celsius_to_kelvin(temp_C)
    return temp_K

In [25]:
print("boiling point of water in Kelvin:", fahr_to_kelvin(212))

boiling point of water in Kelvin: 373.15


This is our first taste of how larger programs are built: we define basic operations, then combine them in ever-larger chunks to get the effect we want. Real-life functions will usually be larger than the ones shown here — typically half a dozen to a few dozen lines — but they shouldn’t ever be much longer than that, or the next person who reads it won’t be able to understand what’s going on. General guideline: a function should perform one task.

### Variable scope

In composing our temperature conversion functions, we created variables inside of those functions, `temp_C`, `temp_F,` and `temp_K`. We refer to these variables as local variables because they no longer exist once the function is done executing. If we try to access their values outside of the function, we will encounter an error:

In [28]:
print("Temperature in Kelvin was:", temp_K)

Temperature in Kelvin was: 373.15


If you want to reuse the temperature in Kelvin after you have calculated it with fahr_to_kelvin, you can store the result of the function call in a variable:

In [27]:
temp_K = fahr_to_kelvin(212)

# Modular Code

1. Update `inflammation_analysis.ipynb` by introducing functions
2. Create a new script called `processing.py` and copy the functions there
3. Create a new file called `inflammation_analysis_refactored.ipynb` and import our functions to run the analsysis

## Errors and exceptions
Every programmer encounters errors, both those who are just beginning, and those who have been programming for years. Encountering errors and exceptions can be very frustrating at times, and can make coding feel like a hopeless endeavour. However, understanding what the different types of errors are and when you are likely to encounter them can help a lot. Once you know why you get certain types of errors, they become much easier to fix.Every programmer encounters errors, both those who are just beginning, and those who have been programming for years. Encountering errors and exceptions can be very frustrating at times, and can make coding feel like a hopeless endeavour. However, understanding what the different types of errors are and when you are likely to encounter them can help a lot. Once you know why you get certain types of errors, they become much easier to fix.

In [31]:
def favorite_ice_cream():
    ice_cream = ["chocolate", "vanilla", "strawberry"]
    print(ice_cream[3])
    
favorite_ice_cream()

IndexError: list index out of range

This particular traceback has two levels. You can determine the number of levels by looking for the number of arrows on the left hand side. In this case:

1. The first shows code from the cell above, with an arrow pointing to Line 5 (which is `favorite_ice_cream()`).

1. The second shows some code in the function `favorite_ice_cream`, with an arrow pointing to Line 3 (which is `print(ice_creams[3])`).

The last level is the actual place where the error occurred. The other level(s) show what function the program executed to get to the next level down. So, in this case, the program first performed a function call to the function `favorite_ice_cream`. Inside this function, the program encountered an error, when it tried to run the code `print(ice_creams[3])`.

**Long Tracebacks**  
Sometimes, you might see a traceback that is very long – sometimes they might even be 20 levels deep! This can make it seem like something horrible happened, but the length of the error message does not reflect severity, rather, it indicates that your program called many functions before it encountered the error. Most of the time, the actual place where the error occurred is at the bottom-most level, so you can skip down the traceback to the bottom.

So what error did the program actually encounter? In the last line of the traceback, Python helpfully tells us the category or type of error (in this case, it is an `IndexError`) and a more detailed error message (in this case, it says “list index out of range”).

If you encounter an error and don’t know what it means, it is still important to read the traceback closely. That way, if you fix the error, but encounter a new one, you can tell that the error changed. Additionally, sometimes knowing where the error occurred is enough to fix it, even if you don’t entirely understand the message.

# Defensive programming

The previous lessons have introduced the basic tools of programming: variables and lists, file I/O, loops, conditionals, and functions. What they haven’t done is show us how to tell whether a program is getting the right answer, and how to tell if it’s still getting the right answer as we make changes to it.

With that, I have some bad news for you: You will make mistakes! However, with that fact also comes knowledge. The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. 

This is called defensive programming, and the most common way to do it is to add **assertions** to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion’s condition. If it’s true, Python does nothing, but if it’s false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn’t positive:

In [32]:
# Some assertion examples
assert 0 < 1

In [34]:
assert 0 > 1, "0 should be smaller than 1"

AssertionError: 0 should be smaller than 1

In [37]:
# Calculate sum of positive numbers
numbers = [1.5, 2, 0.7, 0.3, 4.4]
total = 0
for num in numbers:
    assert num > 0, "Data should only contain positive numbers"
    total = total + num
print("total is", total)

total is 8.9


### How can catch empty datasets in our analysis?

In [38]:
import numpy

In [39]:
array = numpy.array([1, 1])
empty_array = numpy.array([])

print(array)
print(empty_array)

[1 1]
[]


In [40]:
array.size

2

In [41]:
empty_array.size

0

In [42]:
assert array.size > 0, "Expected non-empty array"
assert empty_array.size > 0, "Expected non-empty array"

AssertionError: Expected non-empty array