# Chapter 5: Flow Control and Conditional Statements

In the last chapter, you learned how to create user defined functions, a little about built in functions, and how to import built in modules to gain access to additional methods for doing some common calculations. In even earlier chapters, you learned about different data structures in Python (e.g. lists, dictionaries) and how to use *for* loops to perform calculations with them. These are useful concepts to master, but notice that in order to use functions, methods, and loops, you have to make sure that the data you want, and only the data you want, is processed. When working realistically sized data sets, it can be incredibly time consuming to organize and select the data you want. Python scripts can help.

For example, we might want to examine only a subset of observations in a data set. In Python (and other programming languages), we can use *if*, *elif* and *else* statments in combination with comparison operators (see chapter 2) to select only the data we want to analyze. Recall, comparison operators (==,!=, >,<,>=,<=) compare the value of a variable to another value and return a boolean (true or false).

As in previous chapters, the specific details of some of the exercises in this chapter are intended as learning exercises, but some may also be practically useful, should you encounter a similar situaion "in the wild".

## *if*, *elif*, *else* and *is*

*If* statements specify a condition, and what to do if that condition is true. So, what a program does after an *if* statement depends on what comes after. An *elif* statement will run after an *if* statment only if the condition of the imediately preceding *if* or *elif* statement is false, and another condition. An *else* statement will run if a previous *if* or *elif* statement is false, not matter what. Notice that both *elif* and *else* statements only run if the result of a previous statement is false. The difference is that *elif* statements require another condition to be true before anything happens, whereas an *else* statement does not. 

Also notice that I snuck in some new Python keywords, [is](https://www.w3schools.com/python/ref_keyword_is.asp), [and](https://www.w3schools.com/python/ref_keyword_and.asp), and [or](https://www.w3schools.com/python/ref_keyword_or.asp). 

The *is* key word functions somewhat like the comparison operator ==. For our purposes, the important thing to know is that the *is* keyword checks to see if two variables point to the same thing in memory, whereas the == operator checks to see if the value of two variables is the same. The *and* and *or* keywords are logical operators.

The examples below should be copied and pasted into your own Jupyter notebook because they build on each other. So, if you haven't already, go ahead and open a new notebook.

### Example 1: Selecting one type of data

Let's pretend you have some data stored in a list, and the data is of different types. For illustrative purposes, we'll just assume the data consists of floating point numbers and strings (i.e. text). What we want to do is go through the list and compute the mean (or whatever else we might want to do) for the valid numbers. The code below will do that. Notice that there is only one *if* statement.

In [8]:
import statistics #import the statistics library

rawData = [1.0,2.0,"3.0","4.0","Bob"]#list with floating point numbers and strings

#function for filtering data
def getFloats(data):
    justFloats = []#initialize empty list to put the floats into
    for i in data:
        if type(i) is float: #if statement, with the *is* keyword snuck in for good measure
            justFloats.append(i)
    return justFloats #always need a return statement! In this case, it's a new list

goodData = getFloats(rawData)#use function to create list of juist floats

meanGoodData = statistics.mean(goodData)#calculate the mean

print("The mean is " + str(meanGoodData))#print the mean

The mean is 1.5


### Example 2: Selecting two types data.

In the previous exercise, the function just checked the data type of each element and made a new list for the floats. What if we also wanted a list of the strings?  The code below does that. Notice the differences in the *return* statements of example 1 above and example 2 below.

In [7]:
rawData = [1.0,2.0,"3.0","4.0","Bob"]#list with floating point numbers and strings

def sepFloatsStrings(data):#separates floats and strings
    justFloats = []
    justStrings = []
    for i in data:
        if type(i) is float:
            justFloats.append(i)
        elif type(i) is str:#use elif because
            justStrings.append(i)
    return justFloats, justStrings#this returns two lists. The two lists are stored in a tuple.

FloatString = sepFloatsStrings(rawData)
print(FloatString)

([1.0, 2.0], ['3.0', '4.0', 'Bob'])


Notice the output has two lists (one of floats and one of strings) contained within parentheses. This happened because our function returned two lists. The result is that our two outputs are stored in a data structure called a *tuple*. For our purposes here, you can think of a tuple as a list that can't be changed or updated. Read more about tuples [here](https://www.w3schools.com/python/python_tuples.asp) if you want. Items in tuples are indexed and can be accessed just like items in a list or individual characters in a string. (I should point out here that I did this just for illustration. It's typically easier to follow code when functions return one thing, and not two, so think hard about whether it's better to create one or two functions.

### Example 3: Getting rid of extreme values

The preceding examples selected data based on the type of variable. Once we have the correct kind of data, sometimes we want further select values based on the value. For example, in reaction time experiments, it's common to eliminate extreme values or outliers. There are several methods for determining what an outlier is, and which method to use is highly dependent on context. Perhaps the simplest method is to specify the lowest and highest values. This method is quite simple, but is useful in cases where extreme values can be identified *a priori*, as is often the case in reaction time experiments.

In [9]:
extremeData = [0.1,2.3,1.4,1.2,1.5,1.6,2.9,3.1,4.3,1.2,1.7,0.8,2.6,7.8,]

def trimData(x,lb,ub):
    trimmedData = []
    for i in x:
        if lb < i < ub:
            trimmedData.append(i)
    return trimmedData

nonExtremeData = trimData(extremeData,.2,3.0)

print(nonExtremeData)


[2.3, 1.4, 1.2, 1.5, 1.6, 2.9, 1.2, 1.7, 0.8, 2.6]


Notice that each of code inside each functions in the examples above follow the same general pattern: set up variables to store the value(s) that need to be returned, use for loop to pull out the required values, and a return statement that returns the variables that were declared at the beginning of the function.

### Example 4: Selecting only extreme values

Anytime you're analyzing data and discarding values, it's necessary to keep track of how much data you're discarding. In some cases, it might actually be interesting to examine just the extreme values. The example below counts the number of values rejected for being either too low or too high.

In [12]:
extremeData = [0.1,2.3,1.4,1.2,1.5,1.6,2.9,3.1,4.3,1.2,1.7,0.8,2.6,7.8]

def countRejects(x,lb,ub):
    tooLow = 0
    tooHigh = 0
    for i in x:
        if i < lb:
            tooLow += 1
        elif i > ub:
            tooHigh += 1
    return tooLow, tooHigh

rejectData = countRejects(extremeData,.2,3.0)

print(rejectData)

(1, 3)


### Example 5: Combining previous examples into one example

Remember that the point of writing user-defined functions is to make it easy to reuse the code. If you've been following along and executing each of the examples in the same Jupyter notebook (and not deleting cells as you go), then this example will work. It uses some of the functions to filter data and then calculate the mean and standard deviation.

In [19]:
messyData = [0.1,2.3,1.4,1.2,1.5,0.1,1.6,"Tata",2.9,3.1,4.3,1.2,1.7,0.8,2.6,7.8,"3.0",9.7,"4.0","Bob"]
lb = 0.2
ub = 3.0

justFloats = getFloats(messyData)
noOutliers = trimData(justFloats, lb, ub)
numRejectData = countRejects(justFloats, lb, ub)

theMean = statistics.mean(noOutliers)
theSD = statistics.stdev(noOutliers)

print("The mean is " + str(theMean))
print("The standard deviation is " + str(theSD))
print(str(numRejectData[0]) + " data points were rejected as too low")
print(str(numRejectData[1]) + " data points were rejected as too high")




The mean is 1.72
The standard deviation is 0.6713171133426189
2 data points were rejected as too low
4 data points were rejected as too high


(Need to add an example separating data into two groups)

## Exercises

### Exercise 1