# Data collection project tutorials: Flow control and file handling

During the first tutorial, we introduced the basic types of data and data structures in python. We also visited some small concepts of indexing and looping, but didn't dig in detail in them. That's what we're going to do today!

## Flow control 

### "for" loops

These loops make us visit every element in an indexed object, for intance, a list. So let's start explaining with a list:

In [None]:
my_list = ['hello', 'everybody', 'how', 'are', 'you']

Let's first simply print each element of the list with a for loop

In [None]:
for thing in my_list:
    print(thing)

And we also saw that we can operate with each element in a for loop:

In [None]:
for word in my_list:
    sentence = 'My program is going to print the word: ' + word
    print(sentence)

We can **nest** loops within loops, for instance, to obtain all possible combinations of elements in a list:

In [None]:
for word in my_list:
    for other_word in my_list:
        print(word + ' ' + other_word)

At this point it might be convenient to introduce a flow visualizer tool, so we can see where in the scrip python is reading on each step: http://www.pythontutor.com/visualize.html#mode=edit

For now, we'll leave the for loops aside until we use them in more complex code, and move on.

### "if" loops

If loops, also known as "if then" loops can be seen as a decission based on information in real life. 

For example, let's say you're the bouncer at a club, and you don't allow anyone under the age of 18 to access the club. If you were an if loop, you will check their id card, evaluate whether the condition person_age >= 18, and if this statement is true, then you will let them in. Let's butcher this process in a programatic way:

In [None]:
person_0 = 17
person_1 = 18
person_2 = 21

Let's evaluate for each case whether they are 18 or older:

In [None]:
print(person_0 >= 18)
print(person_1 >= 18)
print(person_2 >= 18)

We might also want to re-direct people that are precisely 18 to another bouncer to check whethere the ID is real. In that case we need a different evaluation, equal (==). But I'm getting tired of typing so much, let me put them in a list so we can make this faster. 

In [None]:
ages_list = [person_0, person_1, person_2]

print(ages_list)

In [None]:
for person in ages_list:
    print(person == 18)

So here we used a for loop to iterate through each element in a list. Then, we evaluated for each element the condition person == 18 and got the boolean "True" or "False". An "if" loop performs these kind of evaluations and only proceeds with the code in it if the evaluation is "True". Example:

In [None]:
for person in ages_list:
    if person >= 18:
        print('Person is ' + str(person) + ' years old, therefore they are allowed in')
    else:  # This means, any other condition. If all the evaluated conditions above are "False", the actions defined in 'else' will be executed
        print('Person is ' + str(person) + ' years old, they can not come in')

But we forgot about the second check! If the id says 18, we need to send it to check whether the id is real! Let's add a second (or as many as we need) condition using "elif". 

In [None]:
for person in ages_list:
    if person >= 18:
        print('Person is ' + str(person) + ' years old, therefore they are allowed in')
    elif person == 18:
        print('Person is ' + str(person) + ' years old, needs a second check')
    else:  # This means, any other condition. If all the evaluated conditions above are "False", the actions defined in 'else' will be executed
        print('Person is ' + str(person) + ' years old, they can not come in')

**Wait what happened here?** Is my computer broken? No. 

Computers only understand pure logic, and the instruction that it has is to evaluate as conditions appear and execute the code as soon as a condition is "True". So the first condition was wether the preson was "At least 18", which is true, so the first action is executed. We need to modify the code to get the right logic that we want. 

In [None]:
for person in ages_list:
    if person > 18:  # Now it only works for those older than 18, so anyone that is 19 or older
        print('Person is ' + str(person) + ' years old, therefore they are allowed in')
    elif person == 18:
        print('Person is ' + str(person) + ' years old, needs a second check')
    else:  # This means, any other condition. If all the evaluated conditions above are "False", the actions defined in 'else' will be executed
        print('Person is ' + str(person) + ' years old, they can not come in')

### Homework 1

We now are going to implement the "second check" of the bouncer, to see if the id is real or fake. Use the list that I will define below to classify the people into the following outputs:

- "This person is [older than 18, put the number] years old, therefore they are allowed in without further checking"
- "This person is 18, and their id is real, so they are allowed in"
- "This person is 18, but their id is fake, so they are NOT allowed in"
- "This person is [younger than 18], they are NOT allowed in"

And remember to substitute the text between brackets by the right age.

In [None]:
persons_ids_list = [[15, 'fake'], [33, 'real'], [25, 'real'], [20, 'fake'], [18, 'real'], [18, 'fake'], [15, 'real'], [42, 'real']]

And just as hints for your task, pay attention to the following lines of code:

In [None]:
print(persons_ids_list[2])
print(persons_ids_list[2][0])
print(persons_ids_list[2][1])

**GOOD LUCK**

## File input and output

We are recycling the iris file from our previous exercise. This time we're going to read it using a more fundamental way than with pandas, as it was done last time. We are going to read it the way **any** file can be read in python, independently of its format:

In [None]:
f = open('../1_intro/iris.csv', 'r')
for line in f:
    print(line)
f.close()

Now we have read every line in the file and closed the file. The information is not yet available, so we need to re-open the file if we want to read from the top again. 

We are reading a comma-separated values file (.csv), so let's also make python understand that the elements need to be read separately and put them in lists:

In [None]:
f = open('../1_intro/iris.csv', 'r')
for line in f:
    print(line.split(','))
f.close()

Do you see that "\n" in the last position of each list? That's the "return" value for python, which lets python know that there's a need to start a new line. For example:

In [None]:
print("This\nmessage\ngoes\nin\ndifferent\nlines")

We can get rid of it we use the rstrip() function before splitting the elements in a list:

In [None]:
f = open('../1_intro/iris.csv', 'r')
for line in f:
    print(line.rstrip().split(','))  # first rstrip() is executed, then split(',') is applied to the output of rstrip()
f.close()

We can also put them together as we want to using either join():

In [None]:
f = open('../1_intro/iris.csv', 'r')
for line in f:
    temp = line.rstrip().split(',')  # first rstrip() is executed, then split(',') is applied to the output of rstrip()
    print(' quarantine sucks '.join(temp))
f.close()

Or however we want with indexing:

In [None]:
f = open('../1_intro/iris.csv', 'r')
for line in f:
    temp = line.rstrip().split(',')  # first rstrip() is executed, then split(',') is applied to the output of rstrip()
    print(temp[0] + ' yoooo ' + temp[2] + ' ' + temp[-1] + ' etc...')
f.close()

But let's say that we only want to print the lines that contain information about the variety virginica:

In [None]:
f = open('../1_intro/iris.csv', 'r')
for line in f:
    temp = line.rstrip().split(',')  # first rstrip() is executed, then split(',') is applied to the output of rstrip()
    if temp[-1] == 'virginica':
        print(' quarantine sucks '.join(temp))
f.close()

### Write output in file

But now we want to write this selection in a file. For that, we need to first open a file with writing permissions and write if the desired condition is true:

In [None]:
outfile = open('virginica_output.txt', 'w')

infile = open('../1_intro/iris.csv', 'r')
for line in infile:
    temp = line.rstrip().split(',')  # first rstrip() is executed, then split(',') is applied to the output of rstrip()
    if temp[-1] == 'virginica':
        outfile.write(' quarantine sucks '.join(temp))
infile.close()
outfile.close()

So what happens with the file? Is this the output that you wanted? Why is it not splitting the lines?

In [None]:
outfile = open('virginica_output.txt', 'w')

infile = open('../1_intro/iris.csv', 'r')
for line in infile:
    temp = line.rstrip().split(',')  # first rstrip() is executed, then split(',') is applied to the output of rstrip()
    if temp[-1] == 'virginica':
        outfile.write(' quarantine sucks '.join(temp) + '\n')
infile.close()
outfile.close()

**GOOD**

### Homework 2

Using the same iris dataset, write in the file "ratios_setosa.txt" the length/width ratios of the petals and sepals so it each line looks like this:

`Petal ratio is [number], sepal ratio is [number], variety is setosa`

Make sure that you're only calculating the setosa entries

**Good luck!**