# Learning to Program: Building Blocks Part 2 (IO, Control Structures, Dictionaries)

*"Every great developer you know got there by solving problems they were unqualified to solve until they actually did it."*<br>*- Patrick McKenzie*

## IO (Cont'd)

At the end of the previous class we looked at how we could print things to output using the **print** command. For example, we could calculate the length of a list and print its value to output using the following code:

In [None]:
title_list = ['Love in the Time of Cholera', '100 Years of Solitude', 'Chronicle of a Death Foretold']
title_list_count = 0
for title in title_list:
    title_list_count = title_list_count + 1
print("The length of the list is", title_list_count)

Let's recall how our list of book titles looked like:

In [None]:
title_list

What if we wanted this list to have more names? What if we had all of these titles stored in a file somewhere? Wouldn't it be much easier to simply "transfer" the contents of this file to our program? 

![read.png](attachment:read.png)

To open a file in Python and read its contents, you can use the following framework:

In [None]:
with open('marquez_works.txt') as text_file:
    for line in text_file:
        line = line.strip()
        # code to perform with each line

Note that there are several parts to this framework:
1. We first see the **open** command. It takes in one argument: the name of the file we want to open as a string (in this case, 'marquez_works.txt').
2. **with open('marquez_works.txt') as text_file**: this is pretty much the same as saying **text_file = open('marquez_works.txt')**, except that it takes care of implementing extra functionalities for opening files that we're not really concerned about. 
3. **for line in text_file**: allows us to loop over each line in the file. **Each line has a string value**. 
4. **line = line.strip()**: each line in the file is a string that ends with a newline character ('\n'). For example, the first line of 'marquez_works.txt' is actually 'Leaf Storm\n'. But if we do **line = line.strip()**, we get rid of the newline character - so we end up with just 'Leaf Storm'. 

**Exercise:** Open the file called 'marquez_works_extended.txt' and **print** its contents. 

In [None]:
# Solution 
with open('marquez_works_extended.txt') as text_file:
    for line in text_file:
        line = line.strip()
        print(line)

*Fun fact*: You just implemented a common tool called **cat** that programmers use to view the contents of a file. No small feat - congratulations! 

Now we have mostly been printing the results of our programming to this notebook. We could save our result by copying and pasting the results of our notebook to a file. But if - most likely, when - our results start piling up, copying and pasting is possible, but tedious. Is there a way we could directly 'write' the results of our programming in Python to a file without copying and pasting? 

We can accomplish this with a similar framework to the one we used to read a file:

In [None]:
with open('our_work.txt', 'w') as text_file:
    for title in title_list:
        title_with_newline = title + '\n'
        text_file.write(title_with_newline)

Here we are sending the contents of title_list to a file called 'our_work.txt'. Notice the following:
1. **open('our_work.txt', 'w')**: Here we have included an extra parameter for the **open** command - a string: 'w'. This little string tells the open command to prepare the file our_work.txt to be **written** to.
2. **title_with_newline = title + '\n'**: You are probably puzzled about this line. Since when can we add strings together. Indeed we can - we'll see this in future classes. For now,  just assume that this line adds a **newline** character ('\n') to a line. The **newline** character is a little string that is created implicitly every time we press "Enter" on our keyboard to go to a new line. We need to explicity add this to each line, since Python can't  read the implicit version. 
3. **text_file.write(title_with_newline)**: this is where we actually write to the file. The **write** command always takes a string as an argument (the value that goes in between the parentheses in **write( )** ).  

**Exercise:** In a file called 'my_classmates.txt', write the names of three people that are closest to you in this room, one name per line. 

In [None]:
# Solution
friends_list = ['Curly','Larry','Moe']
with open('my_classmates.txt', 'w') as text_file:
    for title in title_list:
        title_with_newline = title + '\n'
        text_file.write(title_with_newline)

## Control structures (if-else statements)

Suppose now that we are avid readers of García Márquez and we wish to find out whether our favorite book by the author - say, *The Autumn of the Patriach* - is present in the list of works in 'marquez_works.txt'. How could we do this? 

Let's start with a simpler task. How could we find out whether *The Autumn of the Patriach* is in the first line of the file 'marquez_works.txt'? From our previous exercise we know that the first line of the file is

In [None]:
first_line = 'Leaf Storm'

So to accomplish our task, we could compare the value of first_line with the string 'The Autumn of the Patriach'. If they are the same, then we have our answer. Otherwise, if they are not the same, just pass and continue with the next item. 

Now the golden question is, how can we compare one value to another? And based on the result of that comparison, how can we pick one course of action over another? 

The answer is, through an **if** statement. An **if** statement is composed of three parts: a **condition** and two **actions**:

In [None]:
### Do not run this cell
if CONDITION evaluates to true:
    ACTION 1
else:
    ACTION 2

CONDITION is something that is either true or false. ACTION 1 is taken if the CONDITION is true, and ACTION 2 is taken if the condition is false. For example:

In [None]:
### Do not run this cell
if <I do the exercises>:
    <I will understand the concepts>
else:
    <I will not understand the concepts>

In programming, conditions normally can't be written in natural language as we did above - yet. For now, conditions in Python can evaluate to two possible values: True or False. 

Here are some examples of conditions in Python that evaluate to True:

In [None]:
17.5 == 17.5

In [None]:
'Leaf Storm' == 'Leaf Storm'

In [None]:
[1967,1981,1985] == [1967,1981,1985]

In [None]:
novel = 'Chronicle of a Death Foretold'
novel == 'Chronicle of a Death Foretold'

In [None]:
17.5 + 15.5 == 33.0

In [None]:
funding = 33.0
17.5 + 15.5 == funding

In [None]:
'Chronicle of a Death Foretold' in ['One Hundred Years of Solitude', 'Chronicle of a Death Foretold', 'Love in the Time of Cholera']

Notice here that **x in y** evaluates to **True** if x is a member of list y, and **False** otherwise. 

And here are examples of conditions that evaluate to False: 

In [None]:
20.5 < 20.5

In [None]:
'Leaf Storm' != 'Leaf Storm'

In [None]:
[1967,1981,1985] == [1967,1981,1985,1992]

In [None]:
novel = 'Chronicle of a Death Foretold'
novel == 'Love in the Time of Cholera'

In [None]:
not(17.5 + 15.5 == 33.0)

In [None]:
funding = 20.0
17.5 + 15.5 == funding

In [None]:
'Chronicle of a Death Foretold' not in ['One Hundred Years of Solitude', 'Chronicle of a Death Foretold', 'Love in the Time of Cholera']

**Exercise:** Which of these expressions evaluate to True? Which of them evaluate to False?

In [None]:
'chronicle of a Death Foretold' == 'Chronicle of a Death Foretold' # False

In [None]:
19 // 3 == 6.33 # False

In [None]:
1967 not in [1958, 1987, 1965, 1976] # True

Recall our **if** statement:

In [None]:
### Do not run this cell
if CONDITION evaluates to true:
    ACTION 1
else:
    ACTION 2

What types of commands can ACTION 1 and ACTION 2 can be? 

In short, pretty much anything. We can have a variable assignment, a for-loop, a sequence of variable assignments and a for-loop, print commands - we can even put another **if** statement! 

For example, we could have:

In [None]:
title = 'Chronicle of a Death Foretold'
title_list = ['One Hundred Years of Solitude', 'Chronicle of a Death Foretold', 'Love in the Time of Cholera']
if title in title_list:
    print("I read", title)
    print("... after school finished.")
else:
    pass

Notice the **pass** keyword. This is used in an **if** statement when we don't want to do anything. 

Note that once we have executed the code in the 'if' part of the **if** statement, it will not go to the else part - it will jump to after the 'else' part.

**Exercise:** Recall why we introduced the **if** statement: we wanted to find out whether *The Autumn of the Patriach* is in the first line of the file 'marquez_works.txt'. 

In [None]:
first_line = 'Leaf Storm\n'

And we decided to do the following to accomplish this task:

1. Compare the value of first_line with the string 'The Autumn of the Patriach'. If they are the same, then we have our answer. Otherwise, if they are not the same, just pass and continue with the next item. 

Implement the above in the code box below. Code for removing the '\n' at the end of first_line has been given to you. Don't worry about moving onto the next item just yet - if the they are not the same, just put **pass**. If they are the same, print the string 'Bazinga!'. 

In [None]:
# Solution - nothing should be printed
# Code to remove the '\n' at the end of the string first_line 
first_line = first_line.strip()
if first_line == 'The Autumn of the Patriach':
    print('Bazinga!')
else:
    pass

You may be wondering why in our previous example we even have an **else** in our **if** statement, if we're not doing anything there. The truth is, we don't actually need it! We could have written the example above as  

In [None]:
title = 'Chronicle of a Death Foretold'
title_list = ['One Hundred Years of Solitude', 'Chronicle of a Death Foretold', 'Love in the Time of Cholera']
if title in title_list:
    print("I read", title)
    print("... after school finished.")

On the other hand, if you have more than one condition to check, **if** statements can also check for that. Notice that the conditions are evaluated in *sequential order*:

In [None]:
# Code to select a random title from title_list - don't worry about this (for now)! 
import random
random_title = random.choice(title_list)

if random_title == 'Chronicle of a Death Foretold':
    print(1981)
elif random_title == 'One Hundred Years of Solitude':
    print(1967)
elif random_title == 'Love in the Time of Cholera':
    print(1985)
else:
    print('Something funny happened!')

**Exercise:** Reorder the print statements of the following **if** statement so that it prints the string 'Mario Vargas Llosa':

*Note*: The original question asked to reorder the *conditions* of the if statement. This is an error. It should have said: "Reorder the *print statements* of the following if statement ...". 

In [None]:
# Original question
if 19 // 3 == 6.33:
    print('Jose Emilio Pacheco')
elif 1967 not in [1958, 1987, 1965, 1976]:
    print('Derek Bickerton')
elif 17.5 + 15.5 == 33.0:
    print('Mario Vargas Llosa')
else:
    print('Ray Jackendoff')

In [None]:
# Solution
if 19 // 3 == 6.33:
    print('Jose Emilio Pacheco')
elif 1967 not in [1958, 1987, 1965, 1976]:
    print('Mario Vargas Llosa')
elif 17.5 + 15.5 == 33.0:
    print('Derek Bickerton')
else:
    print('Ray Jackendoff')

## Dictionaries

Recall the code we had for our if statement that checks for multiple conditions:

In [None]:
# Code to select a random title from title_list - don't worry about this (for now)! 
import random
random_title = random.choice(title_list)

if random_title == 'Chronicle of a Death Foretold':
    print(1981)
elif random_title == 'One Hundred Years of Solitude':
    print(1967)
elif random_title == 'Love in the Time of Cholera':
    print(1985)
else:
    print('Something funny happened!')

We can see here that for each title, we have printed a specific number with it (in this case, the year of its publication). In a way, we can say that each title is **associated** with its year of publication. 

The problem with this approach is that it is implicit. Could there perhaps be a more explicit way to associate each title with its year of publication in Python?

Yes, through a **data structure** called **dictionaries**. 

When we think of a dictionary, we think of words associated with definitions:

"book": "physical objects consisting of a number of pages bound together"<br>
"sword": "a cutting or thrusting weapon that has a long metal blade"<br>
"pie": "dish baked in pastry-lined pan often with a pastry top"<br>

In Python, dictionaries work in a similar way. Each "word" (or **key**) is associated with a "definition" (or **value**). For most text processing tasks, keys are usually strings and values are usually other strings or numbers. 

For our book titles example, this is what a dictionary would look like:

In [None]:
title_and_year = {}
title_and_year['Chronicle of a Death Foretold'] = 1981
title_and_year['One Hundred Years of Solitude'] = 1967
title_and_year['Love in the Time of Cholera'] = 1985
print(title_and_year)

We first start by creating an "empty" dictionary. We then associate the key 'Chronicle of a Death Foretold' (in the square brackets) with the value 1981. We then do the same for the other titles and the years associated with them. 

We could alternatively have built our dictionary like this:

In [None]:
title_and_year = {
    'Chronicle of a Death Foretold' : 1981,
    'One Hundred Years of Solitude' : 1967,
    'Love in the Time of Cholera' : 1985
}
print(title_and_year)

Python allows us to inspect the values of a key in a dictionary easily: 

In [None]:
print(title_and_year['Chronicle of a Death Foretold'])
print(title_and_year['One Hundred Years of Solitude'])
print(title_and_year['Love in the Time of Cholera'])

Python also allows us to change the values of a key easily as well:

In [None]:
title_and_year['One Hundred Years of Solitude'] = 1970
print(title_and_year)

And we can also add values to our dictionary with ease:

In [None]:
title_and_year['Leaf Storm'] = 1955
print(title_and_year)

**Exercise:** Modify the code we created for the if statement with multiple conditions so that it uses dictionaries to print values instead of printing the values directly. 

In [None]:
import random
random_title = random.choice(title_list)

# Solution 1 
if random_title == 'Chronicle of a Death Foretold':
    year = title_and_year[random_title]
    print(year)
elif random_title == 'One Hundred Years of Solitude':
    year = title_and_year[random_title]
    print(year)
elif random_title == 'Love in the Time of Cholera':
    year = title_and_year[random_title]
    print(year)
else:
    print('Something funny happened!')

In [None]:
import random
random_title = random.choice(title_list)

# Solution 2 
year = title_and_year[random_title]

if random_title == 'Chronicle of a Death Foretold':
    print(year)
elif random_title == 'One Hundred Years of Solitude':
    print(year)
elif random_title == 'Love in the Time of Cholera':
    print(year)
else:
    print('Something funny happened!')

We can also use a for loop to show all of the elements in a dictionary, much like we did with a list in the previous class. Well, almost - we need to apply the items( ) command, specific to dictionaries: 

In [None]:
for key,value in title_and_year.items():
    print(key,value)

**Exercise:** For each title in our title_and_year dictionary, increase its value by 10. So by the end, if you print title_and_year you should get: 

In [None]:
# do not run this code
{'Chronicle of a Death Foretold': 1991, 
 'One Hundred Years of Solitude': 1980, 
 'Love in the Time of Cholera': 1995, 
 'Leaf Storm': 1965}

In [None]:
# Solution

for key,value in title_and_year.items():
    title_and_year[key] = value + 10

print(title_and_year)

Another important task we often perform when processing text is determining whether a key is in a dictionary. For example, if we wanted to ask whether 'Eyes of a Blue Dog' is in our title_and_year dictionary, we could do the following:

In [None]:
'Eyes of a Blue Dog' in title_and_year

**Exercise:** Print the word count for each word that appears in 'page_1_hundred_years.txt'. To do this, you will need to do the following: 
1. Initialize a dictionary that will keep track of our word counts. So here our **keys** will be words, and our **values** will be numbers corresponding to the word counts. 
2. Open the file using the framework we used above. 
3. For each line in the file, do the following:<br>
    3.1. Each line is one big string that has words separated by spaces. To get rid of the spaces, and to gather all the words of the line into a list, apply the split( ) command on the line  (e.g. word_list = line.split( ) ). <br>
    3.2. For each word in the word list, add it to the dictionary and increase its count. Be careful of when you add a word for the first time to the dictionary, its count is 1. *Hint*: use an **if** statement that checks if a word is in the dictionary already or not. 
4. Loop over the dictionary and print its keys and values. 

In [None]:
# Solution 

# Step 1 - initialize dictionary 
word_counts = {}

# Step 2 - open file using framework 
filename = 'page_1_hundred_years.txt'
with open(filename) as text_file:
    for line in text_file:
        line = line.strip()
        # Step 3.1 - remove the white spaces and gather all the words in a list
        # using the split() command
        word_list = line.split()
        # Step 3.2 - for each word in the word list, add the word as a key to the 
        # dictionary and increase its count 
        for word in word_list:
            if word in word_counts:
                # Notice that we can only increase the count of a word that is already
                # in the dictionary, this is, its count has been initialized already. 
                # Otherwise, the right hand side of the assignment below will give us 
                # an error, since word_counts[word] doesn't have a value yet, and we can't
                # add something to a variable that doesn't have a value! 
                word_counts[word] = word_counts[word] + 1
            else:
                # We initialize the word count of word to 1 if it is not in the dictionary, 
                # this is, we are considering it for the first time. This is like 
                # initializing our counter variable, like we did when we calculated the 
                # length of a list 
                word_counts[word] = 1

# Step 4 - Loop over the dictionary and print its keys and values
for key,value in word_counts.items():
    print(key,":",value)

# Works Cited

Hovy, Dirk. "Python Programming for Linguists - A Gentle Introduction." USC. pp. 15-19. 

Karsdorp, Folgert and Maarten van Gompel. "Chapter 1: Getting started." *Python Programming for the Humanities*. KNAW Meertens Institute. 