# Learning to Program: Functions, More on Strings and Lists

*"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."*<br>*- Martin Fowler*

## Functions

Recall the code we created to calculate the length of a list from our first class:

In [None]:
title_list = ['Love in the Time of Cholera', '100 Years of Solitude', 'Chronicle of a Death Foretold']
title_list_count = 0
for title in title_list:
    title_list_count = title_list_count + 1
title_list_count

This works fine and does what it is supposed to do. But what if we wanted to calculate the length of another list? We would just copy and paste the code we have in our previous box and simply change the value of the variable title_list:

In [None]:
title_list = ["A User's Guide to Thought and Meaning", 'Foundations of Language']
title_list_count = 0
for title in title_list:
    title_list_count = title_list_count + 1
title_list_count

But what if we wanted to calculate the length of *another* list? We could repeat this code again, and again. 

The end result? Our Jupyter notebook is entirely covered by snippets of code which does esentially the same thing over and over.

Is there a smarter way to go about doing this? Could there be a way we could re-use code over and over again and only need to write it once?

Python allows us to do this with **functions**. 

A function for the task above (calculating the length of a list) would look like:

In [None]:
def list_length(t_list):
    title_list_count = 0
    for title in t_list:
        title_list_count = title_list_count + 1
    print(title_list_count)

There a several parts to a function. Let's take a look at them:

1. **def list_length(title_list)**: This is called the **function signature**. It contains three parts: 
- The **def** keyword, which simply lets Python know that we are defining a function. 
- The **name** of the function (here it's list_length, but it can be any sequence of characters that follow the same restrictions as naming Python variables) 
- The **parameters** of the function. Here the parameter is **title_list**, a variable name. 

You may have noticed that **t_list** is not defined anywhere: we haven't assigned it a value! However, this is ok: here the parameter is just a name for the variables we change in our repeated snippets of code. Since here we only change the value of the variable title_list from snippet to snippet, we only need to create one parameter, namely t_list. 

You may also be asking: why didn't we use the name "title_list" to name our parameter? Why go through the trouble of renaming it "t_list"? We do this since we already defined the variable title_list above, and it is good programming practice to keep definitions separate.  If we hadn't defined title_list, we could have absolutely used this name as a parameter. 

- The **body** of the function: contains the code of the task we want to perform (in this case, the code we use to calculate the length of a list). It is always indented with respect to the other parts of the function. 

- **print(title_list_count)**: prints the final result of our list_length task. We can't we simply write the name of the variable, like we did before? It turns out that Python doesn't know what to do with a variable name at the end of a function, so it doesn't allow it. That is why we need to print the value of the variable. 

Now for the million dollar question: how can we actually use this function on the different lists we have?

Ideally we need something like (for the first list):

In [None]:
t_list = ["A User's Guide to Thought and Meaning", 'Foundations of Language']
title_list_count = 0
    for title in t_list:
        title_list_count = title_list_count + 1
print(title_list_count)

It turns out that the following code is equivalent:

In [None]:
list_length(["A User's Guide to Thought and Meaning", 'Foundations of Language'])

This is called a **function call**. It has two parts: 
- The name of the function (list_length)
- The arguments of the function (here the list ["A User's Guide to Thought and Meaning", 'Foundations of Language'])

A function call 'activates' a function. Here is what happens during a function call:
1. Each argument is assigned as a value to a parameter of the function. Recall from above that the parameter to our list_length function was   

In [None]:
### Do not run this code
t_list

and our argument in our function call is 

In [None]:
### Do not run this code
["A User's Guide to Thought and Meaning", 'Foundations of Language']

So what essentially happens is the parameter t_list is assigned the value of the argument passed

In [None]:
### Do not run this code
t_list = ["A User's Guide to Thought and Meaning", 'Foundations of Language']

**inside our function**. Notice that t_list would **not** be available outside of the function list_length. 

2. The function's body (the indented code) is executed with the parameters assigned. So for our example we would get:

In [None]:
### Do not run this code
t_list = ["A User's Guide to Thought and Meaning", 'Foundations of Language']
title_list_count = 0
    for title in t_list:
        title_list_count = title_list_count + 1
print(title_list_count)

Let's verify that this is indeed the case: 

In [None]:
# Function definition
def list_length(t_list):
    # Added a print statement here to show that t_list gets assigned its value from the 
    # argument to the function call
    #print("The value of t_list is",t_list)
    title_list_count = 0
    for title in t_list:
        title_list_count = title_list_count + 1
    print(title_list_count)
    
# Function call    
list_length(["A User's Guide to Thought and Meaning", 'Foundations of Language'])

**Exercise:** Write a function called print_file that prints the contents of a file, and then call that function with the file name 'five_grams.csv'. 

In [None]:
# delete this and type your answer here. 

Now let's suppose that after applying our list_length function we wanted to store the result in a variable. We should just do a variable assignment, right?

In [None]:
list1 = ["A User's Guide to Thought and Meaning", 'Foundations of Language']
len_list1 = list_length(list1)
print(len_list1)

Wait, *what*? 

Why do we get this strange value "None" instead of the number 2? 

The reason for this is that when we print a variable, we are just *inspecting* the contents of the container that is labelled with that variable name. So in the case of our list_length function, the line 

In [None]:
### do not run this code
print(title_list_count)

just inspects the value of title_list_count, but doesn't copy it for further use. If we want an actual "copy" of the contents of the variable name, we need to tell Python to do so. We can do this with the **return** keyword. 

In [None]:
def list_length(t_list):
    title_list_count = 0
    for title in t_list:
        title_list_count = title_list_count + 1
    return title_list_count

Now, when we call this function, we'll get a value that we can assign to a variable. 

In [None]:
list1 = ["A User's Guide to Thought and Meaning", 'Foundations of Language']
len_list1 = list_length(list1)
print(len_list1)

Note that the value of the variable that is returned in a function doesn't necessarily have to be an integer - it can be any one of the types of values we have seen so far (string, float, list - even a dictionary!). 

**Exercise:** Recall the dictionary we created that associated a book title with its year of publication: 

In [1]:
title_and_year = {}
title_and_year['Chronicle of a Death Foretold'] = 1981
title_and_year['One Hundred Years of Solitude'] = 1967
title_and_year['Love in the Time of Cholera'] = 1985
print(title_and_year)

{'Chronicle of a Death Foretold': 1981, 'One Hundred Years of Solitude': 1967, 'Love in the Time of Cholera': 1985}


Create a function that takes in as a parameter a book title and returns the year of publication. If the book title taken in as a parameter is not in the dictionary, return 0. Check that your function works by running the function calls below. 

In [2]:
# Solution 1
def publication_year(title):
    if title in title_and_year:
        pub_year = title_and_year[title]
        return pub_year
    else:
        return 0

# Function calls
chronicle_pub_year = publication_year('Chronicle of a Death Foretold')
aspects_pub_year = publication_year('Aspects of the Theory of Syntax')
print(chronicle_pub_year)
print(aspects_pub_year)

1981
0


In [3]:
# Solution 2
def publication_year(title):
    value_to_return = 0
    if title in title_and_year:
        value_to_return = title_and_year[title]
    return value_to_return

# Function calls
chronicle_pub_year = publication_year('Chronicle of a Death Foretold')
aspects_pub_year = publication_year('Aspects of the Theory of Syntax')
print(chronicle_pub_year)
print(aspects_pub_year)

1981
0


*Side note*: Recall our framework for reading a file from the previous class:

In [None]:
# No need to run this code
with open('marquez_works.txt') as text_file:
    for line in text_file:
        line = line.strip()
        # code to perform with each line

You may have been confused by the line

In [None]:
line = line.strip()

But in fact, we can see this now for what it is: it's a function call! Some types - such as lists, dictionaries and strings - have *built-in functions*: these are functions that were made by the creators of Python which encapsulate common tasks, such as calculating the length of a list, listing the items of a dictionary, or removing the endline character ('\n'), like the strip( ) function. We'll take a deeper look at this in the next section. 

For those keen observers, you may have noticed a slight change in terminology: what I referred to before as **commands** are in fact **function calls**. From this point on I will refer to commands as what they are, function calls. 

Notice that a function doesn't necessarily have to have only one parameter. It can take in no parameters (as with the strip( ) function), as well as two or more parameters. If the function takes in two or more parameters, each parameter is separated by a comma, as in the example below:

In [None]:
def add_two_numbers(number1, number2):
    number_sum = number1 + number2
    return number_sum

s = add_two_numbers(3,5)
print(s)

**Exercise:** Write a function that takes as parameter a list of strings and a string and returns True if the string is in the list of strings, and False otherwise. 

In [4]:
# Solution 1
def in_list(list_of_strings, s):
    if s in list_of_strings:
        return True
    else:
        return False

list1 = ["A User's Guide to Thought and Meaning", 'Foundations of Language']
s1 = "A User's Guide to Thought and Meaning"
result = in_list(list1, s1)
print(result)

s2 = "Curious George"
result = in_list(list1, s2)
print(result)

True
False


In [5]:
# Solution 2
def in_list(list_of_strings, s):
    value_to_return = s in list_of_strings
    return value_to_return

list1 = ["A User's Guide to Thought and Meaning", 'Foundations of Language']
s1 = "A User's Guide to Thought and Meaning"
result = in_list(list1, s1)
print(result)

s2 = "Curious George"
result = in_list(list1, s2)
print(result)

True
False


Let's sum up what we have learned about functions so far:

1. A function is made up of the **function signature** and the **function body**. 
2. The **function signature** consists of the **function name** and the **parameters**. Function names follow the same rules as variable names, as do the names of the parameters. A function can have zero, one, or two or more parameters. If it has two or more parameters, they are separated by commas. 
3. The **function body** is the **indented sequence of commands** that the function executes when it is called. It may or may not return a value. If it returns a value, the **return** keyword must be used. 
4. A **function call** executes the sequence of commands of a function. To call a function, you need its name and its arguments. The number of arguments of call must be the same as the number of parameters of the function. When a function is called, each argument is assigned as a value to its corresponding parameter - it's like a variable assignment occuring inside the function body. 

## More on Lists

Recall what we learned about lists in our initial class:

1. A list is a type of value in Python 
2. A list allows us to group values together. 
3. A list is created by comma-separated values in square brackets (['one','two','three']). 
4. The order of the elements in a list matters. 

Up to this point we have assumed that lists are built beforehand and cannot be modified afterwards. But for most of our text processing tasks, we are going to need to modify our lists somehow. For instance, we will need to add elements to our list, or delete elements from it, even put two lists together. Python makes this easy. 

### Adding an element to a list

Suppose we have a list of nouns we want to analyze:

In [7]:
noun_list = ['champions', 'Raptors', 'NBA']
print(noun_list)

['champions', 'Raptors', 'NBA']


What if we wanted to add the elements 'Warriors', 'losers' to the **end of the list**? How could we do this?

One way of doing this would be to create a new list and manually add those elements: 

In [None]:
noun_list_extended = ['champions', 'Raptors', 'NBA', 'Warriors', 'losers']
print(noun_list_extended)

But then the first list is still lying around and we don't really need it anymore. What we really want is to modify the first list in place so we don't have to create a second list at all. We can do this with a call to the built-in list append function.  

In [None]:
noun_list.append('Warriors')
print(noun_list)

noun_list.append('losers')
print(noun_list)

Notice here that since the list is modified in place, we don't need to assign it as a value to any variable. No value is produced as a result of this operation - only the original list is modified. 

**Exercise:** Write a function that takes in as parameter a list of words (a list of strings) and a word (a string), and only adds the element to the end of the list if the number of letters in the word is equal to 4. A few things to keep in mind while writing this function:
1. Recall that you can calculate the number of characters in a string using the built-in **len( )** function:

In [None]:
word = 'hello'
number_characters = len(word)
print(word)

2. Think about whether you really need to return anything in your function. Remember, the append function modifies lists in place, even those that are passed through functions! 
3. An if statement might be handy here. 

In [6]:
# Solution
def add_four_letter_words_only(word_list, word):
    if len(word) == 4:
        word_list.append(word)

wl = ['pear','buns','jobs','time']

add_four_letter_words_only(wl, 'bird')
# Should print ['pear','buns','jobs','time', 'bird']
print(wl)
add_four_letter_words_only(wl, 'pizza')
# Should print ['pear','buns','jobs','time', 'bird']
print(wl)

['pear', 'buns', 'jobs', 'time', 'bird']
['pear', 'buns', 'jobs', 'time', 'bird']


### Adding two lists together

Let's say you have two lists, one containing a list of shape adjectives and another one containing size adjectives:

In [10]:
shape_adjectives = ['round','square','triangular']
size_adjectives = ['huge','small','tall','thin','thick']

Joining these lists together to create a large list of adjectives is easy in Python:

In [11]:
adjectives = shape_adjectives + size_adjectives
print(adjectives)

['round', 'square', 'triangular', 'huge', 'small', 'tall', 'thin', 'thick']


This operation (**+**) is called **concatenation**. 

Notice that unlike the append function, the concatenation operator:
- Does produce a new value (a new list, which can be assigned to a variable).  
- Does **not** modify the original lists.  

**Exercise**: Append the adjetive list and the noun list we created previously. Append the newline character ("\n") to this new list.  

In [12]:
# Solution
nouns_and_adjectives = noun_list + adjectives
nouns_and_adjectives.append("\n")
print(nouns_and_adjectives)

['champions', 'Raptors', 'NBA', 'round', 'square', 'triangular', 'huge', 'small', 'tall', 'thin', 'thick', '\n']


### Deleting an element from a list

Before we see how to delete an element from a list, it would be wise to first see how Python  numbers elements in a list. 

In programming, unlike in real life, lists are numbered starting at 0. 

Therefore, for our adjective list:

In [None]:
adjectives = shape_adjectives + size_adjectives
print(adjectives)

'round' is the **0th** element of the list, 'square' is the **1st** element of the list, 'triangular' is the **2nd** element of the list, and so on. Another way of saying this is that the **index** of 'round' is 0, the index of 'square' is 1, etc. 

In Python, deleting an element is usually done by using its index, using the **pop** function. For instance, if we wanted to delete 'triangular' (the 2nd element) from our list of adjectives, we would do the following:

In [None]:
adjectives.pop(2)
print(adjectives)

Be careful to use a valid index (from 0 to len(list)-1) when using pop(). Otherwise, you'll get an error. 

In addition, notice that in Python we can retrieve an element of a list by its index in square brackets:

In [None]:
adjective0 = adjectives[0]
adjective1 = adjectives[1]
adjective4 = adjectives[4]
print(adjective0)
print(adjective1)
print(adjective4)

**Exercise:** Write a function that, given a list, deletes its last element, and returns the new last element in the list. You may assume that the list has at least two elements.  

In [13]:
# Solution 
def delete_last_element(word_list):
    # delete last element
    last_index = len(word_list) - 1
    word_list.pop(last_index)
    # return new last element
    new_last_index = len(word_list) - 1
    new_last_element = word_list[new_last_index]
    return new_last_element

new_last_element = delete_last_element(['magazine','game','marathon'])
# Should print 'game'
print(new_last_element)

game


### Get part of a list 

In our text processing tasks, we'll sometimes need only a certain portion of a list. Python offers some pretty cool ways to get only certain parts of a list, based on the **indexes** of the elements of the list, through the **slicing** operator. The slicing operator in Python consists of a semicolon ( **:** ), as used below:  

In [None]:
# do not run this code
words[start:stop] # retrieves all items of list words from index start to index stop-1
words[start:] # retrieves all items starting from index start to the end of the list 
words[:stop] # retrieves all items starting at the beginning of the list to index stop-1

Let's see how this looks like in the context of our adjectives list:

In [None]:
print("Original adjective list:", adjectives)

# 1st to 3rd adjectives
adj_1_to_3 = adjectives[1:4]
print("1st to 3rd items:", adj_1_to_3)

# All adjectives except for the 0th one and 1st one 
all_except_first_2 = adjectives[2:]
print("Everything except the first two:", all_except_first_2)

# all adjectives except for the last one 
last_index = len(adjectives)-1
all_except_last_one = adjectives[:last_index]
print("Everything except for the last one:", all_except_last_one)

Notice that the start and stop indexes could also be negative indexes (what?!). This means that the slicing mechanism starts counting from the end instead of the beginning, like we have been doing up to this point:

![indexes.png](attachment:indexes.png)

In [14]:
print("Original adjective list:", adjectives)

last_item = adjectives[-1] # last item in the list 
print("Last element in the list:", last_item)

last_2_items = adjectives[-2:] # last 2 items in the list
print("Last 2 elements in the list:", last_2_items)

all_except_last_two = adjectives[:-2] # everything except the last 2 items
print("Everything except for the last two:", all_except_last_two)

Original adjective list: ['round', 'square', 'triangular', 'huge', 'small', 'tall', 'thin', 'thick']
Last element in the list: thick
Last 2 elements in the list: ['thin', 'thick']
Everything except for the last two: ['round', 'square', 'triangular', 'huge', 'small', 'tall']


**Exercise:** Write a function that given a list, returns a list that contains the first two elements and the last two elements of the list. Do not modify the original list. 

In [15]:
# Solution
def first_two_and_last_two(word_list):
    first_two = word_list[:2]
    last_two = word_list[-2:]
    result = first_two + last_two
    return result

elements = first_two_and_last_two(['magazine','game','marathon','book','phone','jacket'])
# Should print ['magazine','game','phone','jacket']
print(elements)

['magazine', 'game', 'phone', 'jacket']


Let's sum up what we have learned about additional list operations:

1. We can **add** an element to a list using the **append** function. 
2. We can **add two lists together to create a new list** using the **concatenation ( + )** operator. 
3. Lists are numbered (or **indexed**) starting at 0. 
4. We can **delete** an element from a list using the **pop** function, passing in the index of the element we wish to remove. 
5. We can **retrieve a single element** from a list using the index of the element in square brackets. 
6. We can **retrieve parts of a list** using the **slicing ( : )** operator. We can use positive and negative indices for this.  

### More on strings

Recall what we learned about strings in the first class:

1. Strings are sequences of characters. 
2. They can be enclosed with single quotes or double quotes. 
3. Single quotes can be used inside double quotes and vice versa. If we need to use both, to avoid confusion we can apply the escape ( \ ) character on the inner quotes.

Now we can add the following fact to this list: 

If we consider a **character** as a mini-string that contains a letter or symbol enclosed in single quotes (like 'a'), then a **string is actually a list of characters, with the caveat that this list cannot be modified**. 

In other words, we cannot **add** or **delete** elements from a string. We can, however: 
- **Add two strings together to create a new string** 
- **Retrieve a single character** from the string using the index of the character in square brackets. 
- **Retrieve parts of a string** using the **slicing** operator.  

In [None]:
# Add two strings together to create a new string
topping = "pepperoni "
pizza = "pizza"
pizza_with_topping = topping + pizza
print(pizza_with_topping)

# Retrieve a single character from a string
first_letter = topping[0]
print(first_letter)

# Retrieve part of a string
topping = len(topping)
pizza_without_topping = pizza_with_topping[topping:]
print(pizza_without_topping)

**Exercise:** Suppose you have a list of words that are tagged as nouns or verbs, as so: 

In [None]:
word_list = ['jump_VERB','television_NOUN','eat_VERB','book_NOUN','cook_VERB']

Write a function that takes in a word list as above and returns a list with the tags removed. For the above example, the returned list would look like:

In [None]:
word_list_without_tags = ['jump','television','eat','book','cook']

For this exercise, at some point you'll have to use:
- List append 
- String splicing

In [16]:
# Solution
def remove_tags(words):
    new_list = []
    
    for word in words_with_tags:
        word_without_tag = word[:-5]
        new_list.append(word_without_tag)
        
    return new_list

words_with_tags = ['jump_VERB','television_NOUN','eat_VERB','book_NOUN','cook_VERB']
words_without_tags = remove_tags(words_with_tags)
# Should print ['jump','television','eat','book','cook']
print(words_without_tags)

['jump', 'television', 'eat', 'book', 'cook']


One last thing about strings that comes up often in text processing: let's say we have a big string made up of words that are separated by spaces, like this one:

In [None]:
a_big_string = "there is nothing wrong with big strings"

It would be convenient if we had a way of separating this big string into the words that make it up. The **split( )** function allows us to do this: splitting the string into a list of smaller strings.  

In [None]:
words = a_big_string.split()
print(words)

What's even more useful about the split( ) function is that it is not limited to spaces. We can also split strings that are composed of words separated by commas, by simply passing in the comma string as a parameter to the split( ) function.

In [None]:
comma_separated_string = "there,are,never,too,many,books,on,my,shelf"
words = comma_separated_string.split(',')
print(words)

**Exercise:** Recall the print_file function we implemented at the beginning of this class. Implement a function that instead of printing each line of five_grams.csv, prints the list of words that make up each line. 

In [17]:
# Solution
def print_words(filename):
    with open(filename) as text_file:
        for line in text_file:
            line = line.strip()
            words = line.split(',')
            print(words)

print_words('five_grams.csv')

['a', 'starting', 'point', 'for', 'the']
['at', 'the', 'same', 'rate', 'as']
['based', 'on', 'the', 'study', 'of']
['be', 'returned', 'on', 'or', 'before']
['by', 'the', 'library', 'rules', 'or']
['by', 'and', 'with', 'the', 'advice']
['draws', 'attention', 'to', 'its', 'status']
['due', 'on', 'the', 'latest', 'date']
['European', 'Journal', 'of', 'Industrial', 'Relations']
['even', 'in', 'the', 'presence', 'of']


### Works Cited
Hegwill, Greg. "Understanding slice notation". *Stack Overflow*, February 3rd, 2019. 