# Chapter 2: Collections

-------------------------------

## Methods

Consider the sentence below:

In [38]:
sentence = "Python's name is derived from the television series Monty Python's Flying Circus."

Words are made up of characters, and so are strings in Python, like the string stored in the variable `sentence` in the block above. For the sentence above, it might seem more natural for humans to describe it as a series of words, rather than as a series of characters. Say we want to access the first word in our sentence. If we type in:

In [2]:
first_word = sentence[0]
print(first_word)

P


Python only prints the first *character* of our sentence. (Think about this if you do not understand why.) We can transform our sentence into a `list` of words (represented by strings) using the `split()` function as follows: 

In [40]:
words = sentence.split()
print(words)

["Python's", 'name', 'is', 'derived', 'from', 'the', 'television', 'series', 'Monty', "Python's", 'Flying', 'Circus.']


Make sure that you understand the syntax of this code! We call things like `split()` 'functions': functions provide small pieces of helpful, ready-made functionality that we can use in our own code. Here, we apply the `split()` function to the variable `sentence` and we assign the result of the function (we call this the 'return value' of the function) to the new variable `words`.

By default, the `split()` function in Python will split strings on whitespace between consecutive words and it will return a list of words. However, we can pass an argument to `split()` that explicitly specifies the string we would like to split on. In the code block below, we will split a string on commas, instead of spaces. Do you get the syntax?

In [None]:
fruitstring = "banana,pear,apple"
fruitlist = fruitstring.split(",")
print(fruitlist)

The reverse of the `split()` function can be accomplished with `join()`, it turns a list into a string, with a specific 'delimiter' or the string you want to use to join the items.

In [29]:
print(",".join(['banana', 'pear', 'apple']))

banana,pear,apple


The above four lines can be accomplished in a single line if code, can you figure out how? (Tip: replace all variables by their values)

In [32]:
# insert your oneliner here!
sentence = "Python's name is derived from the television series Monty Python's Flying Circus."
name = "Bart"
words = sentence.split("Python")
print(words)
# print(name.join(words))


['', "'s name is derived from the television series Monty ", "'s Flying Circus."]


As you can see, as a delimiter we now used a string containing both a comma and a space, instead of just a comma. By using a different delimiter than the one we used to split the original string, we have basically implemented the same functionality as what you can achieve with a search and replace operation in e.g. Microsoft Word.

However, Python also provides an easier solution.

### replace()

The `replace()` function is a function which can be called on a string. It will replace all occurrences of a specified substring with another string. Consider the lines in the code block below - and mind the order in which you pass the arguments to the function!

In [34]:
text = "You can not compare apples and pears"
text = text.replace("pears", "apples")
text = text.replace("not ", "")
print(text)

You can compare apples and apples


## DIY

In [35]:
text = "Research has shown that it is often still possible to understand text even if all vowels are removed"
# insert your code here.. I suppose it's obvious what we want you to do ;-)
text = text.replace("a", "")
print(text)
text = text.replace("e", "")
print(text)
text = text.replace("i", "")
print(text)
text = text.replace("o", "")
print(text)
text = text.replace("u", "")
print(text)

Reserch hs shown tht it is often still possible to understnd text even if ll vowels re removed
Rsrch hs shown tht it is oftn still possibl to undrstnd txt vn if ll vowls r rmovd
Rsrch hs shown tht t s oftn stll possbl to undrstnd txt vn f ll vowls r rmovd
Rsrch hs shwn tht t s ftn stll pssbl t undrstnd txt vn f ll vwls r rmvd
Rsrch hs shwn tht t s ftn stll pssbl t ndrstnd txt vn f ll vwls r rmvd


Python also has functions for changing the case of a string. `lower()` converts a string to lowercase characters and `upper()` returns an uppercased version:

In [36]:
my_string = "AllCaps"
print(my_string)
my_string_upper = my_string.upper()
print(my_string_upper)
my_string_lower = my_string.lower()
print(my_string_lower)
my_string_capped = my_string.capitalize()
print(my_string_capped)

AllCaps
ALLCAPS
allcaps
Allcaps


## DIY

-  Can you come up with your own sentence `my_sentence` and split it into words along spaces? Print the new list of words.
-  Is there a difference in length between the variables `sentence` and `words`? (Use functions to find this out!)

In [None]:
# your DIY code goes here...

You can recognize functions because they are always followed by (round brackets). Apart from the `split()` function, we already encountered other functions, also in the previous chapter. Can you think of some and describe their functionality? Do you notice any differences in terms of syntax when you compare these to how `split()` is used?

Answer: examples of other functions include `len()` and `print()`. Like `split()`, they can take arguments between the round brackets. In the case of `len()`, the argument is the object of which you want to know the length, for `print()` it can be any number of objects you want to print. The syntax of `split()` is different because it cannot be used as a standalone function. Instead it must always be called on a string object. It is attached to this object using dot notation: `string.split()`, so e.g.:
* `"a,b,c".split(",")`
* `variable_containing_a_string.split(",")`

We call these object-specific functions *methods* (but we may continue to use the term *function* to refer to functions and methods alike). You cannot use `split()` on other types of objects, such as ints or lists. Other examples of string methods are `.lower()` and `.replace()`. Similarly, list methods are methods that can only be called on lists. One example is `.join()`.

---

## Lists

By introducing `split()` and `join()`, we have started using a new type of object: the *list*.

In many ways, lists are very similar to strings. We can for example access its components using indexes and we can use slice indexes to access parts of the list. Let's try this out.

Write a small program that defines a variable `first_word` and assign to it the first word of our word list `words` from above. Do the same for the fifth word, the last word and the penultimate word. Also, try to extract a slice from `words` and isolate the string of words between `derived` and `Flying` (the slice should not include `derived` and `Flying`). Also, make a slice of words that is identical to the title of the television series in `words`.   

In [57]:
# insert your code here
name = "Mr. White"
print(name[6:])
l = [1,2,3]
l.append(4)

words = sentence.split()
print(words)
first_word = words[0]
print(first_word)
slice = words[:10]
print(slice)

ite
["Python's", 'name', 'is', 'derived', 'from', 'the', 'television', 'series', 'Monty', "Python's", 'Flying', 'Circus.']
Python's
["Python's", 'name', 'is', 'derived', 'from', 'the', 'television', 'series', 'Monty', "Python's"]


A `list` acts like some kind of container in which we can store all kinds of information. We can access a list using indexes and slices. We can also add new items to a list. For that you use the `append()` method. Let's see how that works. Say we want to keep a list of all our good reads. We first declare an empty list using square brackets. Next, we add some books to the list:

In [93]:
# start with an empty list
good_reads = ["1984", "1985"]
list_of_vowels = ["a", "e", "i", "o", "u"]
print(list_of_vowels)
list_of_vowels = ["aeiou"]
empty_list = list()
print(list_of_vowels)

# print(good_reads)
# good_reads += "The Hunger Games"
# print(good_reads)
# # print(good_reads)
# # good_reads.append(["A Clockwork Orange", "1986"])
# # print(good_reads)

['a', 'e', 'i', 'o', 'u']
['aeiou']


As a reminder: do you get the syntax that goes with the `append()` method? The list we wish to append the item to goes first and we join `append()` to this list using a dot (`.`). In between the round brackets that go with the function name, we place the actual string that we wish to add to the list. We call such an input value an *argument* or a *parameter* that we *pass* to a function. Next, the function may return a *return value*, i.e. an object that we can print or assign to a variable (like in `books = ", ".join(good_reads)`, where the return value is a string).
Make sure that you are familiar with this terminology because you will often come across such terms when you look for help online!

Now, if for some reason we don't like a particular book anymore, we can replace it with a new book as follows, using the old book's index:

In [85]:
print(good_reads)
print(good_reads[1])
good_reads[1] = "Pride and Prejudice"
print(good_reads)

['1984', '1985']
1985
['1984', 'Pride and Prejudice']


As you see, it is no problem to reset or update an individual item in a list. This is different, however, for strings. Run the following code in which we try to change a single character in a string. This will raise an error: this is your computer signalling that something is wrong. This is because `strings` (and some other types) are *immutable*. That means that they cannot be changed using the index, as opposed to `list`s which *are* mutable.

In [82]:
name = list("Bonny")
name
name[2] = "l"
name
name[3] = "l"
print(repr(str(name)))
print("".join(name))
# name[2] = "X" # This would raise a TypeError
list_chars = list(name)
print(list_chars)
list_chars[2] = "X"
print(list_chars)
delimiter = ""
print(delimiter.join(list_chars))

"['B', 'o', 'l', 'l', 'y']"
Bolly
['B', 'o', 'l', 'l', 'y']
['B', 'o', 'X', 'l', 'y']
BoXly


## DIY

Here's another small DIY! Add two new titles to the list of `good_reads`. Then, try to change the title of the second book in our good reads collection:

In [98]:
# insert your code here
print(good_reads)
print(type(good_reads))
good_reads.append("1989")
good_reads.append("1990")
print(good_reads)

['1984', '1985', '1989', '1989', '1990', '1989', '1990', '1989', '1990']
<class 'list'>
['1984', '1985', '1989', '1989', '1990', '1989', '1990', '1989', '1990', '1989', '1990']


Lists are a really powerful way of dealing with your data in Python. Let's explore some other ways in which we can manipulate lists.

#### remove()

Let's assume our good reads collection has grown a lot and we would like to remove some of the books from the list. Python provides the function `remove()` that you can call on a list and which takes as argument the item we would like to remove.

In [99]:
good_reads = ["The Hunger Games", "A Clockwork Orange", 
             "Pride and Prejudice", "Water for Elephants", "Illias", "Water for Elephants", "Water for Elephants"]
print(good_reads)
good_reads.remove("Water for Elephants")
print(good_reads)
good_reads.remove("Water for Elephants")
print(good_reads)

['The Hunger Games', 'A Clockwork Orange', 'Pride and Prejudice', 'Water for Elephants', 'Illias', 'Water for Elephants', 'Water for Elephants']
['The Hunger Games', 'A Clockwork Orange', 'Pride and Prejudice', 'Illias', 'Water for Elephants', 'Water for Elephants']
['The Hunger Games', 'A Clockwork Orange', 'Pride and Prejudice', 'Illias', 'Water for Elephants']


If we try to remove a book that is not in our collection, Python raises an error to signal that something is wrong.

In [100]:
good_reads.remove("White Oleander")

ValueError: list.remove(x): x not in list

Note, however, that `remove()` will only delete the *first* item in the list that is identical to the argument which you passed to the function. Execute the code in the block below and you will see that only the first instance of "Pride and Prejudice" gets deleted.

In [None]:
good_reads = ["The Hunger Games", "A Clockwork Orange", 
             "Pride and Prejudice", "Water for Elephants", "Pride and Prejudice"]
good_reads.remove("Pride and Prejudice")
print(good_reads)

Just as with strings, we can concatenate two lists using the `+` operator. Here is an example:

In [101]:
# first we specify two lists of strings:
good_reads = ["A Clockwork Orange", 
              "Pride and Prejudice", "Water for Elephants",
              "The Shadow of the Wind", "Bel Canto"]
bad_reads = ["Fifty Shades of Grey", "Twilight", "The Hunger Games"]

# then we combine them
all_reads = good_reads + bad_reads
print(all_reads)

good_reads += bad_reads
print(good_reads)

['A Clockwork Orange', 'Pride and Prejudice', 'Water for Elephants', 'The Shadow of the Wind', 'Bel Canto', 'Fifty Shades of Grey', 'Twilight', 'The Hunger Games']
['A Clockwork Orange', 'Pride and Prejudice', 'Water for Elephants', 'The Shadow of the Wind', 'Bel Canto', 'Fifty Shades of Grey', 'Twilight', 'The Hunger Games']


#### sort() and sorted()

It is always nice to organise your bookshelf. We can sort our collection alphabetically with the following expressions:

In [104]:
sorted_reads = sorted(good_reads)
print(good_reads)
print(sorted_reads)

good_reads.reverse()
print(good_reads)

['A Clockwork Orange', 'Bel Canto', 'Fifty Shades of Grey', 'Pride and Prejudice', 'The Hunger Games', 'The Shadow of the Wind', 'Twilight', 'Water for Elephants']
['A Clockwork Orange', 'Bel Canto', 'Fifty Shades of Grey', 'Pride and Prejudice', 'The Hunger Games', 'The Shadow of the Wind', 'Twilight', 'Water for Elephants']
['Water for Elephants', 'Twilight', 'The Shadow of the Wind', 'The Hunger Games', 'Pride and Prejudice', 'Fifty Shades of Grey', 'Bel Canto', 'A Clockwork Orange']


Can you spot the difference between `sort()` and `sorted()`?
* `sorted()` is a function that can take an object (typically a list, but it works on strings too) and returns the sorted version, which you can assign to a variable. This does not change the original list!
* `.sort()` is a list method. It does not return anything; instead, it sorts the list in-place (altering it). There is no need for assigning to a new variable. In fact, if you do, this new variable will contain `None` (since `.sort()` does not return anything). Don't worry if this confuses you - it is confusing at first. If you run into bugs in the future, just remember that it may be caused by this difference in behaviour.

#### nested lists

Until now, our lists only consisted of strings. However, a list can contain all kinds of data types, such as integers and even lists! Do you understand what is happening in the following example? Have a close look at the square brackets used.

In [111]:
nested_list = [[1, 2, 3, 4], [5, 6, 7, 8]]

l = [1,2,3,4]
sublist = [5,6,7]
l.append(sublist)
print(l)
l.append("string")
print(l)

# print(nested_list[0])
# print(nested_list[0][2])
# print(nested_list[1][2])
# print(nested_list[0][2])

[1, 2, 3, 4, [5, 6, 7]]
[1, 2, 3, 4, [5, 6, 7], 'string']


We can put this to use to enhance our good reads collection with a score for every book we have. An entry in our collection will consist of a score within the range of `1` to `10` and the title of our book. The first element is the title, the second the score: `[title, score]`. We initialize an empty list, and add two books to it:

In [112]:
good_reads = []
good_reads.append(["Pride and Prejudice", 8])
good_reads.append(["A Clockwork Orange", 9])
print(good_reads)

[['Pride and Prejudice', 8], ['A Clockwork Orange', 9]]


## DIY

Update the `good_reads` collection with some of your own books and give them all a score and a publication year by nesting lists. Can you print out the score you gave to the first book in the list? And the publication year of the third item in your list? (Hint: you can pile up indexes using square brackets!)

In [117]:
# insert your code here
# print(good_reads)
# # good_reads[0].append(1884)
# good_reads[1].append(8909)
# print(good_reads)
good_reads.append(["Pride and Prejudice", 8, 1884])
good_reads.append(["A Clockwork Orange", 9, 9080])
print(good_reads)
print(good_reads[2][1])

[['Pride and Prejudice', 8, 1884], ['A Clockwork Orange', 9, 8909], ['Pride and Prejudice', 8, 1884], ['A Clockwork Orange', 9, 9080], ['Pride and Prejudice', 8, 1884], ['A Clockwork Orange', 9, 9080], ['Pride and Prejudice', 8, 1884], ['A Clockwork Orange', 9, 9080]]
8


##### What we have learnt

To finish this section, here is an overview of the new concepts you have learnt. Go through them and make sure you understand them all.

- function syntax 
- lists
- nested lists
- *mutable* versus *immutable*
- `.split()` vs. `.join()`
- `.append()`
- `.remove()`
- `.sort()`
- `.upper()` vs. `.lower()`

-------------

## Dictionaries

Our little good reads collection is starting to look quite impressive and we can perform all kinds of manipulations on it. Now, imagine that our list is large and we would like to look up the score we gave to a particular book. How are we going to find that book? For this purpose Python provides another more appropriate data structure, named `dictionary`, or `dict` for short. A `dictionary` is similar to the dictionaries you have at home. It consists of entries, or *keys*, that hold a value. Let's define one:

In [None]:
my_dict = {"book": "physical objects consisting of a number of pages bound together",
           "sword": "a cutting or thrusting weapon that has a long metal blade"}
print(my_dict)

Take a close look at the new syntax. Notice the curly brackets and the colons.

You can define an empty dictionary with `my_dict = {}` or `my_dict = dict()`.

To look up the value of a given key, we *index* the dictionary using that key (again, between square brackets):

In [None]:
description = my_dict["sword"]
print(description)

We can also add new entries, or update existing ones:

In [None]:
my_dict["pie"] = "dish baked in pastry-lined pan often with a pastry top"
print(my_dict)
my_dict["sword"] = "a pointy metal stick"
print(my_dict)

Like lists, dictionaries are mutable which means we can add and remove entries from it. Let's define an empty dictionary and add some books to it. The titles will be our keys and the scores their values.

In [None]:
good_reads = {}
good_reads["Pride and Prejudice"] = 8
good_reads["A Clockwork Orange"] = 9
print(good_reads["Pride and Prejudice"])

In a way, this is similar to what we have seen before when we altered our book `list`. There we indexed the list using a integer to access a particular book. Here we directly use the title of the book. Note that the keys in a dictionary must be unique: why would that be?

## DIY

Update the new good reads data structure with your own books. Try to print out the score you gave for one of the books which you added.

In [None]:
# put your code here

#### keys(), values()

To retrieve a list of all the books we have in our collection, we can ask the dictionary to return its keys as a list:

In [None]:
keys = good_reads.keys()
print(keys) # You can see that the dictionary method .keys() does not really return a list
keys = list(keys) # But we can cast the dict_keys object as a list
print(keys)

Similarly we can ask for the values:

In [None]:
print(list(good_reads.values()))

An important property of dictionaries that you should keep in mind, is that they have no concept of order. Unlike lists, which remember the order of the elements it contains, **dictionaries are unordered**. The reason for this is mainly technical (it makes dictionary lookup much much faster), and in practice it does not usually matter (because you will typically look up things in the dictionary instead of reading it from beginning to end). Just keep it in mind when you use the `.keys()` and `.values()` methods: the order in which they present the results is semi-random.

##### What we have learnt

To finish this section, here is an overview of the new concepts and functions you have learnt. Make sure you understand them all.

-  dictionary
-  indexing dictionaries and accessing values through their keys
-  adding items to a dictionary
-  `.keys()`
-  `.values()`

------------------------------------

## Final Exercises Chapter 2

Inspired by *Think Python* by Allen B. Downey (http://thinkpython.com), *Introduction to Programming Using Python* by Y. Liang (Pearson, 2013). Some exercises below have been taken from: http://www.ling.gu.se/~lager/python_exercises.html.

- Ex. 1: Consider the strings `sentence1 = "Brad and Angelina kick the bucket"` and `sentence2 = "Bonny and Clyde are really famous"`. Split these strings into words and create the following strings via list manipulation: `sentence3 = "Brad and Angelina are really famous"` and `sentence4="Bonny+and+Clyde+kick+the+bucket"` (mind the plus signs!). Can you print the middle letter of the fourth sentence?

In [None]:
# sentences

-  Ex. 2: Consider the `lookup` dictionary below. The following letters are still missing from it: `'k':'kilo', 'l':'lima', 'm':'mike'`. Add them to `lookup` one by one! Could you spell the word "marvellous" in code language now? Collect these codes into a list object called `msg`. Next, join the items in this list together with a comma and print the spelled out version!

In [None]:
# lookup code
lookup = {'a':'alfa', 'b':'bravo', 'c':'charlie', 'd':'delta', 'e':'echo',
          'f':'foxtrot', 'g':'golf', 'h':'hotel', 'i':'india', 'j':'juliett',
          'n':'november', 'o':'oscar', 'p':'papa', 'q':'quebec', 'r':'romeo',
          's':'sierra', 't':'tango', 'u':'uniform', 'v':'victor', 'w':'whiskey',
          'x':'x-ray', 'y':'yankee', 'z':'zulu'} # Don't change this line

-  Ex. 3: Collect the code terms in the lookup dict (`alpha`, `bravo`, ...) from the previous exercise into a list called `code_words`. Is this list alphabetically sorted? No? Then make sure that this list is sorted alphabetically. Now remove the items `victor`, `india` and `papa`. Append the words `pigeon` and `potato` at the end of this list. Combine this new list of items into a single string, using a semicolon as a delimiter and print this string. 

In [None]:
# follow-up lookup code

-----------------------------------------------------------------

You've reached the end of Chapter 2! Ignore the code block below -- it's only there to make the page prettier.

In [None]:
from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()