# Week 3 – Data Structures
## 3.1 Sequences and Lists
### Data From Data
On Engage we mentioned that in many programming languages (such as Java), a string is made up of multiple characters. Character is the primitive type, and string is a composite. Don't worry about the exact syntax right now, but you can even see this in the names of the types when we write Java code – which, requires us to declare the type of each variable:

```java
char my_character = 's'
String my_string = "string"
```

The type `char` is written with a lower case first letter, but the type `String` is written with an upper case one.

Enough about Java. In Python, strings *can* be broken down into characters, but each character is just a single letter string. The *object* you get as a result of *indexing* the string is itself another string:

In [1]:
text = "hello"
character = text[0]
type(text) == type(character)

True

Strings in Python are an example of a *sequence* data type. Sequences are *collections* – objects which “hold” more than one object. Sequences must also support certain operations, like indexing, **slicing** (e.g. `text[3:5]`), and so on.

But what if we don't just want to store characters? It's finally introduce a Python feature which may feel long overdue: the list.

### Lists
A **list** is an ordered collection of arbitrary objects. *Ordered* means that the items in the list are in some specific *order* (it does not mean *sorted*!). In other words, the *first* item of a list is always the same item and it is *before* the *second* item in the list. This means we can also index the list, and hence it is a *sequence* type data structure. 

Lists are written with square brackets. We can write a list containing the numbers from `1` to `5` like this:

In [2]:
[1, 2, 3, 4, 5]

[1, 2, 3, 4, 5]

Or a list containing just the number `31`:

In [3]:
[31]

[31]

Or a list containing the string `"Python"`:

In [4]:
["Python"]

['Python']

Or a list containing nothing:

In [5]:
[] 

[]

Or a list containing the number 31, the string `"Python"`, and a list containing the numbers `1` to `5` (yes, we can have lists inside lists!):

In [6]:
[31, "Python", [1, 2, 3, 4, 5]]

[31, 'Python', [1, 2, 3, 4, 5]]

Lists support the exact same sequence operations that you have already learned from strings, including finding the length with `len` and *indexing* the string using square brackets `[i]`.

In [7]:
my_list = [31, "Python", [1, 2, 3, 4, 5]]
print(f"The length of my list is {len(my_list)} and the first element is {my_list[0]}")

The length of my list is 3 and the first element is 31


***Task***: modify the code above to produce a *sub-list* in the same way were were able to produce a *substring*. In other words, take a *slice* of the list.

Unlike strings, lists are **mutable**, meaning that we can change them after they have been created. Specifically we can change their contents. 

Mutability is a really important concept, so let's make this clear. We can *always* reassign a variable:

In [8]:
text = "hello"
text = "goodbye"
print(text)

goodbye


The value of the variable named `text` changed from `"hello"` to `"goodbye"`. But we didn't *change the string* `"hello"`, we just replaced it. In the same way that if we wrote:

In [9]:
num = 10
num = 20
print(num)

20


We are not changing the value `10`, we're just replacing it with a different value `20`. `10` still exists and works just as we'd expect!

With strings, we in fact *cannot* change the values of the characters within the string itself. Suppose we try to assign something into the first position of the string, we will get an error:

In [10]:
text = "hello"
text[0] = "g"

TypeError: 'str' object does not support item assignment

However, with a list, the item assignment operation will work:

In [None]:
my_list = ["h", "e", "l", "l", "o"]
my_list[0] = "g"
print(my_list)

We have modified the contents of the list. The *contents* of the variable have changed, rather than the entire variable being *replaced*.

Like a string, we cannot index a position beyond the end of a list, even if we are trying to add an item to it:

In [None]:
my_list = [1, 2, 3, 4, 5]
my_list[5] = 6

But we *can* add items to lists using the *method* called `append`:

In [None]:
my_list = [1, 2, 3, 4, 5]
my_list.append(6)
my_list

### List Methods
If you'll recall, in week 1 we showed you a few useful *methods* for strings. A *method* is a subroutine that applies to a specific object, and is called with a `.` between the name of the object and subroutine. Like strings, lists also have many useful methods, including `append`. 

As with strings, we can search for the position of an item within a list:

In [None]:
my_list = [1, 2, 3, 4, 5]
my_list.index(3)

`my_list.index(obj)` returns the index of the object `obj` in the list `my_list`. If the object is not found then this causes an error. 

This is in contrast to `text.find(ss)` which would return `-1` if the substring `ss` was not found in the string `text`. Interestingly you can also use `.index` with strings if you *want* the method to cause an error when the substring cannot be found. Sometimes this is useful – one of the common principles when developing good Python code is that it is better to “fail loudly”. Returning `-1` might go unnoticed until much later, which might then become a bug that someone has to hunt down. If it had caused an error instead the code would crash, but you would immediately know where the problem was.

As we mentioned last week when we talked about errors, you should ***not*** be using `try` and `except` if you can prevent the error from happening in the first place. Here you can do this by checking whether the item is in the list *before* you try to get the index. Python has a really nice special keyword `in` to do this, which works like this:

In [None]:
my_list = [1, 2, 3, 4, 5]
3 in my_list

Think for a second about this new keyword. It actually looks a lot like some of the *operators* we saw earlier. The syntax is:
```python
obj in collection
```

it has two inputs: the object on the left can be any object and the object on the right must be a collection. The return type is a Boolean: `True` or `False`. As with any Boolean expression, it is common to see this syntax used as the subject of an if statement:

In [None]:
my_list = [1, 2, 3, 4, 5]
if 3 in my_list:
    print(f"3 is at index {my_list.index(3)}")
else:
    print("3 is not in the list :(")

Here is the elegance of Python in action. The English sentence "if 3 is in my_list" translates almost perfectly into code: `if 3 in my_list`. But as we discussed when we first introduced operators like `<`, you know what is really happening. This is really a *Boolean expression* which is being evaluated, and the result is being fed into the if statement. So `if 3 in list_one or in list_two` is not going to work (what should it be?).

We are seeing lots of parallels between strings and lists, so you may not be surprised to hear that we can use this keyword `in` for testing substrings as well:

In [None]:
"gg" in "eggs"

Lists support a bunch of the kinds of string operations you've seen before, plus a whole host of other methods. As with strings I recommend searching for them or reading the documentation [online](https://docs.python.org/3/tutorial/datastructures.html) as and when you need them. Rather than memorise a few useful methods it is useful to get into the habit of reading the documentation. That said, here are a few more code examples you can read and play around with to learn a few more list methods:

In [None]:
# like strings, we can concatenate lists
[1, 2, 3] + ["another", "list"]

In [None]:
# and repeat list contents
[1, 2] * 3

In [None]:
# [] is an empty list
my_list = []
my_list.append(1)
my_list.append(2)
my_list.append(3)
my_list

In [None]:
# if we use .append to append list to a list it will add it *as a single item*, not concatenate them
my_list = [1, 2, 3]
my_list.append([4, 5, 6])
my_list

In [None]:
# but .extend might be what we were really looking for
my_list = [1, 2, 3]
my_list.extend([4, 5, 6])
my_list

In [None]:
my_list = [1, 2, 3]

# insert(i, o) will insert o at position i
my_list.insert(0, 4)
my_list

In [None]:
my_list = [1, 2, 3]

# pop removes the last element of the list and returns it
x = my_list.pop()
print(x)
print(my_list)

In [None]:
my_list = [1, 0, 0, 1, 1, 0, 1, 0]

# count returns the number of occurances of a particular object
my_list.count(0)

There are also some useful inbuilt functions which work on any collection, including lists:

In [None]:
my_list = [1, 2, 3]
print("my_list = [1, 2, 3]")

# you have already seen len
print(f"len(my_list): {len(my_list)}")

# sum will sum (add) the elements of a list, if they are numeric
print(f"sum(my_list): {sum(my_list)}")

#### Function or Procedure?
Do you remember the difference between a function and a procedure? If not, go back and reread section 2.1 from week 1! 

We specifically pointed out the fact that string methods did *not* work like procedures – they return a new string, they do not modify the existing string:

In [None]:
text = "hello"
text.replace('e', 'u')
print(text)

But similar looking methods on lists ***do*** modify the object, they *are* procedures:

In [None]:
my_list = [1, 2, 3, 4, 5]
my_list.reverse()
print(my_list)

And this can lead to some really confusing mistakes, because these procedures specifically *do not* return values:

In [None]:
my_list = [1, 2, 3, 4, 5]
new_list = my_list.reverse()
print(new_list)

But other methods of a list object ***are*** functions so they *do* return values:

In [None]:
my_list = [1, 2, 3, 4, 5]
new_list = my_list.copy()
print(new_list)

Unfortunately this is something you simply have to get used to. Remember you can always consult the documentation for any function, either online or using built-in tools, which should explain how it works. Alternatively, if you are ever at all unsure, just try it out in a Jupyter cell or Python shell instance!

In [None]:
help(my_list.copy)

### Lists in Loops
Lists are another type of object that be the target of a *for-each loop*. So we can do things like this:

In [None]:
my_list = [4, 1, 3, 2, 5]
lowest = my_list[0]
for num in my_list:
    lowest = min(lowest, num)

lowest

This is actually a really common pattern, to find the biggest or smallest item in a collection with a for loop. We can also pass the list directly into the `min` function to achieve the same goal:


In [None]:
min(my_list)

But we are not always dealing with plain lists of numbers! The pattern in the previous cell (set up a `lowest` variable, go through each element in a for loop, for each new element keep track of whichever is smaller, the previously saved one or the new one) will pop up in all sorts of applications.

Lists are useful because they store any number of items. We often want to do the same or a similar thing to multiple items within a list, and so the for-each loop is perfect for this.

Of course, you can use a regular for loop (using `range(len(my_list))`) if you want to have an index variable instead. Then you can find each element from the list easily using your index variable. 

If you want to have both you can use a builtin function called `enumerate`. This is great if you want to apply some kind of operation to every element in a list:

In [None]:
def square_each_element(in_list):
    for i, item in enumerate(in_list):
        in_list[i] = item ** 2
        print(i)

my_list = [2, -2, 5, 10]
square_each_element(my_list)
my_list

#### ⚠️ Careful ⚠️
Notice `square_each_element` is a *procedure*! It has no return statement, it modifies the object that was passed in. We previously wrote functions that took integers and strings as parameters, but since these are *immutable* we could not modify them. Lists are *mutable*, so if you modify an input list in the body of a subroutine, you will modify the original list. This can be intended (as in `square_each_element`), but it can also be accidental. Just including a return statement *does not* prevent this behaviour.

#### 🚨 Extra Careful! 🚨
Multiple variable *names* can refer to the same *object*. We've seen this above: `in_list` inside the subroutine referred to the same list as `my_list` outside the subroutine. But this can happen even within the same *scope*. This can really mess with your head, so it's okay if this seems a bit confusing, but it's important and learning this now can prevent some headaches later. Read the following example carefully, and guess what it outputs before running it:

In [None]:
def list_of_inverts(in_list):
    out_list = in_list
    for i, item in enumerate(out_list):
        out_list[i] = -1 * item
    return out_list

old_list = [1, 2, 3, 4, 5]
new_list = list_of_inverts(old_list)

print(f"The value of new_list is: {new_list}")
print(f"The value of old_list is: {old_list}")

Remember to try to guess the output before running the cell.
<br /> <br /> <br /><br /> <br /> <br />
The value of `new_list` will not be a surprise, but the value of `old_list` might be. This line:
```python
out_list = in_list
```
creates a new named variable called `out_list` which refers to ***the exact same list*** as `in_list`. Not just the same values, *literally the same data in memory*. If you modify one, it modifies the other.

This is why Python lists have the `.copy()` function that I used earlier. Let's fix that code, notice the small change:

In [None]:
def list_of_inverts(in_list):
    out_list = in_list.copy()
    for i, item in enumerate(out_list):
        out_list[i] = -1 * item
    return out_list

old_list = [1, 2, 3, 4, 5]
new_list = list_of_inverts(old_list)

print(f"The value of new_list is: {new_list}")
print(f"The value of old_list is: {old_list}")

### Questions
#### Interactive Quiz 
Run the cell below to do the quiz about lists.

In [None]:
%run ../scripts/interactive_questions ./questions/3.1.1q.txt

#### Question 1: All Collatz Steps
Remember back in section 2.4 we introduced a sequence which was part of an unsolved mathematical problem called the Collatz conjecture? The sequence was formed by repeatedly applying the following function:
$$f(n) = \begin{cases} 
      3n+1 & n \text{ is odd} \\
      \frac{n}{2} & n \text{ is even} 
   \end{cases}$$
   
Interestingly, the sequence always seems to eventually produce a $1$. 

Previously we wrote functions which could count the number of applications (steps) required to reach $1$ from any number. What if we wanted a function which would calculate every value in the sequence between an input $n$ and $1$?

We could write something like this which uses a `print` statement:
```python
def collatz_sequence(n):
    steps = 0
    while(n != 1):
        print(n)
        if n % 2 == 0:
            n = n // 2
        else:
            n = 3*n + 1
    print(n)
```

But if you were paying attention last week, you will know this is no good! The function does not return anything, it just prints it. What if we want to use this information somewhere else? Maybe we want to find the Collatz sequence for all the numbers from 1 to 50 and display the longest.

So, instead, we should write a function that *returns a list* containing the sequence from `n` to `1` (inclusive).

In [11]:
%run ../scripts/show_examples.py ./questions/3.1/collatz_sequence

Example tests for function collatz_sequence

Test 1/5: collatz_sequence(2) -> [2, 1]
Test 2/5: collatz_sequence(3) -> [3, 10, 5, 16, 8, 4, 2, 1]
Test 3/5: collatz_sequence(4) -> [4, 2, 1]
Test 4/5: collatz_sequence(5) -> [5, 16, 8, 4, 2, 1]
Test 5/5: collatz_sequence(6) -> [6, 3, 10, 5, 16, 8, 4, 2, 1]


In [17]:
def collatz_sequence(n):
    steps = 1
    result = []
    while(n != 1):
        steps += 1
        print(n)
        if n % 2 == 0:
            n = n // 2
        else:
            n = 3*n + 1
        result.append(n)
    result = [steps] + result
    return result
            
%run -i ../scripts/function_tester.py ./questions/3.1/collatz_sequence

Running tests on function collatz_sequence

2
Test 1/9: 
	inputs: 2
	expected: [2, 1]
	actual: [2, 1]
	result: PASS
3
10
5
16
8
4
2
Test 2/9: 
	inputs: 3
	expected: [3, 10, 5, 16, 8, 4, 2, 1]
	actual: [8, 10, 5, 16, 8, 4, 2, 1]
	result: FAIL

Try editing your code and re-running the cell.


#### Question 2: Filter Less Than
Given a list of integers `my_list` and another integer `limit`, return a new list which only contains the elements from `my_list` that are less than `limit`.

In [None]:
%run ../scripts/show_examples.py ./questions/3.1/filter_less_than

In [None]:
def filter_less_than(my_list, limit):
    pass

%run -i ../scripts/function_tester.py ./questions/3.1/filter_less_than

## What Next?
When you are done with this notebook, go back to Engage and move onto the next section.