# Chapter 4 – Data Structures
## 4.1 Sequences and Lists
### Data From Data
We have often mentioned the importance of *types* in programming. Python is more flexible than many languages about typing, but it's still really important that you can always work out what the expected types are in any given situation. Sometimes it's the only way to make the code work: `"Number " + 1` will give an error, but `"Number " + str(1)` produces the desired result. A line of code might do many different things depending on the exact types, so keeping track of what type you expect a variable to be can make debugging much easier.

Many of the types we've used so far are what some languages might call *primitives*. While this is not a word used in the official Python documentation in this context, it is commonly used in other guides and tutorials. A *primitive* is an *atomic* data type – it is its own thing, you cannot break it down into smaller parts. In Python this includes integers, floats, strings, and Booleans.

In another language programming language called Java, *characters* are a *primitive* data type, and strings are made up of multiple characters. This makes the string a *composite* data type in Java.

A **data structure** is a composite data type where data is organised in a way that provides certain benefits – often speed or space efficiency. For some problems, a suitable choice of data structure can make the solution more efficient and easier to write.

In Python, strings *can* be broken down into characters, but each character is just a single letter string. The *object* you get as a result of *indexing* the string is itself another string:

In [2]:
text = "hello"
character = text[0]
type(text) == type(character)

True

Strings in Python are an example of a *sequence* data type. Sequences are collections which support certain operations like indexing, slicing (e.g. `text[3:5]`), and so on.

Another sequence type you will use a lot is the list.

### Lists
A **list** is an ordered collection of objects. It is a sequence type data structure. Lists are written with square brackets, so we can write a list containing the numbers from `1` to `5`:

In [4]:
[1, 2, 3, 4, 5]

[1, 2, 3, 4, 5]

Or a list containing just the number `31`:

In [29]:
[31]

[31]

Or a list containing the string `"Python"`:

In [8]:
["Python"]

['Python']

Or a list containing nothing:

In [28]:
[] 

[]

Or a list containing the number 31, the string `"Python"`, and a list containing the numbers `1` to `5` (we can have lists inside lists):

In [10]:
[31, "Python", [1, 2, 3, 4, 5]]

[31, 'Python', [1, 2, 3, 4, 5]]

Lists support the exact same sequence operations that you have already learned from strings:

In [18]:
my_list = [31, "Python", [1, 2, 3, 4, 5]]
print(f"The length of my list is {len(my_list)} and the first element is {my_list[0]}")

The length of my list is 3 and the first element is 31


Unlike strings, lists are **mutable**, meaning that we can change them after they have been created. Specifically we can change their contents. We can always reassign a variable:

In [12]:
text = "hello"
text = "goodbye"
print(text)

goodbye


But we could not change the values of the string itself:

In [14]:
text = "hello"
text[0] = "g"

TypeError: 'str' object does not support item assignment

However, with a list, this item assignment operation will work:

In [16]:
my_list = [1, 2, 3, 4, 5]
my_list[0] = 5
my_list[1] = 4
print(my_list)

[5, 4, 3, 4, 5]


But like a string, we still cannot index a position beyond the end of a list, even if we are trying to add an item to it:

In [35]:
my_list = [1, 2, 3, 4, 5]
my_list[5] = 6

IndexError: list assignment index out of range

But we *can* add items to lists using the *method* called `append`:

In [36]:
my_list = [1, 2, 3, 4, 5]
my_list.append(6)
my_list

[1, 2, 3, 4, 5, 6]

### List Methods
Lists, like strings, have many useful methods, which you'll remember are subroutines that are called with a `.` between the name of the object and subroutine, like `append`. As with strings, we can search for the position of an item within a list:

In [30]:
my_list = [1, 2, 3, 4, 5]
my_list.index(3)

2

`my_list.index(obj)` returns the index of the object `obj` in the list `my_list`. If the object is not found then this returns an error. This is in contrast to `text.find(ss)` which would return `-1` if the substring `ss` was not found in the string `text`. You can use `.index` with strings but you cannot use `.find` with lists.

Since you do not want any errors, you'll want to check whether the item is in the list. We can use the Python keyword `in` to do this.

In [38]:
my_list = [1, 2, 3, 4, 5]
3 in my_list

True

As with any Boolean expression, this is common to see as the subject of an if statement:

In [39]:
my_list = [1, 2, 3, 4, 5]
if 3 in my_list:
    print(f"3 is at index {my_list.index(3)}")
else:
    print("3 is not in the list :(")

3 is at index 2


We can use this keyword `in` for testing substrings as well:

In [40]:
"gg" in "eggs"

True

Lists support a bunch of the kinds of string operations you've seen before, plus a whole host of other methods. As with strings I recommend searching for them or reading the documentation [online](https://docs.python.org/3/tutorial/datastructures.html) as and when you need them. Rather than memorise a few useful methods it is useful to get into the habit of reading the documentation. That said, here are a few more code examples you can read and play around with to learn a few more list methods:

In [80]:
# like strings, we can concatenate lists
[1, 2, 3] + ["another", "list"]

[1, 2, 3, 'another', 'list']

In [81]:
# and repeat list contents
[1, 2] * 3

[1, 2, 1, 2, 1, 2]

In [60]:
# [] is an empty list
my_list = []
my_list.append(1)
my_list.append(2)
my_list.append(3)
my_list

[1, 2, 3]

In [66]:
# if we append a list to a list it will add it *as a single item*, not concatenate them
my_list = [1, 2, 3]
my_list.append([4, 5, 6])
my_list

[1, 2, 3, [4, 5, 6]]

In [57]:
my_list = [1, 2, 3]

# insert(i, o) will insert o at position i
my_list.insert(0, 4)
my_list

[4, 1, 2, 3]

In [63]:
my_list = [1, 2, 3]

# pop removes the last element of the list and returns it
x = my_list.pop()
print(x)
print(my_list)

3
[1, 2]


In [82]:
my_list = [1, 2, 3]

# there are some general functions that work on any collections
# sum will sum (add) the elements of a list
s = sum(my_list)
s

6

In [65]:
my_list = [1, 0, 0, 1, 1, 0, 1, 0]

# count returns the number of occurances of a particular object
my_list.count(0)

4

#### Function or Procedure?
Do you remember the difference between a function and a procedure? (If not, go back to [Section 2.1](../Chapter%202/2.1.ipynb)!)

We specifically pointed out the fact that string methods did *not* work like procedures – they return a new string, they do not modify the existing string:

In [21]:
text = "hello"
text.replace('e', 'u')
print(text)

hello


But similar looking methods on lists ***do*** modify the object, they *are* procedures:

In [23]:
my_list = [1, 2, 3, 4, 5]
my_list.reverse()
print(my_list)

[5, 4, 3, 2, 1]


And this can lead to some really confusing mistakes, because these procedures specifically *do not* return values:

In [27]:
my_list = [1, 2, 3, 4, 5]
new_list = my_list.reverse()
print(new_list)

None


But other methods of a list object ***are*** functions so they *do* return values:

In [26]:
my_list = [1, 2, 3, 4, 5]
new_list = my_list.copy()
print(new_list)

[5, 4, 3, 2, 1]


Unfortunately this is something you simply have to get used to. Remember you can always consult the documentation for any function, either online or using built-in tools, which should explain how it works. Alternatively, if you are ever at all unsure, just try it out in a Jupyter cell or Python interpreter instance!

In [32]:
help(my_list.copy)

Help on built-in function copy:

copy() method of builtins.list instance
    Return a shallow copy of the list.



### Lists in Loops
Lists are another type of object that be the target of a *for-each loop*. So we can do things like this:

In [34]:
my_list = [4, 1, 3, 2, 5]
lowest = my_list[0]
for num in my_list:
    lowest = min(lowest, num)

lowest

1

This is actually a really common pattern. Lists are useful because they store any number of items. We often want to do the same or a similar thing to multiple items within a list, and so the for-each loop is perfect for this.

Of course you can use a regular for loop (using `range(len(my_list))`) if you want to have an index variable instead. If you want to have both you can use a builtin function called `enumerate`. This is great if you want to apply some kind of operation to every element in a list:

In [42]:
def square_each_element(in_list):
    for i, item in enumerate(in_list):
        in_list[i] = item ** 2

my_list = [2, -2, 5, 10]
square_each_element(my_list)
my_list

[4, 4, 25, 100]

#### ⚠️ Careful ⚠️
Notice `square_each_element` is a *procedure*! It has no return statement, it modifies the object that was passed in. We previously wrote functions that took integers and strings as parameters, but since these are *immutable* we could not modify them. Lists are *mutable*, so if you modify an input list in the body of a subroutine, you will modify the original list. This can be intended (as in `square_each_element`), but it can also be accidental. Including a return statement does not prevent this behaviour.

#### 🚨 Extra Careful! 🚨
Multiple variable *names* can refer to the same *object*. We've seen this above: `in_list` inside the subroutine referred to the same list as `my_list` outside the subroutine. But this can happen even within the same *scope*. This can really mess with your head, so it's okay if this seems a bit confusing, but it's important and learning this now can prevent some headaches later. Read the following example carefully, and guess what it outputs before running it:

In [None]:
def list_of_inverts(in_list):
    out_list = in_list
    for i, item in enumerate(out_list):
        out_list[i] = -1 * item
    return out_list

old_list = [1, 2, 3, 4, 5]
new_list = list_of_inverts(old_list)

print(f"The value of new_list is: {new_list}")
print(f"The value of old_list is: {old_list}")

<br /> <br />
The value of `new_list` will not be a surprise, but the value of `old_list` might be. This line:
```python
out_list = in_list
```
creates a new named variable called `out_list` which refers to ***the exact same list*** as `in_list`. Not just the same values, *literally the same data in memory*. If you modify one, it modifies the other.

This is why Python lists have the `.copy()` function that I used earlier. Let's fix that code, notice the small change:

In [46]:
def list_of_inverts(in_list):
    out_list = in_list.copy()
    for i, item in enumerate(out_list):
        out_list[i] = -1 * item
    return out_list

old_list = [1, 2, 3, 4, 5]
new_list = list_of_inverts(old_list)

print(f"The value of new_list is: {new_list}")
print(f"The value of old_list is: {old_list}")

The value of new_list is: [-1, -2, -3, -4, -5]
The value of old_list is: [1, 2, 3, 4, 5]


### Questions
#### Interactive Quiz 
Run the cell below to do the quiz about lists.

In [None]:
%run ../scripts/interactive_questions ./questions/4.1.1q.txt

#### Question 1: All Collatz Steps
Remember back in [Section 2.4](../Chapter%202/2.4.ipynb) we introduced a sequence which was part of an unsolved mathematical problem called the Collatz conjecture? The sequence was formed by repeatedly applying the following function:
$$f(n) = \begin{cases} 
      3n+1 & n \text{ is odd} \\
      \frac{n}{2} & n \text{ is even} 
   \end{cases}$$
   
Interestingly, the sequence always seems to eventually produce a $1$. 

Previously we wrote functions which could count the number of applications (steps) required to reach $1$ from any number. What if we wanted a function which would calculate every value in the sequence between an input $n$ and $1$?

We could write something like this which uses a `print` statement:
```python
def collatz_sequence(n):
    steps = 0
    while(n != 1):
        print(n)
        if n % 2 == 0:
            n = n // 2
        else:
            n = 3*n + 1
    print(n)
```

But if you were paying attention [last chapter](../Chapter%203/3.3.ipynb), you will know this is no good! The function does not return anything, it just prints it. What if we want to use this information somewhere else? Maybe we want to find the Collatz sequence for all the numbers from 1 to 50 and display the longest.

So, instead, we should write a function that *returns a list* containing the sequence from `n` to `1` (inclusive).

In [76]:
%run ../scripts/show_examples.py ./questions/4.1/collatz_sequence

Example tests for function collatz_sequence

Test 1/5: collatz_sequence(2) -> [2, 1]
Test 2/5: collatz_sequence(3) -> [3, 10, 5, 16, 8, 4, 2, 1]
Test 3/5: collatz_sequence(4) -> [4, 2, 1]
Test 4/5: collatz_sequence(5) -> [5, 16, 8, 4, 2, 1]
Test 5/5: collatz_sequence(6) -> [6, 3, 10, 5, 16, 8, 4, 2, 1]


In [None]:
def collatz_sequence(n):
    pass
            
%run -i ../scripts/function_tester.py ./questions/4.1/collatz_sequence

#### Question 2: Filter Less Than
Given a list of integers `my_list` and another integer `limit`, return a new list which only contains the elements from `my_list` that are less than `limit`.

In [4]:
%run ../scripts/show_examples.py ./questions/4.1/filter_less_than

Example tests for function filter_less_than

Test 1/5: filter_less_than([1, 2, 3, 4, 5], 4) -> [1, 2, 3]
Test 2/5: filter_less_than([1, 2, 3, 4, 5], 2) -> [1]
Test 3/5: filter_less_than([1, 2, 3, 4, 5], 1) -> []
Test 4/5: filter_less_than([], 5) -> []
Test 5/5: filter_less_than([-100, 0, 1000, 50], 100) -> [-100, 0, 50]


In [None]:
def filter_less_than(my_list, limit):
    pass

%run -i ../scripts/function_tester.py ./questions/4.1/filter_less_than

Once you are done you can move onto the [next section](4.2.ipynb).