<a href="https://colab.research.google.com/github/dental-informatics-org/dental.informatics.org/blob/main/Python_Course/lists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Lists
A **list** is an ordered collection of values. The values that make up a list are called its **elements**, or its **items**. We will use the term element or item to mean the same thing. Lists are similar to strings, which are ordered collections of characters, except that the elements of a list can be of any type. Lists and strings — and other collections that maintain the order of their items — are called **sequences.**


# 1.1 List values

There are several ways to create a new list; the simplest is to enclose the elements in square brackets ([ and ]):

In [1]:
ps = [10, 20, 30, 40]
qs = ["spam", "bungee", "swallow"]


The first example is a list of four integers. The second is a list of three strings. The elements of a list don’t have to be the same type. The following list contains a string, a float, an integer, and (amazingly) another list:

In [3]:
zs = ["hello", 2.0, 5, [10, 20]]


A list within another list is said to be nested.

Finally, a list with no elements is called an empty list, and is denoted [].

We have already seen that we can assign list values to variables or pass lists as parameters to functions:

In [2]:
vocabulary = ["apple", "cheese", "dog"]
numbers = [17, 123]
an_empty_list = []
print(vocabulary, numbers, an_empty_list)

['apple', 'cheese', 'dog'] [17, 123] []


# 1.2 Accessing elements

The syntax for accessing the elements of a list is the same as the syntax for accessing the characters of a string — the index operator: [] (not to be confused with an empty list). The expression inside the brackets specifies the index. Remember that the indices start at 0:

In [4]:
numbers[0]

17

Any expression evaluating to an integer can be used as an index:

In [5]:
numbers[9-8]

123

In [6]:
numbers[1.0]

TypeError: list indices must be integers or slices, not float

If you try to access or assign to an element that does not exist, you get a runtime error:

In [7]:
numbers[2]

IndexError: list index out of range

It is common to use a loop variable as a list index.

In [8]:
horsemen = ["war", "famine", "pestilence", "death"]
for i in [0, 1, 2, 3]:
    print(horsemen[i])

war
famine
pestilence
death


# 1.3 List lenght

The function ***len*** returns the length of a list, which is equal to the number of its elements. If you are going to use an integer index to access the list, it is a good idea to use this value as the upper bound of a loop instead of a constant. That way, if the size of the list changes, you won’t have to go through the program changing all the loops; they will work correctly for any size list:



In [9]:
horsemen = ["war", "famine", "pestilence", "death"]

for i in range(len(horsemen)):
    print(horsemen[i])

war
famine
pestilence
death


The last time the body of the loop is executed, i is len(horsemen) - 1, which is the index of the last element. (But the version without the index looks even better now!)

Although a list can contain another list, the nested list still counts as a single element in its parent list. The length of this list is 4:

In [10]:
len(["car makers", 1, ["Ford", "Toyota", "BMW"], [1, 2, 3]])

4

# 1.4 List membership

***in*** and ***not*** in are Boolean operators that test membership in a sequence. We used them previously with strings, but they also work with lists and other sequences:

In [17]:
horsemen = ["war", "famine", "pestilence", "death"] # remove the # simbol above to veriify

"pestilence" in horsemen

#"debauchery" in horsemen

#"debauchery" not in horsemen


True

Using this produces a more elegant version of the nested loop program we previously used to count the number of students doing Computer Science

In [11]:
students = [
    ("John", ["CompSci", "Physics"]),
    ("Vusi", ["Maths", "CompSci", "Stats"]),
    ("Jess", ["CompSci", "Accounting", "Economics", "Management"]),
    ("Sarah", ["InfSys", "Accounting", "Economics", "CommLaw"]),
    ("Zuki", ["Sociology", "Economics", "Law", "Stats", "Music"])]

# Count how many students are taking CompSci
counter = 0
for (name, subjects) in students:
    if "CompSci" in subjects:
           counter += 1

print("The number of students taking CompSci is", counter)

The number of students taking CompSci is 3


# 1.5 List operations

The + operator concatenates lists:

In [12]:
a = [1, 2, 3]
b = [4, 5, 6]
c = a + b
c


[1, 2, 3, 4, 5, 6]

Similarly, the * operator repeats a list a given number of times:

In [13]:
print([0] * 4)



[0, 0, 0, 0]


In [14]:
print([1, 2, 3] * 3)


[1, 2, 3, 1, 2, 3, 1, 2, 3]


The first example repeats [0] four times. The second example repeats the list [1, 2, 3] three times.

# 1.6 List slices

The slice operations we saw previously with strings let us work with sublists:

In [18]:
a_list = ["a", "b", "c", "d", "e", "f"]

In [19]:
a_list[1:3]

['b', 'c']

In [20]:
a_list[:4]

['a', 'b', 'c', 'd']

In [21]:
a_list[3:]

['d', 'e', 'f']

In [22]:
a_list[:]

['a', 'b', 'c', 'd', 'e', 'f']

# 1.7 Lists are mutable

Unlike strings, lists are **mutable**, which means we can change their elements. Using the index operator on the left side of an assignment, we can update one of the elements:

In [23]:
fruit = ["banana", "apple", "quince"]

In [26]:
fruit[0] = "pear"
fruit[2] = "orange"
fruit

['pear', 'apple', 'orange']

The bracket operator applied to a list can appear anywhere in an expression. When it appears on the left side of an assignment, it changes one of the elements in the list, so the first element of fruit has been changed from ***"banana"*** to ***"pear"***, and the last from ***"quince"*** to ***"orange"***. An assignment to an element of a list is called **item assignment**. Item assignment does not work for strings:

In [27]:
my_string = "TEST"

In [28]:
my_string[2] = "X"

TypeError: 'str' object does not support item assignment

but it does for lists:

In [29]:
my_list = ["T", "E", "S", "T"]
my_list[2] = "X"
my_list

['T', 'E', 'X', 'T']

With the slice operator we can update a whole sublist at once:

In [30]:
a_list = ["a", "b", "c", "d", "e", "f"]
a_list[1:3] = ["x", "y"]
a_list

['a', 'x', 'y', 'd', 'e', 'f']

We can also remove elements from a list by assigning an empty list to them:



In [31]:
a_list = ["a", "b", "c", "d", "e", "f"]
a_list[1:3] = []
a_list


['a', 'd', 'e', 'f']

And we can add elements to a list by squeezing them into an empty slice at the desired location:

In [32]:
a_list = ["a", "d", "f"]
a_list[1:1] = ["b", "c"]
a_list

['a', 'b', 'c', 'd', 'f']

In [33]:
a_list[4:4] = ["e"]
a_list

['a', 'b', 'c', 'd', 'e', 'f']

# 1.8 List detection
Using slices to delete list elements can be error-prone. Python provides an alternative that is more readable. The del statement removes an element from a list:

In [34]:
a = ["one", "two", "three"]
del a[1]
a

['one', 'three']

As usual, the sublist selected by slice contains all the elements up to, but not including, the second index.

# 1.9 Objects and references

After we execute these assignment statements

In [35]:
a = "banana"
b = "banana"

we know that a and b will refer to a string object with the letters ***"banana"***. But we don’t know yet whether they point to the same string object.

There are two possible ways the Python interpreter could arrange its memory:

![image](https://openbookproject.net/thinkcs/python/english3e/_images/list1.png) 


In one case, ***a*** and ***b*** refer to two different objects that have the same value. In the second case, they refer to the same object.

We can test whether two names refer to the same object using the ***is*** operator:

In [36]:
a is b

True

This tells us that both a and b refer to the same object, and that it is the second of the two state snapshots that accurately describes the relationship.

Since strings are ***immutable***, Python optimizes resources by making two names that refer to the same string value refer to the same object.

This is not the case with lists:

In [37]:
a = [1, 2, 3]
b = [1, 2, 3]



In [38]:
a == b

True

In [39]:
a is b

False

The state snapshot here looks like this:

![image](https://openbookproject.net/thinkcs/python/english3e/_images/mult_references2.png)

***a*** and ***b*** have the same value but do not refer to the same object.

# 1.10 Aliasing
Since variables refer to objects, if we assign one variable to another, both variables refer to the same object:

In [40]:
a = [1, 2, 3]
b = a
a is b

True

In this case, the state snapshot looks like this:

![image](https://openbookproject.net/thinkcs/python/english3e/_images/mult_references3.png)

Because the same list has two different names, a and b, we say that it is **aliased**. Changes made with one alias affect the other:

In [41]:
b[0] = 5
a

[5, 2, 3]

Although this behavior can be useful, it is sometimes unexpected or undesirable. In general, it is safer to avoid aliasing when you are working with mutable objects (i.e. lists at this point in our textbook, but we’ll meet more mutable objects as we cover classes and objects, dictionaries and sets). Of course, for immutable objects (i.e. strings, tuples), there’s no problem — it is just not possible to change something and get a surprise when you access an alias name. That’s why Python is free to alias strings (and any other immutable kinds of data) when it sees an opportunity to economize.

# 1.11 Cloning lists
If we want to modify a list and also keep a copy of the original, we need to be able to make a copy of the list itself, not just the reference. This process is sometimes called **cloning**, to avoid the ambiguity of the word copy.

The easiest way to clone a list is to use the slice operator:

In [42]:
a = [1, 2, 3]
b = a[:]
b

[1, 2, 3]

Taking any slice of a creates a new list. In this case the slice happens to consist of the whole list. So now the relationship is like this:

![image](https://openbookproject.net/thinkcs/python/english3e/_images/mult_references2.png)

Now we are free to make changes to b without worrying that we’ll inadvertently be changing a:

In [43]:
b[0] = 5
a

[1, 2, 3]

# 1.12 Lists and for loops
The ***for*** loop also works with lists, as we’ve already seen. The generalized syntax of a ***for*** loop is:

In [None]:
for VARIABLE in LIST:
    BODY

So, as we’ve seen

In [45]:
friends = ["Joe", "Zoe", "Brad", "Angelina", "Zuki", "Thandi", "Paris"]
for friend in friends:
    print(friend)

Joe
Zoe
Brad
Angelina
Zuki
Thandi
Paris


It almost reads like English: For (every) friend in (the list of) friends, print (the name of the) friend.

Any list expression can be used in a for loop:

In [46]:
for number in range(20):
    if number % 3 == 0:
        print(number)

for fruit in ["banana", "apple", "quince"]:
    print("I like to eat " + fruit + "s!")

0
3
6
9
12
15
18
I like to eat bananas!
I like to eat apples!
I like to eat quinces!


The first example prints all the multiples of 3 between 0 and 19. The second example expresses enthusiasm for various fruits.

Since lists are mutable, we often want to traverse a list, changing each of its elements. The following squares all the numbers in the list xs:

In [47]:
xs = [1, 2, 3, 4, 5]

for i in range(len(xs)):
    xs[i] = xs[i]**2

Take a moment to think about ***range(len(xs))*** until you understand how it works.

In this example we are interested in both the ***value*** of an item, (we want to square that value), and its ***index*** (so that we can assign the new value to that position). This pattern is common enough that Python provides a nicer way to implement it:

In [48]:
xs = [1, 2, 3, 4, 5]

for (i, val) in enumerate(xs):
    xs[i] = val**2

***enumerate*** generates pairs of both (index, value) during the list traversal. Try this next example to see more clearly how ***enumerate*** works:

In [49]:
for (i, v) in enumerate(["banana", "apple", "pear", "lemon"]):
     print(i, v)

0 banana
1 apple
2 pear
3 lemon


# 1.13 List parameters

Passing a list as an argument actually passes a reference to the list, not a copy or clone of the list. So parameter passing creates an alias for you: the caller has one variable referencing the list, and the called function has an alias, but there is only one underlying list object. For example, the function below takes a list as an argument and multiplies each element in the list by 2:

In [50]:
def double_stuff(a_list):
    """ Overwrite each element in a_list with double its value. """
    for (idx, val) in enumerate(a_list):
        a_list[idx] = 2 * val

If we add the following onto our script:

In [51]:
things = [2, 5, 9]
double_stuff(things)
print(things)

[4, 10, 18]


In the function above, the parameter a_list and the variable things are aliases for the same object. So before any changes to the elements in the list, the state snapshot looks like this:

![image](https://openbookproject.net/thinkcs/python/english3e/_images/mult_references4.png)

Since the list object is shared by two frames, we drew it between them.

If a function modifies the items of a list parameter, the caller sees the change.

# 1.14 List methods
The dot operator can also be used to access built-in methods of list objects. We’ll start with the most useful method for adding something onto the end of an existing list:

In [52]:
mylist = []
mylist.append(5)
mylist.append(27)
mylist.append(3)
mylist.append(12)
mylist

[5, 27, 3, 12]

***append*** is a list method which adds the argument passed to it to the end of the list. We’ll use it heavily when we’re creating new lists. Continuing with this example, we show several other list methods:

In [72]:
mylist.insert(1, 12)  # Insert 12 at pos 1, shift other items up
mylist


[3,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 5,
 5,
 5,
 5,
 5,
 9,
 9,
 11,
 11,
 12,
 27]

In [56]:
mylist.count(12)       # How many times is 12 in mylist?

2

In [57]:
mylist.extend([5, 9, 5, 11])   # Put whole list onto end of mylist
mylist

[3, 12, 5, 5, 5, 9, 11, 12, 27, 5, 9, 5, 11]

In [58]:
mylist.index(9)                # Find index of first 9 in mylist

5

In [59]:
mylist.reverse()
mylist

[11, 5, 9, 5, 27, 12, 11, 9, 5, 5, 5, 12, 3]

In [60]:
mylist.sort()
mylist

[3, 5, 5, 5, 5, 5, 9, 9, 11, 11, 12, 12, 27]

In [61]:
mylist.remove(12)             # Remove the first 12 in the list
mylist

[3, 5, 5, 5, 5, 5, 9, 9, 11, 11, 12, 27]

Experiment and play with the list methods shown here, and read their documentation until you feel confident that you understand how they work.

# 1.15 Pure functions and modifiers
Functions which take lists as arguments and change them during execution are called **modifiers** and the changes they make are called **side effects.**

A **pure function** does not produce side effects. It communicates with the calling program only through parameters, which it does not modify, and a return value. Here is ***double_stuff*** written as a pure function:

In [75]:
def double_stuff(a_list):
    """ Return a new list which contains
        doubles of the elements in a_list.
    """
    new_list = []
    for value in a_list:
        new_elem = 2 * value
        new_list.append(new_elem)

    return new_list

This version of double_stuff does not change its arguments:

In [76]:
things = [2, 5, 9]
xs = double_stuff(things)

print(things)

print(xs)


[2, 5, 9]
[4, 10, 18]


An early rule we saw for assignment said “first evaluate the right hand side, then assign the resulting value to the variable”. So it is quite safe to assign the function result to the same variable that was passed to the function:

In [77]:
things = [2, 5, 9]
things = double_stuff(things)
things

[4, 10, 18]

# 2 Functions that produce lists

The pure version of *double_stuff* above made use of an important **pattern** for your toolbox. Whenever you need to write a function that creates and returns a list, the pattern is usually:

In [None]:
initialize a result variable to be an empty list
loop
   create a new element
   append it to result
return the result

Let us show another use of this pattern. Assume you already have a function *is_prime(x)* that can test if x is prime. Write a function to return a list of all prime numbers less than n:

In [78]:
def primes_lessthan(n):
    """ Return a list of all prime numbers less than n. """
    result = []
    for i in range(2, n):
        if is_prime(i):
           result.append(i)
    return result

# 2.1 Strings and lists

Two of the most useful methods on strings involve conversion to and from lists of substrings. The *split* method (which we’ve already seen) breaks a string into a list of words. By default, any number of whitespace characters is considered a word boundary:

In [79]:
song = "The rain in Spain..."
wds = song.split()
wds

['The', 'rain', 'in', 'Spain...']

An optional argument called a **delimiter** can be used to specify which string to use as the boundary marker between substrings. The following example uses the string *ai* as the delimiter:

In [80]:
song.split("ai")

['The r', 'n in Sp', 'n...']

Notice that the delimiter doesn’t appear in the result.

The inverse of the *split* method is *join*. You choose a desired **separator** string, (often called the glue) and join the list with the glue between each of the elements:

In [81]:
glue = ";"
s = glue.join(wds)
s

'The;rain;in;Spain...'

The list that you glue together (*wds* in this example) is not modified. Also, as these next examples show, you can use empty glue or multi-character strings as glue:

In [82]:
" --- ".join(wds)

'The --- rain --- in --- Spain...'

In [83]:
"".join(wds)

'TheraininSpain...'

# 2.2 list and range

Python has a built-in type conversion function called *list* that tries to turn whatever you give it into a list.

In [84]:
xs = list("Crunchy Frog")
xs

['C', 'r', 'u', 'n', 'c', 'h', 'y', ' ', 'F', 'r', 'o', 'g']

In [85]:
"".join(xs)

'Crunchy Frog'

One particular feature of *range* is that it doesn’t instantly compute all its values: it “puts off” the computation, and does it on demand, or “lazily”. We’ll say that it gives a **promise** to produce the values when they are needed. This is very convenient if your computation short-circuits a search and returns early, as in this case:

In [87]:
def f(n):
    """ Find the first positive integer between 101 and less
        than n that is divisible by 21
    """
    for i in range(101, n):
       if (i % 21 == 0):
           return i


print(f(110) == 105)
print(f(1000000000) == 105)

True
True


In the second test, if *range* were to eagerly go about building a list with all those elements, you would soon exhaust your computer’s available memory and crash the program. But it is cleverer than that! This computation works just fine, because the *range* object is just a promise to produce the elements if and when they are needed. Once the condition in the *if* becomes true, no further elements are generated, and the function returns. (Note: Before Python 3, range was not lazy. If you use an earlier versions of Python, YMMV!)

You’ll sometimes find the lazy *range* wrapped in a call to *list*. This forces Python to turn the lazy promise into an actual list:

In [88]:
range(10)           # Create a lazy promise

range(0, 10)

list(range(10))     # Call in the promise, to produce a list.

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# 3 Nested lists

A nested list is a list that appears as an element in another list. In this list, the element with index 3 is a nested list:

In [89]:
nested = ["hello", 2.0, 5, [10, 20]]

If we output the element at index 3, we get:

In [90]:
print(nested[3])

[10, 20]


To extract an element from the nested list, we can proceed in two steps:

In [92]:
elem = nested[3]

print(elem[0])

10


Or we can combine them:

In [93]:
nested[3][1]

20

Bracket operators evaluate from left to right, so this expression gets the 3’th element of nested and extracts the 1’th element from it.

# 4 Matrices

Nested lists are often used to represent matrices. For example, the matrix:

![image](https://openbookproject.net/thinkcs/python/english3e/_images/matrix2.png)

might be represented as:

In [94]:
mx = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

*mx* is a list with three elements, where each element is a row of the matrix. We can select an entire row from the matrix in the usual way:

In [95]:
mx[1]

[4, 5, 6]

Or we can extract a single element from the matrix using the double-index form:

In [96]:
mx[1][2]

6

The first index selects the row, and the second index selects the column. Although this way of representing matrices is common, it is not the only possibility. A small variation is to use a list of columns instead of a list of rows. Later we will see a more radical alternative using a dictionary.