## 4.6 Lists

Python provides various sequence types. The ones we've seen so far
(`str`, `range` and `tuple`) are immutable: the sequence can't be modified.
Python's `list` data type is the most flexible one:
**lists** are mutable and possibly heterogeneous sequences.
All operations on tuples work in the same way on lists,
so I won't repeat them.
This section focuses on the operations that change lists,
so let's first revisit the sequence ADT.

<div class="alert alert-info">
<strong>Info:</strong> Some texts use 'list' as a synonym of 'sequence'.
In M269 the term refers to Python's data type.
</div>

### 4.6.1 Modifying sequences

Besides operations to inspect and create sequences,
we need operations to remove and add individual items.
They modify the input sequence, i.e. they aren't mathematical functions.
To define them, we replace 'Function:' with 'Operation:' in the template and
add an entry 'Inputs/Outputs:' for variables that have their value changed by
the operation. When we must distinguish the value of an input/output variable
_x_ before and after the operation, we use pre-_x_ and post-_x_ respectively.
Here's how we can define the operation to remove the item at a given index _i_,
which we write as 'remove $s_i$' or 'remove _s_[*i*]' in algorithms in English.

**Operation**: remove\
**Inputs/Outputs**: _values_, a sequence\
**Inputs**: _index_, an integer\
**Preconditions**: 0 ≤ _index_ < │*values*│\
**Postconditions**: post-_values_ =
(pre-_values_[0], ..., pre-_values_[_index_ - 1],
pre-_values_[_index_ + 1], ..., pre-_values_[│pre-_values_│ - 1])

The postcondition states that the value at the given index has 'disappeared'
but all other items remain in the same order.
Without the 'pre-' and 'post-' indications,
the postcondition would be ambiguous.
For example, if the sequence has length&nbsp;5 before an item is removed,
it has length&nbsp;4 afterwards, so which length would │*values*│ refer to?
Writing │pre-_values_│ or │post-_values_│ makes it clear.

We assume that removing the item at index _i_ from a sequence of length _n_ is
implemented by copying each of the subsequent _n_ - _i_ - 1 items one position down.
Also, the length has to be updated whenever the sequence changes,
so that the size operation can just look it up in constant time.
Copying values in RAM and subtracting one from an integer take constant time,
so the removal operation does _n_ - _i_ - 1 + 1 constant-time operations.
The complexity is Θ(_n_ - _i_) or Θ(│*values*│ - _index_).

Using the pre-_x_ notation in preconditions or complexity expressions is
unnecessary. By definition, they can only refer to input values.

A **subsequence** is obtained by deleting zero or more items from the input
sequence.
Every substring is a subsequence, but not every subsequence is a substring.
For example, (1, 3, 5) is a subsequence of (1, 2, 3, 4, 5) but not a substring,
because 1, 3 and 5 aren't consecutive items in the longer sequence.

The operation to add an item, more precisely to insert it at a given position,
can be defined like this:

**Operation**: insert\
**Inputs/Outputs**: _values_, a sequence\
**Inputs**: _index_, an integer; _value_, an object\
**Preconditions**: 0 ≤ _index_ ≤ │*values*│\
**Postconditions**: post-_values_ =
(pre-_values_[0], ..., pre-_values_[_index_ - 1], _value_,
pre-_values_[*index*], ..., pre-_values_[│pre-_values_│ - 1])

The postcondition defines what inserting at a certain position means:
to shift all items from that position onwards to the next position,
in order to 'make space' for the new item.
As the postcondition explicitly shows, post-_values_[*index*] = _value_.
If _index_ = │pre-_values_│, as the preconditions allow,
then the item is effectively added to the end of the sequence.
This special case of insertion is so common it has a name: **appending**.
In algorithms in English, we write these operations respectively as

- insert _value_ in _values_ at _index_
- append _value_ to _values_.

The insertion operation shifts not just the items after the index, but also
the item at the index itself, so it copies _n_ - _i_ values and increments the
length. The complexity is therefore Θ(│*values*│ - _index_ + 1).
In the grand scheme of things, i.e. for long sequences,
the operation spends most of its time shifting items.
The update of the length hardly impacts the overall run-time.
So, we write simply Θ(│*values*│ - _index_).
More generally, a fixed number doesn't affect the growth of the run-time.

<div class="alert alert-warning">
<strong>Note:</strong> In complexity analysis, Θ(<em>e</em> + <em>c</em>) = Θ(<em>e</em> - <em>c</em>) = Θ(<em>e</em>)
if <em>c</em> is an integer constant and
<em>e</em> is an expression involving the input values or sizes.
</div>

If the value is appended then the complexity is Θ(1), as no shifting takes place.

Replacing the item at a given index with a new item can be achieved by
first removing it and then inserting the new item at the same index.
Shifting all subsequent items first down and then up is very inefficient.
The assignment 'let $s_i$ be _x_' does the replacement in constant time.

#### Exercise 4.6.1

The following defines a function that outputs the reverse of the input sequence.

**Function**: reversed sequence\
**Inputs**: _values_, a sequence\
**Preconditions**: true\
**Output**: _reversed_, a sequence\
**Postconditions**: _reversed_ =
(_values_[│ _values_ │ - 1], _values_[│ _values_ │ - 2], ..., _values_[1], _values_[0])

Modify the definition so that it reverses the input sequence instead of
creating a new sequence.

_Write your answer here._

[Hint](../31_Hints/Hints_04_6_01.ipynb)
[Answer](../32_Answers/Answers_04_6_01.ipynb)

### 4.6.2 Creating lists

Let's now see how Python implements the sequence ADT with type `list`.

Like with tuples and strings, lists can be created in various ways.
The simplest is to write a list literal.
It looks like a tuple literal but with square brackets instead of parentheses.
The empty list is `[]` and `[1, [2, 3], ('A', 'B'), True]` is
a heterogeneous list with four elements.
Contrary to tuples, a list of length&nbsp;1 doesn't need an extra comma,
because the square brackets can't be confused with redundant parentheses.

As with strings, printing a list and
displaying it (by just writing the variable name) produces different outputs.

In [1]:
to_do = [
    'finish writing this chapter',
    'write the next one',
    'rinse and repeat for another 20+ chapters'
]
to_do

['finish writing this chapter',
 'write the next one',
 'rinse and repeat for another 20+ chapters']

In [2]:
print(to_do)

['finish writing this chapter', 'write the next one', 'rinse and repeat for another 20+ chapters']


Lists can also be created by slicing or concatenating existing lists.
Repeated concatenation is particularly convenient for creating
long homogeneous lists, with all elements initialised to the same value.

In [3]:
10 * [0]     # create an integer list initialised to zeroes

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

We can use the `list` constructor to convert another sequence to a
list.

In [4]:
list(range(1, 10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [5]:
list('Hello, world!')

['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!']

In [6]:
board = (('X', 'O', 'X'), (' ', ' ', ' '), ('O', 'X', ' '))
list(board)     # doesn't convert nested tuples

[('X', 'O', 'X'), (' ', ' ', ' '), ('O', 'X', ' ')]

Finally, we can use the `sorted` function to obtain a sorted list from
any sequence of pairwise comparable items.
The function has a second optional Boolean parameter to indicate whether
to sort in descending order.

In [7]:
sorted('Hello, world!')

[' ', '!', ',', 'H', 'd', 'e', 'l', 'l', 'l', 'o', 'o', 'r', 'w']

In [8]:
sorted([2, 4, -3, 4.1])

[-3, 2, 4, 4.1]

In [9]:
sorted([2, 4, -3, 4.1], reverse=True)   # sort in descending order

[4.1, 4, 2, -3]

We'll look at sorting algorithms in a later chapter. For the moment we assume
that sorting takes linear time in the length of the sequence in the best case
and quadratic time in the worst case. In the best case, a single pass
over the sequence detects that it's already sorted. In the worst case,
a sorting algorithm must compare every item to every other item,
which takes a quadratic number of comparisons,
to know where each item appears in the sorted sequence.

### 4.6.3 Mistakes

In Python, some arguments of some functions have to be named:
you can't just pass a value.
If you do, the interpreter doesn't know what to do with it
and says that the function is being called with too many arguments.

In [10]:
sorted([1, 2, 3], True)     # sort in descending order

TypeError: sorted expected 1 argument, got 2

Forgetting a comma before a nested list makes the interpreter think we're
trying to index the previous item because indexing uses square brackets too.
This leads to a type error if the item's type doesn't include the indexing operation.

In [11]:
[1 [2, 3], True]        # comma missing after 1

  [1 [2, 3], True]        # comma missing after 1


TypeError: 'int' object is not subscriptable

You can't concatenate sequences of different types.

In [12]:
['first task', 'second task'] + 'third task'

TypeError: can only concatenate list (not "str") to list

In [13]:
(1, 2) + [3, 4]

TypeError: can only concatenate tuple (not "list") to tuple

### 4.6.4 Modifying lists

A list can be changed by replacing, removing, or inserting a new item.
As mentioned above, we replace an item with an assignment, e.g.

1. let _daily temperature_ be [-5, -2, 0, 1, -1]
1. let _daily temperature_[1] be -4

In Python this becomes

In [14]:
daily_temperature = [-5, -2, 0, 1, -1]
daily_temperature[1] = -4

The list has indeed changed:

In [15]:
daily_temperature

[-5, -4, 0, 1, -1]

The `insert` method adds an item at a given position.
A **method** is a function that is only known in the context of
a particular data type. Whereas `print` and `len` are functions that
can be applied to various data types,
`insert` only applies to lists, so the syntax is different.
It uses **dot notation**:
first write an expression (typically a variable) of the required data type,
then a dot, then the method name with the remaining inputs in parentheses.
In the case of the `insert` method, the remaining inputs are
first the index and then the item to be inserted.

In [16]:
daily_temperature.insert(0, -6)     # insert -6 at index 0
daily_temperature

[-6, -5, -4, 0, 1, -1]

In [17]:
daily_temperature.insert(-1, 'week 2:')
daily_temperature

[-6, -5, -4, 0, 1, 'week 2:', -1]

Hmm, this didn't work. I wanted to add the text to the end of the list,
but instead it got into the penultimate position.
Can you explain why and figure a way of making it appear last?

___

The operation worked as described. I asked to insert the text
in the last position, so the value 3 at that position shifted right.
Let's start afresh.

In [18]:
daily_temperature = [-5, -2, 0, 1, -1]

The insertion operation takes the index of where the item will be put.
If I want it to appear after the current last item then
I can't use the index of that item: I must use the next index.

In [19]:
daily_temperature.insert(len(daily_temperature), 'week 2:')
daily_temperature

[-5, -2, 0, 1, -1, 'week 2:']

The `append` method adds an item to the end of a list:
`a_list.append(an_item)` is short for `a_list.insert(len(a_list), an_item)`.

In [20]:
daily_temperature.append(6)
daily_temperature

[-5, -2, 0, 1, -1, 'week 2:', 6]

To remove an item, we use the `pop` method, indicating the index of the item
to be removed. For convenience, the method returns the item that was removed.

In [21]:
daily_temperature.pop(0)

-5

In [22]:
daily_temperature

[-2, 0, 1, -1, 'week 2:', 6]

If you want to see a method's documentation, you must use dot notation to
indicate which data type has that method.

In [23]:
help(list.insert)

Help on method_descriptor:

insert(self, index, object, /)
    Insert object before index.



Lists have a further method to sort a list,
instead of creating a new sorted list.
It has the same optional parameter as `sorted`.

In [24]:
numbers = [1, 4, -2, 3]
numbers.sort()
numbers

[-2, 1, 3, 4]

In [25]:
numbers.sort(reverse=True)
numbers

[4, 3, 1, -2]

#### Exercise 4.6.2

We can also append an item using concatenation:

In [26]:
daily_temperature = daily_temperature + [6]

From a complexity point of view, is this a good idea?

_Write your answer here._

[Hint](../31_Hints/Hints_04_6_02.ipynb)
[Answer](../32_Answers/Answers_04_6_02.ipynb)

#### Exercise 4.6.3

Here's the board games table again, with one game per row,
but this time as a list of lists, so that it can be modified.

In [27]:
games_by_row = [
    ['Board game', 'Rating', 'Owned'],
    ['Power Grid',      10 ,   True ],
    [   'Vintage',       8 ,   True ],
    [  'Pandemic',       9 ,  False ]
]

Write code that adds one more column with game prices
and one more row with another game. Use fictitious data.
The new column should be the right-most column.
The new game should be listed first, before Power Grid.

In [28]:
# replace this by your code to change the table
games_by_row    # display the new table

[Hint](../31_Hints/Hints_04_6_03.ipynb)
[Answer](../32_Answers/Answers_04_6_03.ipynb)

### 4.6.5 Mistakes

Replacing, removing or adding an item of an immutable type leads to
a type error: the operation is attempted on a type that doesn't support it.

In [29]:
text = 'i love grilled fish'
text[0] = 'I'

TypeError: 'str' object does not support item assignment

Calling a method without using dot notation leads to a name error,
because there's no such function in the 'global' context.

In [30]:
pop([1, 2, 3], 1)       # there's no pop function ...

NameError: name 'pop' is not defined

A method's name is only known in the context of a particular data type.
The interpreter must know the type before it's told the method name.
The dot notation does exactly that:
the interpreter, which reads code left to right like you and I do,
first looks at the type of the expression before the dot
and then checks whether that data type has a method with the name after the dot.

In [31]:
[1, 2, 3].pop(1)       # but lists do have a method named pop

2

⟵ [Previous section](04_5_tuples.ipynb) | [Up](04-introduction.ipynb) | [Next section](04_7_reverse.ipynb) ⟶