## 6.7 Linked lists

Arrays keep all data (or pointers to data) in contiguous memory to support
constant-time access. Another approach to implementing sequences scatters the
data over memory. There's a reason and a method to this apparent madness.

A **linked list** is a chain of **nodes**, each with an element of the sequence
(or pointer to it) and a pointer to the next node.
The first node is called the **head** of the linked list.
The last node has a **null pointer**, as there's no next node.
The next figure shows a linked list with integers 0, 1 and 2.
The X represents the null pointer and marks the end of the linked list.

![The figure shows a linked list with three nodes, each depicted by a pair of boxes.
The variable 'head' points to (i.e. has an arrow to) a pair of boxes:
the left box has number 0; the right box has an arrow leading to
another pair of boxes. This second pair has number 1 in the left box and
in the right box an arrow to another pair of boxes. This third pair of boxes
has number 2 in the left box and a cross in the right box.](06_7_ll.png)

The _head_ variable refers to the first node.
If _head_ is the null pointer, then the list is empty.

### 6.7.1 Traversing a linked list

Assuming each node is an object with two instance variables _item_ and _next_,
then the algorithm to traverse a linked list and process each item is:

1. let _current_ be _head_
2. while _current_ isn't the null pointer:
   1. process _current.item_
   2. let _current_ be _current.next_

Step&nbsp;2.1 does whatever is needed for the problem at hand and
step&nbsp;2.2 updates the reference to now refer to the next node.
If the linked list is empty, the step&nbsp;2 condition is false and
the loop doesn't execute.

This algorithm can be adapted to access the item at a given index of the
sequence by replacing the while-loop with a for-loop.
This means that accessing an item takes linear time with linked lists,
more specifically Θ(_i_) to get the item at index _i_,
whereas with arrays it takes constant time.

### 6.7.2 Inserting an item

The algorithm to insert an item _value_ at position _index_ is more subtle.
Here's a description of it:

> First iterate over the linked list to obtain references to
> the nodes that will be before and after the new item.
> The 'before' node is at position _index_ − 1;
> the 'after' node is at position _index_.
> Create a new node with the item to be inserted. Make
> the 'before' node point to the new node, and the new node point to the 'after' node.
> In this way the new node is now at position _index_ and the 'after' node
> (and the rest of the list) has been 'pushed' one position up.

To illustrate the algorithm, let's insert integer 3 in the sequence (0, 1, 2)
at index&nbsp;2, i.e. the resulting sequence should be (0, 1, 3, 2).

First we obtain references to the node at positions _index_ − 1 and _index_:

1. let _before_ be _head_
1. repeat _index_ − 1 times:
   1. let _before_ be _before.next_
1. let _after_ be _before.next_

Next we create a new node with the item to be inserted
but without a next node:

4. let _new_ be a node with _item_ = _value_ and _next_ = null pointer

For our example the situation at this stage is:

![This figure shows two linked lists. The first list is the same as
in the previous figure, with three nodes with numbers 0, 1 and 2 and
with variable 'head' pointing to the first node with number 0.
There are two more variables: 'before' points to the second node,
which has number 1, and variable 'after' points to the third node,
which has number 2. The second list has a single node pointed to by
variable 'new'. The node has number 3 in the left box and
a cross in the right box.](06_7_ll1.png)

Finally we change the pointers to put the new node between _before_ and _after_.

5. let _before.next_ be _new_
6. let _new.next_ be _after_

![The figure is the same as the previous one, but with two differences. First,
the arrow going from the node with number 1 to the node with number 2 now
goes to the node with number 3. Second, the cross in the node with number 3
has been replaced with an arrow going to the node with number 2.](06_7_ll2.png)

Once the algorithm knows where to insert the new item, the insertion itself
takes constant time: no copying of values takes place.

Let's check the algorithm with our test table.

Case | Pre-_values_ | _index_ | Post-_values_
-|-|-|-
length&nbsp;0  | ( ) | 0 | ('!')
length&nbsp;1, before  | ( 0 )  | 0  | ('!', 0)
length&nbsp;1, after  | ( 0 )  |  1 | (0, '!')
length&nbsp;2, before  | (0, 1)  | 0  | ('!', 0, 1)
length&nbsp;2, middle  | (0, 1)  | 1  | (0, '!', 1)
length&nbsp;2, after  | (0, 1)  | 2  | (0, 1, '!')

For the first test case (empty sequence),
the _head_ variable is the null pointer
and so is _before_ after step&nbsp;1. Variable _index_ has value zero,
so the loop is skipped, because it can't be executed minus one times.
Step&nbsp;3 tries to access instance variable _next_ but _before_ is not pointing to a node. This kind of error is called **null pointer dereference**:
we're trying to dereference (i.e. access the object pointed by) _before_,
but _before_ is not a valid pointer.

A quick fix to the algorithm is to move step&nbsp;4 (the creation of the new node)
to the beginning and then handle empty and non-empty sequences separately.

1. let _new_ be a node with _item_ = _value_ and _next_ = null pointer
1. if _head_ is the null pointer:
   1. let _head_ be _new_
1. otherwise:
   1. let _before_ be _head_
   1. repeat _index_ − 1 times:
      1. let _before_ be _before.next_
   1. let _after_ be _before.next_
   1. let _before.next_ be _new_
   1. let _new.next_ be _after_

Let's move on to the second test:
inserting the item at the start of a sequence of length one.
Is the algorithm correct for this case?

___

Alas, it isn't. If an item is inserted at the start, we must update the _head_
variable to refer to the new node, but the algorithm never does so.

We can fix the algorithm by treating this edge case separately.
The current head node becomes the node after the new node,
which in turn becomes the head node.

1. let _new_ be a node with _item_ = _value_ and _next_ = null pointer
1. if _head_ is the null pointer:
   1. let _head_ be _new_
1. otherwise if _index_ = 0:
   1. let _after_ be _head_
   2. let _head_ be _new_
   3. let _new.next_ be _after_
1. otherwise:
   1. let _before_ be _head_
   1. repeat _index_ − 1 times:
      1. let _before_ be _before.next_
   1. let _after_ be _before.next_
   1. let _before.next_ be _new_
   1. let _new.next_ be _after_


Let's move on to the third test, which inserts the item at index one of
a sequence of length one, i.e. it appends the item.
Is the algorithm correct for this case?

____

The sequence not being empty and the index not being zero, the algorithm
executes step&nbsp;1 and then step&nbsp;4.1, making _before_ refer to the first and only node in the linked list. The loop is repeated zero times, because _index_ = 1.
Step&nbsp;4.3&nbsp;sets _after_ to be the null pointer. The situation is:

![The figure shows two linked lists, each with one node.
One node has the number 0 in the left box and a cross in the right box. The other
node has an exclamation mark in the left box and a cross in the right box.
The variable 'new' points to the second node, with the exclamation mark.
The variables 'head' and 'before' both point to the first node, with number 0.
Additionally, the variable 'after' points to a separate x,
indicating that 'after' is a null pointer.](06_7_ll3.png)

Step&nbsp;4.4 links the 'before' node to the new node. Step&nbsp;4.5&nbsp;has no practical
effect, because the new node's _next_ variable is already the null pointer.
The final situation is as follows. The algorithm correctly appends items.

![This figure is like the previous one except that
the cross in the right half of the node with number 0&nbsp;has been replaced with
an arrow that leads to the other node, with the exclamation mark.
The two nodes now form part of a single linked list.](06_7_ll4.png)

The algorithm works when there's no node after the new node.
This makes me realise
that the part that handles insertions at the start (steps 3 to 3.3)
also works for the empty list, when _head_ and _after_ are the null pointer.
I can eliminate steps 2 and 2.1. I actually don't need variable _after_ and
can reduce steps 3.1 to 3.3 and 4.3 to 4.5 to just two steps each.
Here's my final algorithm:

1. let _new_ be a node with _item_ = _value_ and _next_ = null pointer
1. if _index_ = 0:
   1. let _new.next_ be _head_
   1. let _head_ be _new_
1. otherwise:
   1. let _before_ be _head_
   1. repeat _index_ − 1 times:
      1. let _before_ be _before.next_
   1. let _new.next_ be _before.next_
   1. let _before.next_ be _new_

#### Exercise 6.7.1

Describe an algorithm to remove the item at a given position _index_.

[Hint](../31_Hints/Hints_06_7_01.ipynb)
[Answer](../32_Answers/Answers_06_7_01.ipynb)

### 6.7.3 The `LinkedSequence` class

Python doesn't allow us to manipulate pointers (memory addresses)
directly: we can only refer to objects via variables.
The most natural choice is to represent a null pointer as
a variable with value `None`, but any other object that hasn't got instance
variables named 'next' and 'item' can be used: in this way
a null pointer dereference becomes an attribute error in Python.

In [1]:
node = None
node.next

AttributeError: 'NoneType' object has no attribute 'next'

The node objects are just data: they don't have operations.
Writing a class with two instance variables and four methods to access
and modify the variables is overkill.
Since nodes are only meaningful in the context of linked lists, I create
an **inner**  `Node` class, defined within the `LinkedSequence` class, and
let the methods of the latter access the instance variables of nodes.
This saves us from writing four trivial methods
while keeping nodes hidden from users of sequences.
Since `Node` is an inner class, the constructor must be called by
its full name: `LinkedSequence.Node(item)`.

Here's the class. Most methods of `Sequence` are implemented by
traversing the linked list as explained above.

In [2]:
%run -i ../m269_sequence

import math

class LinkedSequence(Sequence):
    """A linked list implementation of the sequence ADT."""

    class Node:
        """A node in a linked list."""

        def __init__(self, item: object):
            """Initialise the node with the given item."""
            self.item = item
            self.next = None

    def __init__(self):
        """Initialise the sequence to be empty."""
        self.head = None

    def capacity(self) -> float:
        return math.inf     # infinite capacity

    def length(self) -> int:
        size = 0
        current = self.head
        while current != None:
            size = size + 1
            current = current.next
        return size

    def get_item(self, index: int) -> object:
        current = self.head
        for times in range(index):
            current = current.next
        return current.item

    def set_item(self, index: int, item: object) -> None:
        current = self.head
        for times in range(index):
            current = current.next
        current.item = item

    def insert(self, index: int, item: object) -> None:
        new = LinkedSequence.Node(item)
        if index == 0:
            new.next = self.head
            self.head = new
        else:
            before = self.head
            for times in range(index - 1):
                before = before.next
            new.next = before.next
            before.next = new

And again, let's test the operations.

In [3]:
%run -i ../m269_check

test_init(LinkedSequence())
for length in range(10):
    print('Testing length', length)
    test_append(LinkedSequence(), length)
    test_insert_start(LinkedSequence(), length)
    test_set_item(LinkedSequence(), length)

Testing length 0
Testing length 1
Testing length 2
Testing length 3
Testing length 4
Testing length 5
Testing length 6
Testing length 7
Testing length 8
Testing length 9


#### Exercise 6.7.2 (optional)

Add the remove operation to the `LinkedSequence` class and
run the test function you wrote previously.

### 6.7.4 Linked list v. array

The sequence ADT can be implemented with dynamic arrays and with linked lists.
The choice depends on which operations we require to be most efficient.
Here's a table of the complexities for some operations on sequence _s_
and index _i_.

Sequence operation | Dynamic array | Linked list
:-|:-|:-
get item at _i_  | Θ(1)  | Θ(_i_)
replace item at _i_ | Θ(1)  | Θ(_i_)
insert at _i_ = 0  | Θ(│*s*│)  | Θ(1)
insert at _i_ = │*s*│ (append) | amortised Θ(1) | Θ(│*s*│)
insert elsewhere  | Θ(│*s*│ − _i_) | Θ(_i_)

The main advantage of arrays over linked lists is the constant-time access
to items, whereas linked lists have to be traversed.
Doing operations at the start of a list can be efficient,
and we'll take advantage of that in the next chapter, because
linked lists don't require copying items when inserting or removing one.
Linked lists are never resized.
They require more memory than arrays (one pointer per item),
but dynamic arrays may also waste memory.

Some operations on linked lists can become more efficient with extra data.
The implementation above computes the length in linear time,
by counting items while iterating over the linked list.
It's also possible to obtain the length in constant time,
by adding an instance variable that is initially zero and
is incremented (or decremented) when an item is inserted (or removed),
as done with dynamic arrays.
This is an example of a **space–time tradeoff**:
we are willing to increase the memory usage of a linked list object
to reduce the run-time of an operation.

<div class="alert alert-warning">
<strong>Note:</strong> Each data structure makes some operations more efficient than others. The best
data structure for the problem at hand is the one that favours the operations we
need more frequently.
</div>

#### Exercise 6.7.3

How could you make the append operation take constant time on linked lists?

_Write your answer here._

[Hint](../31_Hints/Hints_06_7_03.ipynb)
[Answer](../32_Answers/Answers_06_7_03.ipynb)

⟵ [Previous section](06_6_use_dyn_array.ipynb) | [Up](06-introduction.ipynb) | [Next section](06_8_summary.ipynb) ⟶