## 6.6 Using dynamic arrays

With dynamic arrays we can implement the unrestricted sequence ADT,
without any additional arguments or preconditions due to capacity issues.

Since at each point in time the sequence is stored in a static array,
the implementation of every operation stays the same, so I won't repeat them.
I will just focus on the main issue when using dynamic arrays:
when should the array be resized and by how much?

It's obvious the dynamic array must grow when inserting an item in a full
array. Two simple policies are to grow the capacity by a fixed amount,
or by an amount proportional to the current one, say by 100% (doubling).
The former may lead to many linear-time resize operations as the sequence grows,
while the latter risks wasting capacity if the sequence doesn't grow much more.
So which policy is better?

### 6.6.1 Amortised complexity

I'm going to compare doubling the capacity against increasing it by 100,
when the array gets full. It doesn't matter where the item is inserted when
it triggers a resize, so I'm going to consider the fastest insertion: appending.

In the best case, appending an item doesn't require resizing the array and thus
takes constant time. In the worst case, appending an item to a full array
takes linear time: if the full array has length and capacity _n_,
then the new array has length _n_ + 1 (after appending the new item) and
capacity _n_ + 100 or 2×_n_, depending on the policy.

The best- and worst-case complexities only consider a single execution of the
operation. However, often the same operation is executed repeatedly.
Users are interested in the total time,
not whether some executions were faster than others.
If some function does 1000 appends, and 999 of them take 1&nbsp;ms but one takes
1001&nbsp;ms, for the user it's the same as if each had taken 2&nbsp;ms:
the total time waiting for the result is 2&nbsp;seconds.

In cases where the best- and worst-case complexities vary widely,
a more realistic view is the average complexity over a series of executions.
The **amortised complexity** of an operation is
the total complexity of _n_ executions of that operation, divided by _n_.
If the amortised complexity is similar to the worst-case complexity,
then the worst case occurs often; if it's similar to the best-case complexity,
then the worst case occurs infrequently over a series of executions.

<div class="alert alert-info">
<strong>Info:</strong> The average-case complexity is not the same as amortised complexity. The former
is the complexity of a single execution averaged over all possible inputs of
the same size. The latter is the average over consecutive executions which may
make the input grow.
</div>

Let's calculate the amortised complexity of the append operation for each
policy, looking at the total complexity from one resize operation to the next.
Let's assume that a resize occurs at length _L_.

In the fixed-growth policy, the resize makes the array grow by 100 positions to
length _L_ + 100 and hence has complexity Θ(_L_ + 100).
The last resize was 100 appends ago, when the length was _L_ - 100.
So, the total complexity of the 100 appends is
100 × Θ(1) + Θ(_L_ + 100) =  Θ(_L_),
each one with average complexity Θ(_L_) / 100 = Θ(_L_).
(Remember that in complexity analysis, constant complexity and
fixed additive and multiplicative factors can be ignored.)

An informal way to see that each successive append has on average
linear complexity is to realise that each resize takes increasingly longer,
because the sequence is growing, and yet its run-time can only be amortised
over a fixed number of appends. Therefore, as the sequence grows,
the average run-time of each append also grows.

In the relative-growth policy, the resize at length _L_ grows the dynamic array
to length&nbsp;2 × _L_, with complexity Θ(2 × _L_).
The previous resize was at length _L_/2.
The total complexity of the _L_/2 appends leading from length _L_/2 to _L_
is _L_ / 2 × Θ(1) + Θ(2 × _L_) = Θ(_L_) +  Θ(_L_) = Θ(_L_).
The average complexity of each is thus Θ(_L_) / (_L_ / 2) = Θ(1).

Informally, although each resize takes longer as the sequence grows,
because the capacity grows proportionally to the current length,
the number of appends until the next resize grows by the same factor, so
the average run-time per append remains constant.

To sum up, while individual appends may take some time when they trigger a
resize operation, the amortised and best-case complexity is constant,
if the dynamic array grows proportionally to its length.

The policy of doubling the array on each resize can waste considerable memory.
If the dynamic array starts with a length of one,
then its length is always a power of two. A sequence of
length&nbsp;600 requires an array of length&nbsp;1024 (giving 424 unused positions), a sequence
of length&nbsp;1200 requires an array of length&nbsp;2048 (giving 848 free positions), etc.
As the capacity of the array doubles, the number of unused positions may
increase, if the sequence doesn't grow enough to fill most of the array.

<div class="alert alert-info">
<strong>Info:</strong> Python interpreters usually use a factor of less than two, to not waste memory.
</div>

### 6.6.2 The `ArraySequence` class

A dynamic array allows us to access and replace items in constant time,
and only adds an amortised constant-time overhead to inserting (or appending)
an item. It is therefore the usual choice to implement Python's lists.

To implement the sequence ADT with dynamic arrays, the interpreter
needs the definitions of the `Sequence` and `DynamicArray` classes.

In [1]:
%run -i ../m269_array
%run -i ../m269_sequence

The name and docstring of the new subclass reveal which data structure is used,
so that users know that they can count on constant-time indexing.

In [2]:
# this code is also in m269_sequence.py

import math

class ArraySequence(Sequence):
    """A dynamic array implementation of the sequence ADT."""

    def __init__(self):
        """Create an empty sequence."""
        self.items = DynamicArray(1)
        self.size = 0

    def capacity(self) -> float:
        return math.inf     # infinite capacity

    def length(self) -> int:
        return self.size

    def get_item(self, index: int) -> object:
        return self.items.get_item(index)

    def set_item(self, index: int, item: object) -> None:
        self.items.set_item(index, item)

    def insert(self, index: int, item: object) -> None:
        if self.size == self.items.length():    # array full
            self.items.resize(2 * self.size)

        for position in range(self.size - 1, index - 1, -1):
            self.items.set_item(position + 1, self.items.get_item(position))
        self.items.set_item(index, item)
        self.size = self.size + 1

The following accesses the instance variables on purpose
to show how the internal static array evolves.
The array is printed with the `__str__` method inherited from `StaticArray`
and the sequence is printed with the `__str__` method inherited from `Sequence`.

In [3]:
sequence = ArraySequence()
print('array', sequence.items, 'stores sequence', sequence)
for value in range(0, 5):
    sequence.append(value)
    print('array', sequence.items, 'stores sequence', sequence)

array [None] stores sequence []
array [0] stores sequence [0]
array [0, 1] stores sequence [0, 1]
array [0, 1, 2, None] stores sequence [0, 1, 2]
array [0, 1, 2, 3] stores sequence [0, 1, 2, 3]
array [0, 1, 2, 3, 4, None, None, None] stores sequence [0, 1, 2, 3, 4]


As we can see, the length of the static array doubles step-wise from 1 to 8,
and the unused positions have value `None`.

Finally, let's test each method.

In [4]:
%run -i ../m269_check

test_init(ArraySequence())
for length in range(10):
    print('Testing length', length)
    test_append(ArraySequence(), length)
    test_insert_start(ArraySequence(), length)
    test_set_item(ArraySequence(), length)

Testing length 0
Testing length 1
Testing length 2
Testing length 3
Testing length 4
Testing length 5
Testing length 6
Testing length 7
Testing length 8
Testing length 9


#### Exercise 6.6.1 (optional)

Implement and test the remove operation.
The algorithm to remove the item is the same as for a bounded sequence, so
you can copy your previous code.

After the tests pass, add code to the remove method to shrink the array if the
sequence, after the item was removed, is much shorter than the array.
The exact policy is up to you.

Remember that you must shrink to a capacity that is proportional to the current
one in order to achieve amortised constant complexity for the append operation.
Leave some spare capacity after shrinking:
otherwise the next insert or append operation makes the array grow again.
Having to grow a dynamic array immediately after shrinking it is not efficient.

[Hint](../31_Hints/Hints_06_6_01.ipynb)

⟵ [Previous section](06_5_dynamic_array.ipynb) | [Up](06-introduction.ipynb) | [Next section](06_7_linked_list.ipynb) ⟶