# Sequentail Data Structures

## Linked Lists

Arrays are a sequence of memory cells with a fixed length, elements can be accessed quickly in terms of their rank, however expanding an array is expensive (allocate new array and copy).

Each element in a singly linked list is made of two memory cells, the element itself and a pointer to the next element in the sequence. A double linked list is simular except it also has a pointer to the previous element. The advantage of linked lists is they are cheap to expand (allocate new element, change a few pointers) however accessing an element by rank is slower than an array.

## Stacks

A _stack_ is an ADT for storing a collection of elements with the following methods. Its a _LIFO_ collection, meaning the last element inserted, is the first element out.

- `push(e)`: Insert `e` at the top of the stack
- `pop()`: Remove most recently inserted element and returns it (error if empty)
- `isEmpty()`: `true` if the stack is empty, `false` otherwise

Stacks can be implemented with dynamic arrays or linked lists. In both cases all the operations are $O(1)$.

## Queue

A _queue_ also stores a collection of elements, however it is a _FIFO_ collection, meaning the first element in is the first element out.

- `enqueue(e)`: Inserts `e` at the rear of the queue
- `dequeue()`: Removes the oldest inserted element and returns it (error if empty)
- `isEmpty()`: Returns `true` if the queue is empty, `false` otherwise

Queues can also be implemented with dynamic arrays or linked lists. Again, in both cases all the operations are $O(1)$.

## Vector

 A vector is an ADT for storing a sequence $S$ of $n$ elements.
 
 - `elemAtRank(r)`: Return the element at rank $r$, error if $r < 0$ or $r > n - 1$
 - `replaceAtRank(r, e)`: Replace the element at rank $r$, error if $r < 0$ or $r > n - 1$
 - `insertAtRank(r, e)`: Insert a new element $e$ at rank $r$, increasing the rank of all elements following $e$, error if $r < 0$ or $r > n$
 - `removeAtRank(r)`: Remove element $e$ at rank $r$, decreasing the rank of all elements following $e$, error if $r < 0$ or $r > n - 1$
 - `size()`: Return $n$, the number of elements in $S$
 
The easiest implementation if an array and a variable to store $n$. Thus `elemAtRank`, `replaceAtRank` and `size` have trivial implementations with $O(1)$ runtimes. `insertAtRank` and `removeAtRank` are not as simple, for example:
 
 ```
 Algorithum insertAtRank(r, e)
     
     for i <- n downto r + 1 do
         A[i] <- A[i-1]
     A[r] <- e
     n <- n + 1
 ```
 
If `n = A.length` then we need to reallocate the array (lets assume $O(1)$). If $r = 0$ then the for loop will run $n$ times, all other lines are constant time, thus the worst case is $O(1) + O(n) = O(n)$.
 
A linked list implementation would be even worse since `elemAtRank` and `replaceAtRank` require iterating through the nodes, thus $O(n)$.

## Lists

Lists are and ADT that reflect the propertys of a linked list.

- `element(p)`: Return the element at position `p`
- `first()`: Return the position of the first element, error if empty
- `isEmpty()`: Return `true` if the list is empty, `false` otherwise
- `next(p)`: Return the position of the element following the position `p`, error if `p` is the last position
- `isLast(p)`: Return `true` if `p` is the last position, `false` otherwise
- `replace(p, e)`: Replace the element at position `p` with `e`
- `insertFirst(e)`: Insert `e` as the first element of the list
- `insertAfter(p, e)`: Insert `e` after position `p`
- `remove(p)`: Remove the element at position `p`

Their is also `last()`, `previous(p)`, `isFirst(p)`, `insertLast(e)`, `insertBefore(p, e)` that are symetric to `first`, `next`, `isLast`, `insertFirst` and `insertAfter`. The most natural realization of a list is a double linked list where a position are nodes in the list.

Its easy to see that all these methods have a run time of $O(1)$ since we are given a direct "pointer" `p`, for example

```
 Algorithum insertAfter(p, e)
     
     create node q
     q.element <- e
     q.next <- p.next
     q.previous <- p
     p.next <- q
     q.next.previous <- q
```

## Dynamic Arrays

If we insert an element in an array based vector when the vector is full, we need to expand the array with the following:

```
n' <- 2(n+1)
create new array A' of length n'
for i = 0 to n - 1 do
    A'[i] <- A[i]
A <- A'
```

Clearly this is $O(n)$ since we need to copy $n$ elements, however if we let `n' <- 2(n+1)` then as more and more elements are added, the less often we need to expand the array, we can show that on average the algorithum is $O(1)$.

Consider adding $m$ elements to an array based queue. Let $I(1), ..., I(m)$ denote the $m$ insertions. Most of the insertions take $\Theta(1)$ time since they dont need to expand the array, if an insertion $I(i)$ needs to expand the array it takes $\Theta(i)$. Let $I(i_1), ..., I(i_l)$ where $1 \leq i_1 \leq ... \leq i_l \leq m$ denote the expensive insertions. Thus the overall time is:

$$
\sum_{j=1}^{l} \Theta(i_j) + \sum_{\substack{1 \leq i \leq m \\ i \ne i_1, ..., i_l}} \Theta(1)
$$

Using the fact $f = \Theta(g) \iff f = O(g) \text{ and } f = \Omega(g)$

$$
\sum_{j=1}^{l} O(i_j) + \sum_{\substack{1 \leq i \leq m \\ i \ne i_1, ..., i_l}} O(1) \leq
\sum_{j=1}^{l} O(i_j) + \sum_{i = 1}^m O(1) \leq
\sum_{j=1}^{l} O(i_j) + O(m)
$$

In the worst case the array starts empty and the expensive insertions occor on inserts $1, 3, 7, 15, ...$, in general $i_{j+1} = 2 i_{j} + 1$. By induction we can show $2^{j-1} \leq i_j < 2^j$, thus:

$$
\sum_{j=1}^{l} i_j \leq \sum_{j=1}^{l} 2^j = 2^{l+1} - 2
$$

Since $2^{l-1} \leq i_l \leq m$, it follows that $l \leq \lg(m) + 1$, thus

$$
2^{l+1} - 2 \leq 2^{lg(m) + 1 + 1} - 2 = 4m - 2 = O(m)
$$

Let us now consider $\Omega$

$$
\sum_{j=1}^l \Omega(i_j) + \sum_{\substack{1 \leq i \leq m \\ i \ne i_1, ..., i_l}} \Omega(1) \geq \sum_{\substack{1 \leq i \leq m \\ i \ne i_1, ..., i_l}} \Omega(1)
$$

For any $k \in \mathbb{N}$ where $k \geq 4$, the set $\{1, 2, ..., k\}$ contains atmost $\frac{k}{2}$ occurances of $i_j$ hence

$$
\sum_{\substack{1 \leq i \leq m \\ i \ne i_1, ..., i_l}} \Omega(1) \geq \frac{m}{2} \; \Omega(1) = \Omega(m)
$$

Thus the running time for $m$ insertions is $\Theta(m)$, thus the avarage time for a single insertion is $\Theta(1)$