# ArrayList
## Specifications
- All elements are of the same type
- All elements are stored in adjacent memory locations
- The element at index `i` is at memory address `x + bi` => we can access specific element we want very quickly: in O(1) time. (This is called **random access**)

```
You create an array of integers (assume each integer is exactly 4 bytes) in memory, and the beginning of the array (i.e., the start of the very first cell of the array) happens to be at memory address 1000 (in decimal, not binary). What is the memory address of the start of cell 6 of the array, assuming 0-based indexing (i.e., cell 0 is the first cell of the array)?
```

0: 1000

1: 1004

2: 1008

3: 1012

4: 1016

5: 1020

6: `1024`

## Assumptions
- we will assume that a user can only add elements to indices between 0 and n (inclusive), `n = num of total elements that exist in the list prior to the new insertion`
- user can only add elements to the front or back of the array (push/pop)

## Dynamic ArrayList
1. allocate some default "large" amount of memory initially
2. insert elements into this initial array
3. once the array is full, they create a new larger array (typically twice as large as the old array), 
4. copy all elements from the old array into the new array, 
5. replace any references to the old array with references to the new array.

In C++, this is `vector` equivalent. 

## Thinking challenge
```
Array structures (e.g. the array, or the Java ArrayList, or the C++ vector, etc.) require that all elements be the same size. However, array structures can contain strings, which can be different lengths (and thus different sizes in memory). How is this possible?
```

Answer: store pointers to strings in an array. Pointers are of the same size anyways. 

## Insertion
- Worst case: `O(n)` = Inserting into the first index (You have to move up all other elements by one index)
- Best case: `O(1)` = Inserting at the end

## Binary search in ArrayList
**Everything is under the assumption that elements are sorted in advance**

1. Have the array sorted in advance
2. Compare the element in search against the middle element 
3. If less than the middle element, repeat search on the left half of the array / If more, do it on the right half

## Removal
- Best case: `O(1)` = remove the end
- Worst case: `O(n)` = remove at the beginning (comes from our restriction that all elements in our array must be contiguous)

## Thinking challenge
```
When we remove from the very beginning of the backing array of an Array List, even before we move the remaining elements to the left, the remaining elements are all still contiguous, so our restriction is satisfied. Can we do something clever with our implementation to avoid having to perform this move operation when we remove from the very beginning of our list?
```

Answer: if we were to remove `arr[0]`, set index 1 as index 0, 2 as 1, 3 as 2... and n as n-1 and so on. Then it could be done in `O(1)`. In short: adjust indices. 

# LinkedList
- Improved version of ArrayList for better time & space complexities
- Uses nodes
- Head & tail pointer
- Singly-linked list: each node has 1 pointer towards the tail
- Doubly-linked list: each node has 2 pointers back and forth

## `find`
- Takes `O(n)`. Start from the head/tail. 

```c++
bool find(Node* node, int element) {
    while(true){
        if (node->value == element){
            return true;   
        }
        else if(node->next == NULL){
            return false;
        }
        else{
            node = node->next;    
        }
    }
}

```

## `insert`
- find the insertion site, 
- rearrange pointers to fit the new node in its rightful spot.

```c++
void insert(Node* head, Node* newnode, int index) {
    Node* crnt = head;
    for(int i = 0; i < index - 1; i++){
        crnt = crnt->next;        
    }
    Node* tmp = crnt->next;
    crnt->next = newnode;
    newnode->next = tmp;
}
```

## `remove`

```c++
void remove(Node* head, int index){
    if (--index == 0){
        // reached the target index
        head->next = head->next->next;
    }
    else{
        // not yet there
        remove(head->next, index);
    }
}
```

## summary
- add/remove: O(1)
- find: O(n) always (Theta). Cannot do something like binary search.
- no wasted memory, compared to ArrayList


# Skip lists

## Review

### ArrayList
- Worst case for `find`: `O(logn)` (for a sorted one)
- Always, for `insert`/`delete`: `O(n)`. 

### LinkedList
- Worst case for `insert`/`remove` to front/back of the structure : `O(1)` 
- `find`: `O(n)`, because LinkedList does not have **random access** property. 

## Then skip list?
- Comprised of nodes, each of them containing 1 key and multiple pointers
- Multiple layers. Each layer is a node with a forward pointer
![skip list](https://ucarecdn.com/af5d2e1d-a7d7-4af0-9355-6c2725f4f49d/)
![skip list 2](https://ucarecdn.com/4e97db37-7f47-47ca-ae3c-96d810c06009/)
- For each layer `i`, the `i`-th pointer in head points to the first node that has a height of `i`.
- The head also has multiple layers.

## `find`
![find](https://ucarecdn.com/82bf4049-7178-49d9-b3d6-225535f038a4/)
- start at head
- traverse the forward pointer 

## `remove`
![remove 1](https://ucarecdn.com/a1667d0f-9a1d-4c35-82f4-3b3aff40122d/)
![remove 2](https://ucarecdn.com/1787d4ab-ad99-48ed-9f11-89a6cbea6cac/)
- do `find`
- rearrange pointers

## Time complexity for the worst-case
![worst-case](https://ucarecdn.com/db5e6407-7dfd-4cbf-9820-456a1c913c30/)
- worst-case: if the Skip List's node heights are all equal or descending
- you can do no more than `n` times of efficient traversal. 
- worst-case time complexity for `find`/`remove`: O(n).

## Time complexity for the opitmally-distributed SkipList

![optimally-distributed](https://ucarecdn.com/28eab8f8-c850-477c-a8e1-4fa7ee7cddee/)
- O(logn)
- each "jump" allows you to traverse half of the remainder of the list
- The size of the search space starts at n, but then you jump to the middle and cut the search space in half (n/2), then cut in half again (n/4), until eventually the search space becomes 1 (the element you want). The series n, n/2, n/4, ..., 2, 1 has O(log n﻿) elements
- **The distribution of heights really matters. With optimal distribution, it can perform O(logn), which is effectively the TC of binary search.**

## How to design an optiamlly distributed SkipList

1. Find where to put first
![find first](https://ucarecdn.com/eece60fd-8987-460f-8356-eec07f4a4ea3/)
2. Determine the height of the node
    - start at height = 0 
    - coin's probability of heads is p. 
    - If we flip heads, we increase our height by 1. 
    - If we flip tails, we stop playing the game and keep our current height.
   
## Why do we use the probability `p`?
- we designate the new node's height from Bernoulli distribution, which means that we "randomly" sample from the distribution. 
- There is absolutely no guarantee of optimality in a randomized structure: the hope is that, on average, the results will be good

## No multiple coin-flips
- we do not need to flip a coin every single time to increase/stop increasing the height. 
- coin-flip is a **Bernoulli distribution (or a binary distribution), which is just the formal name for a probability distribution in which we only have two possible outcomes: success and failure**.
- you know how many times you will flip the coin. say, `k`. 
- then use **the formula for P(X = k) for geometric distribution, which is Bernoulli distribution done multiple times (n = k in the picture)**.

![Geometric distribution Expectation](https://slideplayer.com/slide/259730/1/images/27/Geometric+Probability+Formula.jpg)

But, we want the number of flips until the first failure. So reverse it:
$$P(X = k) = p^{k}(1-p)$$ 
Note that **k = the number of coin flips just before the fisrt failure in this formula.**

## Problem
```
To determine the heights of new nodes, you use a coin with a probability of success of p = 0.3. What is the probability that a new node will have a height of 0? (Enter your answer as a decimal rounded to the nearest thousandth)
```
Answer: 1 - 0.3 = 0.7

## Problem 2
```
To determine the heights of new nodes, you use a coin with a probability of success of p = 0.3. What is the probability that a new node will have a height of 2? (Enter your answer as a decimal rounded to the nearest thousandth)
```

Answer: 0.3 * 0.3 * 0.7 (two "successes" and one "fail")
Height 0: 0.3
Height 1: 0.3 * 0.3 
Height 2: 0.3 * 0.3 * 0.7

## TC
- Worst-case TC to find/insert/remove is O(n) (poorly distributed)
- Average-case TC is O(logn) (optimally distributed). => Has a proof
- Expected number of comparisons must be done in $1 + (1/p)log_{1/p}(n)+1/(1-p)$

## Finding a 'good' `p` 
![Expected number of operations for n = 1,000,000](https://ucarecdn.com/43ddb841-ec68-4911-88e8-04ed5496fd0c/-/crop/1062x626/72,20/-/preview/)
- You already know how big `n` is roughly going to be
- The curve begins to flatten at around p = 0.025, so that might be a good value to pick for p.
- **Why should you do this? Because as p increases, the amount of space we need to use to store pointers also increases.**

## Finding a max height
Once we've chosen a value for p, it can be formally proven (via a proof that is a bit out-of-scope) that, for good performance, the maximum height of the Skip List should be no smaller than **$log_{1/𝑝}n$**.

```
Imagine you are implementing a Skip List, and you chose a value for p = 0.1. If you are expecting that you will be inserting roughly n = 1,000 elements, what should you pick as your Skip List's maximum height?
```

$log_{1/0.1}1000 = log_{10}10^{3} = 3$

## Summary
- SkipList works efficiently only with optimizations:
    - Choose a good `p` for a geometric probability distribution
    - Choose a good `height`
- Otherwise, it would work just like LinkedList.

# Circular array
You want to only get the good parts from Array Lists and LinkedLists: 
- random access
- inserting at the beginning and at the end in O(1) time

A circular array is a:
- Array list mimicking the behavior of a Linked List. 
- Array list that has head (first) and tail (last) indices

## More explained
![circular array 1](https://ucarecdn.com/afbcd288-7fe5-43ae-9685-a4cc4627a895/)
- You only care about the head (1) & tail (5) indices
- So you can represent the array like a circle as well:
![circular array 2](https://ucarecdn.com/0fa763d6-7778-4d3b-9505-5c652b1ad20a/)

Note: head index should be 'before' the tail index 
- i.e. (0,2) or (5, 7) or (6, 0), or (7, 1) in a circular array of capacity of 8 that contains 3 elements. 
- i.e. NOT (1,7) or (0,6)

## When the array becomes full
- create a new backing array (typically of twice the size)
- simply copy all elements from the old backing array into the new backing array. 
- To ensure the same order, let the elements in indices 0 through n-1 stay in the same indices in the new array.

![circular array doubles](https://ucarecdn.com/85280665-7678-45cb-83ef-3f4b28def5d1/)

## Worst-case insertion at the front / back of a circular array
- the backing array can be full 
- need to allocate a new backing array
- copy all n elements from the old array to the new one
- O(n)

## Worst-case insertion at the front / back of a circular array that's not full
- trivial. O(1). Just insert at the empty index.

## Accessing elements in the middle
- `(head + i) % array.length` would give you the correct index in the real array, where head = 0 and i = index given that head = 0.
-  For example, the element at i = 2 of our list is at index (7 + 2) % 8 = 9 % 8 = 1 of the backing array
![circular array random access](https://ucarecdn.com/ff66b7b0-a3d0-4994-b57a-9b83fddc511c/)

# Abstract data types
- we don't necessarily care about how the data structure executes these tasks: we just care that it gets the job done.
-  a model for data types where the **data type is defined by its behavior from the point of view of a user of the data** (i.e., by what functions the user claims it needs to have) is an Abstract Data Type (ADT).

## ADT vs Data structures
- ADT: it cares about **what functions it should be able to perform, but it does not at all depend on how it actually goes about doing** those functions (i.e., it is not implementation-specific)
- Data structure: 

## Again explained
- An Abstract Data Type does NOT contain details on how it should be implemented
- An Abstract Data Type is designed from the perspective of a user, not an implementer
- Any implementations of an Abstract Data Type have a strict set of functions they must support

# Deque(Dequeue/Double-ended queue)
- works pretty similar to browser history functions

## Deque ADT

```
addFront(element): Add element to the front of the Deque
addBack(element): Add element to the back of the Deque
peekFront(): Look at the element at the front of the Deque
peekBack(): Look at the element at the back of the Deque
removeFront(): Remove the element at the front of the Deque
removeBack(): Remove the element at the back of the Deque
```

## Which data structures to choose?
- Doubly Linked List: 
    - would do O(1) for all those ADTs listed
    - but accessing in the middle would require O(n)
- Circular array:
    - find/remove: O(1)
    - insert: O(1) ~ O(n) (when the backing array is full)

# Queues 