# Data Structures

**Data Structures** allow us to store and organize data in a way that allows us to process it efficiently.

We want to build up a mental toolbox of data structures that we can apply to solve our problems. We want to be able to pick the right data structure for any given job.

Different Structures have different strengths and weaknesses. They have different runtimes for their operations. 

When thinking about a program, we consider the operations we need to perform and pick the data structures that supports those operations most efficiently.

All data structures have a handful of common operations:
- Add an element
- Remove an element
- Access into the data structure
- Search for an element
- etc..

There are many types of data structures:
- arrays
- dictionaries
- sets
- linked lists
- trees
- graphs
- stacks
- queues

## Arrays

Python lists are under the hood arrays.

An **array** is stored in consecutive memory locations. 

All variables in our program are stored in in RAM. To the computer RAM is just a large block of bytes to store data in. Bytes 0 - max_byte_number.

To create an array, we needf to know what we want to store and how many of them we want to store.

Why? Different types of data take up different amounts of space. The OS needs to reserve space for the entire array when it is created.

Different types of data could be: ints, floats, or characters.

In C, we have different data types for whole numbers:
- short: 2 bytes
- int: 4 bytes
- long: 8 bytes

If we want to create an array to store 10 ints how many bytes are needed?

When we create an array to hold 10 ints, it can never hold more than 10 ints. The size of all arrays are limited by how large they are when created.

## Operations on an Array

- Access an element at an index
- Insert into the beginning of the array
- Append an element to the array

### Element access

Index accessing is $O(1)$ because the memory location of any element in the array is trivial to calculate.

### Insertion at the beginning of an array

Suppose we have array that can hold 10 elements:

```
-------------------------------
|  |  |  |  |  |  |  |  |  |  |
-------------------------------
 0  1  2  3  4  5  6  7  8  9
```

```
[3, 5, 7, 11, 13]
```

```
---------------------------------------------------
| 3  | 5  | 7  | 11 | 13 |    |    |    |    |    |
---------------------------------------------------
  0    1    2    3    4    5    6    7    8    9
```

To insert an element at index 0, we have to copy every element to the right.

```
---------------------------------------------------
|    | 3  | 5  | 7  | 11 | 13 |    |    |    |    |
---------------------------------------------------
  0    1    2    3    4    5    6    7    8    9
```

Then we can insert the value:

```
---------------------------------------------------
| 2  | 3  | 5  | 7  | 11 | 13 |    |    |    |    |
---------------------------------------------------
  0    1    2    3    4    5    6    7    8    9
```

The runtime depends on the number of elements in the array.

If we have `n` elements in the array, we have to perform `n` copies.

The runtime is $O(n)$

### Append an element to the array

```
---------------------------------------------------
| 2  | 3  | 5  | 7  | 11 | 13 |    |    |    |    |
---------------------------------------------------
  0    1    2    3    4    5    6    7    8    9
```

To append an element into an array that is not full, we can just assign the value.

This is $O(1)$

What if the array is full?

```
---------------------------------------------------
| 2  | 3  | 5  | 7  | 11 | 13 | 17 | 19 | 23 | 27 |
---------------------------------------------------
  0    1    2    3    4    5    6    7    8    9
```

We can't insert into a full array. To append a new element, we need to create a new larger array, copy everything into it, then append to that new array.

**The original array:**
```
---------------------------------------------------
| 2  | 3  | 5  | 7  | 11 | 13 | 17 | 19 | 23 | 29 |
---------------------------------------------------
  0    1    2    3    4    5    6    7    8    9
```

**Create new larger array:**
```
----------------------------------------------------------------------------
|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |
----------------------------------------------------------------------------
  0    1    2    3    4    5    6    7    8    9    10   11   12   13   14
```

**Copy everything into it:**
```
---------------------------------------------------
| 2  | 3  | 5  | 7  | 11 | 13 | 17 | 19 | 23 | 29 |
---------------------------------------------------
  |    |    |    |    |    |    |    |    |    |
  v    v    v    v    v    v    v    v    v    v
----------------------------------------------------------------------------
| 2  | 3  | 5  | 7  | 11 | 13 | 17 | 19 | 23 | 29 |    |    |    |    |    |
----------------------------------------------------------------------------
  0    1    2    3    4    5    6    7    8    9    10   11   12   13   14
```

**Append the next value:**
```
----------------------------------------------------------------------------
| 2  | 3  | 5  | 7  | 11 | 13 | 17 | 19 | 23 | 31 | 33 |    |    |    |    |
----------------------------------------------------------------------------
  0    1    2    3    4    5    6    7    8    9    10   11   12   13   14
```

Since we have to copy every element over, the runtime is $0(n)$

## Runtime Summary

- Access an element at an index
    - $O(1)$
- Insert into the beginning of the array
    - $O(n)$
- Append an element to the array
    - $O(n)$
- Insert element into arbitrary index
    - $O(n)$
    - The arbitrary index may be 0

# Linked Lists

A linked list is NOT an array and is not stored in a set of consecutive memory locations.

A **Singly Linked List** is a set of **nodes** where each node contains a single element of the list and a pointer to the next node in the list. 

In a **Doubly Linked List** each node also has a pointer to the previous node as well.

### Diagram

```
[2] -> [3] -> [5] -> [7] -> [11]
 ^                           ^
 H                           T
```

The first node is the **Head** and the last is the **Tail**.

A Linked List is dynamic. It grows and shrinks as we add/remove elements.

## Operations

- Append element
- Insert to beginning
- Access by index
- Insert at arbitrary index

### Implementing a Linked List

We can create a Node class and then create a Linked List class that uses Nodes

### Class Node

Attributes:
- the element is contains
- a reference to the next node

## Class LinkedList

Attributes:
- head
- tail
- size

### append()

If the list it empty, when we add the first element, both the head and tail should point to it:

```
 [2]
 / \
H   T
```

If the list isn't empty, we need to add a new node after the tail.

1) create the node
2) point the tail to it
3) update the tail
4) increment the size

```
 [2] -> [3]
  |      |
  H      T
```

## general insertion

If the list is empty, handle it just like append

If the list is not empty, how to insert?

```
[2] -> [3] -> [7] -> [11]
 ^                    ^
 H                    T
```

Insert 5 at index 2.

We want to 
1) create [5]
2) point [5] to [7]
3) point [3] to it. 

We need to get a reference to [3] by iteration through the list until we get to [3].

THe order on the above steps matters!

If we did step 3 first, pointing [3] -> [5]

```
          [5]
         /   \
[2] -> [3]   [7] -> [11]
 ^                    ^
 H                    T
```


```
[2] -> [3] -> [5] -> [7] -> [11]
 ^                    ^
 H                    T
```

In [4]:
class Node:
    def __init__(self, element):
        self.element = element
        self.next = None

    def __str__(self):
        node_string = "[{}]".format(self.element)
        if self.next != None:
            node_string += " -> "
        return node_string

class LinkedList:
    def __init__(self):
        self.head = None
        self.tail = None
        self.size = 0

    def append(self, element):
        node = Node(element)
        if(self.is_empty()):
            self.head = node
            self.tail = node
        else:
            self.tail.next = node
            self.tail = node
        self.size += 1

    def insert(self, element, index):
        if(index <= self.size):
            node = Node(element)
            if(self.is_empty()):
                self.head = node
                self.tail = node
            else:
                if(index == 0):
                    # Should we replace insertion into the beginning with a
                    # prepend function? Maybe.
                    node.next = self.head
                    self.head = node
                elif(index == self.size):
                    # handle case where we insert after tail
                    self.tail.next = node
                    self.tail = node
                else:
                    prev = self.head
                    for i in range(index-1):
                        prev = prev.next
                    node.next = prev.next
                    prev.next = node
            self.size += 1

    def is_empty(self):
        if(self.size == 0):
            return True
        else:
            return False

    def __str__(self):
        current = self.head
        string_rep = str(current)
        while(current.next != None):
            current = current.next
            string_rep += str(current)
        return string_rep

lst = LinkedList()
lst.append(2)
print(lst)
lst.append(3)
print(lst)
lst.append(7)
print(lst)

lst.insert(5, 2)
print(lst)

lst.insert(1, 0)
print(lst)

lst.insert(11, 5)
print(lst)

[2]
[2] -> [3]
[2] -> [3] -> [7]
[2] -> [3] -> [5] -> [7]
[1] -> [2] -> [3] -> [5] -> [7]
[1] -> [2] -> [3] -> [5] -> [7] -> [11]


## Runtime Analysis

### Append

$O(1)$ because we only need to update the tail, no matter the size of the list

### Insert at beginning

$O(1)$ for same reason as append, except we're updating the head

### Arbitrary Insertion

$O(n)$ because we have to iterate through the list to perform insertion at an arbitrary index.

### Contains

$O(n)$ because we have to perform a linear search starting from the head.

### Access at an index

$O(n)$ because we have to start at the head and iterate to this index.

## Strengths and weaknesses

Linked Lists are very efficient for inserting or removing from the beginning or end of the list.

This fact is exploited to build other data structures (Stacks and Queues).

Linked Linked lists are not efficient for inserting, accessing, or removing elements in the middle of the list.
