# Data Structures

In this set of notes, we will outline and use linked lists, but before we get to them, it's useful to define what a data structure is.

A **data structure** is an object which stores data and provides operations for inserting, accessing, and removing data from it.

Different data structures actually structure the data in different ways. This leads to different efficiencies for their operations. 

A few data structures are shown below:


<img src = "figures/data-structures.jpeg" width = "100%">

On the left we have a **Linked List**; in the middle, a **Graph**; and on the right, a **Binary Search Tree**.



From our perspective, other than the intellectual interest in understanding how they work (there are interesting ideas, designs, and algorithms within them!), different data structures have different strengths and weaknesses and are more or less suited for particular applications.

In studying how we can structure and access data, you are building up a toolbox of data structures that you can apply to solve problems.

Different Structures have different strengths and weaknesses. They have different runtimes for their operations. 

All data structures support a handful of common operations:

- Add an element
- Remove an element
- Access into the data structure
- Search for an element
- etc..

When thinking about a program, we consider the operations we need to perform and pick the data structures that supports those operations most efficiently.

As we learn about various data structures, we will focus on a few things:

1. Their structure

2. How we can implement them

3. The runtimes of common operations

We'll also discuss applications, and, especially for graphs, algorithms which can be run on them.

# Python Lists (NOT Linked Lists)

Before we get to Linked Lists, it is useful to discuss python lists and how they work.

Under the hood, python lists are stored as **arrays**.

An **array** is just a consecutive set of memory locations which store values of a particular type.

For example, we can have an array which stores 10 integers which we can represent as follows.

<img src = "figures/array-empty.jpeg" width = "60%">

It has ten spaces, labelled with their indices 0-9.


When an array is created, enough space is allocated for exactly 10 integers, and this array can never hold more than 10 numbers. Why? Because the memory after will be used to store other stuff and the values in an array must be consecutive.

This seems pretty restrictive, and it is. Python hides these low level details from us by providing a more abstract data structure, its list.

Its worthwhile to consider some basic operations on an array, how they work, and what their runtime will be.


## Operations on an Array

- Access an element at an index
- Insert into the array
- Remove from an array

### Element access

Because of the way arrays are stored, accessing into one is trivial. Every array has a starting memory location, and given an index, it is a simple calculation to determine the memory address of that spot. Index access is thus $O(1)$.

#### Insertion

We'll consider two cases, inserting into the beginning of an array and appending to an array.

##### Inserting into the beginning of an array

Suppose we have an array when some number of elements in it:



<img src = "figures/array-insert-00.jpeg" width = "60%">

If we want to insert `2` into in the index `0` spot in this array, we have to make room for it by shifting everthing over. Only then can we insert 2.

<img src = "figures/array-insert-01.jpeg" width = "60%">

##### Runtime

Since we have to shift everything over, if we have `n` elements in the array, we have to perform `n` copies.

The runtime is $O(n)$

##### Inserting at the end of an array

What if we want to insert at the end of an array? 

In the worst case, the array is full. Since an array can't grow, the only way to append a new element is to create a new larger array, copy everything into it, then append to that new array.

<img src = "figures/array-insert-02.jpeg" width = "70%">

##### Runtime

Again the runtime of this operation depends on the size of the array. If there `n` elements in the array, we have to perform `n` copies.

The runtime is $O(n)$

#### Removal

Removing an element from an array is similarly inefficient. An array is not allowed to have an "holes" in it. It must consist of consecutive values.

As a result, if an element is removed, all elements that follow it must be shifted over to fill in the gap.

Removal is therefore also $O(n)$

### Summary

For a general array (e.g, a basic python list) our major operations have the following runtimes:

- Access: $O(1)$
- Insertion: $O(n)$
- Removal: $O(n)$

This isn't great. We have efficient access, but inefficient modification. With this as a background, let's examine a different form of a list: the Linked List.

# Linked Lists

<img src = "figures/linked-list.jpeg" width = "60%">

A **Linked List** is a linked set of **Nodes**. Each **Node** contains one element of the list and references to its next and previous nodes. 

The linked list itself only ever remembers the **HEAD** and **TAIL** nodes of the list. The **HEAD** and **TAIL** are the first and last nodes in the list respectively. Every other node is accessed by starting at either of those and iterating through the list to get to it.

With this information, we can sketch out a `Node` and `LinkedList` class:

In [None]:
class Node:
    def __init__(self, element):
        """
        Construct a Node

        Parameters
        ----------
        element : AnyType
            An element to be stored in a Linked List
        """
        self.element = element
        # Create two attributes, next and prev and initialize them
        # to be None
        # These are set when a None is added to a Linked List
        self.next = None
        self.prev = None

    def __str__(self):
        if self.next == None:
            return "[{}]".format(self.element)
        else:
            return "[{}]<->".format(self.element)

class LinkedList:
    def __init__(self):
        self.size = 0
        self.HEAD = None
        self.TAIL = None

    def is_empty(self):
        return self.size == 0

## Operations

What operations should we support? 

As with most data structures, we should support operations:

- insert
    - insert a new element into the list
- contains
    - check is an element is in the list
- get
    - get the element at a given index
- remove
    - remove an element from the list

## Inserting into a Linked List

First let's consider insertion. How can we insert a new element into a Linked List?

This depends on where we are inserting into. We can break this problem into three cases:

1. Inserting into the very beginning of the list
    - e.g, prepend
2. Inserting into the middle of the list
    - e.g, general insert
3. Inserting at the very end of the list
    - e.g, append

For the first and third cases, we will have to update either the `HEAD` or the `TAIL`. In the second case, since a linked list only has references to the `HEAD` and `TAIL`, we will have to navigate to the location to insert before inserting.

In our Linked List, we'll implement a method for each of these cases.



### Append

Appending (and prepending) are relatively simple. Let's implement `append` first.

To append an element to a linked list, if the list is empty, we need to:

1. Create a new `Node` containing the element
2. Point the `HEAD` and `TAIL` to that node
3. Increment the size of the list. 

Example: Starting from an empty list, append 5 to the list. The relevant part for each step is highlighted in blue.

1. Create the Node containing 5.
2. Point the HEAD and TAIL to it.
3. Increment the size.

<img src = "figures/ll-append-empty.jpeg" width = "40%">


If the list isn't empty, we need to:

1. Create a new `Node` containing the element
2. Update the pointers between the new node and previous tail
3. Update the `TAIL` to point to this new node
4. Increment the size of the list

<img src = "figures/ll-append.jpeg" width = "60%">

Putting it together in code, it looks like:

``` python
def append(self, element):
    node = Node(element)
    if self.is_empty():
        self.HEAD = node
        self.TAIL = node
    else:
        node.prev = self.TAIL
        self.TAIL.next = node
        self.TAIL = node
    self.size += 1
```

Rather than copying the whole Linked List code throughout this notebook, we'll write the functions in-line in the notebook. We'll consolidate everything into a Linked List class later.

### Prepend (You will implement)

`prepend` is very similar to `append`. The only difference is that we are inserting before the `HEAD` rather than after the `TAIL`.

I will leave it to you to implement `prepend`.

```python
def prepend(self, element):
    # You will implement this
```

### General Insert

For general insertion, we want to be able to insert at a particular position.

For example, given an `element` and an `index`, after the insertion, that `element` should be at that `index`. The node that was there will be "pushed back".

To perform a general insert, we need to:

1. Create a new node, `node`, containing the element
2. Iterate to the node BEFORE the insertion position
3. Update the pointers between `node` and the nodes before and after it
4. Increment the size of the list


<img src = "figures/ll-insert.jpeg" width = "60%">

### Insert Implementation

Before jumping into the code, we need to note that for general insertion, the discussion above assumes the case where an index is given for somewhere in the middle of the list. 

If an index is given for the very end or beginning of the list, our strategy won't work; we need to update either the `HEAD` or the `TAIL`. This isn't a problem though, we can handle these special cases using `prepend` or `append`!



``` python
def insert(self, element, index):
    # Handle the special cases of 
    # inserting at index 0
    if(index == 0):
        self.prepend(element)
        return
    # or inserting after the tail
    if(index == self.size):
        self.append(element)
        return
    
    # otherwise, general case
    # 1. Create the new node
    node = Node(element)
    # 2. iterate to the node before the insertion position.
    # if we want to insert at index 3, we need a reference
    # to the node at index 2
    #  0          1          2          3       
    # [17]  <->  [19]  <->  [23]  <->  [31]
    #                        ⬆︎    ⬆︎
    #                     BEFORE   Insert-Position
    # for ease, use function iterate_to_position
    before = self.iterate_to_position(index-1)
    after = before.next
    # 3. update the pointers.
    before.next = node
    after.prev = node
    node.prev = before
    node.next = after
    # 4. increment the size
    self.size += 1

def iterate_to_position(self, index):
    # iterate to the node at the given index.
    # if we want a reference to the node at
    # index 2, we start at the HEAD and move
    # forward two spots from the HEAD
    #  0          1          2          3       
    # [17]  <->  [19]  <->  [23]  <->  [31]
    #                        ⬆︎  
    node = self.HEAD
    for i in range(index):
        node = node.next
    return node
```

## Contains

After implementing insert, `contains` will be much shorter. `contains` returns `True` if an key is in the list and `False` otherwise.

To implement contains, we have to iterate through the list, checking to see any of the elements match the `key`. This is linear search!

```python
def contains(self, key):
    # guard against an empty list
    if self.is_empty():
        return False
    node = self.HEAD
    while True:
        # check each element
        if node.element == key:
            return True
        # if we are at the tail, stop
        if node == self.TAIL:
            break
        # move to the next node
        node = node.next
    return False
```

## Get  (You will implement)

`get` returns the element at a given `index`.

Since a linked list only has references to the `HEAD` and `TAIL`, we need to start at the `HEAD` and iterate to the `index` in order to return that element.

This is another function that I will leave to you to implement.

```python
def get(self, index):
    # You will implement this
```

## Remove

`remove` takes an element and removes it from the list.

Just like `insert`, we have three cases to consider:

- The element is in the `HEAD` of the list
    - we will implement this as `remove_first`, as in remove the first element
- The element is in the middle of the list
    - this is our general remove
- The element is in the `TAIL` of the tail
    - we will implement this as `remove_last`

### Remove First

To remove the first element in the list, we need to:

1. Move the `HEAD` forward to the next node
2. Break the pointers between the previous head and the new head
3. Decrement the size of the list

<img src = "figures/ll-remove-first.jpeg" width = "60%">

This all assumes that there are multiple elements in the list. If there is only a single element in the list, there is nothing to move the HEAD forward to. We can handle this as a special case.

If the list only has one element, set both the `HEAD` and `TAIL` to `None` and the `size` to `0`.

Likewise, as a second special case, if the list is empty, do nothing.

### Remove First Implementation



```python
def remove_first(self):
    if self.size == 0:
        return # empty list, do nothing
    if self.size == 1:
            self.HEAD = None
            self.TAIL = None
            self.size -= 1
            return
    # move head forward, remembering previous head
    prev_head = self.HEAD
    self.HEAD = self.HEAD.next
    # break pointers
    prev_head.next = None
    self.HEAD.prev = None
    # decrement size
    self.size -= 1
```

### Remove Last (You will implement)

This operation mirrors `remove_first`. I will leave it to you to implement this.

```python
def remove_last(self):
    # You will implement this
```

## General Remove (You will implement)

`remove` takes in an element and removes the first instance of it from the list.

It is similar to general insert except that rather than iterating to a particular spot to perform the insertion, we must iterate to the element we want to remove. 

To remove an element from the list, you need to:

1. Iterate to the element
2. Update the pointers of the nodes on either side of it so that they point to each other rather than it.
3. Decrement the size of the list.

You will implement this function. As we've done throughout this notebook, I recommend that you draw out an example to refer to as you implement it. This is a great strategy not only for linked lists, but also for any data structure or algorithm that you are working to comprehend!

```python
def remove(self, element):
    # You will implement this
```

# Runtime

We have been deep in the details of implementing the operations for a Linked List. Let's step back and consider the runtimes of our operations.



We have covered the following operations:

- Insertion
    - append
    - prepend
    - insert
- contains
- get
- Removal
    - remove_first
    - remove_last
    - remove

#### Append

$O(1)$ because we only need to perform the same set number of steps, no matter the size of the list

#### Prepend

$O(1)$ for same reason as append

#### Arbitrary Insertion

$O(n)$ because we have to iterate through the list to perform insertion at an arbitrary index.

#### Contains

$O(n)$ because we have to perform a linear search starting from the head.

#### Access at an index

$O(n)$ because we have to start at the head and iterate to this index.

#### Remove First

$O(1)$. Like append and prepend, removing the first element only requires the same set number of steps, no matter the size of the list.

#### Remove Last

$O(1)$, same as remove first.

#### General Removal

$O(n)$ because we have to iterate to the element to remove before removing it.

## Strengths and weaknesses

Linked Lists are very efficient for inserting or removing from the beginning or end of the list.

This fact is exploited to build other data structures (Stacks and Queues).

Linked Linked lists are not efficient for inserting, accessing, or removing elements in the middle of the list.


# Linked List Code

Here is the implementation for a Linked List that we have built so far. Implementing the stubbed out methods will be a part of the next lab.

In [None]:
class Node:
    def __init__(self, element):
        """
        Construct a Node

        Parameters
        ----------
        element : AnyType
            An element to be stored in a Linked List
        """
        self.element = element
        # Create two attributes, next and prev and initialize them
        # to be None
        # These are set when a None is added to a Linked List
        self.next = None
        self.prev = None

    def __str__(self):
        if self.next == None:
            return "[{}]".format(self.element)
        else:
            return "[{}]<->".format(self.element)

class LinkedList:
    def __init__(self):
        self.size = 0
        self.HEAD = None
        self.TAIL = None

    def is_empty(self):
        return self.size == 0

    def __str__(self):
        node = self.HEAD
        list_str = ""
        while True:
            # concatenate the string for each node
            list_str += str(node)
            # if we are at the tail, stop
            if node == self.TAIL:
                break
            # move to the next node
            node = node.next
        return list_str

    def append(self, element):
        node = Node(element)
        if self.is_empty():
            self.HEAD = node
            self.TAIL = node
        else:
            node.prev = self.TAIL
            self.TAIL.next = node
            self.TAIL = node
        self.size += 1

    def prepend(self, element):
        # TO-DO
        # Implement this
        # Insert this element at the HEAD of the list
        pass

    def insert(self, element, index):
        # Handle the special cases of 
        # inserting at index 0
        if(index == 0):
            self.prepend(element)
            return
        # or inserting after the tail
        if(index == self.size):
            self.append(element)
            return
        
        # otherwise, general case
        # Create the new node
        node = Node(element)
        # iterate to the node before the insertion position.
        # if we want to insert at index 3, we need a reference
        # to the node at index 2
        #  0          1          2          3       
        # [17]  <->  [19]  <->  [23]  <->  [31]
        #                        ⬆︎    ⬆︎
        #                     BEFORE   Insert-Position
        # for ease, use function iterate_to_position
        before = self.iterate_to_position(index-1)
        after = before.next
        # update the pointers.
        before.next = node
        after.prev = node
        node.prev = before
        node.next = after
        # increment the size
        self.size += 1

    def iterate_to_position(self, index):
        # iterate to the node at the given index.
        # if we want a reference to the node at
        # index 2, we start at the HEAD and move
        # forward two spots from the HEAD
        #  0          1          2          3       
        # [17]  <->  [19]  <->  [23]  <->  [31]
        #                        ⬆︎  
        node = self.HEAD
        for i in range(index):
            node = node.next
        return node

    def contains(self, key):
        # guard against an empty list
        if self.is_empty():
            return False
        node = self.HEAD
        while True:
            # check each element
            if node.element == key:
                return True
            # if we are at the tail, stop
            if node == self.TAIL:
                break
            # move to the next node
            node = node.next
        return False

    def get(self, index):
        # TO-DO
        # Implement this
        pass

    def remove_first(self):
        if self.size == 0:
            return # empty list, do nothing
        if self.size == 1:
            self.HEAD = None
            self.TAIL = None
            self.size -= 1
            return
        # move head forward, remembering previous head
        prev_head = self.HEAD
        self.HEAD = self.HEAD.next
        # break pointers
        prev_head.next = None
        self.HEAD.prev = None
        # decrement size
        self.size -= 1

    def remove_last(self):
        # TO-DO
        # Implement this
        # remove the element at the TAIL of the list
        pass

    def remove(self, element):
        # TO-DO
        # Implement this
        # remove the first instance of this element from the list
        pass