# Why Data Structures? 
 - Data Structures differentiate mediocre / avergae programmers from incredible ones - the code is most efficient when the programmer knows which data structure to use. 


# Abstract Data Types

an Abstract Data Type (ADT) is abstraction of the data Structure (DS) that provides only the *interface* of the structure (what you must implement if creating one. 

examples:
- ADT: List , DS Implemetation: Dynamic Array, Linked List
- ADT: Queue, DS Implementation: Linked List Based Queue, Array based queue, Stack based queeu
- ADT: Map, DS Implementation: Tree Map, Hash Map, Hash Table

# Big - O Overview

#### *Time* : 
* Big O Notation gives an upper bound of run time in the worst case, even as input size becomes large. 
* because of case when large input, ignore multiplicative constants in calculation. 
    - $O(n+c) = O(n)$
    - $O(cn) = O(n), c > 0$
    - if run time function is $$f(n) = 7log(n)^3 + 15n^2 + 2n^3 + 8$$ then "Big O" of $f(n)$ is $$f(n) = O(n^3)$$


*Ex 1*

In [1]:
# o(1) example
# does NOT depend on input size. 
a = 1
b = 2
c = a + 3*b
i = 0

while i < 11:
    i = i + 1

In this next ex, $f(n) = n$, and $O(n) = O(n)$

Ex 2

In [2]:
# O(n) example
# runs in linear time at worst time.
n = 100 # input size - variable
i = 0 
while i < n:
    i = i + 1


In this next ex, $f(n) = n / 3$ (think why), and $O(n) = O(n)$, since we ignoe multiplicative constants for big O. 

Ex 3

In [3]:
# O(n) example
# runs in linear time at worst time.
n = 100 # input size - variable
i = 0 
while i < n:
    i = i + 3

Remember to drop multiplicative constants, here is a situation where it is tempting not to, but you shoud:
The following two code snippets run in $O(n)$

Ex 5

In [4]:
# min and max 1 
import sys

arr = [1,2,3,4,5] # n = 5, input array

max = sys.maxsize # maximum sequence length in python - analog to Interger.MAX_SIZE
min = -sys.maxsize - 1 # minimum negative integer in Python

for x in arr:
    if x > max:
        max = x
        
for x in arr:
    if x < min:
        min = x

In [5]:
# min and max 2
import sys

arr = [1,2,3,4,5] # n = 5, input array

max = sys.maxsize # maximum sequence length in python - analog to Interger.MAX_SIZE
min = -sys.maxsize - 1 # minimum negative integer in Python

for x in arr:
    if x > max:
        max = x
    if x < min:
        min = x
        

Ex: 6

What is the runtime of the snippet?

In [6]:
arr = [1,2,3,4,5] # n = 5 is input
sum = 0
prod = 1

for x in range(0, len(arr)):
    sum = sum + arr[x]

for x in range(0, len(arr)):
    prod = prod * arr[x]
    
print(sum, "and", prod)

15 and 120


Solution: Even though we are adding, multiplying, and going through the array twice, runtime $f(n) = n + n = 2n$ is $O(n)$. 

Ex 7

Runtime of snippet?

### Amortized Time
- Amortized time is the runtime describing a data structure that has a very "bad" run time every once in a while, but a different (better) run time in every other case. 
- Example of amortized time: an arrayList is a dynamically resizing array, so whenever it runs out of space, it creates a new array with twice the length of the current array. That means that every so often, adding an element to the ArrayLust takes $O(n)$ time, since it has to copy all n elements over to the new array. But the amortized time is $O(1)$ for appending an element to an ArrayList. 

### Recursive Runtimes


Ex: What is the runtime of this code snippet?

In [7]:
n = 100 # initial input

def f(n):
    if n < 1:
        return 1
    return f(n - 1) + f(n - 1)

Solution: Notice that with every single call to f, we are calling f twice until we hit n = 0. Thus, in an imaginary function call tree, we branch out into two branches at each call, and the total number of function calls is $2^0 + 2^1 + 2^2 + ... + 2^n = 2^{n+1}-1$. After dropping multiplicative constants, the run time is $O(2^n)$

# Arrays :

### Dynamic and Static Arrays

### Dynamic Array Code

# Linked Lists :

### Linked Lists Intro

### Doubly Linked List Code

# Stack :

### Stack Introduction

### Stack Implementation

### Stack Code

# Queues :

### Queue Introduction

### Queue Code

### Priority Queue Introduction

### Priority Queve Min Heaps and Max Heaps

### Priority Queue Inserting Elements

### Priority Queue Removing Elemets

### Priority Queue Code

# Union Find : 

### Union Find Introduction

### Union Find Kruskal's Algorithm

### Union Find - Union Find Operations

### Union Find Path Compression

# Searching: 

## Depth - First Search:

## Breadth - First Search:

- Vertex based technique for finding shortest path in a GRAPH. 

- implemented using Queue (First In, First Out) to find shortest path. 
- one vertex is selected at a time, marked, then its adjacent vertex is visited, then stored in queue
- not suitable for decision trees, since visits all neighbors first. 

#### Time Complexity:

if V = number of vertices, E = number of edges, then $O(V + E)$ when adjacency list used, $O(V^2)$ when adjacency matrix is used. 



example problems:
1. [Islands](https://leetcode.com/problems/number-of-islands/discuss/813511/DFS-and-BFS-with-Easy-Explanation)

Sources:
1. [GFG](https://www.geeksforgeeks.org/difference-between-bfs-and-dfs)


# **Trees and Graphs** :
## Tree Defintiion:
*def* - for all types of trees:
- each tree has a root (node)
- root node has children $c \geq 0$
- each child $c$ has children $c_n \geq 0$
- No cycles. 
- nodes might or might not link back to parents. 
- nodes might or might not be in a particular order. 
- if node has no children, it is a "leaf"

In [8]:
# tree definition:
class Node:
    # every node has a name and a children Node list
    def __init__(self, name, nodes):
        self.name = name
        self.nodes = nodes

In [9]:
# optional - wrap Node with Tree class:
class Tree:
    def __init__(self, root : Node):
        self.root = root

## Types of Trees: 
### Binary *Search* Tree:

def: 

- tree which is binary (2 children max per node) 
- ALL left descendants $\leq$ n $<$ ALL right descendants

ex: NOT a BST
![not a bst](fig/img/no-BST.png)
takeaway: ALWAYS make sure you are working with BST, or clarify what type of tree it is (binary, not search?)

## Types of Trees: 

### Balanced VS Unbalanced Trees
any type of tree can be balanced or unbalanced. 
- balanced $\nRightarrow$ (does not imply) perfect tree (L and R subtrees exact same size)

### Balanced Tree
Operation Runtimes:
- Insert : O(log n)
- Find : O(log n)

Examples:
- Red-Black Trees
- AVL Trees


### Complete Binary Tree

- every level is fully filled except last level
- if last level is filed, it's filled to the right. 


### Full Binary Tree

- every node has EITHER 2 OR 0 children. no 1-child nodes. 

### Perfect Binary Tree

- has maximum number of nodes, each level is perfectly filled to the left and right. Left and right subtrees match. 
- has exactly $2^k - 1$ nodes, $k$ = number of levels. 

### Binary Tree Traversals:

#### In Order (most common):

- visit the left node, the current node, then the right node. 
- in BST, visits in ascending order. 


In [12]:
# in order traversal of a tree
def inOrderTraversal(node : Node):
    if node != None:
        # visit left
        inOrderTraversal(node.left)
        # visit current node
        visitFunction(node)
        # visit right node
        inOrderTraversal(node.right)

#### Pre Order

- visits the current node before visiting its children. 

In [13]:
# pre order traversal of a tree
def preOrderTraversal(node : Node):
    if node != None:
        # visit current node
        visitFunction(node)
        # visit children
        preOrderTraversal(node.left)
        preOrderTraversal(node.right)

#### Post Order

### Binary Heaps (Min Heap, Max Heap)


### Tries (Prefix Trees)

## Graphs 