# Data Structures

Think of your computer as a giant set of drawers🗄️, each with its own address. When you want to store an item in memory, you request space, and the computer provides you with an address for that item. 

For multiple items, you typically use two basic structures: arrays and linked lists.

Source/s:
- Grokking Algorithms by Aditya Bhargava

#### Terminologies
- **Index**: The position of an element in a data structure.
- **Read**: Operation to access data in a data structure.
- **Write**: Operation to modify data in a data structure.
- **Search**: Operation to find an element in a data structure.
- **Update**: Operation to change an element in a data structure.
- **Delete**: Operation to remove an element from a data structure.
- **Insert**: Operation to add an element to a data structure.
- **Append**: Operation to add an element to the end of a data structure.

## Access Types
- **Random Access**: Allows direct access to any element, making reads faster than in linked lists.
- **Sequential Access**: Reading elements one by one from the beginning to the end.

## Arrays
An array is a collection of items stored at **contiguous** memory locations, meaning they are right next to each other. This arrangement allows easy access to elements, as you can calculate the address of any item based on its index.

**Pros of Arrays**:
- **Direct Access**: You can easily find the last item in an array. For example, if you have five items starting at address $00$, the last item is at address $04$.

**Limitations**:
- **Fixed Size**: If you want to add more elements than the allocated size, you may need to copy everything to a new location.
- **Insertions**: Inserting an item in the middle requires shifting all subsequent elements, which can be cumbersome.
- **Deletions**: Deleting an item involves moving all subsequent elements up to fill the gap.

## Linked Lists
A linked list is a sequence of elements where each element points to the next, allowing for flexible memory allocation. The linked list effectively resolves the issue of wasted space seen in arrays.

**Pros of Linked Lists**:
- **Dynamic Size**: You can add as many elements as you want without worrying about the size.
- **Insertions and Deletions**: Adding or removing elements is efficient because you only need to update the pointers of the surrounding elements.

**Limitations**:
- **Sequential Access**: You must traverse the list from the beginning to access an element, which can be time-consuming.
- **Memory Overhead**: Each element in a linked list requires additional memory to store the pointer to the next element.

**Common Practice**:
- It’s beneficial to maintain pointers to both the first and last elements of a linked list, enabling $O(1)$ time complexity for deletion operations.

### Types of Linked Lists
1. **Singly Linked List**: Each element points to the next element.
2. **Doubly Linked List**: Each element points to the next and previous elements.
3. **Circular Linked List**: The last element points back to the first element.

## Hybrid Arrays and Linked Lists
A hybrid data structure combines the benefits of both arrays and linked lists. In this structure, each item in an array can point to a linked list, allowing for dynamic sizing while still enabling efficient access. 

**Benefits of Hybrid Structures**:
- **Flexible Memory Use**: You can store a varying number of elements without wasting memory.
- **Efficient Insertions and Deletions**: You can quickly update the linked list portion for dynamic changes while using the array for faster access.

Also a practical example is when we have a list of names, and we want to store them in a data structure. We can use an array to store the names and a linked list to store the names with the same starting letter.

> An example of a hybrid structure is a **hash table**, where each array element points to a linked list of items.

This hybrid approach is useful when you need the performance benefits of arrays but also require the flexibility of linked lists, especially when handling large datasets with unpredictable sizes.

# Key Takeaways
- **Arrays** are useful for direct access to elements, but they have a fixed size and can be inefficient for insertions and deletions.
- **Linked lists** are dynamic and efficient for insertions and deletions, but they require sequential access to find elements.
- **Hybrid structures** combine the benefits of arrays and linked lists, providing flexible memory use and efficient operations.


## Stack
- A stack is a collection of elements with two main operations: **push** and **pop**.
- **Push**: Adds an element to the top of the stack.
- **Pop**: Removes the top element from the stack.

**LIFO Principle**: Last In, First Out
- The last element added to the stack is the first one to be removed.
* -> [3, 2, 1] -> *

**Common Uses**:
- **Function Calls**: Storing function calls to track the order of execution.
   - We call this the **call stack**.
- **Undo Mechanisms**: Reversing actions by popping the last element.   
    - For example, undoing text edits or closing windows.
- **Expression Evaluation**: Evaluating expressions by converting them to postfix notation.
    - For example, converting infix expressions to postfix expressions.
    - Infix: $a + b$
    - Postfix: $a b +$
- **Backtracking**: Storing states to backtrack to previous decisions.
    - For example, solving mazes or puzzles.

**Implementation**:
- You can implement a stack using arrays or linked lists.   
    - **Arrays**: Efficient for fixed-size stacks.
    - **Linked Lists**: Useful for dynamic-size stacks.

## Hash Table
- A hash table is a data structure that maps keys to values.

Alright let's dumb this down. I'm certain I'll forget what a hash table is in the future, so let's make it simple.
- Say you have a drawer full of books, and you want to find a specific book.
- If they're not organized, you'd have to search through every book to find the one you want (linear search - `O(n)`).
- If they're organized in alphabetical order, you can quickly find the book you're looking for (binary search - `O(log n)`).

We've tackled **Arrays** and **Linked Lists**. 
- We can sort the books in an array
    - **Pros**: Direct access to any book
    - **Cons**: Fixed size, inefficient for insertions and deletions
- If we run binary search on the array, we can quickly find the book we're looking for at `O(log n)` time complexity.
    - **Pros**: Efficient search
    - **Cons**: Inefficient insertions and deletions
> But what if we want to find it at `O(1)` time complexity?

Now, let's introduce the **Hash Table**.
- A hash table is like a drawer full of books, but each book has a unique label.
- When you want to find a book, you look at the label and go straight to the book.
- This way, you can find the book you're looking for in `O(1)` time complexity.

That's magic! 🎩✨ How do we do that?

A hash table uses a **hash function** to convert keys into indices, allowing for direct access to values.
- It maps keys to values using a hash function.
- It needs to be **consistent**, meaning the same key should always produce the same index.

**How hash function works**:
1. In the first time, the hash function converts the key into an index.
2. The hash table stores the value at that index.
3. When you want to retrieve the value, the hash function converts the key into the same index.
4. You can then access the value at that index.

The hash function knows how big the hash table is and maps keys to indices within that range. 
- it only returns valid indices within the hash table's size.

> Hash tables are smarted than arrays and linked lists because they provide direct access to values with `O(1)` time complexity.
> In worst-case scenarios, hash tables have a time complexity of `O(n)`. But compared to arrays and linked lists, hash tables are more efficient for most operations.

| Operation | Hash Table (Average) | Hash Table (Worst Case) | Array | Linked List |
| --- | --- | --- | --- | --- |
| Search | O(1) | O(n) | O(n) | O(n) |
| Insert | O(1) | O(n) | O(n) | O(1) |
| Delete | O(1) | O(n) | O(n) | O(1) |

Hash tables gets the best of both worlds:
- As fast as arrays for direct access (search)
- As fast as linked lists for insertions and deletions




**Hash Function in Action**:
We never have to implement a hash function from scratch. Programming languages provide built-in hash functions that we can use to create hash tables. Whew! 😅 
But for the curious minds, here's how a hash function works:
1. It takes a key as input.
2. It performs some calculations on the key.
ex. `hash('apple') = 4`
3. It returns an index based on the calculations.
4. The hash table stores the value at that index.

Sample of hash function in Python:
```python
hash('apple')
```

Define sample `hash()` function:
```python
def hash(key):
    return len(key) % 10
```

In this example, the hash function calculates the length of the key and returns the remainder when divided by 10. This way, the hash function maps keys to indices within the range of 0 to 9.

### Collision Handling
- A collision occurs when two keys map to the same index.
- To resolve collisions, you can use different techniques:
    - **Separate Chaining**: Each index stores a linked list of values.
    - **Open Addressing**: Finding an empty slot when a collision occurs.
    - **Linear Probing**: Searching for the next available slot.
    - **Quadratic Probing**: Using a quadratic function to find the next slot.
    - **Double Hashing**: Using a second hash function to find the next slot.

Hash functions are essential. Ideally it would map keys evenly across the hash table to avoid collisions. 
- In Separate Chaining, if the linked list grows too long, it can slow down the hash table.


A Hash Table has keys and values.
- **Keys**: Unique identifiers for values.
- **Values**: Data associated with keys.

In Python, we have a built-in data structure called a **dictionary** that acts as a hash table.

In [3]:
basket = dict()

basket["Bawang"]=20
basket["Sibuyas"]=10
basket["Kamatis"]=15
basket["Pitso"]=150
basket["Toyo"]=25

print(basket)

{'Bawang': 20, 'Sibuyas': 10, 'Kamatis': 15, 'Pitso': 150, 'Toyo': 25}


In [6]:
# We can easily ask for the value of a key
print("The price of Bawang is Php", basket["Bawang"])

The price of Bawang is Php 20


#### Real-World Applications
Mainly used in lookups and mappings:
- **Phone Books**: Mapping names to phone numbers.
- **Domain Name Systems (DNS)**: Mapping domain names to IP addresses.
- **Caching**: Storing frequently accessed data for quick retrieval.
    - To avoid recalculating the same data, we can store it in a hash table for quick access.
    - This way, we can retrieve the data in `O(1)` time complexity.
    - Websites remember the data instead of recalculating it every time.
    - Makes fetching data faster and less work for the server.
- **Voting Systems**: Mapping voters to only one vote.

In [8]:
# System to check whether a voter already voted

voters = dict()

# When someone votes, we add them to the list. 
# If they are already in the list, we tell them they already voted.
# Otherwise, we add them to the list.

def hasVoted(name):
    if name in voters:
        print(name, "has already voted. Cannot vote again.")
    else:
        voters[name] = True
        print(name, "has been added to the list of voters.")
    
hasVoted("Alice")
hasVoted("Bob")
hasVoted("Alice")


Alice has been added to the list of voters.
Bob has been added to the list of voters.
Alice has already voted. Cannot vote again.


##### Array vs. Hash Table
If we are storing votes in an array, this can eventually become slow
- If they already voted, we'll neeed to search through the list to check if they already voted.

But with a hash table, we can quickly check if they already voted in `O(1)` time complexity.

In [9]:
# Sample Caching System

cache = dict()

def getPage(url):
    if url in cache:
        print("Returning", url, "from cache.")
        return cache[url] # Return the cached page
    else:
        print("Retrieving", url, "from the web.")
        data = getDataFromWeb(url) # Get the data from the web
        cache[url] = data
        return data


def getDataFromWeb(url):
    # Pretend we are getting the data
    return "This is the data from the web for " + url
    

print(getPage("http://www.google.com"))
print(getPage("http://www.yahoo.com"))
print(getPage("http://www.google.com"))

Retrieving http://www.google.com from the web.
This is the data from the web for http://www.google.com
Retrieving http://www.yahoo.com from the web.
This is the data from the web for http://www.yahoo.com
Returning http://www.google.com from cache.
This is the data from the web for http://www.google.com



### Avoiding Collisions
To avoid collisions, you need
- A low load factor
- A good hash function

#### Load factor
Load factor = $<Number of items in Hash Table> / <Total number of slots in Hash Table>$
- This measures how many empty slots are available in the hash table.
- A low load factor means fewer collisions and faster operations.
- Having a load factor of more than 1 means there are more items than slots in the hash table. You may need to **resize** the hash table to accommodate more items.

**Resizing the Hash Table**:
- When the load factor exceeds a certain threshold, you can resize the hash table to accommodate more items.
1. Create a new hash table with more slots.
   - Rule of thumb: Double the number of slots.
2. Rehash all items from the old hash table to the new hash table.

#### Good Hash Function
A good hash function distributes values evenly across the hash table, reducing the likelihood of collisions.
- A bad hash function may map all keys to the same index, causing many collisions.

Now, we use **SHA** (Secure Hash Algorithm) for secure hashing. It's a cryptographic hash function that generates a fixed-size hash value from input data of any size.
- It's used in security protocols, digital signatures, and blockchain technology.
- We can also use this as a hash function for hash tables.

### Section Key Takeaways
- **Hash tables** provide direct access to values with `O(1)` time complexity.
- They use a **hash function** to map keys to indices within a range.
- **Collision handling** techniques that resolve conflicts include separate chaining and open addressing.
- **Separate chaining** involves storing a linked list of values at each index.
- **Open addressing** involves finding an empty slot when a collision occurs.
- **Hashes** are good for lookups and mappings, such as phone books, DNS, and caching.
    - ...filtering out duplicate
    - ...caching data to avoid recalculating 
- **Load factor** measures how many empty slots are available in the hash table.
- **Resizing** the hash table involves creating a new hash table with more slots and rehashing all items.
- **Good hash functions** distribute values evenly across the hash table, reducing collisions.
- You'll almost never have to implement a hash function from scratch, as most programming languages provide built-in hash functions.
- Hash table = Hash function + Array
- Load factor greater than 0.7 means you should resize the hash table.

# Sample 