### Data Structures

Four different ways to store data:
* Arrays
* Linked Lists
* Hash tables
* Tries

Some variations of these are trees and heaps which are similar to tries, stacks and queues are similar to arrays or linked lists.


What are the strengths and weaknesses of each type?

* Arrays
    * Insertion is bad
    * Deletion is bad
    * Lookup is great - constant time
    * Relatively easy to sort
    * Relatively small size wise
    * Stuck with fixed size, no flexibility

* Linked list
    * Insertion is easy
    * Deletion is easy
    * Lookup is bad - have to search linearly
    * Relatively difficult to sort
    * Relatively small size-wise (larger then arrays)

* Hash table
    * Insertion is a two-step process - hash, then add
    * Deletion is easy - once your find the element
    * Lookup is on average better than with linked list.
    * Not an ideal data structure for sorting.
    * Can run the gamut of size - size varies a bunch

* Tries
    * Insertion is complex - A lot of DMA
    * Deletion is easy - just free a node
    * lookup is fast - not as fast as an array though
    * Already sorted - sorts as you build in almost all situations
    * Rapidly becomes huge, even with very little data present

### Structures

* Structures provide a way to unify several variables of different types into a single, new variable type which can be assigned to its own type name

* We use structures (structs) to group together elements of a variety of data types that have a logical connection


* Once we create a structure, we usually seperate them it/them into separate .h files or atop our programs outside of any functions. Effectively creating a new type

* We can access fields of this type (also known as members) using the dot "." operator


* Structures do not need to be created on the stack. We can dynamically allocate structures if required

* In order to access fields of our structures in that situation, We first need to dereference the pointer to the structure, and then we can access its fields.

* The arrow operator (->) makes this process easier. It's an operator that does two things back-to-back:
    * First, it dereferences the pointer of the left side of the operator
    * Second, it accesses the field on the right side of the operator



### Singly-Linked List 

* A lnked list node is a special kind of struct of two members:
    * Data of some data type (int, char, float...)
    * A pointer to another node of the same type.

We can chain elements that we can follow from beginning to end.

```c

typedef struct sllist
{
    VALUE val;
    struct sllist* next;
}
sllnode;
```
Operations to understand:

1. Create a linked list when it doesn't already exist.
2. Search through a linked list to find an element
3. Insert a new node into the linked list
4. Delete a single element from a linked list
5. Delete entire linked list


```c

// Create a linked list
sllnode* create(VALUE val);

// steps involved
// a. Dynamically allocate space for a new sllnode
// b. Check to make sure we didn't run out of memory
// c. Initialize the node's val field.
// d. Initialize the node's next field
// e. Return a pointer to the newly created sllnode

// Linear search in linked list
bool find(sllnode* head, VALUE val);

// Steps involved
// Create a traversal pointer pointing to the list's head
// If the current node's val field is what we're looking for, report success
// If not, set the traversal pointer to the next pointer in the list and go bacak to step b.
// If you've reached the end of the list, report failure.

// Insert a new node into the linked list.

sllnode* insert(sllnode* head, VALUE val);

// Steps involved:
// Dynamically allocate spcae for a new sllnode
// Check to make sure we didn't run out of memory
// Populate and insert the node at the beginning of the linked list
// Return a pointer to the new head of the linked list

// Delete an entire linked list.

void destroy(sllnode* head);
// Steps involved:
// If you've reached a null pointer, stop
// Delete the rest of the list
// Free the current node
```




### Doubly-Linked Lists

* Singly-linked lists extend our abilty to collect and organize data but have a limitation
    * We can only ever move in one direction through the list

* Consider the implication that would have for trying to delete a nod

* A doubly-linked list, by contrast allows us to move forward and backward through a list, all be simply adding one extra pointer to our struct definition


```c

typedef struct dllist
{
    VALUE val;
    struct dllist* prev;
    struct dllist* next;
}
dllnode;

// In order to work with linked list effectively, there are a number of operations that we need to understand: 

// Create a linked list when it doesn't alredy exist 
// search through a linked list to find an element
// Insert a new node into the linked list
// Delete a single element from a linked list
// Delete an etire linked list

// Delete a nod from a linked list

void delete(dllnode* target);

// Fix the pointers of the surrounding nodes to "skip over" target
// Free target

```

* Linked lists, both singly and doubly linked, support extremely efficient insertion and deletion of elements.
    * They are done in constant time

* The downside is we lose the ability to randomly-access list elements
    * Done in linear time

### Hash Tables

* Hash tables combine the random access ability of an array with the dynamism of linked list

* This means:
    * Insertion, deletion, and lookup can tend toward O(1)

* We're gaining the advantages of both types of data structure.

* To get this performance upgrade, we create a new structure where when we insert data into the structure, the data itself gives us a clue about where we will find the data.

* The downside is that hash tables are harder at ordering or sorting data, but if we that's not necessary, we're all good. 

* A hash table is a combination of two things
    * A hash function which returns an nonnegative integer value called a hash code
    * An array capable of storing data of the type we wish to place into the data structure

* The idea is that we run our data through the hash function, and then store the data in the element of the array represented by the returned hash code


* A good hash function should:
    * Use only the data being hashed
    * Use all of the data being hashed
    * Be deterministic
    * Uniformly distribute data
    * Generate very different hash codes for very similar data


Example of a hash function
```c

unsigned int hash(char* str)
{
    int sum = 0;
    for (int j = 0; str[j] != '\0'; j++)
    {
        sum += str[j];
    }
    return sum % HASH_MAX;
}
```

Always site sources!!!


* A collisions happens when two seperate pieces of data return the same hash code

* Presumably we want to store both pieces of data in the hash table, so we shouldn't simply overwrite the data that happened to be placed in there first

* We need to find a way to get both elements into the hash table while trying to preserve quick insertion and lookup (linked lists?)

* We can revolve collisions using Linear probing
    * If we collisions, we try to place the data in the next consecutive element in the array (wrapping around the beginning if necessary) until we find a vacancy

    * That way, if we don't find what we're looking for in the first location, at least hopefully the element is somewhere nearby

* Linear probing cause problems called clustering. Once there's a miss, two adjacent cells will contain data, making it more likely in the future that the cluster will grow.

* Even if we switch to another probing technique, we're still limited. We can only store as much data as we have locations in our array. 


* Resolving collisions: Chaining

* Instead of each element of an array holding just one piece of data, we can each element contain multiple pieces of data

* if each element of the array is a pointer to the head of a linked list, then multiple pieces of data can yield the same hash code and we'll be able to store it all

* We eliminate clustering with this method

* Linked list run O(n) when it comes to insertion and creation.

* For lookup, we need to only search through what is hopefully a small list, since we're distributing what would be one huge list across n lists. 



### Tries

* We have seen a few data structures that handle the mapping of key-value pairs
    * Arrays: The key is the element index, the value is the data at that location
    * Hash tables: The key is the hash code of the data, the value is a linked list of data hashing to that hash code

* What about a slightly different kind of data structure where the key is guaranteed to be unique, and the value could be as simple as a Boolean that tells you whether the data exist in the structure?

* Tries combine structures and pointers together to store data.

* The data to be searched for in the trie is now a roadmap
    * If you can follow the map from beginning to end, the data exists in the trie
    * If you can't, it doesn't

* Unlike with a hash table, there are no collisions, and no two path pieces of data (unless they are identical) have the same path


A Trie is like a tree

* In a trie, the paths from a central root node to a leaf node would be labeled with digits of the year

* Each node on the path from to leaf could 10 pointers emanating from it, one for each digit.

* To insert an element into the trie, simply build the correct path from the root to the leaf

```c

typedef struct _trie
{
    char university[20];
    struct _trie* paths[10];
}
trie;
```

