### Learning Objectives
* To use `<random>` to generate random numbers 
* To use `<chrono>` to measure running times
* To implement sequential and binary search of both arrays

### Instructions
Read and study the following sections, run their code examples and solve their challenges. This worksheet has the following challenges:
* [CHALLENGE 01](#ch01)
* [CHALLENGE 02](#ch02)
* [CHALLENGE 03](#ch03)
* [CHALLENGE 04](#ch04)

Run your coding challenges and fix any errors they might have before downloading and submitting your completed worksheet for grading. When done, open the menu **File >> Download as >> HTML (.html)** to download your worksheet in HTML format. **Submit the downloaded *.html* file via Canvas**.

# Hashing
In the previous worksheet, we saw two ways of searching: sequential search which takes $O(n)$ and binary search which takes $O(log(n))$. Both of these searches are comparison-based. While binary search is efficient, it only works for sorted arrays. 

Hashing gives us a way for searching data that is not limited to sorted arrays and is more efficient than both sequential and binary search. Because comparison-based search cannot do better than $O(log(n))$, hashing must not be comparison-based if it is to do any better than binary search.

So what is hashing? Let's start by having an example.

Assume we have the data `3, 5, 7, 4, 2, 10, 12, 8`. We want to store this data in such a way that it is efficient to search. To do that we use a **hash table** which is a simple array with a capacity. Looking at this data, we see that its values range from `2` to `12` and its size is `8`. That is we can use an array with enough capacity as our hash table. For reasons that will be clear later on, we choose a capacity that is a prime number. Let's use `17` for this example.

In [None]:
const int CAPACITY = 17;
int htable[CAPACITY]{0};

The `{0}` at the end of last statement above instructs C++ to initialize the elements of this table to `0`.

For this array to work as a hash table, we need to use indexing. That means finding a way to map each value(also called a key) to the cell where it will be stored. To do that, we use a function that takes a value as an argument and returns its index in the hash table. We call this function a `hash function`.  A simple common example hash function uses the modular division `%` (the "mod" operator) like you see below.

In [None]:
int hash(int key){
    return key % CAPACITY;
}

Here we are using the "mod" operator `%` to find the index where the argument key or value should be saved in the table. Now we create two functions:
- a function named `put` to add the a given key or value to the hash table.

In [None]:
void put(int key){
    htable[hash(key)] = key;
}

- a function named `contains` to return whether a give key exists in the hash table.

In [None]:
bool contains(int key){
    return htable[hash(key)] == key;
}

Here we are searching for the value of `key`. So we use the hash function on the searched for `key` to get its corresponding index in the hash table. Then we access the table cell where that index is and see if `key` is there. If so we return `true`; otherwise `false` is returned.

This kind of searching is not comparison-based and it very efficient. 

### <a id="ch01">CHALLENGE 01</a>
**Q1**. What is the running time in Big-O notation of the `put` function?

**Q2**. What is the running time in Big-O notation of the `contains` function?

## Displaying the hash table
Having done that, we can now enter the above data to `htable`.

In [None]:
put(3);
put(5);
put(7);
put(4);
put(2);
put(10);
put(12);
put(8);

Let us see how `htable` looks like after putting all the above keys in it. Here is a function to display `htable` in a nice format.

In [None]:
#include <iostream>
#include <iomanip>
#include <string>
void printHTable(){
    std::cout << std::setw(8) << "index: "; 
    for(int i = 0; i < CAPACITY; i++){
        std::cout << std::setw(4) << i;
    }
    std::cout << std::endl << std::setw(8) << ' ' 
              << std::string(4 * CAPACITY + 1, '=') << std::endl;
    std::cout << std::setw(8) << "data: "; 
    for(int i = 0; i < CAPACITY; i++){
        if(htable[i] == 0) {
            std::cout << "|" << std::setw(3) << ' ';
        } else {
            std::cout << "|" << std::setw(3) << htable[i];
        }
    }
    std::cout << "|" << std::endl << std::setw(8) << ' ' 
              << std::string(4 * CAPACITY + 1, '=') << std::endl;
    
}

Now let's call this function to see what `htable` look like:

In [None]:
printHTable();

Here a few example search calls.

In [None]:
std::cout << contains(10) << std::endl;
std::cout << contains(6) << std::endl;
std::cout << contains(7) << std::endl;
std::cout << contains(18) << std::endl;

### <a id="ch02">CHALLENGE 02</a>
**Q1**. At what cell will `put(28)` store the key value `28`?

**Q2**. At what cell will `put(17)` store the key value `17`?

# Collisions
Having a search function that runs in constant time $O(1)$ is exactly what we are looking for. But will it work beyond the above example. In other words:
- How big should the hash table be compared to the data? 
- Can we store more data than the size of the hash table?
- Can the hash function return the same index for two different keys? 
- How do we handle having more than one key mapped by the hash function to the same index?

Right now the hash table `htable` does not allow for storing more elements than its capacity; that is more than `17` elements. Also, the above `hash()` function maps both `3`, `20`, and `37` to the same index. That means running the following three calls:

In [None]:
put(3);
put(20);
put(37);

results in the following table:

In [None]:
printHTable();

Notice that keys `3` and `20` are missing from the table. Only `37` exists. This is because all the `3`, `20`, and `37` keys mere mapped to the cell at index 3 and therefore has only the key of the last `put` call. This means the keys `3` and `20` are lost.

We call having two or more keys mapped to the same index in a hash table a `collision` and we have to have a way to handle collisions. There are multiple strategies to handle collisions. In this worksheet, we'll cover two strategies: **chaining** and **linear probing**

# Chaining (open hashing)
One way of handling collisions is to change the hash table from an array of keys to an array of linked lists. This allows for storing more than one key value at the same index. We call this **chained hash table**. Each linked list is called a **bucket** and is used to store all the keys that map to the same index.

Here is a class named `ChainedHashtable` that uses the build-in c++ doubly linked list class `std::list<>` to implement a chained hash table.

In [None]:
#include <list>

template<typename T>
class ChainedHashtable {
private:
  unsigned sz = 0, capacity = 499;
  std::list<T>* htable;
  int hash(T e){ return e % capacity; }
    
public:
  ChainedHashtable(unsigned capacity):  sz(0), capacity(capacity), htable(new std::list<T>[capacity]{}){}
  ChainedHashtable(const ChainedHashtable<T>& c) = delete; 
  ChainedHashtable<T>& operator=(const ChainedHashtable<T>&& c) = delete;

  friend std::ostream& operator<<(std::ostream& out, const ChainedHashtable<T>& t){
    for(int i = 0; i < t.capacity; i++){
      out << std::setw(6) << i << ": ";

      for(auto it = t.htable[i].begin(); it != t.htable[i].end(); ++it){
        out << *it << " ";
      }
      out << std::endl;
    } 
    
    return out;
  }

  void put(T e){ htable[hash(e)].push_back(e); sz++; }
  bool remove(T e){ htable[hash(e)].remove(e); sz--; }
  bool empty(){ return sz == 0; }
  bool full(){ return sz == capacity; }
  int size() { return sz; }
  bool contains(T e) {
    auto bucket = htable[hash(e)];
    for(auto it = bucket.begin(); it != bucket.end(); ++it){
      if (*it == e) return true;
    }
    
    return  false;
  }

  ~ChainedHashtable(){
      delete[] htable;
  }
};

Notice the use of `= delete` to tell C++ that this class does not support copying. Let's test this class on the above example.

In [None]:
#include <iostream>
ChainedHashtable<int> cht {17};

cht.put(5);
cht.put(7);
cht.put(4);
cht.put(2);
cht.put(10);
cht.put(12);
cht.put(8);
cht.put(3);
cht.put(20);
cht.put(37);

std::cout << cht;

As you can see, there were two collisions: both `20` and `37`collided with the value at index `3` and were added to the same linked list because they all were mapped by the `hash` function to the same index `3`. Because this table has access linked lists, it can accommodate any number of keys even more that the capacity of the the hash table. Notice also the use of prime numbers for capacity. This is to make sure that keys are distributed all over the table and do not cluster, which is important for performance.

### <a id="ch03">CHALLENGE 03</a>
Having the following chained hash table:

```
0: 7 21 28
1:
2: 9 23 30
3:
4:
5:
6: 13

```

Print out how this table will look like after the following code:
```c++
put(44);
put(33);
put(50);
put(20);
```

# Linear probing (close hashing) 

What if we want to save the keys inside the hash table without using linked lists. How do we handle collisions then? We could used **linear probing** for that. Say that you have a key value $k$ whose hash is $h_k$. If the element at index $h_k$ is empty, then we store $k$ there and we are done. If, however, $h_k$ is not empty, then we check the element at index $h_k + 1$; if it is empty we store $k$ in it, else we check the index $h_k + 2$ and do the same. We keep doing this until we find an empty cell to store $k$ in or reach the end of the table. If that is the case, we move to the beginning of the table and check if the element at index $0$ is empty.

Here is a class implementing this.

In [None]:
template<typename T>
struct Cell {
  T info;
  bool empty = true;
};

template<typename T>
class LinearHashtable {
private:
  unsigned sz = 0, capacity = 499;
  Cell<T>* htable;
  int hash(T e){ return e % capacity; }
public:
  LinearHashtable(int capacity):  sz(0), capacity(capacity), htable(new Cell<T>[capacity]{}){}
  LinearHashtable(const LinearHashtable<T>& c) = delete;
  LinearHashtable<T>& operator=(const LinearHashtable<T>&& c) = delete;

  friend std::ostream& operator<<(std::ostream& out, const LinearHashtable<T>& t){
    for(int i = 0; i < t.capacity; i++){
      out << std::setw(6) << i << ": ";
      if(!t.htable[i].empty) out << t.htable[i].info;
      out << std::endl;
    } 

    out << std::endl;
    return out;
  }

  void put(T e){
    if(full()) throw std::runtime_error("Table is full.");

    int t = hash(e);
    if(htable[t].empty){
      htable[t] = {e, false};
    } else {
      int i = 1;
      while (!htable[(t + i) % capacity].empty) i++;
      htable[(t + i) % capacity] = {e, false};
    }

    sz++;
  }

  bool remove(T e){
    if(empty()) return false;

    int t = hashCode(e);
    if(!htable[t].empty && htable[t].info == e){
      htable[t].empty = true;
      sz--;
      return true;
    } else {
      int i = 1;
      while (i < capacity){
        if(!htable[(t + i) % capacity].empty &&  htable[(t + i) % capacity].info == e){
          htable[(t + i) % capacity].empty = true;
          sz--;
          return true;
        }
        
        i++;
      }
      
      return false;
    }
  }

  bool contains(T e) {
    int t = hashCode(e);
    if(!htable[t].empty && htable[t].info == e){
      return true;
    } else {
      int i = 1;
      while (i < capacity){
        if(!htable[(t + i) % capacity].empty && htable[(t + i) % capacity].info == e) {
          return true;
        }

        i++;
      }
      return  false;
    }
  }

  bool empty(){ return sz == 0; }
  bool full(){ return sz == capacity; }
  int size() { return sz; }

  ~LinearHashtable(){
      delete[] htable;
  }
};

Here is example program testing this class.

In [None]:
LinearHashtable<int> lht {17};

lht.put(5);
lht.put(7);
lht.put(4);
lht.put(2);
lht.put(10);
lht.put(12);
lht.put(8);
lht.put(3);
lht.put(32);
lht.put(33);

std::cout << lht;

To see linear probing in action, let's put `20` in this table and print it. 

In [None]:
lht.put(20);
std::cout << lht;

We see that `20` was stored in the cell at index `6`. This is because the cell at index `20 % 17 = 3` was not empty, therefore we check the cells at index 4 and 5 which were not empty either. The next available cell after that was at index `6` which is where `20` is stored. Similarly, putting `37` will cause a conflict with the cell at index `3` and therefore will be stored at the available cell after that, which is `9`. Let's see that in action.

In [None]:
lht.put(37);
std::cout << lht;

Finally, let's see what happen when we put `50` in this table. `50 % 17` is `16` which has a conflict with the key currently at index `16`. So where will the key `50` be saved? Let us see.

In [None]:
lht.put(50);
std::cout << lht;

As you can see, `50` is stored at the cell at index `0` because having hit the end of the table (index `16`), linear probing starts over at the beginning of the table (index `0`), which happens to be empty. Thus `50` is stored at cell `0`.

### <a id="ch04">CHALLENGE 04</a>
Looking at the following linear hash table:

```
0: 7
1: 21
2: 9
3: 29
4: 6
5:
6: 13

```
Type the `put` function calls necessary to generate this hash table.