### Hash Function
Hash function is any function which maps data of arbitrary size to a fixed-size value. Value returned by such function is called *hash value*, *hash* or *digest*. Hash value is commonly used in conjunction with *hash table*. A good hash function has the following properties:
- always returns same hash value for same input
- equal input will therefore have the same hash, unequal input on the other hand should have different hashes
- must be uniform, it must distribute hash over its range
- fixed size output from a hash function is desirable
- should be non-invertible, ie from a given hash one cannot determine the input used to generate the given hash

A sample hash function: In Java a string's hash is calculated in the following manner:

In [1]:
def hash_string(input):
    hash = 0; j = 1;
    for i in input:
        hash += ord(i)*(31**(len(input)-j))
        j += 1
    return hash

hash_string('ABC')

64578

### Hash Table
Hash Table is a data structure which maps keys to values. We use the keys to calculate hash and that hash acts as index where the value is stored. The image below represents a hash table:

![hash table](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Hash_table_3_1_1_0_1_0_0_SP.svg/315px-Hash_table_3_1_1_0_1_0_0_SP.svg.png)

### Collisions
It is possible that different inputs may have the same hash, for example,

In [2]:
print('Hash for Aa is ', hash_string('Aa'))
print('Hash for BB is ', hash_string('BB'))

Hash for Aa is  2112
Hash for BB is  2112


There are several methods to resolve collisions. In a typical hash table the index is calculated in the following two steps:
$$index = f(key, array\_size)$$
$$hash = hash\_func(key)$$
$$index = hash \% array\_size$$

The **load_factor** of a hashtable is $load\_factor = \frac{n}{k}$, where $n$ is the number of occupied entries and $k$ is the total number of buckets (array_size).

The following methods are used to reduce collision:
- **separate chaining:** in this case each bucket contains a linked list of all entries having the same hash. In this case the cost of lookup depends upon the average number of keys per bucket. The worst case in this scenario is when all the items are stored in the same bucket, this is equivalent to searching in a plain list. Other data structures (rather than linked list) can also be used, like self balancing BST.
![separate chaining](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Hash_table_5_0_1_1_1_1_1_LL.svg/450px-Hash_table_5_0_1_1_1_1_1_LL.svg.png)

- **open addressing:** in open addressing, in case of collision, we move to next bucket until an empty bucket is found. The drawback is that the maximum entries that can be stored is the number of buckets. The next bucket can be found in the following ways:
    - **linear probing:** in case of collision look at next bucket, then next bucket, then next until vacant bucket is found
    - **quadratic probing:**
    - **double hashing:**