## Hash Function
Hash function is any function which maps data of arbitrary size to a fixed-size value. Value returned by such function is called *hash value*, *hash* or *digest*. Hash value is commonly used in conjunction with *hash table*. A good hash function has the following properties:
- always returns same hash value for same input
- equal input will therefore have the same hash, unequal input on the other hand should have different hashes
- must be uniform, it must distribute hash over its range
- fixed size output from a hash function is desirable
- should be non-invertible, ie from a given hash one cannot determine the input used to generate the given hash

A sample hash function: In Java a string's hash is calculated in the following manner:

In [7]:
public int hash(String input) {
    int h = 0;
    for (int i = 0; i < input.length(); i++) {
        h = 31 * h + (int) input.charAt(i);
    }
    return h;
}

System.out.println(hash("ABC"));
System.out.println(hash("abc"));

64578
96354


Which is essentially $hash=s[0] \times 31^{n−1} + s[1] \times 31^{n−2} + ... + s[n−1]$

## Hash Table
Hash Table is a data structure which maps keys to values. We use the keys to calculate hash and that hash acts as index where the value is stored. The image below represents a hash table:

![hash table](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Hash_table_3_1_1_0_1_0_0_SP.svg/315px-Hash_table_3_1_1_0_1_0_0_SP.svg.png)

### Collisions
It is possible that different inputs may have the same hash, for example,

In [8]:
System.out.println("Hash for Aa is " + hash("Aa"));
System.out.println("Hash for BB is " + hash("BB"));

Hash for Aa is 2112
Hash for BB is 2112


There are several methods to resolve collisions. In a typical hash table the index is calculated in the following two steps:
$$index = f(key, array\_size)$$
<hr>
$$hash = hash\_func(key)$$
$$index = hash \% array\_size$$

The **load_factor** of a hashtable is $load\_factor = \frac{n}{k}$, where $n$ is the number of occupied entries and $k$ is the total number of buckets (array_size).

The following methods are used to reduce collision:
- **separate chaining:** in this case each bucket contains a linked list of all entries having the same hash. In this case, the cost of lookup depends upon the average number of keys per bucket. The worst case in this scenario is when all the items are stored in the same bucket, this is equivalent to searching in a plain list. Other data structures (rather than linked list) can also be used, like self balancing BST. Java's `HashMap` uses this technique.
![separate chaining](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Hash_table_5_0_1_1_1_1_1_LL.svg/450px-Hash_table_5_0_1_1_1_1_1_LL.svg.png)

- **open addressing:** in open addressing, in case of collision, we move to next bucket until an empty bucket is found. The drawback is that the maximum entries that can be stored is the number of buckets. The next bucket can be found in the following ways:
    - **linear probing:** in case of collision look at next bucket, then next bucket, then next until vacant bucket is found. `IdentityHashMap` uses this technique.
    - **quadratic probing:** in case of collision look at buckets at quadratic distances ($h, h+1^2, h+2^2, h+3^2, ...$)
    - **double hashing:** uses a second hash function to decide step size

### Chained Hash Table
![Chained Hash Table](images/XVb6Pnn.png)

In the above example, $n = 14$ is the current occupancy. Whereas $t = 16$ is the size of the array. The hash value of data item $x$, is $y = hash(x)$. The hash, $y \in \{0,1,..., t-1\}$.  
It is necessary to make sure the lists do not get long, so $n \le t$

In [None]:
List<T>[] table;
int n;	// Number of items in the table

public boolean add(T x) {
    // Return false if the element already exists
    if(find(x))
        return false;

    // We follow the below rule so that the lists do not become too long.
    // Or we can resize based on predecided load factor.
    if(n + 1 > table.length)
        resize(); // This method resizes the table array and reinserts all values

    table[hash(x)].add(x);
    n++;

    return true;
}

public T remove(T x) {
    // Iterator is used because we are modifying list
    // during iteration
    Iterator<T> iterator = table[hash(x)].iterator();
    while(iterator.hasNext()) {
        T temp = iterator.next();
        if(temp.equals(x)) {
            iterator.remove();
            n--;
            return temp;
        }
    }
    
    return null;
}

public boolean find(T x) {
    for(T t: table[hash(x)]) {
        if(t.equals(x)) {
            return true;
        }
    }

    return false;
}

### Linear Hash Table
In a linear hash table, we follow the following process:
1. Find if the position $i = hash(x)$ is vacant, if yes then we insert value at that index
2. If the previous position is already occupied, then try to store at $(i + 1)\ \%\ table.length$
3. If the index in previous step is also not available then go to $(i + 2)\ \%\ table.length$
4. Keep incrementing till a vacant position is found

Whenever we remove an item from hashtable, we replace it with dummy `del`. This del item indicates that the index was previously occupied. So in total, we store three different types of values
- data values: actual values in the USet that we are representing
- null values: at array locations where no data has ever been stored; and
- del values: at array locations where data was once stored but that has since been deleted.

To illustrate why the `del` value is used, consider hash table of length 7. And `hash = key % 7`. When we insert values 10, 17, 14 all have collisions and are inserted as `_ _ _ _ 10 17 24`. If we delete 17, we now have `_ _ _ _ 10 _ 24`. Now if we search for 24, we will not be able to since we will encounter a `null` value after 10 and stop searching.

In a linear hash table we maintain that $table.length \ge 2q$, where $q$ is the number of data or del values.

In [None]:
T[] table;
int n; // total number of filled spots
int q; // total number of filled or del spots
T del = (T) new Object();

public boolean find(T x) {
    int i = hash(x);
    int start = i;
    
    while (table[i] != null) {
        if (table[i] != del && table[i].equals(x))
            return true;
        i = (i + 1) % table.length;
        if (i == start) break;
    }

    return false;
}

public boolean add(T x) {
    if(table.length < 2 * ( q + 1))
        resize();

    int i = hash(x);
    int firstDel = -1;

    // Probe the table
    while (table[i] != null) {
        if (table[i] != del && table[i].equals(x)) {
            // Key already exists
            return false;
        }
        
        if (table[i] == del && firstDel == -1) {
            firstDel = i; // remember first tombstone
        }
        
        i = (i + 1) % table.length;
    }

    // Insert position: del if found, else current null
    if (firstDel != -1) {
        i = firstDel;
    } else {
        q++; // new occupied slot
    }

    table[i] = x;
    n++; // increment actual keys count
    return true;
}

public T remove(T x) {
    int i = hash(x);
    
    while (table[i] != null) {
        if (table[i] != del && table[i].equals(x)) {
            table[i] = del; // mark del tombstone
            n--;            // decrement actual keys
            return x;
        }
        i = (i + 1) % table.length;
    }

    // Key not found
    return null;
}

## Bloom Filter
Is a probabilistic data structure that tells us whether an element may be present in a set or definitely not. We add elements to the bloom filter and then later check if an element is present. The result of this check can be:
- true indicating that the element may be present
- false indicating that the element is certainly not present

[More details](https://www.enjoyalgorithms.com/blog/bloom-filter)

**Probability calculation**: let 
- $n$ be length of bloom filter
- $k$ be the number of hash functions
- $m$ be the current occupancy (number of elements added to the filter).

Assuming each hash function selects an index with uniform probability, the probability that a specific bit is not picked by a $k$ hash function is:
$$p = (1 - \frac{1}{n})^k$$

The probability that a specific bit is not picked by any of the $k$ hash functions during the insertion of $m$ elements is:
$$p = (1 - \frac{1}{n})^{km}$$

The probability that a specific bit has been set to 1 is
$$p = 1 - (1 - \frac{1}{n})^{km}$$

A false positive occurs when we check a non-existent element, but all $k$ bits corresponding to its hash values happen to be set (are 1)
$$p = (1 - (1 - \frac{1}{n})^{km})^k$$

This is called as error rate or false positive probability. As we can see, increasing $n$ reduces the error rate on expense of space. A very simple implementation of bloom filter is given below:

In [None]:
public class BloomFilter {
    private final byte[] filter;
    private final int n;
    private final int k;
    private int m;

    public BloomFilter(int n, int k) {
        filter = new byte[n];
        this.n = n;
        this.k = k;
    }

    public void addElement(String input) {
        for (int i : getBloomFilterIndices(input)) {
            filter[i] = 1;
        }

        m++;
    }

    public boolean contains(String input) {
        for (int i: getBloomFilterIndices(input)) {
            if (filter[i] == 0) {
                return false;
            }
        }

        return true;
    }

    private int[] getBloomFilterIndices(String input) {
        int[] indices = new int[k];
        int hash1 = input.hashCode();
        // Simple secondary hash derived from hash1
        int hash2 = (hash1 >>> 16) | (hash1 << 16);

        for (int i = 0; i < k; i++) {
            // Use double hashing: (h1 + i * h2) % n
            int combinedHash = hash1 + i * hash2;
            indices[i] = Math.abs(combinedHash % n);
        }

        return indices;
    }
}

## Problems
**Q 1:** Given an array find if there exists a subarray such that the sum of elements of that subarray equals zero.  
**Answer:** A naive approach is to go through all the subarrays. But there is a better approach. Consider the array `2,4,-3,-1,5,-1`. If we generate iits prefix array, we get `2,6,3,2,7,6` We see that The sum rises from 2 and goes back to 2. This means that there exists a subarray having sum equal to zero. In general, if there are repeating values in prefix sum array, we can conclude that there exists a subarray having sum zero.  
There is a corner case. Consider the array `4,-3,-1,2,7`. Its prefix array is `4,1,0,2,9`. None of the elements of prefix array occur more than once, yet we have a subarray with sum equal to zero.

In [1]:
public boolean zeroSumSubArray(int[] array) {
    // Form prefixSum array
    int[] prefixSum = new int[array.length];
    prefixSum[0] = array[0];
    for (int i = 1; i < array.length; i++) {
        prefixSum[i] = prefixSum[i - 1] + array[i];
    }

    Set<Integer> seen = new HashSet<>();
    for (int i = 0; i < prefixSum.length; i++) {
        if (seen.contains(prefixSum[i]) || prefixSum[i] == 0) {
            return true;
        }

        seen.add(prefixSum[i]);
    }

    return false;
}

System.out.println(zeroSumSubArray(new int[]{2, 4, -3, -1, 5, -1}));
System.out.println(zeroSumSubArray(new int[]{4, -3, -1, 2, 7}));

true
true


**Q 2:** This question is just an extension of the above one. In this question we have to return the length of the longest subarray having sum equal to zero.  
**Answer:** To solve this problem, we will be storing the index of the first occurance in the map as well. So the answer is:

In [5]:
public int longestZeroSumSubArray(int[] array) {
    // Form prefixSum array
    int[] prefixSum = new int[array.length];
    prefixSum[0] = array[0];
    for (int i = 1; i < array.length; i++) {
        prefixSum[i] = prefixSum[i - 1] + array[i];
    }

    Map<Integer, Integer> seen = new HashMap<>();
    int length = 0;
    for (int i = 0; i < prefixSum.length; i++) {
        if (prefixSum[i] == 0 && i + 1 > length) {
            length = i + 1;
        } else if (seen.containsKey(prefixSum[i]) && (i - seen.get(prefixSum[i])) > length) {
            length = i - seen.get(prefixSum[i]);
        }

        if (!seen.containsKey(prefixSum[i]))
            seen.put(prefixSum[i], i);
    }

    return length;
}

System.out.println(longestZeroSumSubArray(new int[]{2, 4, -3, -1, 5, -1}));
System.out.println(longestZeroSumSubArray(new int[]{0, 4, -4}));

4
3


[GFG](https://www.geeksforgeeks.org/problems/largest-subarray-with-0-sum/1)

**Q 3:** This question is generalization of the above. Instead of sum being zero, find the length of longest subarray having sum $K$.  
**Answer:** In the above problem, we had equation like this: `prefix[j] - prefix[i] = 0`. Here we modify it to `prefix[j] - prefix[i] = K`. Or `prefix[j] - K = prefix[i]`.

In [6]:
public int longestKSumSubArray(int[] array, int k) {
    // Form prefixSum array
    int[] prefixSum = new int[array.length];
    prefixSum[0] = array[0];
    for (int i = 1; i < array.length; i++) {
        prefixSum[i] = prefixSum[i - 1] + array[i];
    }

    Map<Integer, Integer> seen = new HashMap<>();
    int length = 0;
    for (int i = 0; i < prefixSum.length; i++) {
        if (prefixSum[i] == k && i + 1 > length) {
            length = i + 1;
        } else if (seen.containsKey(prefixSum[i] - k) && (i - seen.get(prefixSum[i] - k)) > length) {
            length = i - seen.get(prefixSum[i] - k);
        }

        if (!seen.containsKey(prefixSum[i]))
            seen.put(prefixSum[i], i);
    }

    return length;
}

System.out.println(longestKSumSubArray(new int[]{1, 2, -3, 3, -1, 2, 4}, 3));
System.out.println(longestKSumSubArray(new int[]{1, 3, 15, 10, 20, 23, 3}, 48));

5
4


[GFG](https://www.geeksforgeeks.org/problems/longest-sub-array-with-sum-k0809/1)

In the above problem if the array contained only non-negative numbers then we could have solved this using two pointers (since the prefix sum array would be sorted in ascending order).

In [7]:
public int longestKSumSubArray2Pointer(int[] array, int k) {
    // Form prefixSum array
    int[] prefixSum = new int[array.length];
    prefixSum[0] = array[0];
    for (int i = 1; i < array.length; i++) {
        prefixSum[i] = prefixSum[i - 1] + array[i];
    }

    int length = 0;
    int i = 0, j = 0;
    while (i <= j && j < prefixSum.length) {
        if (prefixSum[j] - prefixSum[i] == k) {
            if (j - i > length) length = j - i;
            j++;
        } else if (prefixSum[j] - prefixSum[i] < k) {
            j++;
        } else {
            i++;
        }
    }

    return length;
}

System.out.println(longestKSumSubArray2Pointer(new int[]{1, 3, 15, 10, 20, 23, 3}, 48));

4


**Q 4:** Given an array, a special pair is a pair of numbers such that `A[i] == A[j]` and `|i - j|` is minimum. Return the index of special pair. For example in the array `2,4,6,2,3,12,3`. Both 2 and 3 are repeated but the distance between the 3s is minimum, so we return `(4,6)` as the answer.  
**Answer:**

In [8]:
public int[] specialPair(int[] array) {
    int pairDistance = array.length;
    Map<Integer, Integer> indexMap = new HashMap<>();
    int[] answer = new int[2];

    for (int i = 0; i < array.length; i++) {
        if (indexMap.containsKey(array[i]) && i - indexMap.get(array[i]) < pairDistance) {
            pairDistance = i - indexMap.get(array[i]);
            answer[0] = indexMap.get(array[i]);
            answer[1] = i;
        }

        indexMap.put(array[i], i);
    }

    return answer;
}

System.out.println(Arrays.toString(specialPair(new int[]{2, 4, 6, 2, 3, 12, 3})));

[4, 6]


**Q 5:** Given an array A, for example `13,4,3,1,12,11,5,6,2`, return the length of largest sequence consisting of consecutive elements. In this example the answer is 6 corresponding to the subsequence `1,2,3,4,5,6` .   
**Answer:** One way to solve it is by sorting the numbers:

In [10]:
public int longestConsecutive(int[] input) {
    Arrays.sort(input);

    int currentSize = 1;
    int maxSize = 1;

    for (int i = 1; i < input.length; i++) {
        if (input[i] == input[i-1] + 1) {
            currentSize++;
        } else if (input[i] != input[i-1]) {
            currentSize = 1;
        }

        if (currentSize > maxSize) {
            maxSize = currentSize;
        }
    }

    return maxSize;
}

System.out.println(longestConsecutive(new int[]{13, 4, 3, 1, 12, 11, 5, 6, 2}));
System.out.println(longestConsecutive(new int[]{1,0,1,2}));

6
3


In the example used above, we can see that there are two clusters of consequtive numbers: `1,2,3,4,5,6` and `11,12,13`. So if we identify the element which is the starting element of the cluster, then we can solve this problem. So once we identify a number A as the start, we can do A+1 and check if the number is present in the array or not, and so on.

In [7]:
public int longestConsecutive2(int[] input) {
    Set<Integer> seen = new HashSet<>();
    for (int i = 0; i < input.length; i++) {
        seen.add(input[i]);
    }

    int maxSize = 0;
    for (int i=0; i<input.length; i++) {
        if (!seen.contains(input[i] - 1)) { // Found the left boundary
            int size = 1;
            int element = input[i];

            while (seen.contains(++element)) {
                size++;
            }

            if (size > maxSize) {
                maxSize = size;
            }
        }
    }

    return maxSize;
}

It is better to iterate over the set instead of the array since the above solution times out on LC.  

In [None]:
public int longestConsecutive3(int[] input) {
    Set<Integer> seen = new HashSet<>();
    for (int i = 0; i < input.length; i++) {
        seen.add(input[i]);
    }

    int maxSize = 0;
    for (int num: seen) {
        if (!seen.contains(num - 1)) {
            int size = 1;

            while (seen.contains(++num)) {
                size++;
            }

            if (size > maxSize) {
                maxSize = size;
            }
        }
    }

    return maxSize;
}

[LeetCode 128](https://leetcode.com/problems/longest-consecutive-sequence)

What if the problem was regarding subsequence? This means we need to preserve order of elements.

In [12]:
public int longestConsecutiveSubseq(int[] input) {
    Map<Integer, Integer> seen = new HashMap<>();

    int maxSize = Integer.MIN_VALUE;
    for (int i = 0; i < input.length; i++) {
        int size;
        if (seen.containsKey(input[i] - 1)) {
            size = 1 + seen.get(input[i] - 1);
            seen.put(input[i], size);
        } else {
            size = 1;
            seen.put(input[i], size);
        }

        if (size > maxSize) {
            maxSize = size;
        }
    }

    return maxSize;
}

System.out.println(longestConsecutiveSubseq(new int[]{13, 4, 3, 1, 12, 11, 5, 6, 2}));
System.out.println(longestConsecutiveSubseq(new int[]{1,0,1,2}));

3
3


**Q 6:** Given an array of strings, find how many palindromes can be formed by concatenating two strings from the array at a time. For example, consider the array `['abcd', 'dcba', 'lls', 's', 'ssll']`. The palindromic pairs formed are: `abcddcba, slls, llsssll, dcbaabcd`   
**Answer:** Naive approach is two go through all the possible pairs and check if the concatenated string is palindrome or not. There is a better approach to solve the problem. The time complexity in this case is $O(nk^2)$, where $k$ is the average length of a string.

In [11]:
public List<String> palindromicPairs(List<String> input) {
    Map<String, Integer> map = new HashMap<>();
    for (int i = 0; i < input.size(); i++) {
        map.put(new StringBuilder(input.get(i)).reverse().toString(), i);
    }

    Set<String> answer = new HashSet<>();
    for (int i = 0; i < input.size(); i++) {
        String str = input.get(i);
        for (int j = 0; j < str.length(); j++) {
            String prefix = str.substring(0, j);
            String rest = str.substring(j);

            // There is reverse of prefix present somewhere else in the string and rest is palindrome
            // <prefix><rest><rev prefix>
            if (map.containsKey(prefix) && map.get(prefix) != i && isPalindrome(rest)) {
                answer.add(new StringBuilder(prefix).reverse().insert(0, str).toString());
            }

            // There is reverse of rest present somewhere else in the string and prefix is palindrome
            // <rev of rest><prefix><rest>
            if (map.containsKey(rest) && map.get(rest) != i && isPalindrome(prefix)) {
                answer.add(new StringBuilder(rest).reverse().append(str).toString());
            }
        }
    }

    return new ArrayList<>(answer);
}

public boolean isPalindrome(String input) {
    int start = 0, end = input.length() - 1;
    while (start <= end) {
        if (input.charAt(start) != input.charAt(end)) {
            return false;
        }

        start++;
        end--;
    }

    return true;
}

System.out.println(palindromicPairs(List.of("abcd", "dcba", "lls", "s", "ssll")));

[abcddcba, slls, llsssll, dcbaabcd]


[LeetCode 336](https://leetcode.com/problems/palindrome-pairs)