# Hash Table
<hr>

## Definition
一般来说就是根据关键码(**map keys**)来直接进行访问的数据结构。要使用**hash function**。数组就是一种hash table。

当我们遇到了要**快速判断一个元素是否出现集合里**的时候，就要考虑哈希法。

但是哈希法也是**牺牲了空间换取了时间**，因为我们要使用额外的数组，set或者是map来存放数据，才能实现快速的查找。

### Hash function
例如学生名字进行举例：

`index = hashFunction(name)`

`hashFuction = hashCode(name)%tableSize`

如果两个不同的名字mod之后的值一样，就涉及到**hash collision**

### Hash collision
1. **Chaining 拉链法**<br>
Store multiple values at the same index using a linked list, array, or other data structures. When a collision occurs, new values are added to the list at that index.
2. **Open Addressing**
- *Linear Probing:* Check the next index sequentially until an empty slot is found. (**tableSize > dataSize**)
- *Quadratic Probing:* Use a quadratic function to find the next available slot (e.g., index + 1², index + 2², ...).
- *Double Hashing:* Use a second hash function to determine the step size for resolving collisions.

### Hash table structure
In most cases, **maps (dictionaries)** are the most common implementation of hash tables in programming languages like **Python (dict), Java (HashMap), and JavaScript (Map)**.

1. **array: Used for the base structure to store values at hashed indices.**
- Most hash table implementations use an array as the underlying data structure to store values at computed indices. This allows for constant-time access ($O(1)$ on average) when resolving hash function outputs.

2. **set: Used when only key existence needs to be checked.**
- A set is typically used when a hash table only needs to store unique keys without associated values. Many programming languages implement sets using hash tables internally.

3. **map: Used for key-value storage and retrieval.**
- A map (or dictionary) is a higher-level data structure that uses a hash table internally to store key-value pairs, ensuring fast lookups, insertions, and deletions.

### 242. Valid Anagram
给两个字符串，确定它们是由相同的字母组成的。

a到z26个字母的ASCII是连续的。这样可以定义一个**26位array** `hash[26]`.
- traverse第一个词这样有一个数组

`for(i=0, i<s.size, i++): hash[s[i] - 'a']++;`
- traverse第二个词的时候相应的从数组中减去

`for(i=0, i<t.size, i++): hash[t[i] - 'a']--;`
- 如果这个数组变回0，证明这两个词是Anagram

`for(i=0, i<26, i++): if (hash[i] != 0) return false`

in python

`ord()` is a built-in function that returns the Unicode code point (ASCII value for English letters) of a given character.

```python
record[ord(i) - ord("a")] += 1
```
- ord(i) gets the ASCII value of the character i.
- ord("a") gets the ASCII value of 'a', which is 97.
- ord(i) - ord("a") calculates the index position for the character in the record array (where 'a' maps to 0, 'b' to 1, …, 'z' to 25).


In [2]:
class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        array = [0] * 26

        for i in s:
            array[ord(i) - ord('a')] += 1

        for i in t:
            array[ord(i) - ord('a')] -= 1

        for i in array:
            if i != 0:
                return False
            
        return True
    
# test code
s = Solution()
print(s.isAnagram("anagram", "nagaram")) # True
print(s.isAnagram("rat", "car")) # False

True
False


### 1002. Find Common Character
用一个巧妙的办法来算：

| Character  | a | b | c | d | e | ... |
|------------|---|---|---|---|---|-----|
| bella      | 1 | 1 | 0 | 0 | 1 | ... |
| label      | 1 | 1 | 0 | 0 | 1 | ... |
| roller     | 0 | 0 | 0 | 0 | 1 | ... |
| **min**    | 0 | 0 | 0 | 0 | 1 | ... |


In [5]:
from typing import List

class Solution:
    def commonChars(self, words: List[str]) -> List[str]:
        # initial a minimum frequency table
        min_freq = [float('inf')] * 26

        for word in words:
            char_freq = [0] * 26
            for char in word:
                char_freq[ord(char) - ord('a')] += 1

        for i in range(26):
            min_freq[i] = min(min_freq[i], char_freq[i])

        result = []
        for i in range(26):
            result.extend([chr(i + ord('a'))] * min_freq[i])

        return result
    
# test code
s = Solution()
print(s.commonChars(["bella", "label", "roller"])) # ["e", "l", "l"]
print(s.commonChars(["cool", "lock", "cook"])) # ["c", "o"]

['e', 'l', 'l', 'o', 'r', 'r']
['c', 'k', 'o', 'o']


### 349.Intersection of Two Arrays
在c++里面，这个题可以使用 `unordered_set` 使用的是hash来做底层数据结构。而`set` `multiple_set`是红黑树。不适合这道题。

一开始这个题目没有限制number，用数组就十分的占用空间，后面限制number 1000了就可以用数组了。

用set的话，就traverse nums1 进入set，然后比较traverse num2，看有没有相同的。就可以返回了。

数组的话就相对来说跟1002差不多进行比较就行。

In [6]:
class Solution:
    def intersection1(self, nums1: List[int], nums2: List[int]) -> List[int]:
        return list(set(nums1) & set(nums2))
    
    def intersection2(self, nums1: List[int], nums2: List[int]) -> List[int]:
        count1 = [0] * 1001
        count2 = [0] * 1001
        result = []

        # traverse nums1 and nums2
        for i in range(len(nums1)):
            count1[nums1[i]] = 1
        for i in range(len(nums2)):
            count2[nums2[i]] = 1

        # find the intersection
        for i in range(1001):
            if count1[i] == 1 and count2[i] == 1:
                result.append(i)
        
        return result
    
# test code
s = Solution()
print(s.intersection1([1, 2, 2, 1], [2, 2])) # [2]
print(s.intersection2([4, 9, 5], [9, 4, 9, 8, 4])) # [9, 4]

[2]
[4, 9]
