# `hash table`與`hash function`原理

#### hash function
hash function 稱為雜湊演算法，功能是將一字串放進hash function後，進行像是加密、或重新給予編碼的處理。
而這些被處理後的output，不可反推回原本的字串樣子，且壓縮後的編碼不容易和其他被處理過後的編碼衝突或相同，所以很常被用在加密的處理上。
在瀏覽過許多網站和自己理解後，我整理了幾個hash function的性質：
* (1) 不容易發生衝突狀況
* (2) 即使兩個字串的文字只有一點差距，經過hash function處理後的output也會相差甚大，因此很適合用來做比較
* (3) 即使輸入的字串很長很長，hash function處理後的編碼也是可以依function不同而固定的(但輸入很短，返回的值也是固定的長度)
* (4) 運算速度快

#### hash table
hash table 則是被稱為雜湊表，是指存放hash function產出值而有的表，許多人在整理並為了方便檢索時，會將這些表使用Linked List串聯，並將他除以指定的儲存空間數，將具有相同餘數的雜湊數字，用Linked List串聯起來。而雖是題外話，但我此次使用的概念包含hash set的概念，每一個Linked List的值只能出現一次。

# 學習歷程

## 目標：
### 生成Hash Table，並使其具有`add()`、`remove()`、`contains()`的功能，而其中被放入的字串，皆需要被MD5的方式編碼加密過。

#### 第一步：先將指定的字串使用MD5方式加密，並改成數字方式編碼
##### 套用Crypto.Hash功能

In [300]:
from Crypto.Hash import MD5
h = MD5.new()
h.update("I Awoiejroiwjeroiweoriuweoriuweoiuroweiurowieurowieurowieurowieurowieum yin".encode("utf-8"))
print(h.hexdigest())
x = h.hexdigest()
x = int(h.hexdigest(), 16)
print(x)

6905f7c6cd2ccce846e9e4b20ca24ab3
139599926547555707976931732289954663091


##### 字串成功編碼成MD5，可以開始寫add()

雖然add()的順序看似是第一步應該完成的函數，但因為我假設資料儲存是Hash Set的模式，所以當新增資料的時候不能有重複的元素。

第二步：定義ListNode()的型態，做為以後儲存元素的節點。
我最初的構想是先寫`contains()`函式，並在`add()`中呼叫，當`contains()`返回False時，才需要新增函式

In [108]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None

class MyHashSet():
    def __init__(self, capacity = 5):
        self.capacity = capacity
        self.data = [] * capacity
    
    def contains(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        key = str(key)
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        remainder = ListNode(remainder)
        
        if remainder:
            if remainder in self.data:
                index = self.data.index(remainder)
                now = self.data[index]
                print(index)
                if x == now.val:
                    return True
                else:
                    now = now.next
                    self.contains(key)
                    return False
            else:
                return False
        else:
            return False
    
    def add(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        b = self.contains(key)
        if b == False:
            if remainder.val in self.data:
                now = self.data[self.data.index(remainder)]
                now.next = x
                x = now
            else:
                self.data.append(remainder)
                remainder.next = x
                now = x
                
        else:
            return

In [109]:
hashSet = MyHashSet()
hashSet.add("coffee")
hashSet.add("Dog Mochi")
hashSet.add("golden")
hashSet.add("retriever")
hashSet.add("drink")
hashSet.add("0")
hashSet.add("1")
hashSet.add("2")

rel = hashSet.contains("cat")
print(rel)

# add有問題
print(hashSet.data[0].val)
print(hashSet.data[1].val)
print(hashSet.data[2].val)
print(hashSet.data[3].val)
print(hashSet.data[4].val)

False
3
4
4
4
4


此時的`contain()`函式是有錯的，無論如何都會返回False，不過若單看`add()`函式本身，會發現依狀況不同，需要新增的資料也不同。
* (1) self.data儲存餘數remainder，而當remainder還未被新增進self.data時，需要先在self.data中新增remainder節點，而remainder.next = ListNode(x)  (x 為 key MD5後的值)
* (2) 當當self.data中已有和餘數相等的remainder節點值，則需要判斷這條ListNode()的每個節點，是否有x.val。若有則不用新增x；若無則新增x

麻煩的是，要怎麼寫出來

#### 發現add有問題

In [5]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None

class MyHashSet():
    def __init__(self, capacity = 5):
        self.capacity = capacity
        self.data = [None] * capacity
    
    def contains(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        key = str(key)
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        remainder = ListNode(remainder)
        
        if remainder:
            if remainder in self.data:
                index = self.data.index(remainder)
                now = self.data[index]
                print(index)
                if x == now.val:
                    return True
                else:
                    now = now.next
                    self.contains(key)
                    return False
            else:
                return False
        else:
            return False
        
    def add(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        b = self.contains(key)
        
        if b == False:
            # 此處有三個可能性，一為餘數的值尚未存在於self.data中，需要新增餘數和key值、
            # 二到三種為已經存在在self.data中，不過key值已存在在ListNode中不需新增 、 key值不存在在listnode中，要找到listnode的尾巴新增
            if remainder.val in self.data:
                # 當x已存在於ListNode中，不須取代
                now = self.data[remainder]
                if now.next.val == x.val:
                    return
                else:
                    now = now.next
            
            else:
                self.data[remainder] = ListNode(remainder)
                self.data[remainder].next = ListNode(x)
                return self.add(key)
            
            
       # 當值存在時，不須添加直接return就好      
        else:
            return

In [6]:
hashSet = MyHashSet()
hashSet.add("cat")
hashSet.add("Dog Mochi")
hashSet.add("golden")
hashSet.add("1.5")

rel = hashSet.contains("cat")
print(rel)
print(hashSet.data[0].val)

# add有問題
print(hashSet.data[1].val)
print(hashSet.data[2].val)
print(hashSet.data[3].val)

TypeError: list indices must be integers or slices, not ListNode

問題出現在判斷`self.data[remainder]`是否等於ListNode(remainder)
為了解決搜尋的問題，我曾嘗試建立一個`search_ListNode()`的函式，來藉由返回true_False判斷式子是否成立
新增`search_ListNode()`來尋找ListNode是否有key轉成的x值

In [94]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None

class MyHashSet():
    def __init__(self, capacity = 5):
        self.capacity = capacity
        self.data = [None] * capacity
    
    def contains(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        key = str(key)
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        remainder = ListNode(remainder)
        
        if remainder:
            if remainder in self.data:
                index = self.data.index(remainder)
                now = self.data[index]
                print(index)
                if x == now.val:
                    return True
                else:
                    now = now.next
                    self.contains(key)
                    return False
            else:
                return False
        else:
            return False
        
    def search_ListNode(self, remainder, x):
#         from Crypto.Hash import MD5
#         a = MD5.new()
#         a.update(key.encode("utf-8"))
#         x = a.hexdigest()
#         x = int(a.hexdigest(), 16)
#         remainder = x % self.capacity
        
#         remainder = ListNode(remainder)
        first_key = remainder.next
        
        if not first_key:
            return "not in here"
        if first_key.val == x:
            return True
        
        return self.search_ListNode(first_key, x)
        
    
    def add(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        b = self.contains(key)
        
        if b == False:
            # 此處有三個可能性，一為餘數的值尚未存在於self.data中，需要新增餘數和key值、
            # 二到三種為已經存在在self.data中，不過key值已存在在ListNode中不需新增 、 key值不存在在listnode中，要找到listnode的尾巴新增
            if remainder.val in self.data:
                # 當x已存在於ListNode中，不須取代
                if search_ListNode(remainder, x) == True:
                    return
                else:
                    cur = self.data[remainder.val]
                    while cur.next != None:
                        cur = cur.next
                    cur.next = x
            
       # 當值存在時，不須添加直接return就好      
        else:
            return

### 最後發現問題！ 
#### 原來Linked List在List中無法使用`.index()`函數！<br/> 而LinkNode也不能直接在List中直接比較是否相等

In [93]:
### 原來linked list在List中沒有辦法使用.index()函數！！！


class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None
        
o = ['cat', 234, 'dfad', 245645, ListNode(55)]
ListNode(55) in o

False

In [295]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None
        
f = [ListNode(1), ListNode(2), ListNode(3), ListNode(4)]
# print(f[2].val == 3)
print(f[2] == ListNode(3))

False


發現問題是出在判別式：list中的ListNode()的值不能直接拿來和int比較!! 所以判別式的公式會導致self.data[0] != remainder.val  <br/>
經過修改後，決定改寫判斷的程式碼：

# Add成功!!!

In [190]:
from Crypto.Hash import MD5
h = MD5.new()
h.update("Dog".encode("utf-8"))
print(h.hexdigest())
x = h.hexdigest()
x = int(h.hexdigest(), 16)
print(x)
x % 5

f = MD5.new()
f.update("soda".encode("utf-8"))
y = f.hexdigest()
y = int(f.hexdigest(), 16)

print(y)

c935d187f0b998ef720390f85014ed1e
267454268680180314793225837761097166110
229404840929843000141111086926369191910


In [212]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None

class MyHashSet():
    def __init__(self, capacity = 5):
        self.capacity = capacity
        self.data = [None] * capacity
    
    def contains(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        key = str(key)
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        remainder = ListNode(remainder)
        
        if remainder:
            if remainder in self.data:
                index = self.data.index(remainder)
                now = self.data[index]
                print(index)
                if x == now.val:
                    return True
                else:
                    now = now.next
                    self.contains(key)
                    return False
            else:
                return False
        else:
            return False        
    
    
    def add(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        if not self.data[remainder.val]:
            self.data[remainder.val] = remainder
            self.data[remainder.val].next = x
        else:
            now = self.data[remainder.val]
            
            while now.val != x.val and now.next:
                now = now.next
            if now.val != x.val:
                now.next = x
            else:
                return

In [241]:
hashSet = MyHashSet()
hashSet.add("Dog")        # 0
hashSet.add("I Am yin")   # 1
hashSet.add("cat")        # 2
hashSet.add("Mochi")      # 4
hashSet.add("I Am yin1")  # 4
hashSet.add("coffee")     # 3
hashSet.add("cola")       # 1
hashSet.add("soda")       # 0
hashSet.add("Dog")        # 0
hashSet.add("aaba")       # 0

rel = hashSet.contains("cat")
print(rel)

# add有問題
print(hashSet.data[0].next.val)
print(hashSet.data[0].next.next.val)
print(hashSet.data[0].next.next.next.val)
print(hashSet.data[1].val)
print(hashSet.data[2].val)
print(hashSet.data[3].val)
print(hashSet.data[4].val)

False
267454268680180314793225837761097166110
229404840929843000141111086926369191910
229466311654740609737103861184741220735
1
2
3
4


### 小驕傲的提一下，我發現`add()`的邏輯有很大一部分和`remove()`以及`contains()`很像，因此在`add()`的函式成功後，`remove()`和`contains()`的程式，在一小時內就完成了。

## remove()

In [242]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None

class MyHashSet():
    def __init__(self, capacity = 5):
        self.capacity = capacity
        self.data = [None] * capacity
    
    def remove(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        if self.data[remainder.val]:
            now = self.data[remainder.val]
            while now.val != x.val and now.next:
                former = now
                now = now.next
                
            former.next = now.next

        else:
            return
    
    
    def add(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        if not self.data[remainder.val]:
            self.data[remainder.val] = remainder
            self.data[remainder.val].next = x
        else:
            now = self.data[remainder.val]
            
            while now.val != x.val and now.next:
                now = now.next
            if now.val != x.val:
                now.next = x
            else:
                return

In [297]:
hashSet = MyHashSet()
hashSet.add("Dog")        # 0
hashSet.add("I Am yin")   # 1
hashSet.add("cat")        # 2
hashSet.add("Mochi")      # 4
hashSet.add("I Am yin1")  # 4
hashSet.add("coffee")     # 3
hashSet.add("cola")       # 1
hashSet.add("soda")       # 0
hashSet.add("Dog")        # 0
hashSet.add("aaba")       # 0

hashSet.remove("Dog")


# add有問題
print(hashSet.data[0].next.val)
print(hashSet.data[0].next.next.val)
print(hashSet.data[0].next.next.val)
print(hashSet.data[1].val)
print(hashSet.data[2].val)
print(hashSet.data[3].val)
print(hashSet.data[4].val)

229404840929843000141111086926369191910
229466311654740609737103861184741220735
229466311654740609737103861184741220735
1
2
3
4


## contain()

In [282]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None

class MyHashSet():
    def __init__(self, capacity = 5):
        self.capacity = capacity
        self.data = [None] * capacity
    
    def add(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        if not self.data[remainder.val]:
            self.data[remainder.val] = remainder
            self.data[remainder.val].next = x
        else:
            now = self.data[remainder.val]
            
            while now.val != x.val and now.next:
                now = now.next
            if now.val != x.val:
                now.next = x
            else:
                return
            
    def remove(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        if self.data[remainder.val]:
            now = self.data[remainder.val]
            while now.val != x.val and now.next:
                former = now
                now = now.next
                
            former.next = now.next

        else:
            return
    
    
    def contains(self, key):
        from Crypto.Hash import MD5
        a = MD5.new()
        a.update(key.encode("utf-8"))
        x = a.hexdigest()
        x = int(a.hexdigest(), 16)
        remainder = x % self.capacity
        
        remainder = ListNode(remainder)
        x = ListNode(x)
        
        if self.data[remainder.val].val == remainder.val:
            now = self.data[remainder.val]
            
            while now.val != x.val and now.next:
                now = now.next
            if now.val == x.val:
                return True
            else:
                return False
                
        else:
            return False

In [292]:
hashSet = MyHashSet(2) # 中間的2是指修改capacity預設為5的設定，改成capacity = 2
hashSet.add("Dog")        # 0
hashSet.add("I Am yin")   # 1
hashSet.add("cat")        # 2
hashSet.add("Mochi")      # 4
hashSet.add("I Am yin1")  # 4
hashSet.add("coffee")     # 3
hashSet.add("cola")       # 1
hashSet.add("soda")       # 0
hashSet.add("Dog")        # 0
hashSet.add("aaba")       # 0

hashSet.remove("Dog")
a = hashSet.contains("I Am yin")
b = hashSet.contains("Dog")
c = hashSet.contains("0")
print(a)
print(b)
print(c)

# add有問題
print(hashSet.data[0].val)
print(hashSet.data[0].next.val)
print(hashSet.data[0].next.next.next.next.next.val)
# print(hashSet.data[0].next.next.next.val)
# print(hashSet.data[1].next.next.next.val)
# print(hashSet.data[2].val)
# print(hashSet.data[3].val)
# print(hashSet.data[4].val)

True
False
False
0
277102220249073555409885156483852860632
229404840929843000141111086926369191910


In [277]:
class ListNode():
    def __init__(self, val):
        self.val = val
        self.next = None
x = ListNode(3)
f = [ListNode(1), ListNode(2), ListNode(3), ListNode(4)]
# f[2].val
# x.val
f[2].val == x.val

# print(f[2] = ListNode(3))

True

# 流程圖
![hash table流程圖](https://github.com/agying/leetcode-practices/blob/master/Hash%20Table%E6%B5%81%E7%A8%8B%E5%9C%96.jpg?raw=true)

## 參考資料

## 參考資料

### 程式碼撰寫
不參考資料自力完成：`remove()`和`contains`函式
* `add()`參考下列頁面的部分程式邏輯作為除錯和靈感
* https://leetcode.com/problems/design-hashset/discuss/279957/Python-Chaining
* https://leetcode.com/problems/design-hashset/discuss/379193/Python-chaining-hashset-easy-to-understand
* https://kknews.cc/code/3mb5eg8.html

並在stackoverflow提問過，雖未得到回覆，但還是自己摸索出問題所在
* https://stackoverflow.com/questions/59181798/add-elements-to-linkedhashset-however-replace-instead

### 流程圖 & Hash Table、Hash Function參考資料
* https://zh.wikipedia.org/wiki/%E6%95%A3%E5%88%97%E5%87%BD%E6%95%B8
* https://hackmd.io/@EW34LLeXTra2Oikg0WEQ5Q/HJln3jU_e?type=view
* https://ithelp.ithome.com.tw/articles/10208884
* 
* 
* 