# 허프만 알고리즘(Huffman Coding Compression Algorithm)
## 개요
- 고정 길이 인코딩(fixed-length encoding) vs 가변 길이 인코딩(variable-length encoding)
- 접두사 규칙(prefix rules)
- 허프만 트리
- 활용

### 고정 길이 인코딩 vs 가변 길이 인코딩
- 고정 길이 인코딩: 문자당 비트 수(code word의 길이)가 고정. <br/>
&emsp; &emsp; &emsp; &emsp; &emsp; &emsp; &nbsp; 인코딩, 디코딩이 쉬움
- 가변 길이 인코딩: 각 분자당 비트 수(code word의 길이)가 다름 <br/>
&emsp; &emsp; &emsp; &emsp; &emsp; &emsp; &nbsp; 디코딩이 어려움

### 문자를 저장하는 데 필요한 공간을 줄이는 방법: 가변 길이 인코딩을 사용
- 문제점: 디코딩을 어떻게 할 것인가?

e.g.) string = "aabacdab" <br/>
빈도 수가 각각 a:4, b:2, c:1, d:1인 걸 알 수 있음 <br/>
빈도 수가 높은 문자에 더 적은 수의 비트를 할당 <br/>
<br/>

```
a   0
b   11
c   100
d   011
```

<br/> 

인코딩: "aabacdab" -> 00110100011011 (0|0|11|0|100|011|0|11) <br/>

디코딩: 00110100011011 -> ? <br/>

```
0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab
…

```
**즉, 여러 문자열로 해석될 여지가 있어 모호함** <br/>
해결 방법: **접두사 규칙(prefix rules)"**
<br/><br/>
### Prefix code: 어떤 코드(비트)도 다른 코드의 접두사가 아닌 코드

e.g.) string = "aabacdab"
```
a   0
b   10
c   110
d   111
```

### Huffman tree


## 복잡도
O(nlogn)

참고
https://www.techiedelight.com/huffman-coding/
https://yesdoing.github.io/%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98/2018/04/17/compression.html
https://velog.io/@kksshh0612/%ED%97%88%ED%94%84%EB%A7%8C-%EC%9D%B8%EC%BD%94%EB%94%A9%EB%94%94%EC%BD%94%EB%94%A9
https://home.cse.ust.hk/faculty/golin/COMP271Sp03/Notes/MyL17.pdf

In [1]:
import heapq
from heapq import heappop, heappush
 
def isLeaf(root):
    return root.left is None and root.right is None
 
# A Tree node
class Node:
    def __init__(self, ch, freq, left=None, right=None):
        self.ch = ch
        self.freq = freq
        self.left = left
        self.right = right
 
    # Override the `__lt__()` function to make `Node` class work with priority queue
    # such that the highest priority item has the lowest frequency
    def __lt__(self, other):
        return self.freq < other.freq
 
 
# Traverse the Huffman Tree and store Huffman Codes in a dictionary
def encode(root, s, huffman_code):
 
    if root is None:
        return
 
    # found a leaf node
    if isLeaf(root):
        huffman_code[root.ch] = s if len(s) > 0 else '1'
 
    encode(root.left, s + '0', huffman_code)
    encode(root.right, s + '1', huffman_code)
 
 
# Traverse the Huffman Tree and decode the encoded string
def decode(root, index, s):
 
    if root is None:
        return index
 
    # found a leaf node
    if isLeaf(root):
        print(root.ch, end='')
        return index
 
    index = index + 1
    root = root.left if s[index] == '0' else root.right
    return decode(root, index, s)
 
# Builds Huffman Tree and decodes the given input text
def buildHuffmanTree(text):
 
    # base case: empty string
    if len(text) == 0:
        return
 
    # count the frequency of appearance of each character
    # and store it in a dictionary
    freq = {i: text.count(i) for i in set(text)}
 
    # Create a priority queue to store live nodes of the Huffman tree.
    pq = [Node(k, v) for k, v in freq.items()]
    heapq.heapify(pq)
 
    # do till there is more than one node in the queue
    while len(pq) != 1:
 
        # Remove the two nodes of the highest priority
        # (the lowest frequency) from the queue
 
        left = heappop(pq)
        right = heappop(pq)
 
        # create a new internal node with these two nodes as children and
        # with a frequency equal to the sum of the two nodes' frequencies.
        # Add the new node to the priority queue.
 
        total = left.freq + right.freq
        heappush(pq, Node(None, total, left, right))
 
    # `root` stores pointer to the root of Huffman Tree
    root = pq[0]
 
    # traverse the Huffman tree and store the Huffman codes in a dictionary
    huffmanCode = {}
    encode(root, '', huffmanCode)
 
    # print the Huffman codes
    print('Huffman Codes are:', huffmanCode)
    print('The original string is:', text)
 
    # print the encoded string
    s = ''
    for c in text:
        s += huffmanCode.get(c)
 
    print('The encoded string is:', s)
    print('The decoded string is:', end=' ')
 
    if isLeaf(root):
        # Special case: For input like a, aa, aaa, etc.
        while root.freq > 0:
            print(root.ch, end='')
            root.freq = root.freq - 1
    else:
        # traverse the Huffman Tree again and this time,
        # decode the encoded string
        index = -1
        while index < len(s) - 1:
            index = decode(root, index, s)
 
 
# Huffman coding algorithm implementation in Python
if __name__ == '__main__':
 
    text = 'Huffman coding is a data compression algorithm.'
    buildHuffmanTree(text)

Huffman Codes are: {'o': '000', 'l': '00100', 'H': '00101', 'u': '00110', 'p': '00111', 'a': '010', ' ': '011', 'n': '1000', 'm': '1001', 's': '1010', 'h': '10110', 'r': '10111', 't': '11000', 'd': '11001', 'f': '11010', 'g': '11011', 'c': '11100', '.': '111010', 'e': '111011', 'i': '1111'}
The original string is: Huffman coding is a data compression algorithm.
The encoded string is: 00101001101101011010100101010000111110000011001111110001101101111111010011010011110010101100001001111100000100100111101111110111010101011110001000011010001001101100010111111111000101101001111010
The decoded string is: Huffman coding is a data compression algorithm.

## 활용