# Chapter 13
## Algorithmic Strategies for Data Handling

### Implementing Huffman coding in Python
We start by creating a node for each character, where the node contains the character and its frequency. These nodes are then added to a priority queue, with the least frequent elements having the highest priority.
- For this, we create a Node class to represent each character in the Huffman tree. Each Node object contains the character, its frequency, and pointers to its left and right children.

- The __lt__ method is defined to compare two Node objects based on their frequencies.


In [1]:
import heapq
import functools

@functools.total_ordering
class Node:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

    def __eq__(self, other):
        return self.freq == other.freq

Next, we build the Huffman tree. The construction of a Huffman tree involves a series of insertions and deletions in a priority queue, typically implemented as a binary heap.
- To build the Huffman tree, we create a min-heap of Node objects. A min-heap is a specialized tree-based structure that satisfies a simple but important condition: the parent node has a value less than or equal to its children. This property ensures that the smallest element is always at the root, making it efficient for priority operations.
- We repeatedly pop the two nodes with the lowest frequencies, merge them, and push the merged node back into the heap.
- This process continues until there is only one node left, which becomes the root of the Huffman tree. The tree can be built by build_tree function which is defined as follows:

In [2]:
def build_tree(frequencies):
    heap = [Node(char, freq) for char, freq in frequencies.items()]
    heapq.heapify(heap)
    while len(heap) > 1:
        node1 = heapq.heappop(heap)
        node2 = heapq.heappop(heap)
        merged = Node(None, node1.freq + node2.freq)
        merged.left = node1
        merged.right = node2
        heapq.heappush(heap, merged)
    return heap[0]  # the root node

#### Example usage:

In [3]:
frequencies = {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}
root = build_tree(frequencies)
print(root.freq)

100


### Generate the Huffman codes by traversing the tree

In [4]:
import heapq
import functools

@functools.total_ordering
class Node:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

    def __eq__(self, other):
        return self.freq == other.freq

In [5]:
def build_tree(frequencies):
    heap = [Node(char, freq) for char, freq in frequencies.items()]
    heapq.heapify(heap)
    while len(heap) > 1:
        node1 = heapq.heappop(heap)
        node2 = heapq.heappop(heap)
        merged = Node(None, node1.freq + node2.freq)
        merged.left = node1
        merged.right = node2
        heapq.heappush(heap, merged)
    return heap[0]  # the root node

In [6]:
def generate_codes(node, code='', codes=None):
    if codes is None:
        codes = {}
    if node is None:
        return {}
    if node.char is not None:
        codes[node.char] = code
        return codes
    generate_codes(node.left, code + '0', codes)
    generate_codes(node.right, code + '1', codes)
    return codes

Sample data for Huffman's encoding

In [7]:
data = {
    'L': 0.45,
    'M': 0.13,
    'N': 0.12,
    'X': 0.16,
    'Y': 0.09,
    'Z': 0.05
}

Build the Huffman tree and generate the Huffman codes

In [8]:
root = build_tree(data)
codes = generate_codes(root)

Print the Huffman code

In [9]:
# Print the root of the Huffman tree
print(f'Root of the Huffman tree: {root}')
# Print out the Huffman codes
for char, code in codes.items():
    print(f'{char}: {code}')


Root of the Huffman tree: <__main__.Node object at 0x7b28f2a01570>
L: 0
N: 100
M: 101
Z: 1100
Y: 1101
X: 111


In [10]:
# Print the root of the Huffman tree
print(f'Root of the Huffman tree: {root}')
# Print out the Huffman codes
for char, code in codes.items():
    print(f'{char}: {code}')


Root of the Huffman tree: <__main__.Node object at 0x7b28f2a01570>
L: 0
N: 100
M: 101
Z: 1100
Y: 1101
X: 111
