Huffman Code Generation

In [None]:
import heapq

# Huffman Tree Node
class Node:
    
    def __init__(self, freq, symbol, left=None, right=None):
        self.freq = freq
        self.symbol = symbol
        self.left = left
        self.right = right
        self.huff = ''

    def __lt__(self, nxt):
        return self.freq < nxt.freq

# Function to print Huffman codes
def printNodes(node, val=''):
    
    newVal = val + str(node.huff)

    # Traverse the tree
    if node.left:
        printNodes(node.left, newVal)
    
    if node.right:
        printNodes(node.right, newVal)

    # Leaf node condition
    if not node.left and not node.right:
        print(f"{node.symbol} -> {newVal}")

# Function to generate the Huffman Tree
def gen_huffman_tree(chars, freq):
    
    nodes = []

    # Create a priority queue using the frequencies
    for x in range(len(chars)):
        heapq.heappush(nodes, Node(freq[x], chars[x]))

    while len(nodes) > 1:
        
        # Extract two nodes with the smallest frequency
        left = heapq.heappop(nodes)
        right = heapq.heappop(nodes)

        # Assign binary codes
        left.huff = 0
        right.huff = 1

        # Combine the nodes into a new parent node
        newNode = Node(left.freq + right.freq, left.symbol + right.symbol, left, right)
        heapq.heappush(nodes, newNode)

    # Return the root of the tree
    return nodes[0]

def main():
    
    while True:
        print("\nHuffman Tree Generator:")
        print("1. Enter char and their freq")
        print("2. Exit")
        
        ch = input("Enter your choice: ")

        if ch == '1':
            
            chars = input("Enter char (comma-separated): ").split(',')
            freq = list(map(int, input("Enter freq (comma-separated): ").split(',')))

            if len(chars) != len(freq):
                print("Error: num of char and freq must match.")
                continue

            print("\nGenerating Huffman Tree...")
            root = gen_huffman_tree(chars, freq)
            print("\nHuffman Codes:")
            printNodes(root)

        elif ch == '2':
            break

        else:
            print("Invalid choice.")

if __name__ == "__main__":
    main()



Huffman Tree Generator:
1. Enter char and their freq
2. Exit
Enter your choice: 1
Enter char (comma-separated): x,y,z,w
Enter freq (comma-separated): 10,15,20,25

Generating Huffman Tree...

Huffman Codes:
x -> 00
y -> 01
z -> 10
w -> 11

Huffman Tree Generator:
1. Enter char and their freq
2. Exit


In [None]:
# Huffman Coding Explanation

"""
1. Creating the Node Class
   Purpose: Represents each symbol and its frequency in the Huffman tree.
   Attributes:
   - freq: Frequency of the character.
   - symbol: Character associated with the frequency.
   - left and right: Left and right children of the node (for the tree structure).
   - huff: Binary code (0 or 1) assigned to the node during tree generation.
   - Comparison Function (__lt__): Enables sorting of nodes based on frequency using the priority queue (min-heap).
"""

"""
2. Printing Huffman Codes (printNodes)
   Recursive Function: Traverses the tree using a depth-first approach.
   Logic:
   - Appends the binary value (0 or 1) stored in node.huff to the current code (val) while traversing the tree.
   - Prints the code once a leaf node is reached (no children).
"""

"""
3. Generating the Huffman Tree (gen_huffman_tree)
   Inputs: Characters (chars) and their corresponding frequencies (freq).
   Steps:
   1. Initialization: Each character and its frequency are wrapped in a Node object and added to a priority queue (nodes) using heapq.
   2. Tree Construction:
      - The two nodes with the smallest frequencies are extracted (left and right).
      - A new parent node is created:
        * Frequency = Sum of frequencies of left and right:
          f_parent = f_left + f_right
        * Symbol = Concatenation of left.symbol and right.symbol.
        * Children: left and right.
        * Assign huff = 0 to the left child and huff = 1 to the right child.
      - The parent node is added back to the heap.
   3. Completion: The process repeats until only one node remains in the heap, which becomes the root of the Huffman tree.
"""

"""
4. Main Function
   Provides a user interface for the algorithm:
   - Option to input characters and their frequencies.
   - Generates and displays the Huffman codes.
"""



"""
Theoretical Explanation:
Huffman Coding Basics:
- Principle: Assign shorter binary codes to more frequent characters and longer codes to less frequent ones.

Encoding:
- The binary code for each character is determined by traversing the tree from the root to the leaf node corresponding to the character.
- Left traversal adds 0, and right traversal adds 1.

Mathematical Representation:
- Code Length:
    L = Σ (f_i ⋅ d_i) for i = 1 to n
    where:
    * f_i: Frequency of the i-th character.
    * d_i: Depth of the i-th character in the Huffman tree.

- Optimality:
  Huffman coding minimizes the weighted path length (WPL) of the tree:
    WPL = Σ (f_i ⋅ d_i) for i = 1 to n
"""



"""
Time and Space Complexity:
1. Time Complexity:
   - Heap Construction:
     * Building the initial heap takes O(n log n), where n is the number of characters.
   - Tree Construction:
     * For n - 1 iterations, merging nodes and re-heapifying each time takes O(log n).
     Total = O(n log n).
   - Overall: O(n log n).

2. Space Complexity:
   - The heap requires O(n) space.
   - The tree requires O(n) space to store nodes.
   - Overall: O(n).
"""



"""
Applications of Huffman Coding:
1. File Compression:
   - Used in ZIP, PNG, and JPEG formats to reduce file sizes.
2. Data Transmission:
   - Minimizes the amount of data sent over networks.
3. Speech Encoding:
   - Used in audio codecs like MP3.
4. Text Encoding:
   - Useful in efficient text representation (e.g., Morse code-like structures).
"""



"""
Exam-Style Questions:
1. What is the purpose of the __lt__ method in the Node class?
   Answer: It allows the Node objects to be compared based on their frequency when inserted into the heap. This ensures the heap always maintains the smallest frequency at the top.

2. How does Huffman coding achieve compression?
   Answer: It assigns shorter binary codes to characters with higher frequencies and longer codes to less frequent characters, thereby reducing the total number of bits used.

3. Why do we use a priority queue for Huffman coding?
   Answer: A priority queue ensures that the two nodes with the smallest frequencies can be efficiently extracted to construct the tree.

4. What happens if all characters have the same frequency?
   Answer: If all frequencies are equal, the Huffman tree will still be constructed, but the codes will not result in any significant compression since the lengths of the binary codes will be nearly equal.

5. What is the time complexity of inserting an element into the heap?
   Answer: O(log n), where n is the current size of the heap.

6. Explain how the weighted path length (WPL) is minimized in Huffman coding.
   Answer: By assigning shorter paths (codes) to more frequent characters, Huffman coding ensures the sum of the products of frequencies and depths (f_i ⋅ d_i) is minimized.
"""
