huffman_encoder/decoder NU EECS214 data structures course with github.com/tov/eecs214. Command line tool that can encode a given file into huffman characters, and is accompanied by a decoding file that can reconstruct the huffman tree and the original file. This algorithm compresses the character byte values by frequency into n-bit length characters. The encoded characters will be nonsensical as interpreted by ascii or a unicode language, so to read the encoded message you must decode it!
In mac terminal open the folder containg the pertinent HUFF.PY PUFF.PY HUFFMAN_CLASSES.PY.
You must specify the input file, and an output file (such as input.huff)
ENCODE COMMAND: $ python huff.py input.txt input.txt.huff
DECODE COMMAND: $ python puff.py input.huff input.txt.puff
ENCODE METHOD:
-
Read infile, count frequency of each character.
-
Insert! each (character, frequency) into a Priority Queue. (forest of leaves.)
-
Make a new branch using the characters (trees) with the lightest frequency.
-
Replace those two leaves (char, freqs) with the new branch.
-
Repeat steps 3 and 4 until the Priority queue has one tree. This is the root of the tree.
-
Walk down the tree. That is, at every node, assign a bit value, left children get 1, right children get 0. Now each character will have a bit-length relative to its frequency in the message.
-
Write the length of the encoded message in 32 bits.
-
Serialize the tree. Perform an inorder traversal, for every branch, write a 1 to the outfile. for every leaf, write a 0 and the origin ascii character in 8 bits.
-
Read infile again, for each character write it's encoded counterpart to the outfile.
DECODE METHOD:
-
Read infile
-
first 32 bits are encoded message length.
-
Deserialize the huffman tree into full use by reverse engineering the sequence of bits.
-
Read in the rest of the encoded message using a for loop with the length from step 2. For each bit read in, traverse the tree and when a leaf is hit, print the leaf's character.