Skip to content
Skully edited this page Dec 17, 2021 · 3 revisions

Design Documentation

This design documentation covers the principles behind the core of how each function works and why certain decisions were made during the development process. The overall code structure, functionality and features are outlined in further detail.

I. Generating Huffman Tree

Frequency Class (OOP Object)

The Frequency class exists to allow a Frequency object to be used to store the frequency value and respective character, along with the getters and setters for the object.

Generating Frequency Table

The QueueReferenceBased object is a list that contains key-value associations for specific characters of the alphabet that have a specific frequency based on how likely it is to reoccur within a string.

The tree is generated by iterating through each frequency node within the tree and if there is more than one node within the tree (more than one in this instance means that the table still has other nodes to combine to make the root tree node), grab the value of the two lowest items in the table and sum up their frequency to make their parent node.

Once the parent node is instantiated, the parent of the two lowest nodes used is then set to the new parent node created, this is to maintain all child nodes within a relative association with the node above them and simplifies the process of tree traversal as the tree node object keeps track of what parent it has. At any given time, there will only be one tree node that does not have a parent, and this is the root node at the top of the binary tree.

The iteration continues until every node has been combined and given a parent node until the table only contains one more node; at this case the iteration is completed and the tree is generated.

II. Encoding

The encoding method takes a string as a parameter; the string we want to encode. A tree ‘cache’ system is then created which allows tree nodes to be stored temporarily. To start the iteration and begin building the encoded string, a while loop continues running until it has encoded the same amount of characters that are in the string that was provided via input. This counter which tracks how many characters we have completed is updated whenever a letter is completed.

In the encoding logic, we firstly check to see if the node we are iterating through is a leaf. We can determine if a node is a leaf if it does not have any children to the left or the right. If a node is a leaf, we can accurately depict that we have reached a ‘dead end’ at the end of a tree and we just retrieve the character here or go back up to its parent and turn the other way to continue traversal. We keep traversing through the tree until we reach the leaf that contains the character we want for this current index of the string. Whilst doing all of this, we are updating our tree cache to contain the trees we have visited in the last iteration so we can keep track of nodes we have visited and to ensure we don’t go to the same ‘dead end’ twice.

Once the algorithm reaches a leaf and the character does match the character it is looking for in the current iteration, it must make its way out of the tree; this is done by getting the parent of the present node and continuing up and out of the tree. For every time we go to the left parent, we append a 0 to our final encoded string that will be returned in the end. Every time it traverses to the right, a 1 is appended.

Once we reach the node that does not have a parent, we know that we have reached the top of the tree and the set of 0’s and 1’s that were compiled is the encoding for this character. This process is continued on until every character of the string is encoded.

In the end, we return the string that was obtained after we reverse it. The reason for reversing the order of the encoding is because the path generated is the path to get out of tree (bottom-up) and during decoding, the path traversing down the tree will be required.

III. Decoding

To decode, we take the string as a parameter that we want to decode and then iterate through every character of the string, this is a simple key association of the tree containing the exact route to travel when traversing.

static String decodeString(String textToDecode, TreeNode theTree) {
	TreeNode rootNode = theTree; // Variable to reference the top of the tree since original will be adjusted.

	StringBuilder decodedMessage = new StringBuilder();

	for (int i = 0; i < textToDecode.length(); i++) {
		char currentChar = textToDecode.charAt(i);
		if (currentChar == '0') {
			theTree = theTree.getLeft();
		} else {
			theTree = theTree.getRight();
		}

		if (theTree.getLeft() == null && theTree.getRight() == null) {
			Frequency theFrequency = (Frequency) theTree.getItem();
			decodedMessage.append(theFrequency.getC());
			theTree = rootNode;
		}
	}

	return new StringBuilder(decodedMessage).reverse().toString();
}

Starting at the beginning, if the current character is a 0 we get the left node and traverse to it for the next iteration. If the character is 1, we do the same but to the right node. This process is continued until we reach a leaf node and when we do, we know that this must be the leaf containing the character we need to decode this character.

The process then repeats itself again until every character of the string is iterated through and we have a fully decoded message to return. In this case, we reverse the string again since the order of characters obtained through traversal is done in a top-down manner, meaning the last character was fetched first; flipping the string in reverse order gives the original string.

IV. Program Interface

       

  • In encoding mode, the text in the input box is used as input and sent to the string encoding method, the result returned is displayed in the text box by setting the text and replacing what was previously there.
  • In decoding mode, the exact same process is conducted except the decoding method is instead invoked.

V. Test Cases

Case 1 - Encoding

  • Input: TESTME

  • Expected Result: (string) 01011100001001111101010100

  • Result: (string) 01011100001001111101010100

    Image

Case 2 - Decoding

  • Input: 01011100001001111101010100

  • Expected Result: (string) TESTME

  • Result: (string) TESTME

    Image