-
Couldn't load subscription status.
- Fork 20.7k
Closed
Labels
Description
What would you like to Propose?
I would like to propose adding two fundamental data transformation algorithms, commonly used in compression pipelines, to the compression category:
- BWT (Burrows-Wheeler Transform)
- MTF (Move-to-Front)
These implementations will serve as data transformation tools, enhancing the repository's collection of algorithms related to lossless compression.
Issue details
Algorithm 1: BWT (Burrows-Wheeler Transform)
- Problem Statement: The BWT is a reversible data transformation, not a direct compression algorithm. It rearranges the input data by sorting, which groups similar characters together. This improves the effectiveness of subsequent compression algorithms (like RLE or entropy coding). The well-known
bzip2compression format uses BWT. - Example:
- Input:
"banana$"(where$is a special end-of-string marker) - Output:
("annb$aa", 4)"annb$aa"is the transformed string (the last column of the sorted matrix).4is the 0-based index of the original string within the sorted matrix.
- Input:
- File Path:
src/main/java/com/thealgorithms/compression/BurrowsWheelerTransform.java
Algorithm 2: MTF (Move-to-Front)
- Problem Statement: MTF (Move-to-Front) is also a data transformation, often used after BWT. It converts a data sequence into a sequence of integers. It maintains a dynamic list of characters; when a character occurs, it outputs its index in the list and then moves that character to the front. This results in frequently recurring characters being encoded as small integers, which is beneficial for subsequent entropy encoding.
- Example:
- Assuming the output from the BWT step is
"annb$aa". - Assuming the initial character list (alphabetical) is
['$', 'a', 'b', 'n']. - Input:
"annb$aa" - Output:
[1, 3, 0, 3, 3, 3, 0]
- Assuming the output from the BWT step is
- File Path:
src/main/java/com/thealgorithms/compression/MoveToFront.java