Skip to content

[FEATURE REQUEST] Add LZ77 and LZ78 Algorithms to Compression Package #6909

@Microindole

Description

@Microindole

What would you like to Propose?

I would like to propose adding implementations for the following fundamental lossless compression algorithms to the compression category:

  1. LZ77 (Lempel–Ziv 77)
  2. LZ78 (Lempel–Ziv 78)

Adding these algorithms would significantly enhance the collection of classic compression techniques, complementing the existing LZW and Run-Length Encoding algorithms within the package. LZ77 and LZ78 represent key variations in dictionary-based compression and are foundational to many modern compression schemes (like DEFLATE used in gzip and PNG).

Issue details

Algorithm 1: LZ77 (Lempel–Ziv 77)

  • Problem Statement: LZ77 is a lossless data compression algorithm that achieves compression by replacing repeated occurrences of data with references to a single copy of that data existing earlier in the uncompressed data stream (using a sliding window). A match is encoded by a pair of numbers: a distance (offset back in the window) and a length.
  • Example (Conceptual):
    • Input: "ababcabc"
    • Output: A sequence representing encoded data, often as tuples like (offset, length, next_character). The exact output format can vary, but conceptually for this input, it might look something like: (0,0,'a'), (0,0,'b'), (2,2,'c'), (3,3,'<End Marker>') (representing 'a', 'b', then 'ab' found 2 back, followed by 'c', then 'abc' found 3 back).
  • Proposed File Path: src/main/java/com/thealgorithms/compression/LZ77.java

Algorithm 2: LZ78 (Lempel–Ziv 78)

  • Problem Statement: LZ78 is a dictionary-based lossless data compression algorithm. It processes input data sequentially, building a dictionary of phrases encountered so far. It outputs pairs, typically (dictionary_index, next_character), where the index refers to the longest matching phrase already in the dictionary.
  • Example (Conceptual):
    • Input: "ABAABABAABAB"
    • Output: A sequence representing encoded data, often as pairs like (index, character). For this input, a possible output sequence could be: (0,'A'), (0,'B'), (1,'A'), (2,'B'), (4,'A'), (5,'B'). (This implicitly builds a dictionary like: 1:A, 2:B, 3:AA, 4:BA, 5:ABA, 6:BAB).
  • Proposed File Path: src/main/java/com/thealgorithms/compression/LZ78.java

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions