### Run-length Encoding
One application of algebra and basic math can be **compression**. This is a way to save data in less space than it originally takes. The most basic form of compression is called [run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding).

Write a function that encodes a given text. Write another one that decodes.

We can see that RLE is not very useful in the general case. But it can be extremely useful if we have very few symbols. An example of this can be DNA and protein sequences. DNA code, for example, has only 4 characters.

Test your encoding and decoding functions on a DNA sequence (you can look up some on the Internet). Measure how much your data is compressed relative to the original.

In [1]:
def encode(text: str) -> str:
    """
    Returns the run-length encoded version of the text
    (numbers after symbols, length = 1 is skipped)
    """
    # Store information in these variables during each iteration.
    output: str     = ""
    prev_char: str  = ''
    counter: int    = 0

    # Iterate over each character of the input.
    for c in text:
        # CASE 1: Same character as the previous one. Increment the counter.
        if c == prev_char:
            counter += 1
        # CASE 2: Different character. Update the `output` and reset the counter.
        else:
            # Add the previous character to the `output`. Append the counter too, in case it is > 1.
            output += __encode_char_segment(prev_char, counter)
            # New char came. Reset the counter.
            prev_char = c
            counter = 1

    # Add the final character to the `output` as well.
    output += __encode_char_segment(prev_char, counter)
    return output

def __encode_char_segment(char: str, counter: int) -> str:
    return char + (str(counter) if counter > 1 else '')

def decode(text: str) -> str:
    """
    Decodes the text using run-length encoding
    """
    # Store information in these variables during each iteration.
    output: str    = ""
    last_char: str  = ''
    counter: str    = "" # The counter might be 10+. That's why we keep it as string (in case it has multiple digits).

    # Iterate over each character of the input.
    for c in text:
        # CASE 1: Digit was found. Update the counter (it might include multiple digits/chars). 
        if c.isdigit():
            counter += c
        # CASE 2: New character. Update the `output` and reset the counter.
        else:
            # Add the previous character to the output (as many times the counter instructs)
            output += __decode_char_segment(last_char, counter)
            # New char came. Reset the counter.
            last_char = c
            counter = ""

    # Add the final character to the `output` as well.
    output += __decode_char_segment(last_char, counter)
    return output

def __decode_char_segment(char: str, counter: str) -> str:
    counter = int(counter) if counter != "" else 1
    return ''.join([char for _ in range(counter)])

In [2]:
# Tests
# Test that the functions work on their own
assert encode("AABCCCDEEEE") == "A2BC3DE4"
assert decode("A2BC3DE4") == "AABCCCDEEEE"

# Test that the functions really invert each other
assert decode(encode("AABCCCDEEEE")) == "AABCCCDEEEE"
assert encode(decode("A2BC3DE4")) == "A2BC3DE4"