# Explore ANS using toy examples

Given the Decimal System:
- Alphabet: $X = \{0, \dots, 9\}$, uniformly distributed

The Entropy per symbol: $$H_p(X_i) = E_p[-\log_2 P(X_i)] = E_p[-\log_2 P(\dfrac{1}{10})] = \log_2 10 = 3.32 \text{ bits}$$

The expected code word length using optimal symbol encoding (average depth of the tree for alphabet X): $$E_p[l(X_i)] = 3.4 \text{ bits}$$

Can we achieve a better compression than the optimal symbol code?


In [35]:
from typing import Iterator

def encode_better_than_symbol_coding(msg:list[int], base:int)->int:
    #create integer to represent the compressed message
    compressed = 1
    for symb in msg:
        assert symb < base, f"Symbol {symb} is greater than base {base}"
        compressed  = compressed * base + symb # multiply by base and add the new symbol
    return compressed

def decode_better_than_symbol_coding(compressed:int, base:int)->Iterator[int]:
    while compressed != 1: # while the compressed number is not 1 (starting number)
        yield compressed % base # get the last digit
        compressed = compressed // base # remove the last digit


initial_msg, base = [3,2,7,5,6,2,4,5,5,6,7,8,6,5,3,2,9,2,3,5], 10
e = encode_better_than_symbol_coding(initial_msg, base)
d = decode_better_than_symbol_coding(e, base)

print(f"initial_msg: {initial_msg}")
print(f"encoded: {e} | binary representation: {e:b}")
print(f"decoded: {list(d)[::-1]}")


print(f"bitrate: {e.bit_length() / len(initial_msg) :.2f}")

initial_msg: [3, 2, 7, 5, 6, 2, 4, 5, 5, 6, 7, 8, 6, 5, 3, 2, 9, 2, 3, 5]
encoded: 132756245567865329235 | binary representation: 1110011001001011100111011011111100010100011001000111001011001010011
decoded: [3, 2, 7, 5, 6, 2, 4, 5, 5, 6, 7, 8, 6, 5, 3, 2, 9, 2, 3, 5]
bitrate: 3.35
