Modeling:
* Zero order
    * Static compression system -
    *  The probabilities are independent of the actual
        message to be compresses.
    *  Example - character based model:
        * Alphabet of single character of size 256 (ASCII-America Standard Code for Information Interchange)
        * pi = 1/256
        * H(P) = -SUM_i=1->256((1/256)*log2(1/256))=8.0

In [1]:
# Functions from lecture 1

from math import log2
from decimal import Decimal

def I(p):
    """
    return the Information content of symbol s
    with probability p. It means that the codeword
    for s, shuold be represented with [-log2(p)]
    bits only for the best encode
    """
    return -log2(p)



def H(P):
    """
    return the entropy for given distribution P
    """
    return float(sum([Decimal(str(p)) * Decimal(str(I(p))) for p in P]))



def K(C):
    """
    return the kraft of given code C
    """
    return float(sum([Decimal(str(2**(-len(c)))) for c in C]))

In [2]:
# The message on which we will demonstrate all the concepts in this lecture

Message =\
"""\
Bring me my bow of burning gold!
Bring me my arrows of desire!
Bring me my spear! O clouds unfold!
Bring me my chariot of fire!
"""

In [3]:
# Static Compression System Example
# PDF 2 side 13

P = [1/256] * 256
print("pi = 1/256")
print("H(P) = -SUM_i=1->256((1/256)*log2(1/256)) = {0}".format(H(P)))

pi = 1/256
H(P) = -SUM_i=1->256((1/256)*log2(1/256)) = 8.0


Modeling:
* Zero order
    * Semi-Static compression system -
        * Examines the message in a preliminary pass to
        derive the symbol set and includes a desciption of
        those symbols in prelude to the compressed data.

In [4]:
def symboal_set(T):
    """
    Return the symbol set (python list) of
    all the symbols that appearing in text T
    """
    ret = set()
    for t in T:
        ret.add(t)

    return list(ret)

In [5]:
# Semi-Static Compression model Example
# PDF 2 silde 14

from math import ceil

ALPHABET = symboal_set(Message)
n        = len(ALPHABET)
P        = [1/n] * n
C        = [bin(i)[2:].zfill(ceil(log2(n))) for i in range(n)]

print("pi = 1/{0}".format(n))
print("H(P) = -SUM_i=1->{}((1/{})*log2(1/{})) = {:0.2f}".format(n, n, n, H(P)))
print("Codewords C: {}".format(C))
print("K(C) = {:0.2f}".format(K(C)))

pi = 1/25
H(P) = -SUM_i=1->25((1/25)*log2(1/25)) = 4.64
Codewords C: ['00000', '00001', '00010', '00011', '00100', '00101', '00110', '00111', '01000', '01001', '01010', '01011', '01100', '01101', '01110', '01111', '10000', '10001', '10010', '10011', '10100', '10101', '10110', '10111', '11000']
K(C) = 0.78


Modeling:
* Zero order
    * Semi-Static Self-Probability Compression System -
        * pi = vi/m
        * where vi - number of occurrences of si
        * and m - message size

In [6]:
from fractions import Fraction

def symbols_and_probability(T):
    """
    Return list of pairs (si, pi=vi/m)
    where si - is symball in text T
          pi - probability of si in T. 
    """
    ret = dict()
    for t in T:
        if t in ret.keys():
            ret[t] += 1
        else:
            ret[t] = 1
    
    m = len(T)

    return [(s, Fraction(v,m, _normalize=False)) for s,v in ret.items()]  

In [7]:
# Semi-Static Self-Probability Compression model Example
# PDF 2 silde 16 & 17

from tabulate import tabulate

def special_char(c):
    if c == '\n':
        return '\\n'
    elif c == ' ':
        return '\' \''
    else:
        return c

si_pi = symbols_and_probability(Message)
ALPHABET = [special_char(si) for si,_ in si_pi]
P = [float(pi) for _, pi in si_pi]
P_for_view = [str(pi) for _, pi in si_pi]

print(tabulate({'si':ALPHABET, 'pi':P_for_view}, headers="keys", tablefmt="pretty"))
print("H(P)={:0.2f}".format(H(P)))

+-----+--------+
| si  |   pi   |
+-----+--------+
|  B  | 4/128  |
|  r  | 11/128 |
|  i  | 8/128  |
|  n  | 7/128  |
|  g  | 6/128  |
| ' ' | 22/128 |
|  m  | 8/128  |
|  e  | 8/128  |
|  y  | 4/128  |
|  b  | 2/128  |
|  o  | 9/128  |
|  w  | 2/128  |
|  f  | 5/128  |
|  u  | 3/128  |
|  l  | 3/128  |
|  d  | 4/128  |
|  !  | 5/128  |
| \n  | 4/128  |
|  a  | 3/128  |
|  s  | 4/128  |
|  p  | 1/128  |
|  O  | 1/128  |
|  c  | 2/128  |
|  h  | 1/128  |
|  t  | 1/128  |
+-----+--------+
H(P)=4.22
