Information Theory Lab: Entropy
------

----
Binary encodings
----

X = {💃, ⛹, 🚴, 🕴}  
Each outcome has the same probability.

Manually encode each of the values with 2 bits. Pick an optimal encoding.

In [1]:
#💃 {1,0}
#⛹ {0,1}
#🚴 {1,1}
#🕴 {0,0}

------
X = {🤓, 🤔, 😱, 😡}

With the following probabilities:  
🤓 = 1/2  
🤔 = 1/4  
😱 = 1/8  
😡 = 1/8  

Manually encode each of the values into bits. Pick an encoding that reflects the probability distribution.

In [15]:
from numpy import log2

In [16]:
from math import ceil

In [199]:
#🤓 {1}
#🤔 {0,1}
#😱  {0,0,1}
#😡 {0,1,1}

----
dit package
-----

dit is a Python package for information theory.

[RTFM](http://docs.dit.io/en/latest/)



----

Install dit package

At the command line:   
`$ pip install git+https://github.com/dit/dit/#egg=dit`

In [6]:
reset -fs

In [7]:
import dit

In [8]:
# Setup vars for dit packages
outcomes = "🐶 👹 🐯 🐲".split() # Define discrete RV
outcome_probabilities = [0.20, 0.30, 0.25, 0.25] # Created weighted outcomes
assert sum(outcome_probabilities) == 1 # Sanity check
d = dit.Distribution(outcomes, outcome_probabilities) # Create instance

In [9]:
print(d)
print()
print(f"The probability of getting a {outcomes[0]} is: {d[outcomes[0]]}")
print(f"The probability of getting a {outcomes[0]} and {outcomes[1]} is: {d.event_probability([outcomes[0], outcomes[1]])}")

Class:          Distribution
Alphabet:       ('🐯', '🐲', '🐶', '👹') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 1
RV Names:       None

x   p(x)
🐯   0.25
🐲   0.25
🐶   0.2
👹   0.3

The probability of getting a 🐶 is: 0.2
The probability of getting a 🐶 and 👹 is: 0.5


In [10]:
print(f"The Shannon entropy of this group is: {dit.shannon.entropy(d)}")

The Shannon entropy of this group is: 1.9854752972273344


__TODO__: Write your own Python code to calculate the Shannon entropy

In [63]:
def entropy(p):
    # entropy of events with a list of p
    return -(sum([i*log2(i) for i in p]))

In [27]:
H_original = entropy([.25,.25,.2,.3])
H_original

1.9854752972273344

In [28]:
from math import isclose

assert isclose(dit.shannon.entropy(d), H)

__TODO__: Change the probabilities to decrease the entropy

In [53]:
H_d = entropy([.025,.025,.05,.9])
H_d

0.61899559358928125

In [54]:
assert(H_d < H_original)

__TODO__: Change the probabilities to increase the entropy

In [55]:
H_i = entropy([.25,.25,.35,.15])
H_i

1.9406454496153462

In [56]:
assert(H_i > H_d)

__TODO__: Change the probabilities to have maximum entropy

In [57]:
H_max = entropy([.25,.25,.25,.25])
H_max

2.0

In [58]:
assert(H_max > H_i)

__TODO__: Change the probabilities to have minimum entropy

In [59]:
import numpy as np

In [60]:
H_min = np.nan_to_num(entropy([1.0,0,0,0]))
H_min

0.0

In [61]:
assert( H_min < H_d)

In [65]:
# hoffman encoding

----

In [192]:
class BinaryTree():
    def __init__(self,node):
        self.node = node
        self.left = None
        self.right  = None
    def append(self,node):
        if self.left == None:
            self.left = node
        elif self.right == None:
            self.right = node
        else:
            self.node = self.left
            self.left = node
            self.right = None
    def show_tree(self,node=None):
        if self.left == None or self.right == None or self.node == self.left:
            print('node =',self.node ,' left = ',self.left,'right = ',self.right)
            return 'finished'
        print('node =',self.node ,' left = ',self.left,'right = ',self.right)
        self.left = self.node
        return self.show_tree()

In [193]:
btree = BinaryTree(10)

In [194]:
btree.append(2)
btree.show_tree()

node = 10  left =  2 right =  None


'finished'

In [195]:
btree.append(3)
btree.show_tree()

node = 10  left =  2 right =  3
node = 10  left =  10 right =  3


'finished'

In [196]:
btree.right

3

In [197]:
btree.append(4)
btree.show_tree()
btree.append(5)

node = 10  left =  4 right =  None


In [198]:
btree.show_tree()

node = 10  left =  4 right =  5
node = 10  left =  10 right =  5


'finished'