## Probability, entropy, and Kullback-Leiber divergence

Images are made from pixels which are dotss on a screen - an image is formed when pixels are arranged in a specific pattern. Image resolution is determined by number of pixels contained in an image.

There is a directly proportional relationship between number of pixels in an image and the quality of an image. 

Image generation occurs by using the pattern or probability of an image's pixels distributed in a space. 

### Probability and Shannon entropy
A fair coin has two possible outcomes - head or tails - with both having 50\% probability of occurring.  To model the probability of a coin system:

- we can represent the outcome of a coin as a random variable X (discrete) with a probability mass distribution $P(X) = \{0.5;0.5\}$
- if we have an unfair coin with a PMD of $p(X) = \{0.65;0.35\}$ and toss it n times, we cann calculate the probability of getting outcome x for y number of times using a binomial distribution function.

**Shanon  Entropy of Information** Measures the amount of information contained in a random variable or the minimum bits required to encode information of a system.

$$
H(P) = - \sum_{x \in \mathcal{X}} P(x) \log P(x)
$$

**Kullback-Leibler divergence and cross entropy**

$D_{KL}$  is the expected value of the log likelihood ratio of two distributions of the same random variable. This ratio measres the difference between two PDFs of a system with the same random variable.

$$
D_{\mathrm{KL}}(P \parallel Q) = \sum_{x \in \mathcal{X}} P(x) \ln \frac{P(x)}{Q(x)}
$$

**Cross entropy** - another ratio used to quantify difference between 2 PDFs and is normally used as a loss function in machine  learning.

$$
[
H(p, q) = H(p) + D_{\mathrm{KL}}(p \| q)
]
$$

In [5]:
import numpy as np
import math

In [22]:
def binomial_distribution_function(p,x,n):
    combination = math.factorial(n)/(math.factorial(x)*math.factorial(n-x))
    prob_success = p**x
    prob_fail = (1-p)**(n-x)
    probability = combination*prob_success*prob_fail
    return probability 
    

In [25]:
def gaussian_distribution_function(x,mean,variance):
    ratio = 1 / math.sqrt(2*math.pi*variance)
    z_score = (x-mean)/ math.sqrt(variance)
    probability = ratio * math.e ** (-0.5*z_score**2)
    return probability
    

In [38]:
def shannon_entropy(p_list):
    probability = 0
    for prob in p_list:
        info = prob * math.log(prob)
        probability += info
    entropy= -probability
    return entropy

In [47]:
def calc_kullback_leibler_divergence(target_pmd, real_pmd):
    if len(target_pmd) != len(real_pmd):
        raise ValueError("length target and real pmds do not match")
    k_l_divergence = 0
    for i in range(len(target_pmd)):
        log_likelihood_ratio = target_pmd[i] * math.log(target_pmd[i]/real_pmd[i])
        k_l_divergence += log_likelihood_ratio
    return k_l_divergence

In [48]:
# P(X) = {0.45;0.35}
# n = 10 tosses
# x = 4 heads

p_4_heads = binomial_distribution_function(0.45,4,10)
print(p_4_heads)

0.23836664662207044


In [51]:
entropy = shannon_entropy([0.5,0.5])
print(entropy)

0.6931471805599453


In [52]:
# Example Scenario
# Target distribution of coin - P(X) -> 0.5, 0.5
# Real distribution of coin after 10,000 rolls P(X) -> 0.65, 0.35
dkl = calc_kullback_leibler_divergence([0.5,0.5],[0.65,0.35])
print(dkl)

0.04715533973562064


In [53]:
cross_entropy = entropy+dkl
print(cross_entropy)

0.740302520295566


## Conditional probabilites and joint entropies

A joint PDF is the combination of the multiple random variables which have their own PDFs. 
According to Bayes' Theorem, $$p(x,y), p(y|x), p(x), and p(y)$$

$$
P( y\mid x) = \frac{P(x \mid y) \, P( y)}{P( x)}
$$