<a href="https://colab.research.google.com/github/wingated/cs473/blob/main/labs/cs473_lab_week_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><p><b>After clicking the "Open in Colab" link, copy the notebook to your own Google Drive before getting started, or it will not save your work</b></p>

# BYU CS 473 Lab Week 4

## Introduction:
KL divergence is one of the most commonly used concepts in machine learning. Here, we'll explore the

---
## Exercise #1: Symmetry    

KL divergence is a *measure*, but not a *metric*. This means that while it satisfies some properties of things like a distance metric, it does not satisfy all of them.

For example, KL divergence is NOT symmetric. First, implement a function that calculates the KL divergence between two discrete distributions. Then,cCraft an example to demonstrate that it is not symmetric.

In [4]:
import numpy as np

def kl(a,b):
    # a is a n-dimensional distribution
    # b is a n-dimensional distribution
    #
    # return: KL(a||b)

    a = np.array(a, dtype=float)
    b = np.array(b, dtype=float)

    # Normalize to ensure they are probability distributions
    a = a / np.sum(a)
    b = b / np.sum(b)

    # Avoid division by zero / log of zero by masking terms where a[i] == 0
    mask = (a > 0)
    return np.sum(a[mask] * np.log(a[mask] / b[mask]))

# find an example where kl(a,b) != kl(b,a)
a = [0.9, 0.1]
b = [0.5, 0.5]

print("KL(a || b) =", kl(a, b))
print("KL(b || a) =", kl(b, a))

KL(a || b) = 0.3680642071684971
KL(b || a) = 0.5108256237659907


---
## Exercise #2: Triangle inequality

Another property that KL divergence does not satisfy is the triangle inequality, which states that

kl(a,c) <= kl(a,b)+kl(b,c)

Prove that KL divergence does not satisfy the triangle inequality by crafting a counter-example.

In [8]:
a = [0.9, 0.1]
b = [0.5, 0.5]
c = [0.1, 0.9]

print("KL(a||b) =", kl(a, b))
print("KL(b||c) =", kl(b, c))
print("KL(a||c) =", kl(a, c))

left = kl(a, c)
right = kl(a, b) + kl(b, c)

print("\nKL(a||c) =", left)
print("KL(a||b) + KL(b||c) =", right)
print("Is kl(a,c) <= kl(a,b)+kl(b,c)?", end="\t")

if left <= right:
    print("Yes")

else:
    print("No")

KL(a||b) = 0.3680642071684971
KL(b||c) = 0.5108256237659907
KL(a||c) = 1.7577796618689758

KL(a||c) = 1.7577796618689758
KL(a||b) + KL(b||c) = 0.8788898309344878
Is kl(a,c) <= kl(a,b)+kl(b,c)?	No


---
## Exercise #3: Proofs

Prove that:

1) kl(a,a) = 0
2) kl(a,b) >= 0

Extra credit:

3) kl(a,b) = 0 iff a==b

1. Prove that kl(a,a) = 0

    * KL(a,b) = ∑((ai)(ln(ai/bi)))
    * x/x = 1
    * ln(1) = 0
    * (x)(0) = 0

    Given the above, if a == b, ai = bi for all i, so (ai)(ln(ai/bi)) = (ai)(ln(ai/ai)) = (ai)(ln(1)) = (ai)(0) = 0 for all i.

    ∑0 = 0, therefore kl(a,a) = 0

2. Prove that kl(a,b) >= 0

    * ai >= 0, bi >= 0, ∑ai = ∑bi = 1
    * KL(a,b) = ∑((ai)(ln(ai/bi)))
    * lim(x→0) ln(x) = 0, so ln(ai) = 0 when ai = 0
    * lim(x→0) ln(1/x) = ∞, so ln(ai/bi) = ∞ when ai > 0 and bi = 0
    * Apply Gibbs Inequality:
        * ln(x) <= x, x - 1 for x > 0
        * therefore ln(bi/ai) <= (bi/ai) - 1
        * multiplied by -1, we get -ln(bi/ai) >= 1 - (bi/ai)
        * Because -ln(bi/ai) = ln(ai/bi), we get ln(ai/bi) >= 1 - (bi/ai)
        * Multiply by ai and sum to get ∑((ai)(ln(ai/bi))) >= ∑(ai - bi) = ∑(ai) - ∑(bi) = 1 - 1 = 0
    
    Given all this, ∑((ai)(ln(ai/bi))) >= 0, or kl(a,b) >= 0.

3. Prove that kl(a,b) = 0 iff a==b

    * Combining the above proofs, we see that kl(a,b) = 0 when a == b, and kl(a,b) >= 0
    * When ai > 0, (ai)(ln(ai/bi)) = 0 iff ln(ai/bi) == 0, which happens iff ai/bi == 1, which happens iff ai = bi
    * Because ∑ai = ∑bi = 1, if ai ever equals 0 while bi != 0, a must, at some point, make up the difference, meaning there exists some aj > bj. If aj > bj, (aj)(ln(aj/bj)) > (aj)ln(1) = (aj)(0).

    In order for (ai)(ln(ai/bi)) = 0 when ai != 0, ai must be equal to bi. For (ai)(ln(ai/bi)) = 0 when ai = 0 and bi != 0, there must be another value of i where ai > bi, ln(ai/bi) > ln(1), ln(ai/bi) > 0. Therefore, for no value of ai can (ai)(ln(ai/bi)) = 0 unless ai = bi. Given that ai >= 0, bi >= 0, kl(a,b) = 0 iff (ai)(ln(ai/bi)) = 0 for all i. Putting all of this together, kl(a,b) = 0 iff a==b.