Task: Implement KL Divergence Between Two Normal Distributions

Your task is to compute the Kullback-Leibler (KL) divergence between two normal distributions. KL divergence measures how one probability distribution differs from a second, reference probability distribution.

Write a function kl_divergence_normal(mu_p, sigma_p, mu_q, sigma_q) that calculates the KL divergence between two normal distributions, where ( P \sim N(\mu_P, \sigma_P^2) ) and ( Q \sim N(\mu_Q, \sigma_Q^2) ).

The function should return the KL divergence as a floating-point number.

Example:

Input:

mu_p = 0.0

sigma_p = 1.0

mu_q = 1.0

sigma_q = 1.0

print(kl_divergence_normal(mu_p, sigma_p, mu_q, sigma_q))
Output:

0.5

Reasoning:

The KL divergence between the normal distributions ( P ) and ( Q ) with parameters ( \mu_P = 0.0 ), ( \sigma_P = 1.0 ) and ( \mu_Q = 1.0 ), ( \sigma_Q = 1.0 ) is 0.5.


In [9]:
import math
def KL_divergence(mu_p,sigma_p,mu_q,sigma_q):
  return math.log(sigma_p/sigma_q)+ (sigma_p**2+(mu_p-mu_q)**2)/(2*sigma_q**2)-0.5

In [10]:
print(KL_divergence(0.0,1.0,1.0,1.0))

0.5


Understanding Kullback-Leibler Divergence (KL Divergence)
KL Divergence is a key concept in probability theory and information theory, used to measure the difference between two probability distributions. It quantifies how much information is lost when one distribution is used to approximate another.

What is KL Divergence?

KL Divergence is defined as:

DKL(P∥Q)=∑xP(x)log⁡(P(x)Q(x))DKL(P∥Q)=x∑P(x)log(Q(x)P(x))
Where:

1.	P(x)P(x) is the true probability distribution.

2.	Q(x)Q(x) is the approximating probability distribution.

3.	The sum is taken over all possible outcomes xx.

Intuition Behind KL Divergence

KL Divergence measures the "extra" number of bits required to code samples from P(x)P(x) using the distribution Q(x)Q(x), instead of using the true distribution P(x)P(x).

•	If PP and QQ are identical, DKL(P∥Q)=0DKL(P∥Q)=0, meaning no extra bits are needed.

•	If QQ is very different from PP, the divergence will be large, indicating a poor approximation.

KL Divergence is always non-negative due to its relationship with the Kullback-Leibler inequality, which is a result of Gibbs' inequality.

Key Properties

1.	Asymmetry: DKL(P∥Q)≠DKL(Q∥P)DKL(P∥Q)=DKL(Q∥P). That is, KL Divergence is not a true distance metric.

2.	Non-negativity: DKL(P∥Q)≥0DKL(P∥Q)≥0 for all probability distributions PP and QQ.

3.	Applicability: KL Divergence is used in various fields, including machine learning, data science, and natural language processing, to compare probability distributions or models.

Example

Consider two discrete probability distributions P(x)P(x) and Q(x)Q(x):
P(x)=[0.4,0.6],Q(x)=[0.5,0.5]

The KL Divergence between these two distributions is calculated as:

DKL(P∥Q)=0.4log⁡(0.40.5)+0.6log⁡(0.60.5)DKL(P∥Q)=0.4log(0.50.4)+0.6log(0.50.6)
This gives the divergence measure, quantifying how much information is lost when using Q(x)Q(x) to approximate P(x)P(x).

KL Divergence plays an essential role in fields like machine learning, where it is used for tasks such as model evaluation, anomaly detection, and optimization.
