# Comparing the Divervgence Between Data Distributions
The goal is to show that the data distribution shifts much more rapidly for change in depth compared to change in saturation. If that is the case, it becomes really difficult for any transformation to exist that can reliably distinguish changes in saturation agnostic of depth. For the distribution metric, I would want to use KLD for now. But something else might also work. KLD Formula used: Assuming P and Q are the two distributions(normalized)
$$
KLD(P || Q) = \sum_{P \neq 0, Q\neq 0} (P \times log(P / Q))
$$
For now, we are disregarding the 0 values.

## Defining the KLD function

In [148]:
# Find distribution ranges per dimension
from typing import List, Tuple
import numpy as np

def find_ranges(dist1: np.ndarray, dist2: np.ndarray) -> List[Tuple[float, float]]:
    assert dist1.shape[1] == dist2.shape[1], "The two distributions must have the same number of dimensions"
    ranges = []
    for i in range(dist1.shape[1]):
        min_val = min(np.min(dist1[:, i]), np.min(dist2[:, i]))
        max_val = max(np.max(dist1[:, i]), np.max(dist2[:, i]))
        ranges.append((min_val, max_val))
    return ranges

def custom_kld(dist1: np.ndarray, dist2: np.ndarray, bin_count: int = 20, verbose: bool = False) -> float:
    value_ranges = find_ranges(dist1, dist2)
    if verbose:
        print(f"Value Ranges [(x_min, x_max), .. ]: {value_ranges}")
    # Create a histogram of the two distributions
    hist1, _ = np.histogramdd(dist1, bins=bin_count, range=value_ranges)
    hist2, _ = np.histogramdd(dist2, bins=bin_count, range=value_ranges)
    # Normalize the histograms
    hist1 = hist1 / np.sum(hist1)
    hist2 = hist2 / np.sum(hist2)
    
    # Create the mask
    non_zero_mask = (hist1 > 0) & (hist2 > 0)
    if verbose:
        print("Non-Zero Mask Length (AND condition):", np.sum(non_zero_mask))
    # Compute the KLD
    kld = np.sum(hist1[non_zero_mask] * np.log(hist1[non_zero_mask] / hist2[non_zero_mask]))
    return kld

# Test Case
dist1 = np.exp(np.random.rand(1000, 3))
dist2 = np.random.rand(1000, 3)
print(custom_kld(dist1, dist2, 40, True))

Value Ranges [(x_min, x_max), .. ]: [(0.002708744248845596, 2.7176158915920587), (0.0006329672221848659, 2.7159320454335245), (0.00029391334555917137, 2.714085742181309)]
Non-Zero Mask Length (AND condition): 0
0.0
