[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Optimization Methods

## Convex Optimization - Constrained Optimization - Balancing Classes for Machine Learning Training 

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.2.001 | 08/08/2025 | Royi Avital | Removing all zeros rows, Added the "Engineer Solution"             |
| 1.2.000 | 24/07/2025 | Royi Avital | Added the Mean Invariance formulation                              |
| 1.1.000 | 23/07/2025 | Royi Avital | Using the KL Divergence formulation                                |
| 1.0.000 | 17/07/2025 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0002PointLine.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Optimization
import cvxpy as cp

# Miscellaneous
import math
from platform import python_version
import random
import fractions

# Visualization
import matplotlib.pyplot as plt

# Jupyter
from IPython import get_ipython

# Typing
from typing import List, Optional, Tuple, Union

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ?????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
%matplotlib inline

# warnings.filterwarnings("ignore")

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme
# sns.set_palette("tab10")

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

In [None]:
# Course Packages


In [None]:
# Auxiliary Functions

def DisplayClassBalance(vClassBalance: np.ndarray, *, figSize: Tuple[float, float] = (6.0, 6.0), hA: Optional[plt.Axes] = None) -> plt.Axes:

    if hA is None:
        hF, hA = plt.subplots(figsize = figSize)
    
    numClasses = len(vClassBalance)
    
    hA.bar(np.arange(numClasses), vClassBalance, width = 0.85)
    hA.set_xlabel('Class Index')
    hA.set_ylabel('Number of Samples')
    hA.set_title('Class Balance')

    return hA

## Class Balancing in Machine Learning

### Balanced and Imbalanced Data Set

The _Classification_ task in _Machine Learning_ by default assumes balanced data.  
Namely, the number of samples per _Class_ is similar among classes.

<img src="https://i.imgur.com/PytqYsy.png" width="750"/>
<!-- ![](https://i.imgur.com/PytqYsy.png) -->
<!-- ![](https://i.postimg.cc/3RwJGjhv/68e7d3ca-1f1c-4ea4-a56d-ac0b0116306f.png) -->

* <font color='brown'>(**#**)</font> While balanced data is the ideal, there are methods to handle imbalanced data in Machine Learning Classification.

### Balancing the Data Set by Weighted Oversampling  

Let $\boldsymbol{a}_{i} \in \mathbb{R}^{n}$ be the vector of class appearances in the data sample $i$.  
The sample (For example an image in _Object Detection_ task) contains different numbers of examples per class.  
This case, for $m$ samples and $n$ classes can be represented by a matrix $\boldsymbol{A} \in \mathbb{R}^{m \times n}$:

 * Each row is the number of class examples per sample.
 * Each column is the number of examples of a certain class per sample.

Assuming one may _oversample_ each sample (Row), with a factor ${x}_{i}$ then the factoring can be formulated as:

$$ \boldsymbol{y} = \boldsymbol{A}^{T} \boldsymbol{x}, \; \boldsymbol{x} \in \mathbb{N}_{+}^{m} $$

Namely, the value ${x}_{i} \in \mathbb{N}_{+}$ is the number of copies of the sample $i$ in the balanced data set.

* <font color='brown'>(**#**)</font> A better way the an actually oversample / replicate the sample is to increase its weight or increase the class weight.

### Concept Formulation of the Problem

Conceptually the problem is given by:

$$
\begin{align*}
\arg \min_{\boldsymbol{x}} \quad & R \left( \boldsymbol{y} \right) \\
\text{subject to} \quad & \begin{aligned} 
\boldsymbol{A}^{T} \boldsymbol{x} & = \boldsymbol{y} \\
\boldsymbol{x} & \geq \boldsymbol{1}
\end{aligned}
\end{align*}
$$

Where $R \left( \cdot \right)$ is a function which penalizes high variance vectors / promotes near constant vectors.

This notebook explores different formulations to solve the problem.

### Measures of Vector Unevenness / Curvature / Non Uniformity

In order force nearly constant vector one need to define $R \left( \cdot \right)$ to measure the _Unevenness_ (_Curvature_ / _Roughness_) of the vector.  
This section introduces several ideas (Limited to _Convex_ functions):

 - Variance - $\frac{1}{n} \sum_{i = 1}^{n} {\left( {y}_{i} - \frac{1}{n} \sum_{j = 1}^{n} {y}_{j} \right)}^{2}$  
   Can be also formulated as $\frac{1}{2 {n}^{2}} \sum_{i, j = 1}^{n} {\left( {y}_{i} - {y}_{j} \right)}^{2}$.
 - Mean Absolute Deviation - $\frac{1}{n} \sum_{i = 1}^{n} \left| {y}_{i} - \frac{1}{n} \sum_{j = 1}^{n} {y}_{j} \right|$  
   Can be formulated as a _Linear PRogramming_ problem.
 - Maximum Absolute Deviation - ${\left\| \boldsymbol{y} - \bar{y} \boldsymbol{1} \right\|}_{\infty}$  
   Where $\bar{y} = \frac{1}{n} \sum_{j = 1}^{n} {y}_{j}$.  
   Can be formulated as a _Linear PRogramming_ problem.
 - Huber based Deviation - $\frac{1}{n} \sum_{i = 1}^{n} {\delta}^{2} \left( \sqrt{ 1 + \frac{ {\left( {y}_{i} - \bar{y} \right)}^{2} }{{\delta}^{2}} } - 1 \right)$  
   Outliers robust Variance (Combines _Variance_ and _Mean Absolute Deviation_).
 - Invariance to Filtration by Mean Filter - ${\left\| \boldsymbol{C} \boldsymbol{y} - \boldsymbol{y} \right\|}_{2}^{2}$  
   Where ${C}_{i, j} = \frac{1}{n}$.  
   The norm can be replaced with the ${L}_{1}$ or ${L}_{\infty}$ norm.
 - Relative Entropy to Uniform Distribution - $\operatorname{D}_{KL} \left( \boldsymbol{y}, \boldsymbol{u} \right) = \sum_{i = 1}^{n} {y}_{i} \log \left( \frac{ {y}_{i} }{ 1 / n } \right) = \log \left( n \right) + \sum_{i = 1}^{n} {y}_{i} \log \left( {y}_{i} \right)$  
   Requires a discrete distribution. Namely only when we can enforce $\boldsymbol{y} \in \mathcal{\Delta}^{n}$.  
   In order to use with larger numbers, one must "Guess" the uniform vector.  
   See [Better Intuition for Information Theory](https://www.blackhc.net/blog/2019/better-intuition-for-information-theory), [Less Wrong - Six Intuitions for KL Divergence](https://www.lesswrong.com/posts/no5jDTut5Byjqb4j5).

* <font color='brown'>(**#**)</font> Some formulations will require additional penalty on the sum of $\boldsymbol{y}$.

In [None]:
# Parameters

# Data
dataFileUrl        = r'https://raw.githubusercontent.com/FixelAlgorithmsTeam/FixelCourses/refs/heads/master/DataSets/ClassBalancing.csv'
numSamplesDiscrete = 10_000

# Model
upperbound = 1_000
fracRes    = 100_000

## Generate Data


### Load the Data

This section loads a real world case (Credit to Michael Kaster) as a sample case for the algorithms.

In [None]:
# Load Data of Real World Case

mA = np.loadtxt(dataFileUrl, dtype = np.float64, delimiter = ',')

# Remove All Zeros rows
vM = np.any(mA, axis = 1) #<! Mask of rows with at least one non zero element
mA = mA[vM, :]            #<! Filter rows

print(f'The Data Shape            : {mA.shape}')
print(f'The Data Number of Samples: {mA.shape[0]}')
print(f'The Data Number of Classes: {mA.shape[1]}')

### Display the Class Balance

In [None]:
# Display the Class Balance

numSamples = mA.shape[0]
numClasses = mA.shape[1]

hA = DisplayClassBalance(np.sum(mA, axis = 0))

## Formulation 001 - Discrete

Since the values of ${x}_{i}$ stands for counting one could force the problem to be a _Discrete Optimization_ problem.  
In order to make it feasible, one can limit itself to cases where the objective, defined by $R \left( \cdot \right)$, can be formulated using _Integer Linear Programming_.

### Question 001

Using the _Maximum Absolute Deviation_ the problem is given by:

$$
\begin{align*}
\arg \min_{\boldsymbol{x} \in \mathbb{N}_{+}^{m}, \mu} \quad & {\left\| \boldsymbol{A}^{T} \boldsymbol{x} - \mu \boldsymbol{1} \right\|}_{\infty} \\
\text{subject to} \quad & \begin{aligned} 
\boldsymbol{x} & \geq \boldsymbol{1} \\
\boldsymbol{x} & \leq u \boldsymbol{1} \\
\frac{1}{m} \boldsymbol{1}^{T} \boldsymbol{A}^{T} \boldsymbol{x} & = \mu
\end{aligned}
\end{align*}
$$

This formulation also adds an upper bound to the number of copies ($u$).

Formulate the problem as a _Integer Linear Programming_ (Linear objective, Linear constraints).

### Solution 001

<font color='red'>??? Fill the answer here ???</font>

---

In [None]:
# Maximum Absolute Deviation - Original Formulation
# Using the straight forward formulation, the problem can be solved using a convex optimization solver.

def SolveDiscreteInfNormOrg( mA: np.ndarray, upBound: float ) -> np.ndarray:
    """
    Solve the discrete infinity norm optimization problem.
    
    Input:
     mA     - Input matrix where each row represents a sample and each column represents a class.
    upBound - Upper bound for the number of copies of the rows.
    
    Output:
     vX     - Solution vector representing the number of copies of each class.
    
    Remarks:
    - The problem size (number of elements in `mA`) should be relatively small.
    """
    
    numSamples = np.size(mA, 0)
    numClasses = np.size(mA, 1)
    
    # Variables
    vX = cp.Variable(numSamples, integer = True) #<! Integer variable for class counts
    μ  = cp.Variable(1) #<! Mean of the class counts

    # Problem
    cpObjFun = cp.Minimize(cp.norm(mA.T @ vX - μ, p = 'inf')) #<! Objective function
    cpConst  = [vX >= 1, vX <= upBound, cp.mean(mA.T @ vX) == μ] #<! Constraints
    oCvxPrb  = cp.Problem(cpObjFun, cpConst) #<! Create the convex problem instance
    
    # Solution
    oCvxPrb.solve(solver = cp.HIGHS)  #<! Solve the problem
    
    assert (oCvxPrb.status == 'optimal'), 'The problem is not solved.'
    
    return vX.value

In [None]:
# Find the Point to Minimize Sum of Squared Euclidean Distances

def SolveDiscreteInfNorm( mA: np.ndarray, upBound: float ) -> np.ndarray:
    """
    Solve the discrete infinity norm optimization problem.
    
    Input:
     mA     - Input matrix where each row represents a sample and each column represents a class.
    upBound - Upper bound for the number of copies of the rows.
    
    Output:
     vX     - Solution vector representing the number of copies of each class.
    
    Remarks:
    - The problem size (number of elements in `mA`) should be relatively small.
    """
    
    numSamples = np.size(mA, 0)
    numClasses = np.size(mA, 1)

    #===========================Fill This===========================#
    # 1. Set the optimization variables.
    # 2. Define the objective.
    # 3. Define the constraints.
    # !! The solution should match the original formulation.
    
    # Variables
    vX = ??? #<! Integer variable for class counts
    μ  = ??? #<! Mean of the class counts
    t  = ??? #<! Boundary variable for the maximum absolute deviation

    # Problem
    cpObjFun = ??? #<! Objective function
    cpConst  = ??? #<! Constraints
    oCvxPrb  = ??? #<! Create the convex problem instance
    #===============================================================#
    
    oCvxPrb.solve(solver = cp.HIGHS)  #<! Solve the problem
    
    assert (oCvxPrb.status == 'optimal'), 'The problem is not solved.'
    
    return vX.value

In [None]:
# Objective Function Value

def ObjFunMaxAbsDev( vX: np.ndarray, mA: np.ndarray ) -> float:

    vY = mA.T @ vX
    μ = np.mean(vY)
    
    return np.max(np.abs(vY - μ))

In [None]:
# Verification of the Solution

vXRef = SolveDiscreteInfNormOrg(mA, upBound = upperbound)
vX    = SolveDiscreteInfNorm(mA, upBound = upperbound)

assert np.allclose(vX, vXRef), 'The solutions do not match.'
print('The solutions match.')

print(f'Objective Function Value: {ObjFunMaxAbsDev(vX, mA):0.4f}')

In [None]:
# Plot the Class Balance

hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 4))
hA = DisplayClassBalance(np.sum(mA, axis = 0), hA = vHa[0])
hA.set_title(f'Original Class Balance, Objective: {ObjFunMaxAbsDev(np.ones(np.size(mA, 0)), mA):0.2f}')
hA = DisplayClassBalance(mA.T @ vXRef, hA = vHa[1])
hA.set_title(f'Balanced Class Balance, Objective: {ObjFunMaxAbsDev(vXRef, mA):0.2f}')
vYlim = hA.get_ylim()
vHa[0].set_ylim(vYlim);

## Formulation 002 - Continuous

One can promote flat / uniform continuous vector and use rounding to achieve discrete solution.  
While not guaranteed to yield optimal discrete solution, it might yield good enough solution while being faster and scale better.

### Question 002

The _Mean Filter Invariance_ property: $\boldsymbol{C} \boldsymbol{y} = \boldsymbol{y}, \; \boldsymbol{y} \in \mathbb{R}^{n}, \; \boldsymbol{C} = \frac{1}{n} \boldsymbol{y} \boldsymbol{y}^{\top}$.  
Only constant vectors holds it, hence it can be used to promote constant vectors.

Formulate the problem with the objective defined by **minimization** of the _Mean Filter Invariance_: $\arg \min_{\boldsymbol{y}} \frac{1}{2} {\left\| \boldsymbol{C} \boldsymbol{y} - \boldsymbol{y} \right\|}_{2}^{2}$.  

Remarks:
 - One may use other Norms to promote the property.
 - The optimization is not defined up to a constant.  
   Hence add penalty to ensure the lowest value which obeys it.

### Solution 002

<font color='red'>??? Fill the answer here ???</font>

---

In [None]:
# KL Divergence Formulation

def SolveContMeanInv( mA: np.ndarray, upBound: float ) -> np.ndarray:
    """
    Solve the continuous Mean Invariance optimization problem.
    Input:
     mA     - Input matrix where each row represents a sample and each column represents a class
    upBound - Upper bound for the number of copies of the rows.
    Output:
     vX     - Solution vector representing the number of copies of each class.
    Remarks:
     - A
    """
    
    numSamples = np.size(mA, 0)
    numClasses = np.size(mA, 1)

    #===========================Fill This===========================#
    # 1. Set the optimization variables.
    # 2. Define the objective.
    # 3. Define the constraints.
    
    # Variables
    vX = cp.Variable(numSamples) #<! Objective variable for class counts
    vY = cp.Variable(numClasses) #<! Auxiliary variable

    mC = np.ones((numClasses, numClasses)) / numClasses #<! Mean filter matrix
    maxY = np.max(np.sum(mA, axis = 0)) * numClasses #<! Auxiliary variable for the mean filter

    # Problem
    cpObjFun = cp.Minimize(cp.sum_squares(mC @ vY - vY)) #<! Objective function
    cpConst  = [mA.T @ vX == vY, vX >= 1, vX <= upBound, cp.sum(vY) <= maxY] #<! Constraints

    oCvxPrb  = cp.Problem(cpObjFun, cpConst) #<! Create the convex problem instance
    #===============================================================#
    
    oCvxPrb.solve(solver = cp.CLARABEL, verbose = False)  #<! Solve the problem
    
    assert (oCvxPrb.status == 'optimal'), 'The problem is not solved.'
    
    return vX.value

* <font color='red'>(**?**)</font> Can the calculation of the optimization be optimized? Think of a matrix $\boldsymbol{D}$ such that the objective is given by $\boldsymbol{D} \boldsymbol{y}$.

In [None]:
# Solve the Continuous Problem

vX  = SolveContMeanInv(mA, upBound = upperbound)
vXR = np.round(vX)

In [None]:
# Objective Function

def ObjFunMeanInv( vX: np.ndarray, mA: np.ndarray ) -> float:
    """
    Calculate the Mean Invariance of the class distribution.
    
    Input:
     vX     - Solution vector representing the number of copies of each class.
     mA     - Input matrix where each row represents a sample and each column represents a class.
    
    Output:
     valE - The entropy value of the class distribution.

    Remarks:
    - A
    """

    mC  = np.ones((numClasses, numClasses)) / numClasses #<! Mean filter matrix
    mC -= np.eye(numClasses)  #<! Optimization of calculation
    
    vY = mA.T @ vX
    valE = np.sum(np.square(mC @ vY)) 
    
    return valE

In [None]:
# Plot the Class Balance

hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 4))
hA = DisplayClassBalance(np.sum(mA, axis = 0), hA = vHa[0])
hA.set_title(f'Original Class Balance, Objective: {ObjFunMeanInv(np.ones(np.size(mA, 0)), mA):0.2f}')
hA = DisplayClassBalance(mA.T @ vXR, hA = vHa[1])
hA.set_title(f'Balanced Class Balance, Objective: {ObjFunMeanInv(vXR, mA):0.2f}')
vYlim = hA.get_ylim()
vHa[0].set_ylim(vYlim);

## Formulation 003 - Continuous

Most continuous formulations are based on a Norm based loss.  
This section demonstration of a dissimilarity function based on probability concepts.

* <font color='brown'>(**#**)</font> The KL Divergence is neither a distance function nor symmetric function.

### Question 003

Formulate the problem with the objective defined by **minimization** of the _Kullback Leibler Divergence_ (KL Divergence): $\operatorname{D}_{KL} \left( \boldsymbol{y}, \boldsymbol{z} \right) = \sum_{i} {y}_{i} \log \left( \frac{{y}_{i}}{{z}_{i}} \right) - {y}_{i} + {z}_{i}$.

Remarks:
 - Pay attention that for a fixed $\boldsymbol{z}$ the KL Divergence is a _strictly_ convex function.
 - For a fixed $\boldsymbol{z}$ the optimization over $\boldsymbol{y}$ is equivalent ot maximization of the Entropy: $\sum_{i} {y}_{i} \left( \log \left( \frac{{y}_{i}}{{z}_{i}} \right) - 1 \right)$.  
   Assuming the natural logarithm is used, the above has a critical point for $\frac{{y}_{i}}{{z}_{i}} = e$.
 - The vector $\boldsymbol{z}$ ideally was the lowest value achievable for a uniform counts of the classes.  
   In practice it is not known. The optimal value would be the class with maximum value as the constraint $\boldsymbol{x} \geq \boldsymbol{1}$ means it can not be lower.


<font color='red'>(**!**)</font><font color='red'>(**!**)</font><font color='red'>(**!**)</font> Currently [formulation requires the (Commercial) MOSEK solver](https://github.com/oxfordcontrol/Clarabel.rs/issues/197). Skip this formulation (Learn from it and implement another choice). <font color='red'>(**!**)</font><font color='red'>(**!**)</font><font color='red'>(**!**)</font>

### Solution 003

<font color='red'>??? Fill the answer here ???</font>

---

In [None]:
# KL Divergence Formulation

def SolveContKLDiv( mA: np.ndarray, upBound: float ) -> np.ndarray:
    """
    Solve the continuous Kullback Leibler Divergence (KL Divergence) optimization problem.
    Input:
     mA     - Input matrix where each row represents a sample and each column represents a class
    upBound - Upper bound for the number of copies of the rows.
    Output:
     vX     - Solution vector representing the number of copies of each class.
    Remarks:
     - The problem uses the Kullback Leibler Divergence to measure the difference between the counts of balanced class and a uniform counts (Max).
     - Pay attention to the formulation of the KL Divergence in `CVXPY` and `SciPy` to support non distributional data.
     - The solution of the optimization problem should be rounded to generate a discrete solution.
    """
    
    numSamples = np.size(mA, 0)
    numClasses = np.size(mA, 1)

    #===========================Fill This===========================#
    # 1. Set the optimization variables.
    # 2. Define the objective.
    # 3. Define the constraints.
    # !! You may find `cp.kl_div()` useful.
    
    # Variables
    vX = ??? #<! Objective variable for class counts
    vY = ??? #<! Auxiliary variable

    # Problem
    cpObjFun = ??? #<! Objective function
    cpConst  = ??? #<! Constraints
    oCvxPrb  = ??? #<! Create the convex problem instance
    #===============================================================#
    
    oCvxPrb.solve(solver = cp.MOSEK, verbose = False)  #<! Solve the problem
    
    assert (oCvxPrb.status == 'optimal'), 'The problem is not solved.'
    
    return vX.value

In [None]:
# Solve the Continuous Problem

if 'MOSEK' in cp.installed_solvers():
    vX  = SolveContKLDiv(mA, upBound = upperbound)
else:
    vX = np.ones(np.size(mA, 0)) #<! Fallback solution if MOSEK is not available
vXR = np.round(vX)

In [None]:
# Objective Function

def ObjFunEntropy( vX: np.ndarray, mA: np.ndarray ) -> float:
    """
    Calculate the Kullback Leibler Divergence (KL Divergence) of the class distribution.
    
    Input:
     vX     - Solution vector representing the number of copies of each class.
     mA     - Input matrix where each row represents a sample and each column represents a class.
    
    Output:
     valE - The entropy value of the class distribution.

    Remarks:
    - The entropy is calculated using the Kullback Leibler Divergence (KL Divergence) with respect to a uniform vector.  
      In this case the uniform vector is the maximum sum of the columns of `mA` multiplied by the number of classes.  
      As the optimal result is  a uniform vector yet the lowest value can be achieved is the maximum class.
    """

    vYY = np.max(np.sum(mA, axis = 0)) * np.ones(numClasses) #<! Auxiliary variable for the mean filter
    
    vY = mA.T @ vX
    valE = np.sum(sp.special.kl_div(vY, vYY))  #<! Adding a small constant to avoid log(0)
    
    return valE

In [None]:
# Plot the Class Balance

hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 4))
hA = DisplayClassBalance(np.sum(mA, axis = 0), hA = vHa[0])
hA.set_title(f'Original Class Balance, Objective: {ObjFunEntropy(np.ones(np.size(mA, 0)), mA):0.2f}')
hA = DisplayClassBalance(mA.T @ vXR, hA = vHa[1])
hA.set_title(f'Balanced Class Balance, Objective: {ObjFunEntropy(vXR, mA):0.2f}')
vYlim = hA.get_ylim()
vHa[0].set_ylim(vYlim);

* <font color='green'>(**@**)</font> Implement another Continuous Formulation of the problem.

## Formulation 004 - Discrete / Continuous

$$
\begin{align*}
\arg \min_{\boldsymbol{x}, \boldsymbol{e}, c} \quad & \sum_{i = 1}^{n} \left| {e}_{i} \right| + c \\
\text{subject to} \quad & \begin{aligned} 
\boldsymbol{A}^{T} \boldsymbol{x} + \boldsymbol{e} & = c \boldsymbol{1} \\
c & \geq \min \left( {A}^{T} \boldsymbol{1} \right) \\
\boldsymbol{x} & \geq \boldsymbol{1} \\
\boldsymbol{x} & \leq u \boldsymbol{1} \\
\end{aligned}
\end{align*}
$$

* <font color='red'>(**?**)</font> Explain how the formulation work.  
   Explain the roles of $c$ and $\boldsymbol{e}$. In practice, will the constraint on $c$ be active?
* <font color='blue'>(**!**)</font> Implement the formulation.

## Formulation 005 - Continuous (The Engineer Solution)

One could solve the the problem with sum to 1 then using the [Least Common Multiple](https://en.wikipedia.org/wiki/Least_common_multiple) (LCM) to scale the result.

$$
\begin{align*}
\arg \min_{\boldsymbol{x}} \quad & \frac{1}{2} {\left\| \boldsymbol{A}^{T} \boldsymbol{x} - \boldsymbol{1} \right\|}_{2}^{2} \\
\text{subject to} \quad & \begin{aligned} 
\boldsymbol{x} & \geq \frac{1}{r} \boldsymbol{1} \\
\boldsymbol{x} & \leq \frac{u}{r} \boldsymbol{1} \\
\end{aligned}
\end{align*}
$$

Where $r$ is the fraction resolution (Forcing an LCM).

In [None]:
# Solve the Problem using Non Negative Least Squares

oRes = sp.optimize.lsq_linear(mA.T, np.ones(mA.shape[1]), bounds = (1 / fracRes, upperbound / fracRes))
vXF  = oRes.x
vX = np.round(fracRes * vXF) #<! Round by the fraction resolution

* <font color='brown'>(**#**)</font> Using Accelerated Projected Gradient Descent one can solve it faster.

In [None]:
# Plot the Class Balance

hF, vHa = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 4))
hA = DisplayClassBalance(np.sum(mA, axis = 0), hA = vHa[0])
hA.set_title(f'Original Class Balance')
hA = DisplayClassBalance(mA.T @ vX, hA = vHa[1])
hA.set_title(f'Balanced Class Balance')
vYlim = hA.get_ylim()
vHa[0].set_ylim(vYlim);

## Auxiliary Visualization

In [None]:
# Equivalency of Variance and Pair Wise Sum Square
# Show empirical equivalency of the _Variance_ and _Pair Wise Sum Square_.

def PairWiseSumSquare( vX: np.ndarray ) -> float:
    """
    Computes the sum of pairwise squared differences for a given vector `vX`.
    """
    numSamples = len(vX)
    return np.sum(np.square(vX[:, None] - vX[None, :])) / (2 * numSamples * numSamples)

In [None]:
# Test the Equivalency
numSamples = 1000
vX = np.random.randn(numSamples)

print(f'Variance: {np.mean(np.square(vX - np.mean(vX))):0.5f}')
print(f'Sum of Pair Wise Square: {PairWiseSumSquare(vX):0.5f}')

In [None]:
# Maximizing Entropy Promotes Constant Vectors

vX   = cp.Variable(10)

cpObjFun = cp.Maximize( cp.sum(cp.entr(vX)) ) #<! Objective Function
cpConst  = [vX >= 1.2] #<! Constraint per each sample
oCvxPrb  = cp.Problem(cpObjFun, cpConst)   

oCvxPrb.solve(solver = cp.CLARABEL) #<! Solve the problem

assert (oCvxPrb.status == 'optimal'), 'The problem is not solved.'
print('Problem is solved.')

vX.value

In [None]:
# Plot the Entropy in 2D

vG = np.linspace(0.5, 3, 1000)

vX = np.tile(vG, len(vG))
vY = np.repeat(vG, len(vG))

mXX = np.r_[np.reshape(vX, (1, -1)), np.reshape(vY, (1, -1))]
mXX.shape

vE = sp.stats.entropy(mXX, axis = 0)
mE = np.reshape(vE, (len(vG), len(vG)))

hF, hA = plt.subplots(figsize = (8, 6))
# mE = np.log(1 + mE)  #<! Apply logarithm to the entropy values for better visualization
mE = np.power(mE, 4)
hA.imshow(mE, extent = (vG[0], vG[-1], vG[0], vG[-1]), origin = 'lower')
# Add contour lines with different colors
hA.contour(vG, vG, mE, levels = 100, cmap = 'jet', linewidths = 0.5, linestyles = 'solid')
hA.set_xlabel('X')
hA.set_ylabel('Y')
hA.set_title('Entropy Surface');

hF, hA = plt.subplots(figsize = (8, 6), subplot_kw = {'projection': '3d'})
hA.view_init(elev = 25, azim = 225, roll = 0)
hA.plot_surface(np.reshape(vX, (len(vG), len(vG))), np.reshape(vY, (len(vG), len(vG))), mE, cmap = 'viridis', edgecolor = 'none')
hA.set_xlabel('X')
hA.set_ylabel('Y')
hA.set_zlabel('Entropy')
hA.set_title('Entropy Surface');