# Universal $\mathcal{P}_1(\mathbb{R})$-Deep Neural Model (Type A)
---

---
# Training Algorithm:
---
## 1) Generate Data:
Generates the empirical measure $\sum_{n=1}^N \delta_{X_T(\omega_n)}$ of $X_T$ conditional on $X_0=x_0\in \mathbb{R}$ *($x_0$ and $T>0$ are user-provided)*.

## 2) Get "Sample Barycenters":
Let $\{\mu_n\}_{n=1}^N\subset\mathcal{P}_1(\mathbb{R}^d)$.  Then, the *sample barycenter* is defined by:
1. $\mathcal{M}^{(0)}\triangleq \left\{\hat{\mu}_n\right\}_{n=1}^N$,
2. For $1\leq n\leq \mbox{N sample barycenters}$: 
    - $
\mu^{\star}\in \underset{\tilde{\mu}\in \mathcal{M}^{(n)}}{\operatorname{argmin}}\, \sum_{n=1}^N \mathcal{W}_1\left(\mu^{\star},\mu_n\right),
$
    - $\mathcal{M}^{(n)}\triangleq \mathcal{M}^{(n-1)} - \{\mu^{\star}\},$
*i.e., the closest generated measure form the random sample to all other elements of the random sample.*

---
**Note:** *We simplify the computational burden of getting the correct classes by putting this right into this next loop.*

## 3) Train Deep Classifier:
$\hat{f}\in \operatorname{argmin}_{f \in \mathcal{NN}_{d:N}^{\star}} 
\sum_{x \in \mathbb{X}}
\, 
\mathbb{H}
\left(
    \operatorname{Softmax}_N\circ f(x)_n| I\left\{W_1(\hat{\mu}_n,\mu_x),\inf_{m\leq N} W_1(\hat{\mu}_m,\mu_x)\right\}
\right);
$
where $\mathbb{H}$ is the categorical cross-entropy.  

---
---
---
## Notes - Why the procedure is so computationally efficient?
---
 - The sample barycenters do not require us to solve for any new Wasserstein-1 Barycenters; which is much more computationally costly,
 - Our training procedure never back-propages through $\mathcal{W}_1$ since steps 2 and 3 are full-decoupled.  Therefore, training our deep classifier is (comparatively) cheap since it takes values in the standard $N$-simplex.

---

## Meta-Parameters

### Visualization

In [1]:
# How many random polulations to visualize:
Visualization_Size = 4

### Quantization
*This hyperparameter describes the proportion of the data used as sample-barycenters.*

In [2]:
Quantization_Proportion = 0.1

### Simulation

#### Ground Truth:
*The build-in Options:*
- rSDE 
- pfBM
- 2lnflow

In [3]:
groud_truth = "2lnflow"

#### Grid Hyperparameter(s)

In [4]:
## Monte-Carlo
N_Euler_Maruyama_Steps = 100
N_Monte_Carlo_Samples = 10**3
N_Monte_Carlo_Samples_Test = 10**4 # How many MC-samples to draw from test-set?

# End times for Time-Grid
T_end = 1
T_end_test = 1.1


## Grid
N_Grid_Finess = 100
Max_Grid = 1

**Note**: Setting *N_Quantizers_to_parameterize* prevents any barycenters and sub-sampling.

### Random Cover

In [None]:
# TEMP:
from operator import itemgetter 
from itertools import compress
# Set Minibatch Size
Covering_Mini_Batch_Size = 50

#### Mode: Code-Testin Parameter(s)

In [5]:
trial_run = True

### Meta-parameters

In [6]:
# Test-size Ratio
test_size_ratio = .25

## Simulation from Measure-Valued $2$-Parameter Gaussian Flow
$$
X_{t,x} \sim \mathcal{N}\left(\alpha(t,x),\beta(t,x)\right).
$$

**Note:** *$\alpha$ and $\beta$ are specified below in the SDE Example*.

## Simulation from Rough SDE
Simulate via Euler-M method from:
$$ 
X_T = x + \int_0^T \alpha(s,x)ds + \int_0^T((1-\eta)\beta(s,x)+\eta\sigma_s^H)dW_s.
$$

### Drift

In [7]:
def alpha(t,x):
    return t*np.sin(math.pi*x) #+ np.exp(-t)

### Volatility

In [8]:
def beta(t,x):
    return (1+t) + np.cos(x)

### Roughness Meta-parameters

In [9]:
Rougness = 0.9 # Hurst Parameter
Ratio_fBM_to_typical_vol = 0 # $\eta$ in equation above.

## Perturbed Fractional Brownian Motion
Simulate from:
$$
X_t^x(\omega) = f_1(x)f_2(t) + B_t^H(\omega).
$$

In [10]:
def field_dirction_x(x):
    return x*np.cos(x)

def finite_variation_t(t):
    return t*(np.sin(math.pi*t) + np.exp(-t))

### Get Paths

In [11]:
# load dataset
results_path = "./outputs/models/"
results_tables_path = "./outputs/results/"
raw_data_path_folder = "./inputs/raw/"
data_path_folder = "./inputs/data/"

### Import

In [12]:
# Load Packages/Modules
exec(open('Init_Dump.py').read())
# Load Hyper-parameter Grid
exec(open('CV_Grid.py').read())
# Load Helper Function(s)
# %run ParaGAN_Backend.ipynb
exec(open('Helper_Functions.py').read())
# Import time separately
import time

Using TensorFlow backend.


Deep Feature Builder - Ready
Deep Classifier - Ready


### Set Seed

In [13]:
random.seed(2021)
np.random.seed(2021)
tf.random.set_seed(2021)

## Get Internal (Hyper)-Parameter(s)
*Initialize the hyperparameters which are fully-specified by the user-provided hyperparameter(s).*

### Initialize Grid
This is $\mathbb{X}$ and it represents the grid of initial states.

In [14]:
# Get Input Data
#----------------------------------------------------------#
## Train
x_Grid = np.arange(start=-Max_Grid,
                   stop=Max_Grid,
                   step=(2*Max_Grid/N_Grid_Finess))
t_Grid = np.linspace(0,T_end,(1+N_Euler_Maruyama_Steps))
## Get Number of Instances in Grid: Training
N_Grid_Instances_x = len(x_Grid)
N_Grid_Instances_t = len(t_Grid)
N_Grid_Instances = N_Grid_Instances_x*N_Grid_Instances_t 

#----------------------------------------------------------#
## Test
x_Grid_test = np.sort(np.random.uniform(low=-Max_Grid,
                                        high=Max_Grid,
                                        size = round(N_Grid_Instances*test_size_ratio)))
t_Grid_test = np.linspace(T_end+0.001,T_end_test,(1+round(N_Euler_Maruyama_Steps*test_size_ratio)))
# Get Number of Instances in Grid: Test
N_Grid_Instances_x_test = len(x_Grid_test)
N_Grid_Instances_t_test = len(t_Grid_test)
N_Grid_Instances_test = N_Grid_Instances_x_test*N_Grid_Instances_t_test
#----------------------------------------------------------#

# Updater User
print("\u2022 Grid Instances: ", N_Grid_Instances, "and :",N_Grid_Instances_test," Testing instances.")

• Grid Instances:  10100 and : 65650  Testing instances.


### Initialize Counting Parameters
Initialize the "conting" type parameters which will help us to determine the length of loops and to intialize object's size later on.  

In [15]:
# Get Internal (Counting) Parameters
N_Quantizers_to_parameterize = round(Quantization_Proportion*N_Grid_Instances)
N_Elements_Per_Cluster = int(round(N_Grid_Instances/N_Quantizers_to_parameterize))

# Update User
print("\u2022",N_Quantizers_to_parameterize," Centers will be produced; from a total datasize of: ",N_Grid_Finess,
      "!  (That's ",Quantization_Proportion,
      " percent).")
print("\u2022 Each Wasserstein-1 Ball should contain: ",
      N_Elements_Per_Cluster, 
      "elements from the training set.")

• 1010  Centers will be produced; from a total datasize of:  100 !  (That's  0.1  percent).
• Each Wasserstein-1 Ball should contain:  10 elements from the training set.


---

### Simulate from non-Markovian SDE with rough volatility:
$d X_t = \alpha(t,X_t)dt + ((1-\eta)\beta(t,X_t)+\eta\sigma_t^H)dW_t ;\qquad X_0 =x$
Where $(\sigma_t^H)_t$ is a fBM with Hurst parameter $H=0.01$ and $\eta \in [0,1]$ controlls the 'abount of long-term memory and roughness in $X_t$'.

### Define Sampler - Data-Generator

Generates the empirical measure $\sum_{n=1}^N \delta_{X_T(\omega_n)}$ of $X_T$ conditional on $X_0=x_0\in \mathbb{R}$ *($x_0$ and $T>0$ are user-provided)*.

In [16]:
def Euler_Maruyama_Generator(x_0,
                             N_Euler_Maruyama_Steps = 10,
                             N_Monte_Carlo_Samples = 100,
                             T = 1,
                             Hurst = 0.1,
                             Ratio_fBM_to_typical_vol = 0.5): 
    
    #----------------------------#    
    # DEFINE INTERNAL PARAMETERS #
    #----------------------------#
    # Initialize Empirical Measure
    X_T_Empirical = np.zeros([N_Euler_Maruyama_Steps,N_Monte_Carlo_Samples])


    # Internal Initialization(s)
    ## Initialize current state
    n_sample = 0
    ## Initialize Incriments
    dt = T/N_Euler_Maruyama_Steps
    sqrt_dt = np.sqrt(dt)

    #-----------------------------#    
    # Generate Monte-Carlo Sample #
    #-----------------------------#
    while n_sample < N_Monte_Carlo_Samples:
        # Reset Step Counter
        t = 1
        # Initialize Current State 
        X_current = x_0
        # Generate roughness
        sigma_rough = FBM(n=N_Euler_Maruyama_Steps, hurst=0.75, length=1, method='daviesharte').fbm()
        # Perform Euler-Maruyama Simulation
        while t<(N_Euler_Maruyama_Steps-1):
            # Update Internal Parameters
            ## Get Current Time
            t_current = t*(T/N_Euler_Maruyama_Steps)

            # Update Generated Path
            drift_t = alpha(t_current,X_current)*dt
            vol_t = ((1-Ratio_fBM_to_typical_vol)*beta(t_current,X_current)+Ratio_fBM_to_typical_vol*(sigma_rough[t]))*np.random.normal(0,sqrt_dt)
            X_current = X_current + drift_t + vol_t

            # Update Counter (EM)
            t = t+1

            # Update Empirical Measure
            X_T_Empirical[t,n_sample] = X_current

        # Update Counter (MC)
        n_sample = n_sample + 1

    return X_T_Empirical

---

### Initializations

In [17]:
# Initialize List of Barycenters
Wasserstein_Barycenters = []
# Initialize Terminal-Time Empirical Measures
## Training Outputs
measures_locations_list = []
measures_weights_list = []
## Testing Outputs
measures_locations_test_list = []
measures_weights_test_list = []
# Grid (Training and Testing inputs (t,x))
X_train = []
X_test = []

# Initialize (Empirical) Weight(s)
measure_weights = np.ones(N_Monte_Carlo_Samples)/N_Monte_Carlo_Samples
measure_weights_test = np.ones(N_Monte_Carlo_Samples_Test)/N_Monte_Carlo_Samples_Test
# Initialize Quantizer
Init_Quantizer_generic = np.ones(N_Monte_Carlo_Samples)/N_Monte_Carlo_Samples

## Generate $\{\hat{\nu}^{N}_{T,x}\}_{x \in \mathbb{X}}$ Build Wasserstein Cover

#### Get Data

### Gaussian $2$-Parameter Flow

In [18]:
if groud_truth == "2lnflow":
    print("Direct Sampling from Distribution for 2-Parameter Flow.")
    #----------------------------------------------------------------------------------------------#
    # Update User
    print("===================================")
    print("Start Simulation Step: Training Set")
    print("===================================")
    # Perform Monte-Carlo Data Generation
    for i in tqdm(range(N_Grid_Instances_x)):
        x_loop = x_Grid[i]
        # Generate finite-variation path (since it stays unchanged)
        for j in range(N_Grid_Instances_t):
            t_loop = t_Grid[j]
            measures_locations_loop = np.random.normal(alpha(t_loop,x_loop),
                                                          beta(t_loop,x_loop),
                                                          N_Monte_Carlo_Samples)
        
            # Update Inputs
            if (i==0 and j==0):
                X_train = np.array([t_loop,x_loop]).reshape(1,-1)
            else:
                X_train = np.append(X_train,np.array([t_loop,x_loop]).reshape(1,-1),axis=0)
        
            # Append to List
            measures_locations_list = measures_locations_list + [measures_locations_loop]
            measures_weights_list.append(measure_weights)
        
        
    
    # Update User
    print("==================================")
    print("Done Simulation Step: Training Set")
    print("==================================")


print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")

if groud_truth == "2lnflow":
    print("===============================")
    print("Start Simulation Step: Test Set")
    print("===============================")
    # Perform Monte-Carlo Data Generation
    for i in tqdm(range(N_Grid_Instances_x_test)):
        x_loop = x_Grid_test[i]
        # Generate finite-variation path (since it stays unchanged)
        for j in range(N_Grid_Instances_t_test):
            t_loop = t_Grid_test[j]
            measures_locations_loop = np.random.normal(alpha(t_loop,x_loop),
                                                          beta(t_loop,x_loop),
                                                          N_Monte_Carlo_Samples_Test)
        
            # Update Inputs
            if (i==0 and j==0):
                X_test = np.array([t_loop,x_loop]).reshape(1,-1)
            else:
                X_test = np.append(X_test,np.array([t_loop,x_loop]).reshape(1,-1),axis=0)
        
            # Append to List
            measures_locations_test_list = measures_locations_test_list + [measures_locations_loop]
            measures_weights_test_list.append(measure_weights_test)
    print("==============================")
    print("Done Simulation Step: Test Set")
    print("==============================")

  0%|          | 0/100 [00:00<?, ?it/s]

Direct Sampling from Distribution for 2-Parameter Flow.
Start Simulation Step: Training Set


100%|██████████| 100/100 [00:01<00:00, 77.61it/s]
  0%|          | 7/2525 [00:00<00:37, 66.34it/s]

Done Simulation Step: Training Set
Start Simulation Step: Test Set


100%|██████████| 2525/2525 [01:17<00:00, 32.70it/s]

Done Simulation Step: Test Set





### Rough SDE Simulator:

In [19]:
if groud_truth == "rSDE":
    print("Using Euler-Maruyama distritization + Monte-Carlo Sampling.")
    #----------------------------------------------------------------------------------------------#
    # Update User
    print("===================================")
    print("Start Simulation Step: Training Set")
    print("===================================")
    # Initialize fBM Generator
    fBM_Generator = FBM(n=N_Euler_Maruyama_Steps, hurst=0.75, length=1, method='daviesharte')

    # Perform Monte-Carlo Data Generation
    for i in tqdm(range(N_Grid_Instances_x)):
        # Get x
        field_loop_x = field_dirction_x(x_Grid[i])
        # Get omega and t
        # Generate finite-variation path (since it stays unchanged)
        finite_variation_path = finite_variation_t(t_Grid).reshape(-1,1) +field_loop_x
        # Simulate Paths
        paths_loop = Euler_Maruyama_Generator(x_0=x_Grid[i],
                                              N_Euler_Maruyama_Steps = len(t_Grid),
                                              N_Monte_Carlo_Samples = N_Monte_Carlo_Samples,
                                              T = T_end,
                                              Hurst=Rougness,
                                              Ratio_fBM_to_typical_vol=Ratio_fBM_to_typical_vol)
        
        # Map numpy to list
        measures_locations_loop = paths_loop.tolist()
        # Get inputs
        X_train_loop = np.append(np.repeat(x_Grid[i],(N_Euler_Maruyama_Steps+1)).reshape(-1,1),
                                 t_Grid.reshape(-1,1),
                                 axis=1)
        
        # Append to List
        measures_locations_list = measures_locations_list + measures_locations_loop
        measures_weights_list.append(measure_weights)
        
        # Update Inputs
        if i==0:
            X_train = X_train_loop
        else:
            X_train = np.append(X_train,X_train_loop,axis=0)
    
    # Update User
    print("==================================")
    print("Done Simulation Step: Training Set")
    print("==================================")


print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")

if groud_truth == "rSDE":
    print("===============================")
    print("Start Simulation Step: Test Set")
    print("===============================")
    # Initialize fBM Generator
    fBM_Generator_test = FBM(n=(len(t_Grid_test)-1), hurst=0.75, length=1, method='daviesharte')

    # Perform Monte-Carlo Data Generation
    for i in tqdm(range(N_Grid_Instances_x_test)):
        # Get x
        field_loop_x = field_dirction_x(x_Grid_test[i])
        # Get omega and t
        # Generate finite-variation path (since it stays unchanged)
        finite_variation_path = finite_variation_t(t_Grid_test).reshape(-1,1) +field_loop_x
        paths_loop = Euler_Maruyama_Generator(x_0=x_Grid_test[i],
                                              N_Euler_Maruyama_Steps = len(t_Grid_test),
                                              N_Monte_Carlo_Samples = N_Monte_Carlo_Samples_Test,
                                              T = T_end_test,
                                              Hurst=Rougness,
                                              Ratio_fBM_to_typical_vol=Ratio_fBM_to_typical_vol)
        
        # Map numpy to list
        measures_locations_loop = paths_loop.tolist()
        # Get inputs
        X_test_loop = np.append(np.repeat(x_Grid_test[i],len(t_Grid_test)).reshape(-1,1),
                                 t_Grid_test.reshape(-1,1),
                                 axis=1)
        
        # Append to List
        measures_locations_test_list = measures_locations_test_list + measures_locations_loop
        measures_weights_test_list.append(measure_weights_test)
        
        # Update Inputs
        if i==0:
            X_test = X_test_loop
        else:
            X_test = np.append(X_test,X_test_loop,axis=0)
    print("==============================")
    print("Done Simulation Step: Test Set")
    print("==============================")



### Perturbed fBM Generator:

In [20]:
# Update User
print("Current Monte-Carlo Step:")
if groud_truth == "pfBM":
    print("===================================")
    print("Start Simulation Step: Training Set")
    print("===================================")
    # Initialize fBM Generator
    fBM_Generator = FBM(n=N_Euler_Maruyama_Steps, hurst=0.75, length=1, method='daviesharte')

    # Perform Monte-Carlo Data Generation
    for i in tqdm(range(N_Grid_Instances_x)):
        # Get x
        field_loop_x = field_dirction_x(x_Grid[i])
        # Get omega and t
        # Generate finite-variation path (since it stays unchanged)
        finite_variation_path = finite_variation_t(t_Grid).reshape(-1,1) +field_loop_x
        for n_MC in range(N_Monte_Carlo_Samples):
            fBM_variation_path_loop = fBM_Generator.fbm().reshape(-1,1)
            generated_path_loop = finite_variation_path + fBM_variation_path_loop
            if n_MC == 0:
                paths_loop = generated_path_loop
            else:
                paths_loop = np.append(paths_loop,generated_path_loop,axis=-1)
        
        # Map numpy to list
        measures_locations_loop = paths_loop.tolist()
        # Get inputs
        X_train_loop = np.append(np.repeat(x_Grid[i],(N_Euler_Maruyama_Steps+1)).reshape(-1,1),
                                 t_Grid.reshape(-1,1),
                                 axis=1)
        
        # Append to List
        measures_locations_list = measures_locations_list + measures_locations_loop
        measures_weights_list.append(measure_weights)
        
        # Update Inputs
        if i==0:
            X_train = X_train_loop
        else:
            X_train = np.append(X_train,X_train_loop,axis=0)
    
    # Update User
    print("==================================")
    print("Done Simulation Step: Training Set")
    print("==================================")


print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")

if groud_truth == "pfBM":
    print("===============================")
    print("Start Simulation Step: Test Set")
    print("===============================")
    # Initialize fBM Generator
    fBM_Generator_test = FBM(n=(len(t_Grid_test)-1), hurst=0.75, length=1, method='daviesharte')

    # Perform Monte-Carlo Data Generation
    for i in tqdm(range(N_Grid_Instances_x_test)):
        # Get x
        field_loop_x = field_dirction_x(x_Grid_test[i])
        # Get omega and t
        # Generate finite-variation path (since it stays unchanged)
        finite_variation_path = finite_variation_t(t_Grid_test).reshape(-1,1) +field_loop_x
        for n_MC in range(N_Monte_Carlo_Samples_Test):
            fBM_variation_path_loop = fBM_Generator_test.fbm().reshape(-1,1)
            generated_path_loop = finite_variation_path + fBM_variation_path_loop
            if n_MC == 0:
                paths_loop = generated_path_loop
            else:
                paths_loop = np.append(paths_loop,generated_path_loop,axis=-1)
        
        # Map numpy to list
        measures_locations_loop = paths_loop.tolist()
        # Get inputs
        X_test_loop = np.append(np.repeat(x_Grid_test[i],len(t_Grid_test)).reshape(-1,1),
                                 t_Grid_test.reshape(-1,1),
                                 axis=1)
        
        # Append to List
        measures_locations_test_list = measures_locations_test_list + measures_locations_loop
        measures_weights_test_list.append(measure_weights_test)
        
        # Update Inputs
        if i==0:
            X_test = X_test_loop
        else:
            X_test = np.append(X_test,X_test_loop,axis=0)
    print("==============================")
    print("Done Simulation Step: Test Set")
    print("==============================")
    
print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")
print("===============================--------------------------------------===============================")

Current Monte-Carlo Step:


#### Start Timer (Model Type A)

In [21]:
# Start Timer
Type_A_timer_Begin = time.time()

## Get "Sample Barycenters":
Let $\{\mu_n\}_{n=1}^N\subset\mathcal{P}_1(\mathbb{R}^d)$.  Then, the *sample barycenter* is defined by:
1. $\mathcal{M}^{(0)}\triangleq \left\{\hat{\mu}_n\right\}_{n=1}^N$,
2. For $1\leq n\leq \mbox{N sample barycenters}$: 
    - $
\mu^{\star}\in \underset{\tilde{\mu}\in \mathcal{M}^{(n)}}{\operatorname{argmin}}\, \sum_{n=1}^N \mathcal{W}_1\left(\mu^{\star},\mu_n\right),
$
    - $\mathcal{M}^{(n)}\triangleq \mathcal{M}^{(n-1)} - \{\mu^{\star}\},$
*i.e., the closest generated measure form the random sample to all other elements of the random sample.*

---
**Note:** *We simplify the computational burden of getting the correct classes by putting this right into this next loop.*

---

## Build Dissimilarity (Distance) Matrix
*In this step we build a dissimularity matrix of the dataset on the Wasserstein-1 space.  Namely:*
$$
\operatorname{Mat}_{\# \mathbb{X},\# \mathbb{X}}\left(\mathbb{R}\right)\ni D; \text{ where}\qquad \, D_{i,j}\triangleq \mathcal{W}_1\left(f(x_i),f(x_j)\right)
;
$$
*where $f\in C\left((\mathcal{X},\mathcal{P}_1(\mathcal{Y})\right)$ is the "target" function we are learning.*

**Note**: *Computing the dissimularity matrix is the most costly part of the entire algorithm with a complexity of at-most $\mathcal{O}\left(E_{W} \# \mathbb{X})^2\right)$ where $E_W$ denotes the complexity of a single Wasserstein-1 evaluation between two elements of the dataset.*

In [22]:
# Update User
# print("\U0001F61A"," Begin Building Distance Matrix"," \U0001F61A")

# # Update User
# print("\U0001F600"," Done Building Distance Matrix","\U0001F600","!")

In [23]:
# # Initialize Disimilarity Matrix
# Dissimilarity_matrix_ot = np.zeros([N_Grid_Instances,N_Grid_Instances])


# # Update User
# print("\U0001F61A"," Begin Building Distance Matrix"," \U0001F61A")
# # Build Disimilarity Matrix
# for i in tqdm(range(N_Grid_Instances)):
#     for j in range(N_Grid_Instances):
#         Dissimilarity_matrix_ot[i,j] = ot.emd2_1d(measures_locations_list[j],
#                                                   measures_locations_list[i])
# # Update User
# print("\U0001F600"," Done Building Distance Matrix","\U0001F600","!")

In [24]:
# # Initialize Locations Matrix (Internal to Loop)
# measures_locations_list_current = copy.copy(measures_locations_list)
# Dissimilarity_matrix_ot_current = copy.copy(Dissimilarity_matrix_ot)

# # Initialize masker vector
# masker = np.ones(N_Grid_Instances)

# # Initialize Sorting Reference Vector (This helps us efficiently scroll through the disimularity matrix to identify the barycenter without having to re-compute the dissimultarity matrix of a sub-saple at every iteration (which is the most costly part of the algorithm!))
# Distances_Loop = Dissimilarity_matrix_ot_current.sum(axis=1)

# # Initialize Classes (In-Sample)
# Classifer_Wasserstein_Centers = np.zeros([N_Quantizers_to_parameterize,N_Grid_Instances])

### Get Minibatch Cover

In [26]:
#-----------------#
# Initializations #
#-----------------#
print("Data-Points per Random Sample: ", Covering_Mini_Batch_Size**2)

# Initialize Inder for intput/output data
index_remaining = np.array(range(len(measures_locations_list)))
# Count number of remaining datums to cluster:
length_of_sample = len(index_remaining)
# Initialize Mini-batch iteration counter
mini_batch_iteration_counter = 0

#--------------------#
# Build Random-Cover #
#--------------------#
while length_of_sample >0:
    print("\u2922 Current iteration of Mini-Batch Random Covering: ",mini_batch_iteration_counter)
    # Update User
    print("Remaining points to cluster: ",length_of_sample)
    
    #---------------------------------#
    # Get Random Sample for Minibatch #
    #---------------------------------#
    ## Get indices
    which_to_sample = np.random.choice(index_remaining.shape[0], Covering_Mini_Batch_Size, replace=False) 
    clustering_indices_minibatch = index_remaining[which_to_sample]
    ## Get Indices for current sample
    indices_to_remove_loop = np.flatnonzero(np.isin(index_remaining,clustering_indices_minibatch))
    ## UPDATE: Remove Indices from "Remaining Data"
    index_remaining = np.delete(index_remaining,indices_to_remove_loop)
    ## UPDATE: Length of Sample
    length_of_sample = len(index_remaining)
    ## UPDATE: Mini-batch iteration counter
    mini_batch_iteration_counter = mini_batch_iteration_counter+1
    

    #---------------------------#
    # Build Disimilarity Matrix #
    #---------------------------#
    Dissimilarity_matrix_ot_current = np.zeros([Covering_Mini_Batch_Size,Covering_Mini_Batch_Size])
    # Build Disimilarity Matrix
    for i in tqdm(range(Covering_Mini_Batch_Size)):
        index_i = indices_to_remove_loop[i]
        for j in range(Covering_Mini_Batch_Size):
            index_j = indices_to_remove_loop[j]
            Dissimilarity_matrix_ot_current[i,j] = ot.emd2_1d(measures_locations_list[index_j],
                                                              measures_locations_list[index_i])


    #----------------------------#
    # Inidialize Looping Indices #
    #----------------------------#
    # Initialize "Internal to loop" subset of measures
    measures_locations_list_current = [measures_locations_list[i] for i in indices_to_remove_loop]
    # Initialize masker vector
    masker = np.ones(Covering_Mini_Batch_Size)
    # Initialize Sorting Reference Vector (This helps us efficiently scroll through the disimularity matrix to identify the barycenter without having to re-compute the dissimultarity matrix of a sub-saple at every iteration (which is the most costly part of the algorithm!))
    Distances_Loop = Dissimilarity_matrix_ot_current.sum(axis=1)
    # Initialize Classes
    Classifer_Wasserstein_Centers = np.zeros([N_Quantizers_to_parameterize,N_Grid_Instances])



    #--------------------------#
    # Build Sample Barycenters #
    #--------------------------#
    # Identify Sample Barycenters
    for i in range(N_Quantizers_to_parameterize):    
        # GET BARYCENTER #
        #----------------#
        ## Identify row with minimum total distance
        Barycenter_index = int(Distances_Loop.argsort()[:1][0])
        ## Get Barycenter
        ## Update Barycenters Array ##
        #----------------------------#
        ### Get next Barycenter
        new_barycenter_loop = np.array(measures_locations_list_current[Barycenter_index]).reshape(-1,1)
        ### Update Array of Barycenters
        if i == 0:
            # Initialize Barycenters Array
            Barycenters_Array = new_barycenter_loop
        else:
            # Populate Barycenters Array
            Barycenters_Array = np.append(Barycenters_Array,new_barycenter_loop,axis=-1)

        # GET CLUSTER #
        #-------------#
        # Identify Cluster for this barycenter (which elements are closest to it)
        Cluster_indices = (masker*Dissimilarity_matrix_ot_current[:,Barycenter_index]).argsort()[:N_Elements_Per_Cluster]
        ## UPDATES Set  M^{(n)}  ##
        #-------------------------#
        Dissimilarity_matrix_ot_current[Cluster_indices,:] = 0
        # Distance-Based Sorting
        Distances_Loop[Cluster_indices] = math.inf

        # Update Cluster
        masker[Cluster_indices] = math.inf

        # Update Classes
        Classifer_Wasserstein_Centers[i,(indices_to_remove_loop[Cluster_indices])] = 1

    # pd.DataFrame(Classifer_Wasserstein_Centers)
    # print(np.sum(Classifer_Wasserstein_Centers,axis=0))
print("----------------------------------------------------------------------------------------------")
print("Average Classes Per Sample Barycenter: ", np.mean(np.sum(Classifer_Wasserstein_Centers,axis=0)))
print("Left-Overs:",length_of_sample)
print("----------------------------------------------------------------------------------------------")

Data-Points per Random Sample:  400
⤢ Current iteration of Mini-Batch Random Covering:  0
Remaining points to cluster:  10100


100%|██████████| 20/20 [00:00<00:00, 116.46it/s]
100%|██████████| 20/20 [00:00<00:00, 131.09it/s]

⤢ Current iteration of Mini-Batch Random Covering:  1
Remaining points to cluster:  10080



100%|██████████| 20/20 [00:00<00:00, 120.17it/s]

⤢ Current iteration of Mini-Batch Random Covering:  2
Remaining points to cluster:  10060



100%|██████████| 20/20 [00:00<00:00, 129.22it/s]

⤢ Current iteration of Mini-Batch Random Covering:  3
Remaining points to cluster:  10040



100%|██████████| 20/20 [00:00<00:00, 118.82it/s]

⤢ Current iteration of Mini-Batch Random Covering:  4
Remaining points to cluster:  10020



100%|██████████| 20/20 [00:00<00:00, 105.59it/s]

⤢ Current iteration of Mini-Batch Random Covering:  5
Remaining points to cluster:  10000



 45%|████▌     | 9/20 [00:00<00:00, 86.46it/s]

⤢ Current iteration of Mini-Batch Random Covering:  6
Remaining points to cluster:  9980


100%|██████████| 20/20 [00:00<00:00, 99.49it/s]
100%|██████████| 20/20 [00:00<00:00, 107.17it/s]

⤢ Current iteration of Mini-Batch Random Covering:  7
Remaining points to cluster:  9960



100%|██████████| 20/20 [00:00<00:00, 115.01it/s]

⤢ Current iteration of Mini-Batch Random Covering:  8
Remaining points to cluster:  9940



100%|██████████| 20/20 [00:00<00:00, 120.23it/s]

⤢ Current iteration of Mini-Batch Random Covering:  9
Remaining points to cluster:  9920



100%|██████████| 20/20 [00:00<00:00, 130.02it/s]

⤢ Current iteration of Mini-Batch Random Covering:  10
Remaining points to cluster:  9900



100%|██████████| 20/20 [00:00<00:00, 140.77it/s]

⤢ Current iteration of Mini-Batch Random Covering:  11
Remaining points to cluster:  9880



 35%|███▌      | 7/20 [00:00<00:00, 63.65it/s]

⤢ Current iteration of Mini-Batch Random Covering:  12
Remaining points to cluster:  9860


100%|██████████| 20/20 [00:00<00:00, 82.86it/s]
100%|██████████| 20/20 [00:00<00:00, 117.37it/s]

⤢ Current iteration of Mini-Batch Random Covering:  13
Remaining points to cluster:  9840



100%|██████████| 20/20 [00:00<00:00, 136.90it/s]

⤢ Current iteration of Mini-Batch Random Covering:  14
Remaining points to cluster:  9820



100%|██████████| 20/20 [00:00<00:00, 129.62it/s]

⤢ Current iteration of Mini-Batch Random Covering:  15
Remaining points to cluster:  9800



 60%|██████    | 12/20 [00:00<00:00, 111.98it/s]

⤢ Current iteration of Mini-Batch Random Covering:  16
Remaining points to cluster:  9780


100%|██████████| 20/20 [00:00<00:00, 91.56it/s] 
100%|██████████| 20/20 [00:00<00:00, 107.84it/s]

⤢ Current iteration of Mini-Batch Random Covering:  17
Remaining points to cluster:  9760



100%|██████████| 20/20 [00:00<00:00, 120.66it/s]

⤢ Current iteration of Mini-Batch Random Covering:  18
Remaining points to cluster:  9740



100%|██████████| 20/20 [00:00<00:00, 104.45it/s]

⤢ Current iteration of Mini-Batch Random Covering:  19
Remaining points to cluster:  9720



100%|██████████| 20/20 [00:00<00:00, 112.06it/s]

⤢ Current iteration of Mini-Batch Random Covering:  20
Remaining points to cluster:  9700



 55%|█████▌    | 11/20 [00:00<00:00, 101.40it/s]

⤢ Current iteration of Mini-Batch Random Covering:  21
Remaining points to cluster:  9680


100%|██████████| 20/20 [00:00<00:00, 95.32it/s] 
100%|██████████| 20/20 [00:00<00:00, 110.67it/s]

⤢ Current iteration of Mini-Batch Random Covering:  22
Remaining points to cluster:  9660



100%|██████████| 20/20 [00:00<00:00, 144.66it/s]

⤢ Current iteration of Mini-Batch Random Covering:  23
Remaining points to cluster:  9640



100%|██████████| 20/20 [00:00<00:00, 141.57it/s]

⤢ Current iteration of Mini-Batch Random Covering:  24
Remaining points to cluster:  9620



100%|██████████| 20/20 [00:00<00:00, 124.62it/s]

⤢ Current iteration of Mini-Batch Random Covering:  25
Remaining points to cluster:  9600



100%|██████████| 20/20 [00:00<00:00, 113.31it/s]

⤢ Current iteration of Mini-Batch Random Covering:  26
Remaining points to cluster:  9580



 45%|████▌     | 9/20 [00:00<00:00, 87.25it/s]

⤢ Current iteration of Mini-Batch Random Covering:  27
Remaining points to cluster:  9560


100%|██████████| 20/20 [00:00<00:00, 98.41it/s]
 40%|████      | 8/20 [00:00<00:00, 78.72it/s]

⤢ Current iteration of Mini-Batch Random Covering:  28
Remaining points to cluster:  9540


100%|██████████| 20/20 [00:00<00:00, 93.50it/s]
100%|██████████| 20/20 [00:00<00:00, 146.33it/s]

⤢ Current iteration of Mini-Batch Random Covering:  29
Remaining points to cluster:  9520



100%|██████████| 20/20 [00:00<00:00, 140.82it/s]

⤢ Current iteration of Mini-Batch Random Covering:  30
Remaining points to cluster:  9500



 40%|████      | 8/20 [00:00<00:00, 77.38it/s]

⤢ Current iteration of Mini-Batch Random Covering:  31
Remaining points to cluster:  9480


100%|██████████| 20/20 [00:00<00:00, 89.76it/s]
100%|██████████| 20/20 [00:00<00:00, 125.53it/s]

⤢ Current iteration of Mini-Batch Random Covering:  32
Remaining points to cluster:  9460



100%|██████████| 20/20 [00:00<00:00, 129.46it/s]

⤢ Current iteration of Mini-Batch Random Covering:  33
Remaining points to cluster:  9440



100%|██████████| 20/20 [00:00<00:00, 115.18it/s]

⤢ Current iteration of Mini-Batch Random Covering:  34
Remaining points to cluster:  9420



100%|██████████| 20/20 [00:00<00:00, 115.73it/s]

⤢ Current iteration of Mini-Batch Random Covering:  35
Remaining points to cluster:  9400



 45%|████▌     | 9/20 [00:00<00:00, 88.89it/s]

⤢ Current iteration of Mini-Batch Random Covering:  36
Remaining points to cluster:  9380


100%|██████████| 20/20 [00:00<00:00, 81.33it/s]
 40%|████      | 8/20 [00:00<00:00, 73.20it/s]

⤢ Current iteration of Mini-Batch Random Covering:  37
Remaining points to cluster:  9360


100%|██████████| 20/20 [00:00<00:00, 81.67it/s]
100%|██████████| 20/20 [00:00<00:00, 161.25it/s]

⤢ Current iteration of Mini-Batch Random Covering:  38
Remaining points to cluster:  9340



100%|██████████| 20/20 [00:00<00:00, 136.51it/s]

⤢ Current iteration of Mini-Batch Random Covering:  39
Remaining points to cluster:  9320



 40%|████      | 8/20 [00:00<00:00, 75.09it/s]

⤢ Current iteration of Mini-Batch Random Covering:  40
Remaining points to cluster:  9300


100%|██████████| 20/20 [00:00<00:00, 82.95it/s]
 45%|████▌     | 9/20 [00:00<00:00, 89.74it/s]

⤢ Current iteration of Mini-Batch Random Covering:  41
Remaining points to cluster:  9280


100%|██████████| 20/20 [00:00<00:00, 88.99it/s]
 45%|████▌     | 9/20 [00:00<00:00, 81.49it/s]

⤢ Current iteration of Mini-Batch Random Covering:  42
Remaining points to cluster:  9260


100%|██████████| 20/20 [00:00<00:00, 76.61it/s]
 35%|███▌      | 7/20 [00:00<00:00, 68.70it/s]

⤢ Current iteration of Mini-Batch Random Covering:  43
Remaining points to cluster:  9240


100%|██████████| 20/20 [00:00<00:00, 89.02it/s]
100%|██████████| 20/20 [00:00<00:00, 143.20it/s]

⤢ Current iteration of Mini-Batch Random Covering:  44
Remaining points to cluster:  9220



100%|██████████| 20/20 [00:00<00:00, 120.56it/s]

⤢ Current iteration of Mini-Batch Random Covering:  45
Remaining points to cluster:  9200



100%|██████████| 20/20 [00:00<00:00, 142.39it/s]

⤢ Current iteration of Mini-Batch Random Covering:  46
Remaining points to cluster:  9180



100%|██████████| 20/20 [00:00<00:00, 117.55it/s]

⤢ Current iteration of Mini-Batch Random Covering:  47
Remaining points to cluster:  9160



100%|██████████| 20/20 [00:00<00:00, 137.05it/s]

⤢ Current iteration of Mini-Batch Random Covering:  48
Remaining points to cluster:  9140



100%|██████████| 20/20 [00:00<00:00, 138.21it/s]

⤢ Current iteration of Mini-Batch Random Covering:  49
Remaining points to cluster:  9120



100%|██████████| 20/20 [00:00<00:00, 118.31it/s]

⤢ Current iteration of Mini-Batch Random Covering:  50
Remaining points to cluster:  9100



100%|██████████| 20/20 [00:00<00:00, 136.50it/s]

⤢ Current iteration of Mini-Batch Random Covering:  51
Remaining points to cluster:  9080



100%|██████████| 20/20 [00:00<00:00, 147.05it/s]

⤢ Current iteration of Mini-Batch Random Covering:  52
Remaining points to cluster:  9060



100%|██████████| 20/20 [00:00<00:00, 122.25it/s]

⤢ Current iteration of Mini-Batch Random Covering:  53
Remaining points to cluster:  9040



100%|██████████| 20/20 [00:00<00:00, 138.82it/s]

⤢ Current iteration of Mini-Batch Random Covering:  54
Remaining points to cluster:  9020



100%|██████████| 20/20 [00:00<00:00, 123.64it/s]

⤢ Current iteration of Mini-Batch Random Covering:  55
Remaining points to cluster:  9000



100%|██████████| 20/20 [00:00<00:00, 147.56it/s]

⤢ Current iteration of Mini-Batch Random Covering:  56
Remaining points to cluster:  8980



100%|██████████| 20/20 [00:00<00:00, 141.26it/s]

⤢ Current iteration of Mini-Batch Random Covering:  57
Remaining points to cluster:  8960



 50%|█████     | 10/20 [00:00<00:00, 94.85it/s]

⤢ Current iteration of Mini-Batch Random Covering:  58
Remaining points to cluster:  8940


100%|██████████| 20/20 [00:00<00:00, 99.62it/s]
100%|██████████| 20/20 [00:00<00:00, 132.59it/s]

⤢ Current iteration of Mini-Batch Random Covering:  59
Remaining points to cluster:  8920



100%|██████████| 20/20 [00:00<00:00, 138.12it/s]

⤢ Current iteration of Mini-Batch Random Covering:  60
Remaining points to cluster:  8900



100%|██████████| 20/20 [00:00<00:00, 113.16it/s]

⤢ Current iteration of Mini-Batch Random Covering:  61
Remaining points to cluster:  8880



100%|██████████| 20/20 [00:00<00:00, 117.68it/s]

⤢ Current iteration of Mini-Batch Random Covering:  62
Remaining points to cluster:  8860



100%|██████████| 20/20 [00:00<00:00, 103.35it/s]

⤢ Current iteration of Mini-Batch Random Covering:  63
Remaining points to cluster:  8840



100%|██████████| 20/20 [00:00<00:00, 128.33it/s]

⤢ Current iteration of Mini-Batch Random Covering:  64
Remaining points to cluster:  8820



100%|██████████| 20/20 [00:00<00:00, 113.37it/s]

⤢ Current iteration of Mini-Batch Random Covering:  65
Remaining points to cluster:  8800



100%|██████████| 20/20 [00:00<00:00, 122.30it/s]

⤢ Current iteration of Mini-Batch Random Covering:  66
Remaining points to cluster:  8780



100%|██████████| 20/20 [00:00<00:00, 134.49it/s]

⤢ Current iteration of Mini-Batch Random Covering:  67
Remaining points to cluster:  8760



100%|██████████| 20/20 [00:00<00:00, 140.44it/s]

⤢ Current iteration of Mini-Batch Random Covering:  68
Remaining points to cluster:  8740



100%|██████████| 20/20 [00:00<00:00, 122.10it/s]

⤢ Current iteration of Mini-Batch Random Covering:  69
Remaining points to cluster:  8720



 40%|████      | 8/20 [00:00<00:00, 77.04it/s]

⤢ Current iteration of Mini-Batch Random Covering:  70
Remaining points to cluster:  8700


100%|██████████| 20/20 [00:00<00:00, 72.75it/s]
100%|██████████| 20/20 [00:00<00:00, 127.52it/s]

⤢ Current iteration of Mini-Batch Random Covering:  71
Remaining points to cluster:  8680



100%|██████████| 20/20 [00:00<00:00, 126.45it/s]

⤢ Current iteration of Mini-Batch Random Covering:  72
Remaining points to cluster:  8660



100%|██████████| 20/20 [00:00<00:00, 138.30it/s]

⤢ Current iteration of Mini-Batch Random Covering:  73
Remaining points to cluster:  8640



100%|██████████| 20/20 [00:00<00:00, 108.74it/s]

⤢ Current iteration of Mini-Batch Random Covering:  74
Remaining points to cluster:  8620



100%|██████████| 20/20 [00:00<00:00, 105.30it/s]

⤢ Current iteration of Mini-Batch Random Covering:  75
Remaining points to cluster:  8600



100%|██████████| 20/20 [00:00<00:00, 122.36it/s]

⤢ Current iteration of Mini-Batch Random Covering:  76
Remaining points to cluster:  8580



100%|██████████| 20/20 [00:00<00:00, 131.43it/s]

⤢ Current iteration of Mini-Batch Random Covering:  77
Remaining points to cluster:  8560



100%|██████████| 20/20 [00:00<00:00, 111.85it/s]

⤢ Current iteration of Mini-Batch Random Covering:  78
Remaining points to cluster:  8540



100%|██████████| 20/20 [00:00<00:00, 118.37it/s]

⤢ Current iteration of Mini-Batch Random Covering:  79
Remaining points to cluster:  8520



100%|██████████| 20/20 [00:00<00:00, 110.68it/s]

⤢ Current iteration of Mini-Batch Random Covering:  80
Remaining points to cluster:  8500



100%|██████████| 20/20 [00:00<00:00, 133.23it/s]

⤢ Current iteration of Mini-Batch Random Covering:  81
Remaining points to cluster:  8480



100%|██████████| 20/20 [00:00<00:00, 101.69it/s]

⤢ Current iteration of Mini-Batch Random Covering:  82
Remaining points to cluster:  8460



100%|██████████| 20/20 [00:00<00:00, 103.43it/s]

⤢ Current iteration of Mini-Batch Random Covering:  83
Remaining points to cluster:  8440



100%|██████████| 20/20 [00:00<00:00, 147.04it/s]

⤢ Current iteration of Mini-Batch Random Covering:  84
Remaining points to cluster:  8420



100%|██████████| 20/20 [00:00<00:00, 113.06it/s]

⤢ Current iteration of Mini-Batch Random Covering:  85
Remaining points to cluster:  8400



100%|██████████| 20/20 [00:00<00:00, 132.10it/s]

⤢ Current iteration of Mini-Batch Random Covering:  86
Remaining points to cluster:  8380



100%|██████████| 20/20 [00:00<00:00, 116.15it/s]

⤢ Current iteration of Mini-Batch Random Covering:  87
Remaining points to cluster:  8360



100%|██████████| 20/20 [00:00<00:00, 122.52it/s]

⤢ Current iteration of Mini-Batch Random Covering:  88
Remaining points to cluster:  8340



100%|██████████| 20/20 [00:00<00:00, 114.10it/s]

⤢ Current iteration of Mini-Batch Random Covering:  89
Remaining points to cluster:  8320



100%|██████████| 20/20 [00:00<00:00, 102.13it/s]

⤢ Current iteration of Mini-Batch Random Covering:  90
Remaining points to cluster:  8300



100%|██████████| 20/20 [00:00<00:00, 129.74it/s]

⤢ Current iteration of Mini-Batch Random Covering:  91
Remaining points to cluster:  8280



100%|██████████| 20/20 [00:00<00:00, 150.16it/s]

⤢ Current iteration of Mini-Batch Random Covering:  92
Remaining points to cluster:  8260



100%|██████████| 20/20 [00:00<00:00, 105.72it/s]

⤢ Current iteration of Mini-Batch Random Covering:  93
Remaining points to cluster:  8240



100%|██████████| 20/20 [00:00<00:00, 120.97it/s]

⤢ Current iteration of Mini-Batch Random Covering:  94
Remaining points to cluster:  8220



100%|██████████| 20/20 [00:00<00:00, 132.61it/s]

⤢ Current iteration of Mini-Batch Random Covering:  95
Remaining points to cluster:  8200



100%|██████████| 20/20 [00:00<00:00, 104.88it/s]

⤢ Current iteration of Mini-Batch Random Covering:  96
Remaining points to cluster:  8180



100%|██████████| 20/20 [00:00<00:00, 114.94it/s]

⤢ Current iteration of Mini-Batch Random Covering:  97
Remaining points to cluster:  8160



100%|██████████| 20/20 [00:00<00:00, 117.25it/s]

⤢ Current iteration of Mini-Batch Random Covering:  98
Remaining points to cluster:  8140



100%|██████████| 20/20 [00:00<00:00, 115.42it/s]

⤢ Current iteration of Mini-Batch Random Covering:  99
Remaining points to cluster:  8120



100%|██████████| 20/20 [00:00<00:00, 147.01it/s]

⤢ Current iteration of Mini-Batch Random Covering:  100
Remaining points to cluster:  8100



100%|██████████| 20/20 [00:00<00:00, 121.86it/s]

⤢ Current iteration of Mini-Batch Random Covering:  101
Remaining points to cluster:  8080



 25%|██▌       | 5/20 [00:00<00:00, 47.07it/s]

⤢ Current iteration of Mini-Batch Random Covering:  102
Remaining points to cluster:  8060


100%|██████████| 20/20 [00:00<00:00, 66.65it/s]
100%|██████████| 20/20 [00:00<00:00, 122.49it/s]

⤢ Current iteration of Mini-Batch Random Covering:  103
Remaining points to cluster:  8040



100%|██████████| 20/20 [00:00<00:00, 127.24it/s]

⤢ Current iteration of Mini-Batch Random Covering:  104
Remaining points to cluster:  8020



100%|██████████| 20/20 [00:00<00:00, 112.30it/s]

⤢ Current iteration of Mini-Batch Random Covering:  105
Remaining points to cluster:  8000



100%|██████████| 20/20 [00:00<00:00, 114.65it/s]

⤢ Current iteration of Mini-Batch Random Covering:  106
Remaining points to cluster:  7980



100%|██████████| 20/20 [00:00<00:00, 110.31it/s]

⤢ Current iteration of Mini-Batch Random Covering:  107
Remaining points to cluster:  7960



100%|██████████| 20/20 [00:00<00:00, 137.18it/s]

⤢ Current iteration of Mini-Batch Random Covering:  108
Remaining points to cluster:  7940



100%|██████████| 20/20 [00:00<00:00, 118.93it/s]

⤢ Current iteration of Mini-Batch Random Covering:  109
Remaining points to cluster:  7920



100%|██████████| 20/20 [00:00<00:00, 134.73it/s]

⤢ Current iteration of Mini-Batch Random Covering:  110
Remaining points to cluster:  7900



100%|██████████| 20/20 [00:00<00:00, 111.12it/s]

⤢ Current iteration of Mini-Batch Random Covering:  111
Remaining points to cluster:  7880



100%|██████████| 20/20 [00:00<00:00, 114.71it/s]

⤢ Current iteration of Mini-Batch Random Covering:  112
Remaining points to cluster:  7860



100%|██████████| 20/20 [00:00<00:00, 111.52it/s]

⤢ Current iteration of Mini-Batch Random Covering:  113
Remaining points to cluster:  7840



100%|██████████| 20/20 [00:00<00:00, 122.05it/s]

⤢ Current iteration of Mini-Batch Random Covering:  114
Remaining points to cluster:  7820



100%|██████████| 20/20 [00:00<00:00, 111.94it/s]

⤢ Current iteration of Mini-Batch Random Covering:  115
Remaining points to cluster:  7800



 55%|█████▌    | 11/20 [00:00<00:00, 102.29it/s]

⤢ Current iteration of Mini-Batch Random Covering:  116
Remaining points to cluster:  7780


100%|██████████| 20/20 [00:00<00:00, 97.11it/s] 
100%|██████████| 20/20 [00:00<00:00, 148.93it/s]

⤢ Current iteration of Mini-Batch Random Covering:  117
Remaining points to cluster:  7760



100%|██████████| 20/20 [00:00<00:00, 113.32it/s]

⤢ Current iteration of Mini-Batch Random Covering:  118
Remaining points to cluster:  7740



100%|██████████| 20/20 [00:00<00:00, 130.29it/s]

⤢ Current iteration of Mini-Batch Random Covering:  119
Remaining points to cluster:  7720



100%|██████████| 20/20 [00:00<00:00, 151.90it/s]

⤢ Current iteration of Mini-Batch Random Covering:  120
Remaining points to cluster:  7700



100%|██████████| 20/20 [00:00<00:00, 147.94it/s]

⤢ Current iteration of Mini-Batch Random Covering:  121
Remaining points to cluster:  7680



100%|██████████| 20/20 [00:00<00:00, 153.31it/s]

⤢ Current iteration of Mini-Batch Random Covering:  122
Remaining points to cluster:  7660





KeyboardInterrupt: 

---

### Train Deep Classifier

In this step, we train a deep (feed-forward) classifier:
$$
\hat{f}\triangleq \operatorname{Softmax}_N\circ W_J\circ \sigma \bullet \dots \sigma \bullet W_1,
$$
to identify which barycenter we are closest to.

Re-Load Grid and Redefine Relevant Input/Output dimensions in dictionary.

#### Train Deep Classifier

In [None]:
# Re-Load Hyper-parameter Grid
exec(open('CV_Grid.py').read())
# Re-Load Classifier Function(s)
exec(open('Helper_Functions.py').read())

In [None]:
# Redefine (Dimension-related) Elements of Grid
param_grid_Deep_Classifier['input_dim'] = [2]
param_grid_Deep_Classifier['output_dim'] = [N_Quantizers_to_parameterize]

# Train simple deep classifier
predicted_classes_train, predicted_classes_test, N_params_deep_classifier = build_simple_deep_classifier(n_folds = CV_folds, 
                                                                                                        n_jobs = n_jobs, 
                                                                                                        n_iter = n_iter, 
                                                                                                        param_grid_in=param_grid_Deep_Classifier, 
                                                                                                        X_train = X_train, 
                                                                                                        y_train = Classifer_Wasserstein_Centers.T,
                                                                                                        X_test = X_test)

#### Get Predicted Quantized Distributions
- Each *row* of "Predicted_Weights" is the $\beta\in \Delta_N$.
- Each *Column* of "Barycenters_Array" denotes the $x_1,\dots,x_N$ making up the points of the corresponding empirical measures.

In [None]:
# Format Weights
## Train
print("#---------------------------------------#")
print("Building Training Set (Regression): START")
print("#---------------------------------------#")
Predicted_Weights = np.array([])
for i in tqdm(range(N_Quantizers_to_parameterize)):    
    b = np.repeat(np.array(predicted_classes_train[:,i],dtype='float').reshape(-1,1),N_Monte_Carlo_Samples,axis=-1)
    b = b/N_Monte_Carlo_Samples
    if i ==0 :
        Predicted_Weights = b
    else:
        Predicted_Weights = np.append(Predicted_Weights,b,axis=1)
print("#-------------------------------------#")
print("Building Training Set (Regression): END")
print("#-------------------------------------#")

## Test
print("#-------------------------------------#")
print("Building Test Set (Predictions): START")
print("#-------------------------------------#")
Predicted_Weights_test = np.array([])
for i in tqdm(range(N_Quantizers_to_parameterize)):
    b_test = np.repeat(np.array(predicted_classes_test[:,i],dtype='float').reshape(-1,1),N_Monte_Carlo_Samples,axis=-1)
    b_test = b_test/N_Monte_Carlo_Samples
    if i ==0 :
        Predicted_Weights_test = b_test
    else:
        Predicted_Weights_test = np.append(Predicted_Weights_test,b_test,axis=1)
print("#-----------------------------------#")
print("Building Test Set (Predictions): END")
print("#-----------------------------------#")
        
# Format Points of Mass
print("#-----------------------------#")
print("Building Barycenters Set: START")
print("#-----------------------------#")
Barycenters_Array = Barycenters_Array.T.reshape(-1,)
print("#-----------------------------#")
print("Building Barycenters Set: END")
print("#-----------------------------#")

#### Stop Timer

In [None]:
# Stop Timer
Type_A_timer_end = time.time()
# Compute Lapsed Time Needed For Training
Time_Lapse_Model_A = Type_A_timer_end - Type_A_timer_Begin

## Get Moment Predictions

#### Write Predictions

### Training-Set Result(s): 

In [None]:
print("Building Training Set Performance Metrics")

# Initialize Wasserstein-1 Error Distribution
W1_errors = np.array([])
Mean_errors = np.array([])
Var_errors = np.array([])
Skewness_errors = np.array([])
Kurtosis_errors = np.array([])
#---------------------------------------------------------------------------------------------#

# Populate Error Distribution
for x_i in tqdm(range(len(measures_locations_list)-1)):    
    # Get Laws
    W1_loop = ot.emd2_1d(Barycenters_Array,
                         np.array(measures_locations_list[x_i]).reshape(-1,),
                         Predicted_Weights[x_i,].reshape(-1,),
                         measure_weights.reshape(-1,))
    W1_errors = np.append(W1_errors,W1_loop)
    # Get Means
    Mu_hat = np.sum((Predicted_Weights[x_i])*(Barycenters_Array))
    Mu = np.mean(np.array(measures_locations_list[x_i]))
    Mean_errors =  np.append(Mean_errors,(Mu_hat-Mu))
    # Get Var (non-centered)
    Var_hat = np.sum((Barycenters_Array**2)*(Predicted_Weights[x_i]))
    Var = np.mean(np.array(measures_locations_list[x_i])**2)
    Var_errors = np.append(Var_errors,(Var_hat-Var)**2)
    # Get skewness (non-centered)
    Skewness_hat = np.sum((Barycenters_Array**3)*(Predicted_Weights[x_i]))
    Skewness = np.mean(np.array(measures_locations_list[x_i])**3)
    Skewness_errors = np.append(Skewness_errors,(abs(Skewness_hat-Skewness))**(1/3))
    # Get skewness (non-centered)
    Kurtosis_hat = np.sum((Barycenters_Array**4)*(Predicted_Weights[x_i]))
    Kurtosis = np.mean(np.array(measures_locations_list[x_i])**4)
    Kurtosis_errors = np.append(Kurtosis_errors,(abs(Kurtosis_hat-Kurtosis))**.25)
    
#---------------------------------------------------------------------------------------------#
# Compute Error Statistics/Descriptors
W1_Performance = np.array([np.min(np.abs(W1_errors)),np.mean(np.abs(W1_errors)),np.max(np.abs(W1_errors))])
Mean_prediction_Performance = np.array([np.min(np.abs(Mean_errors)),np.mean(np.abs(Mean_errors)),np.max(np.abs(Mean_errors))])
Var_prediction_Performance = np.array([np.min(np.abs(Var_errors)),np.mean(np.abs(Var_errors)),np.max(np.abs(Var_errors))])
Skewness_prediction_Performance = np.array([np.min(np.abs(Skewness_errors)),np.mean(np.abs(Skewness_errors)),np.max(np.abs(Skewness_errors))])
Kurtosis_prediction_Performance = np.array([np.min(np.abs(Kurtosis_errors)),np.mean(np.abs(Kurtosis_errors)),np.max(np.abs(Kurtosis_errors))])

Type_A_Prediction = pd.DataFrame({"W1":W1_Performance,
                                  "E[X']-E[X]":Mean_prediction_Performance,
                                  "(E[X'^2]-E[X^2])^.5":Var_prediction_Performance,
                                  "(E[X'^3]-E[X^3])^(1/3)":Skewness_prediction_Performance,
                                  "(E[X'^4]-E[X^4])^.25":Kurtosis_prediction_Performance},index=["Min","MAE","Max"])

# Write Performance
Type_A_Prediction.to_latex((results_tables_path+str("Roughness_")+str(Rougness)+str("__RatiofBM_")+str(Ratio_fBM_to_typical_vol)+
 "__TypeAPrediction_Train.tex"))


#---------------------------------------------------------------------------------------------#
# Update User
print(Type_A_Prediction)

---

### Test-Set Result(s): 

In [None]:
print("Building Test Set Performance Metrics")

# Initialize Wasserstein-1 Error Distribution
W1_errors_test = np.array([])
Mean_errors_test = np.array([])
Var_errors_test = np.array([])
Skewness_errors_test = np.array([])
Kurtosis_errors_test = np.array([])
#---------------------------------------------------------------------------------------------#

# Populate Error Distribution
for x_i in tqdm(range(len(measures_locations_test_list)-1)):    
    # Get Laws
    W1_loop_test = ot.emd2_1d(Barycenters_Array,
                         np.array(measures_locations_test_list[x_i]).reshape(-1,),
                         Predicted_Weights_test[x_i,].reshape(-1,),
                         measure_weights_test.reshape(-1,))
    W1_errors_test = np.append(W1_errors_test,W1_loop_test)
    # Get Means
    Mu_hat_test = np.sum((Predicted_Weights_test[x_i])*(Barycenters_Array))
    Mu_test = np.mean(np.array(measures_locations_test_list[x_i]))
    Mean_errors_test =  np.append(Mean_errors_test,(Mu_hat_test-Mu_test))
    # Get Var (non-centered)
    Var_hat_test = np.sum((Barycenters_Array**2)*(Predicted_Weights_test[x_i]))
    Var_test = np.mean(np.array(measures_locations_test_list[x_i])**2)
    Var_errors_test = np.append(Var_errors_test,(Var_hat_test-Var_test)**2)
    # Get skewness (non-centered)
    Skewness_hat_test = np.sum((Barycenters_Array**3)*(Predicted_Weights_test[x_i]))
    Skewness_test = np.mean(np.array(measures_locations_test_list[x_i])**3)
    Skewness_errors_test = np.append(Skewness_errors_test,(abs(Skewness_hat_test-Skewness_test))**(1/3))
    # Get skewness (non-centered)
    Kurtosis_hat_test = np.sum((Barycenters_Array**4)*(Predicted_Weights_test[x_i]))
    Kurtosis_test = np.mean(np.array(measures_locations_test_list[x_i])**4)
    Kurtosis_errors_test = np.append(Kurtosis_errors_test,(abs(Kurtosis_hat_test-Kurtosis_test))**.25)
    
#---------------------------------------------------------------------------------------------#
# Compute Error Statistics/Descriptors
W1_Performance_test = np.array([np.min(np.abs(W1_errors_test)),np.mean(np.abs(W1_errors_test)),np.mean(np.abs(W1_errors_test))])
Mean_prediction_Performance_test = np.array([np.min(np.abs(Mean_errors_test)),np.mean(np.abs(Mean_errors_test)),np.mean(np.abs(Mean_errors_test))])
Var_prediction_Performance_test = np.array([np.min(np.abs(Var_errors_test)),np.mean(np.abs(Var_errors_test)),np.mean(np.abs(Var_errors_test))])
Skewness_prediction_Performance_test = np.array([np.min(np.abs(Skewness_errors_test)),np.mean(np.abs(Skewness_errors_test)),np.mean(np.abs(Skewness_errors_test))])
Kurtosis_prediction_Performance_test = np.array([np.min(np.abs(Kurtosis_errors_test)),np.mean(np.abs(Kurtosis_errors_test)),np.mean(np.abs(Kurtosis_errors_test))])

Type_A_Prediction_test = pd.DataFrame({"W1":W1_Performance_test,
                                  "E[X']-E[X]":Mean_prediction_Performance_test,
                                  "(E[X'^2]-E[X^2])^.5":Var_prediction_Performance_test,
                                  "(E[X'^3]-E[X^3])^(1/3)":Skewness_prediction_Performance_test,
                                  "(E[X'^4]-E[X^4])^.25":Kurtosis_prediction_Performance_test},index=["Min","MAE","Max"])

# Write Performance
Type_A_Prediction_test.to_latex((results_tables_path+str("Roughness_")+str(Rougness)+str("__RatiofBM_")+str(Ratio_fBM_to_typical_vol)+
 "__TypeAPrediction_Test.tex"))

## Update User

### Print for Terminal Legibility

In [None]:
print("#----------------------#")
print("Training-Set Performance")
print("#----------------------#")
print(Type_A_Prediction)
print(" ")
print(" ")
print(" ")

print("#------------------#")
print("Test-Set Performance")
print("#------------------#")
print(Type_A_Prediction_test)
print(" ")
print(" ")
print(" ")

### Facts of Simulation Experiment:

In [None]:
# Update User
print("====================")
print(" Experiment's Facts ")
print("====================")
print("------------------------------------------------------")
print("=====")
print("Model")
print("=====")
print("\u2022 N Centers:",N_Quantizers_to_parameterize)
print("\u2022 Each Wasserstein-1 Ball should contain: ",
      N_Elements_Per_Cluster, 
      "elements from the training set.")
print("------------------------------------------------------")
print("========")
print("Training")
print("========")
print("\u2022 Data-size:",(len(x_Grid)*len(t_Grid)))
print("\u2022 N Points per training datum:",N_Monte_Carlo_Samples)
print("------------------------------------------------------")
print("=======")
print("Testing")
print("=======")
print("\u2022 Data-size Test:",(len(x_Grid_test)*len(t_Grid_test)))
print("\u2022 N Points per testing datum:",N_Monte_Carlo_Samples_Test)
print("------------------------------------------------------")
print("------------------------------------------------------")

### Training-Set Performance

In [None]:
Type_A_Prediction

### Test-Set Performance

In [None]:
Type_A_Prediction_test

# Visualization

In [None]:
# Get Testing Predictions
Mu_hat = np.array([])
Mu = np.array([])
# Populate Error Distribution
for x_i in tqdm(range(len(measures_locations_list)-1)):    
    # Get Laws
    Mu_hat = np.append(Mu_hat,np.sum((Predicted_Weights[x_i])*(Barycenters_Array)))
    Mu = np.append(Mu,np.mean(np.array(measures_locations_list[x_i])))

# Get Training Predictions
Mu_hat_test = np.array([])
Mu_test = np.array([])
Var_hat_test = np.array([])
# Populate Error Distribution
for x_i in tqdm(range(len(measures_locations_test_list)-1)):    
    # Get Laws
    Mu_hat_test = np.append(Mu_hat_test,np.sum((Predicted_Weights_test[x_i])*(Barycenters_Array)))
    Mu_test = np.append(Mu_test,np.mean(np.array(measures_locations_test_list[x_i])))
    ## Error Bands
    Var_hat_test = np.append(Var_hat_test,np.sqrt(np.sum((Barycenters_Array**2)*(Predicted_Weights_test[x_i]))))

In [None]:
X_train

---

---
# Fin
---

---