# Generic Conditional Laws for Random-Fields - via:

## Universal $\mathcal{P}_1(\mathbb{R})$-Deep Neural Model $\mathcal{NN}_{1_{\mathbb{R}^n},\mathcal{D}}^{\sigma:\star}$.

---

By: [Anastasis Kratsios](https://people.math.ethz.ch/~kratsioa/) - 2021.

---

In [1]:
# Software/Hardware Testing or Real-Deal?
trial_run = False

---
# Training Algorithm:
---
- Random $\delta$-bounded partition on input space,
- Train deep classifier on infered classes.
---
---
---
## Notes - Why the procedure is so computationally efficient?
---
 - The sample barycenters do not require us to solve for any new Wasserstein-1 Barycenters; which is much more computationally costly,
 - Our training procedure never back-propages through $\mathcal{W}_1$ since steps 2 and 3 are full-decoupled.  Therefore, training our deep classifier is (comparatively) cheap since it takes values in the standard $N$-simplex.

---

## Load Auxiliaries

In [2]:
# Load Packages/Modules
exec(open('Init_Dump.py').read())
# Load Hyper-parameter Grid
exec(open('CV_Grid.py').read())
# Load Helper Function(s)
exec(open('Helper_Functions.py').read())
# Import time separately
import time


# load dataset
results_path = "./outputs/models/"
results_tables_path = "./outputs/results/"
raw_data_path_folder = "./inputs/raw/"
data_path_folder = "./inputs/data/"


### Set Seed
random.seed(2021)
np.random.seed(2021)
tf.random.set_seed(2021)

Using TensorFlow backend.


Deep Feature Builder - Ready
Deep Classifier - Ready


## Meta-Parameters

### Simulation

#### Grid Hyperparameter(s)
- Ratio $\frac{\text{Testing Datasize}}{\text{Training Datasize}}$.
- Number of Training Points to Generate

In [3]:
train_test_ratio = .2
N_train_size = 10**3

Monte-Carlo Paramters

In [4]:
## Monte-Carlo
# N_Euler_Maruyama_Steps = 2
N_Monte_Carlo_Samples = 10**3

# End times for Time-Grid
T_end = 1

Initial radis of $\delta$-bounded random partition of $\mathcal{X}$!

In [5]:
# Hyper-parameters of Cover
delta = 0.001
Proportion_per_cluster = .1

**Note**: Setting *N_Quantizers_to_parameterize* prevents any barycenters and sub-sampling.

In [6]:
trial_run = True

## Problem Dimension

In [7]:
problem_dim = 5
width = 10

# Simulate from: $Y=f(X,U) = f_{\text{unknown}}(X+U)$ 
*Non-linear dependance on exhaugenous noise.*

In [20]:
# Easy
# def f_unknown(x_in):
#     return np.sum(x_in)

# Hard
W_2a = np.random.uniform(size=np.array([1,width]),low=-.5,high=.5)
W_1a = np.random.uniform(size=np.array([width,problem_dim]),low=-.5,high=.5)
def f_unknown(x):
    x_internal = x.reshape(-1,)
    x_internal = np.matmul(W_1a,x_internal)
    x_internal = np.matmul(W_2a,np.cos(x_internal))
    return x_internal

In [21]:
def Simulator(x_in):
    # Initialize Gaussian
#     cov = np.diag(np.ones(problem_dim))*0.01
#     ready = np.random.multivariate_normal(x_in, cov, N_Monte_Carlo_Samples)
#     f_x = np.apply_along_axis(f_unknown, 1, x_in)
    # Pushforward
    f_x = f_unknown(x_in)
    # Apply Noise After
    f_x_noise = f_x + np.random.normal(0,0.01,N_Monte_Carlo_Samples)
    return f_x_noise

## Initialize Data

In [23]:
N_test_size = int(np.round(N_train_size*train_test_ratio,0))

### Initialize Training Data (Inputs)

Try initial sampling-type implementation!  It worked nicely..i.e.: centers were given!

In [24]:
# Get Training Set
X_train = np.random.uniform(size=np.array([N_train_size,problem_dim]),low=.5,high=1.5)

# Get Testing Set
test_set_indices = np.random.choice(range(X_train.shape[0]),N_test_size)
X_test = X_train[test_set_indices,]
X_test = X_test + np.random.uniform(low=-(delta/np.sqrt(problem_dim)), 
                                    high = -(delta/np.sqrt(problem_dim)),
                                    size = X_test.shape)

In [25]:
from sklearn.cluster import KMeans

In [26]:
# Initialize k_means
N_Quantizers_to_parameterize = int(round(Proportion_per_cluster*X_train.shape[0]))
kmeans = KMeans(n_clusters=N_Quantizers_to_parameterize, random_state=0).fit(X_train)
# Get Classes
Train_classes = np.array(pd.get_dummies(kmeans.labels_))
# Get Center Measures
Barycenters_Array_x = kmeans.cluster_centers_

### Get Barycenters
*Here we make the assumption that we can directly resample $f(X=x,U)$ if necessary...or that it is available as part of the dataset.*

In [27]:
for i in tqdm(range(Barycenters_Array_x.shape[0])):
    # Put Datum
    Bar_x_loop = Barycenters_Array_x[i,]
    # Product Monte-Carlo Sample for Input
    Bar_y_loop = (Simulator(Bar_x_loop)).reshape(1,-1)

    # Update Dataset
    if i == 0:
        Barycenters_Array = Bar_y_loop
    else:
        Barycenters_Array = np.append(Barycenters_Array,Bar_y_loop,axis=0)

100%|██████████| 100/100 [00:00<00:00, 2942.46it/s]


### Initialize Training Data (Outputs)

#### Get Training Set

In [28]:
for i in tqdm(range(X_train.shape[0])):
    # Put Datum
    x_loop = X_train[i,]
    # Product Monte-Carlo Sample for Input
    y_loop = (Simulator(x_loop)).reshape(1,-1)

    # Update Dataset
    if i == 0:
        Y_train = y_loop
    else:
        Y_train = np.append(Y_train,y_loop,axis=0)

100%|██████████| 1000/1000 [00:01<00:00, 502.55it/s]


#### Get Test Set

In [29]:
# Start Timer
Test_Set_PredictionTime_MC = time.time()

# Generate Data
for i in tqdm(range(X_test.shape[0])):
    # Put Datum
    x_loop = X_test[i,]
    # Product Monte-Carlo Sample for Input
    y_loop = (Simulator(x_loop)).reshape(1,-1)

    # Update Dataset
    if i == 0:
        Y_test = y_loop
    else:
        Y_test = np.append(Y_test,y_loop,axis=0)
        
# End Timer
Test_Set_PredictionTime_MC = time.time() - Test_Set_PredictionTime_MC

100%|██████████| 200/200 [00:00<00:00, 4727.89it/s]


# Train Model

#### Start Timer

In [30]:
# Start Timer
Type_A_timer_Begin = time.time()

### Train Deep Classifier

In this step, we train a deep (feed-forward) classifier:
$$
\hat{f}\triangleq \operatorname{Softmax}_N\circ W_J\circ \sigma \bullet \dots \sigma \bullet W_1,
$$
to identify which barycenter we are closest to.

#### Train Deep Classifier

Re-Load Packages and CV Grid

In [31]:
# Re-Load Hyper-parameter Grid
exec(open('CV_Grid.py').read())
# Re-Load Classifier Function(s)
exec(open('Helper_Functions.py').read())

Deep Feature Builder - Ready
Deep Classifier - Ready


Train Deep Classifier

In [32]:
print("==========================================")
print("Training Classifer Portion of Type-A Model")
print("==========================================")

# Redefine (Dimension-related) Elements of Grid
param_grid_Deep_Classifier['input_dim'] = [problem_dim]
param_grid_Deep_Classifier['output_dim'] = [N_Quantizers_to_parameterize]

# Train simple deep classifier
predicted_classes_train, predicted_classes_test, N_params_deep_classifier, timer_output = build_simple_deep_classifier(n_folds = CV_folds, 
                                                                                                        n_jobs = n_jobs, 
                                                                                                        n_iter = n_iter, 
                                                                                                        param_grid_in=param_grid_Deep_Classifier, 
                                                                                                        X_train = X_train, 
                                                                                                        y_train = Train_classes,
                                                                                                        X_test = X_test)

print("===============================================")
print("Training Classifer Portion of Type Model: Done!")
print("===============================================")

Training Classifer Portion of Type-A Model
Fitting 2 folds for each of 1 candidates, totalling 2 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    7.4s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    7.4s finished


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Training Classifer Portion of Type Model: Done!


#### Get Predicted Quantized Distributions
- Each *row* of "Predicted_Weights" is the $\beta\in \Delta_N$.
- Each *Column* of "Barycenters_Array" denotes the $x_1,\dots,x_N$ making up the points of the corresponding empirical measures.

In [33]:
# Initialize Empirical Weights
empirical_weights = (np.ones(N_Monte_Carlo_Samples)/N_Monte_Carlo_Samples).reshape(-1,)

for i in range(N_Quantizers_to_parameterize):
    if i == 0:
        points_of_mass = Barycenters_Array[i,]
    else:
        points_of_mass = np.append(points_of_mass,Barycenters_Array[i,])

In [34]:
direct_facts = np.apply_along_axis(f_unknown, 1, X_train)

In [35]:
print("#--------------------#")
print(" Get Training Error(s)")
print("#--------------------#")
for i in tqdm(range((X_train.shape[0]))):
    for j in range(N_Quantizers_to_parameterize):
        b_loop = np.repeat(predicted_classes_train[i,j],N_Monte_Carlo_Samples)
        if j == 0:
            b = b_loop
        else:
            b = np.append(b,b_loop)
        b = b.reshape(-1,1)
        b = b
    b = np.array(b,dtype=float).reshape(-1,)
    b = b/N_Monte_Carlo_Samples
    
    # Compute Error(s)
    ## W1
    W1_loop = ot.emd2_1d(points_of_mass,
                         np.array(Y_train[i,]).reshape(-1,),
                         b,
                         empirical_weights)
    
    ## M1
    Mu_hat = np.sum(b*(points_of_mass))
    Mu_MC = np.mean(np.array(Y_train[i,]))
    Mu = direct_facts[i,]
    ### Error(s)
    Mean_loop = (Mu_hat-Mu)
    Mean_loop_MC = (Mu_hat-Mu_MC)
    
    ## M2
    Var_hat = np.sum(((points_of_mass-Mu_hat)**2)*b)
    Var_MC = np.mean(np.array(Y_train[i]-Mu_MC)**2)
    Var = np.mean((direct_facts[i,]-Mu)**2)
    
    ### Error(s)
    Var_loop = np.abs(Var_hat-Var)
    Var_loop_MC = np.abs(Var_MC-Var)
    
    # Update
    if i == 0:
        W1_errors = W1_loop
        Mean_errors =  Mean_loop
        Var_errors = Var_loop
        Mean_errors_MC =  Mean_loop_MC
        Var_errors_MC = Var_loop_MC
        
        
    else:
        W1_errors = np.append(W1_errors,W1_loop)
        Mean_errors =  np.append(Mean_errors,Mean_loop)
        Var_errors = np.append(Var_errors,Var_loop)
        Mean_errors_MC =  np.append(Mean_errors_MC,Mean_loop_MC)
        Var_errors_MC = np.append(Var_errors_MC,Var_loop_MC)
        
print("#-------------------------#")
print(" Get Training Error(s): END")
print("#-------------------------#")

  0%|          | 0/1000 [00:00<?, ?it/s]

#--------------------#
 Get Training Error(s)
#--------------------#


100%|██████████| 1000/1000 [00:17<00:00, 56.43it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#





In [39]:
print("#----------------#")
print(" Get Test Error(s)")
print("#----------------#")
for i in tqdm(range((X_test.shape[0]))):
    for j in range(N_Quantizers_to_parameterize):
        b_loop = np.repeat(predicted_classes_test[i,j],N_Monte_Carlo_Samples)
        if j == 0:
            b = b_loop
        else:
            b = np.append(b,b_loop)
        b = b.reshape(-1,1)
        b = b
    b = np.array(b,dtype=float).reshape(-1,)
    b = b/N_Monte_Carlo_Samples
    
    # Compute Error(s)
    ## W1
    W1_loop = ot.emd2_1d(points_of_mass,
                         np.array(Y_test[i,]).reshape(-1,),
                         b,
                         empirical_weights)
    
    ## M1
    Mu_hat = np.sum(b*(points_of_mass))
    Mu_MC = np.mean(np.array(Y_test[i,]))
    Mu = direct_facts[i,]
    ### Error(s)
    Mean_loop = (Mu_hat-Mu)
    Mean_loop_MC = (Mu_hat-Mu_MC)
    
    ## M2
    Var_hat = np.sum(((points_of_mass-Mu_hat)**2)*b)
    Var_MC = np.mean(np.array(Y_test[i]-Mu_MC)**2)
    Var = np.mean((direct_facts[i,]-Mu)**2)
    
    ### Error(s)
    Var_loop = np.abs(Var_hat-Var)
    Var_loop_MC = np.abs(Var_MC-Var)
    
    # Update
    if i == 0:
        W1_errors_test = W1_loop
        Mean_errors_test =  Mean_loop
        Var_errors_test = Var_loop
        Mean_errors_MC_test =  Mean_loop_MC
        Var_errors_MC_test = Var_loop_MC
        
        
    else:
        W1_errors_test = np.append(W1_errors,W1_loop)
        Mean_errors_test =  np.append(Mean_errors,Mean_loop)
        Var_errors_test = np.append(Var_errors,Var_loop)
        Mean_errors_MC_test =  np.append(Mean_errors_MC,Mean_loop_MC)
        Var_errors_MC_test = np.append(Var_errors_MC,Var_loop_MC)
        
print("#-------------------------#")
print(" Get Training Error(s): END")
print("#-------------------------#")

  2%|▎         | 5/200 [00:00<00:04, 48.18it/s]

#----------------#
 Get Test Error(s)
#----------------#


100%|██████████| 200/200 [00:03<00:00, 56.23it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#





#### Stop Timer

In [40]:
# Stop Timer
Type_A_timer_end = time.time()
# Compute Lapsed Time Needed For Training
Time_Lapse_Model_A = Type_A_timer_end - Type_A_timer_Begin

## Get Moment Predictions

#### Write Predictions

### Training-Set Result(s): 

In [81]:
#---------------------------------------------------------------------------------------------#
W1_95 = bootstrap(W1_errors, n=1000, func=np.mean)(.95)
W1_99 = bootstrap(W1_errors, n=1000, func=np.mean)(.99)
#---------------------------------------------------------------------------------------------#
Model_Complexity = pd.DataFrame({"N_Centers":N_Quantizers_to_parameterize,
                                 "N_Q":N_Monte_Carlo_Samples,
                                 "N_Params":N_params_deep_classifier,
                                 "Training Time":Time_Lapse_Model_A,
                                 "T_Test/T_Test-MC": (timer_output/Test_Set_PredictionTime_MC),
                                 "Time Test": timer_output,
                                 "Time EM-MC": Test_Set_PredictionTime_MC},index=["Model_Complexity_metrics"])

#---------------------------------------------------------------------------------------------#
# Compute Error Statistics/Descriptors
## Train
W1_Performance = np.array([np.min(np.abs(W1_errors)),np.mean(np.abs(W1_errors)),np.max(np.abs(W1_errors))])
Mean_prediction_Performance = np.array([np.min(np.abs(Mean_errors)),np.mean(np.abs(Mean_errors)),np.max(np.abs(Mean_errors))])
Var_prediction_Performance = np.array([np.min(np.abs(Var_errors)),np.mean(np.abs(Var_errors)),np.max(np.abs(Var_errors))])
Mean_prediction_Performance_MC = np.array([np.min(np.abs(Mean_errors_MC)),np.mean(np.abs(Mean_errors_MC)),np.max(np.abs(Mean_errors_MC))])
Var_prediction_Performance_MC = np.array([np.min(np.abs(Var_errors_MC)),np.mean(np.abs(Var_errors_MC)),np.max(np.abs(Var_errors_MC))])
## Test
W1_Performance_test = np.array([np.min(np.abs(W1_errors_test)),np.mean(np.abs(W1_errors_test)),np.max(np.abs(W1_errors_test))])
Mean_prediction_Performance_test = np.array([np.min(np.abs(Mean_errors_test)),np.mean(np.abs(Mean_errors_test)),np.max(np.abs(Mean_errors_test))])
Var_prediction_Performance_test = np.array([np.min(np.abs(Var_errors_test)),np.mean(np.abs(Var_errors_test)),np.max(np.abs(Var_errors_test))])
Mean_prediction_Performance_MC_test = np.array([np.min(np.abs(Mean_errors_MC_test)),np.mean(np.abs(Mean_errors_MC_test)),np.max(np.abs(Mean_errors_MC_test))])
Var_prediction_Performance_MC_test = np.array([np.min(np.abs(Var_errors_MC_test)),np.mean(np.abs(Var_errors_MC_test)),np.max(np.abs(Var_errors_MC_test))])
#---------------------------------------------------------------------------------------------#

Type_A_Prediction = pd.DataFrame({"W1":W1_Performance,
                                  "M1":Mean_prediction_Performance,
                                  "M1/M1_MC":Mean_prediction_Performance/Mean_prediction_Performance_MC,
                                  "M2":Var_prediction_Performance,
                                  "M2/M2_MC":Var_prediction_Performance/Var_prediction_Performance_MC},index=["Min","MAE","Max"])
Type_A_Prediction_test = pd.DataFrame({"W1":W1_Performance_test,
                                  "M1":Mean_prediction_Performance_test,
                                  "M1/M1_MC":Mean_prediction_Performance_test/Mean_prediction_Performance_MC_test,
                                  "M2":Var_prediction_Performance_test,
                                  "M2/M2_MC":Var_prediction_Performance_test/Var_prediction_Performance_MC_test},index=["Min","MAE","Max"])

Type_A_Predictions_and_confidence = pd.DataFrame({"W1_99_Train":W1_95,
                                                  "W1error_99_Train":W1_99},index=["CL","Mean","CU"])

Summary = pd.DataFrame({"W1":[np.array(W1_Performance[1],W1_Performance_test[1])],
                        "M1":[np.array(Mean_prediction_Performance[1],
                                       Mean_prediction_Performance_test[1])],
                        "M1/M1_MC":[np.array(Mean_prediction_Performance[1]/Mean_prediction_Performance_MC[1],
                                             Mean_prediction_Performance_test[1]/Mean_prediction_Performance_MC_test[1])],
                        "M2":[np.array(Var_prediction_Performance[1],
                                       Var_prediction_Performance_test[1])],
                        "M2/M2_MC":[np.array(Var_prediction_Performance[1]/Var_prediction_Performance_MC[1],
                                             Var_prediction_Performance_test[1]/Var_prediction_Performance_MC_test[1])],
                        "N_Centers":np.array((N_Quantizers_to_parameterize,N_Quantizers_to_parameterize)),
                        "N_Q":np.array((N_Monte_Carlo_Samples,N_Monte_Carlo_Samples)),
                        "N_Params":np.array((N_params_deep_classifier,N_params_deep_classifier)),
                        "Training Time":np.array((Time_Lapse_Model_A,Time_Lapse_Model_A)),
                        "T_Test/T_Test-MC":np.array(((timer_output/Test_Set_PredictionTime_MC),(timer_output/Test_Set_PredictionTime_MC)))
                       },index=["Train","Test"])


# Write Performance Metrics to file #
#-----------------------------------#
pd.set_option('display.float_format', '{:.4E}'.format)
Type_A_Prediction.to_latex((results_tables_path+"Latent_Width_NSDE"+str(width)+"Problemdimension"+str(problem_dim)+"__TypeAPrediction_Train.tex"))
Type_A_Prediction_test.to_latex((results_tables_path+"Latent_Width_NSDE"+str(width)+"Problemdimension"+str(problem_dim)+"__TypeAPrediction_Train.tex"))
pd.set_option('display.float_format', '{:.4E}'.format)
(Type_A_Predictions_and_confidence.T).to_latex((results_tables_path+"Latent_Width_NSDE"+str(width)+"Problemdimension"+str(problem_dim)+"__TypeAPrediction_Train_predictions_w_confidence_intervals.tex"))
pd.set_option('display.float_format', '{:.4E}'.format)
Model_Complexity.to_latex((results_tables_path+"Latent_Width_NSDE"+str(width)+"Problemdimension"+str(problem_dim)+"__ModelComplexities.tex"))
pd.set_option('display.float_format', '{:.4E}'.format)
Summary.to_latex((results_tables_path+"Latent_Width_NSDE"+str(width)+"Problemdimension"+str(problem_dim)+"__SUMMARY_METRICS.tex"))


#---------------------------------------------------------------------------------------------#
# Update User
print(Type_A_Predictions_and_confidence)
print(Summary)

      W1_99_Train  W1error_99_Train
CL     1.9665E-03        1.9375E-03
Mean   2.0507E-03        2.0507E-03
CU     2.1345E-03        2.1584E-03
              W1         M1   M1/M1_MC         M2   M2/M2_MC  N_Centers   N_Q  \
Train 2.0507E-03 2.4562E-02 1.0008E+00 1.8708E-03 1.8699E+01        100  1000   
Test  2.0507E-03 2.4562E-02 1.0008E+00 1.8708E-03 1.8699E+01        100  1000   

       N_Params  Training Time  T_Test/T_Test-MC  
Train      2220     1.3967E+02        1.4443E+00  
Test       2220     1.3967E+02        1.4443E+00  


# Update User

## Training Set Prediction Quality

In [78]:
Summary

Unnamed: 0,W1,M1,M1/M1_MC,M2,M2/M2_MC,N_Centers,N_Q,N_Params,Training Time,T_Test/T_Test-MC
Train,0.0020507,0.024562,1.0008,0.0018708,18.699,"[100, 100]","[1000, 1000]","[2220, 2220]","[139.67454409599304, 139.67454409599304]","[1.4443448572420332, 1.4443448572420332]"
Test,0.0020507,0.024562,1.0008,0.0018708,18.699,"[100, 100]","[1000, 1000]","[2220, 2220]","[139.67454409599304, 139.67454409599304]","[1.4443448572420332, 1.4443448572420332]"


---

---
# Fin
---

---