# Deep Universal Regular Conditional Expectations:

---
This implements the universal deep neural model of $\mathcal{NN}_{1_{\mathbb{R}^n},\mathcal{D}}^{\sigma:\star}$ [Anastasis Kratsios](https://people.math.ethz.ch/~kratsioa/) - 2021.

---

## What does this code do?
1. Learn Heteroskedastic Non-Linear Regression Problem
     - $Y\sim f_{\text{unkown}}(x) + \epsilon$ where $f$ is an known function and $\epsilon\sim Laplace(0,\|x\|)$
2. Learn Random Bayesian Network's Law:
    - $Y = W_J Y^{J-1}, \qquad Y^{j}\triangleq \sigma\bullet A^{j}Y^{j-1} + b^{j}, \qquad Y^0\triangleq x$

3. In the above example if $A_j = M_j\odot \tilde{A_j}$ where $\tilde{A}_j$ is a deterministic matrix and $M_j$ is a "mask", that is, a random matrix with binary entries and $\odot$ is the Hadamard product then we recover the dropout framework.
4. Learn the probability distribution that the unique strong solution to the rough SDE with uniformly Lipschitz drivers driven by a factional Brownian motion with Hurst exponent $H \in [\frac1{2},1)$:
$$
X_t^x = x + \int_0^t \alpha(s,X_s^x)ds + \int_0^t \beta(s,X_s^x)dB_s^H
$$
belongs, at time $t=1$, to a ball about the initial point $x$ of random radius given by an independant exponential random-variable with shape parameter $\lambda=2$
5. Train a DNN to predict the returns of bitcoin with GD.  Since this has random initialization then each prediction of a given $x$ is stochastic...We learn the distribution of this conditional RV (conditioned on x in the input space).
$$
Y_x \triangleq \hat{f}_{\theta_{T}}(x), \qquad \theta_{(t+1)}\triangleq \theta_{(t)} + \lambda \sum_{x \in \mathbb{X}} \nabla_{\theta}\|\hat{f}_{\theta_t}(x) - f(x)\|, \qquad \theta_0 \sim N_d(0,1);
$$
$T\in \mathbb{N}$ is a fixed number of "SGD" iterations (typically identified by cross-validation on a single SGD trajectory for a single initialization) and where $\theta \in \mathbb{R}^{(d_{J}+1)+\sum_{j=0}^{J-1} (d_{j+1}d_j + 1)}$ and $d_j$ is the dimension of the "bias" vector $b_j$ defining each layer of the DNN with layer dimensions:
$$
\hat{f}_{\theta}(x)\triangleq A^{(J)}x^{(J)} + b^{(J)},\qquad x^{(j+1)}\triangleq \sigma\bullet A^{j}x^{(j)} + b^{j},\qquad x^{(0)}\triangleq x
.
$$

6. Extreme Learning Machines: 
    Just like the Bayesian network but then last layer is trained on the training set using KRidge!

#### Mode:
Software/Hardware Testing or Real-Deal?

In [15]:
trial_run = True

### Simulation Method:

In [16]:
# Random DNN
# f_unknown_mode = "Heteroskedastic_NonLinear_Regression"

# Random DNN internal noise
## Real-world data version
#f_unknown_mode = "Extreme_Learning_Machine"
### General Parameters
# activation_function == 'thresholding'
activation_function = 'sigmoid'
### Dataset Option 1
dataset_option = 'SnP'
### Dataset Option 2
# dataset_option = 'crypto'
Depth_Bayesian_DNN = 1
N_Random_Features = 10
## Simulated Data version
# f_unknown_mode = "DNN_with_Random_Weights"
width = 10

# Random Dropout applied to trained DNN
# f_unknown_mode = "DNN_with_Bayesian_Dropout"
Dropout_rate = 0.75

# GD with Randomized Input
# f_unknown_mode = "GD_with_randomized_input"
# GD_epochs = 50

# SDE with fractional Driver
# f_unknown_mode = "Rough_SDE"
N_Euler_Steps = 10**4
Hurst_Exponent = 0.5

f_unknown_mode = "Rough_SDE_Vanilla"
## Define Process' dynamics in (2) cell(s) below.

## Problem Dimension

In [17]:
problem_dim = 10
if f_unknown_mode != 'Extreme_Learning_Machine':
    width = int(2*(problem_dim+1))

#### Vanilla fractional SDE:
If f_unknown_mode == "Rough_SDE_Vanilla" is selected, then we can specify the process's dynamics.  

In [18]:
#--------------------------#
# Define Process' Dynamics #
#--------------------------#
drift_constant = 0.1
volatility_constant = 0.01

# Define DNN Applier
def f_unknown_drift_vanilla(x):
    x_internal = x
    x_internal = drift_constant*np.ones(problem_dim)
    return x_internal
def f_unknown_vol_vanilla(x):
    x_internal = volatility_constant*np.diag(np.ones(problem_dim))
    return x_internal

## Note: *Why the procedure is so computationally efficient*?
---
 - The sample barycenters do not require us to solve for any new Wasserstein-1 Barycenters; which is much more computationally costly,
 - Our training procedure never back-propages through $\mathcal{W}_1$ since steps 2 and 3 are full-decoupled.  Therefore, training our deep classifier is (comparatively) cheap since it takes values in the standard $N$-simplex.

---

#### Grid Hyperparameter(s)
- Ratio $\frac{\text{Testing Datasize}}{\text{Training Datasize}}$.
- Number of Training Points to Generate

In [19]:
train_test_ratio = .1
N_train_size = 5*(10**2)

Monte-Carlo Paramters

In [20]:
## Monte-Carlo
N_Monte_Carlo_Samples = 10**2

Initial radis of $\delta$-bounded random partition of $\mathcal{X}$!

In [21]:
# Hyper-parameters of Cover
delta = 0.1
Proportion_per_cluster = .5

## Dependencies and Auxiliary Script(s)

In [22]:
# %run Loader.ipynb
exec(open('Loader.py').read())
# Load Packages/Modules
exec(open('Init_Dump.py').read())
import time as time #<- Note sure why...but its always seems to need 'its own special loading...'

Deep Feature Builder - Ready
Deep Classifier - Ready
Deep Feature Builder - Ready


# Simulate or Parse Data

In [23]:
# %run Data_Simulator_and_Parser.ipynb
exec(open('Data_Simulator_and_Parser.py').read())

  0%|          | 0/5 [00:00<?, ?it/s]

---------------------------------------
Beginning Data-Parsing/Simulation Phase
---------------------------------------
Deep Feature Builder - Ready
Deep Classifier - Ready
Deep Feature Builder - Ready
Deep Feature Builder - Ready
Deep Classifier - Ready
Deep Feature Builder - Ready
Deciding on Which Simulator/Parser To Load
Setting/Defining: Internal Parameters
Deciding on Which Type of Data to Get/Simulate
Simulating Output Data for given input data


100%|██████████| 5/5 [00:02<00:00,  2.20it/s]
100%|██████████| 1/1 [00:00<00:00,  2.32it/s]

----------------------------------
Done Data-Parsing/Simulation Phase
----------------------------------





#### Scale Data
This is especially important to avoid exploding gradient problems when training the ML-models.

In [24]:
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Run Main:

In [59]:
print("------------------------------")
print("Running script for main model!")
print("------------------------------")
%run Universal_Measure_Valued_Networks_Backend.ipynb
# exec(open('Universal_Measure_Valued_Networks_Backend.py').read())

print("------------------------------------")
print("Done: Running script for main model!")
print("------------------------------------")

------------------------------
Running script for main model!
------------------------------


100%|██████████| 125/125 [00:00<00:00, 15706.18it/s]

Deep Feature Builder - Ready
Deep Classifier - Ready
Training Classifer Portion of Type-A Model
Fitting 2 folds for each of 1 candidates, totalling 2 fits



[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.7s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.7s finished


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Training Classifer Portion of Type Model: Done!


  0%|          | 0/250 [00:00<?, ?it/s]

#--------------------#
 Get Training Error(s)
#--------------------#


100%|██████████| 250/250 [00:31<00:00,  7.89it/s]
  2%|▏         | 1/50 [00:00<00:05,  8.37it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#
#----------------#
 Get Test Error(s)
#----------------#


100%|██████████| 50/50 [00:06<00:00,  8.08it/s]

#------------------------#
 Get Testing Error(s): END
#------------------------#
                                      DNM  MC-Oracle
W1-95L                           0.202849   0.000000
W1                               0.219712   0.000000
W1-95R                           0.234712   0.000000
M-95L                            0.191941   0.000000
M                                0.195918   0.000000
M-95R                            0.199908   0.000000
N_Par                          775.000000   0.000000
Train_Time                      42.649558   2.710179
Test_Time/MC-Oracle_Test_Time    0.127335   1.000000
------------------------------------
Done: Running script for main model!
------------------------------------





---
# Run: All Benchmarks

## 1) *Pointmass Benchmark(s)*
These benchmarks consist of subsets of $C(\mathbb{R}^d,\mathbb{R})$ which we lift to models in $C(\mathbb{R}^d,\cap_{1\leq q<\infty}\mathscr{P}_{q}(\mathbb{R}))$ via:
$$
\mathbb{R}^d \ni x \to f(x) \to \delta_{f(x)}\in \cap_{1\leq q<\infty}\mathcal{P}_{q}(\mathbb{R}).
$$

In [60]:
exec(open('CV_Grid.py').read())
# Notebook Mode:
# %run Evaluation.ipynb
# %run Benchmarks_Model_Builder_Pointmass_Based.ipynb
# Terminal Mode (Default):
exec(open('Evaluation.py').read())
exec(open('Benchmarks_Model_Builder_Pointmass_Based.py').read())

  0%|          | 0/3 [00:00<?, ?it/s]

Deep Feature Builder - Ready
--------------
Training: ENET
--------------


100%|██████████| 3/3 [00:12<00:00,  4.03s/it]
  7%|▋         | 17/250 [00:00<00:01, 164.76it/s]

---------------------
Training: ENET - Done
---------------------
#------------#
 Get Error(s) 
#------------#


100%|██████████| 250/250 [00:01<00:00, 175.21it/s]
 34%|███▍      | 17/50 [00:00<00:00, 162.34it/s]

#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#


100%|██████████| 50/50 [00:00<00:00, 167.02it/s]
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Batch computation too fast (0.0454s.) Setting batch_size=2.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:    0.1s remaining:    0.1s


#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                      DNM  MC-Oracle         ENET
W1-95L                           0.324933   0.000000            -
W1                               0.331775   0.000000            -
W1-95R                           0.340192   0.000000            -
M-95L                            0.627737   0.000000      306.297
M                                0.651914   0.000000       309.06
M-95R                            0.683217   0.000000      312.247
N_Par                          775.000000   0.000000         1008
Train_Time                      42.649558   2.710179  1.62076e+09
Test_Time/MC-Oracle_Test_Time    0.127335   1.000000  0.000585741
------------------------------------------------
Updated Performance Metrics Dataframe and Saved!
------------------------------------------------
-----------------
Training: K-Ridge
-----------------
Fitting 2 folds for each of 2 candidates, totalling 4 fits


[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    0.4s finished
  5%|▌         | 13/250 [00:00<00:01, 123.86it/s]

#------------#
 Get Error(s) 
#------------#


100%|██████████| 250/250 [00:01<00:00, 142.76it/s]
 34%|███▍      | 17/50 [00:00<00:00, 160.76it/s]

#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#


100%|██████████| 50/50 [00:00<00:00, 172.99it/s]
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Batch computation too fast (0.0328s.) Setting batch_size=2.
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Batch computation too fast (0.0427s.) Setting batch_size=2.
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Batch computation too fast (0.0108s.) Setting batch_size=2.
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.0s finished
  0%|          | 0/250 [00:00<

#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                      DNM  MC-Oracle         ENET      KRidge
W1-95L                           0.324933   0.000000            -           -
W1                               0.331775   0.000000            -           -
W1-95R                           0.340192   0.000000            -           -
M-95L                            0.627737   0.000000      306.297     305.098
M                                0.651914   0.000000       309.06     308.169
M-95R                            0.683217   0.000000      312.247     310.156
N_Par                          775.000000   0.000000         1008           0
Train_Time                      42.649558   2.710179  1.62076e+09     2.49863
Test_Time/MC-Oracle_Test_Time    0.127335   1.000000  0.000585741  0.00332947
------------------------------------------------
Updated Performance Metrics Dataframe and Saved!
--------------------------------------------

100%|██████████| 250/250 [00:01<00:00, 148.83it/s]
 22%|██▏       | 11/50 [00:00<00:00, 103.66it/s]

#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#


100%|██████████| 50/50 [00:00<00:00, 119.10it/s]
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                      DNM  MC-Oracle         ENET      KRidge  \
W1-95L                           0.324933   0.000000            -           -   
W1                               0.331775   0.000000            -           -   
W1-95R                           0.340192   0.000000            -           -   
M-95L                            0.627737   0.000000      306.297     305.098   
M                                0.651914   0.000000       309.06     308.169   
M-95R                            0.683217   0.000000      312.247     310.156   
N_Par                          775.000000   0.000000         1008           0   
Train_Time                      42.649558   2.710179  1.62076e+09     2.49863   
Test_Time/MC-Oracle_Test_Time    0.127335   1.000000  0.000585741  0.00332947   

                                     GBRF  
W1-95L                                  -  
W1                     

[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.9s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.9s finished


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


  0%|          | 0/250 [00:00<?, ?it/s]

#------------#
 Get Error(s) 
#------------#


100%|██████████| 250/250 [00:01<00:00, 192.13it/s]
 38%|███▊      | 19/50 [00:00<00:00, 183.01it/s]

#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#


100%|██████████| 50/50 [00:00<00:00, 160.59it/s]

#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                      DNM  MC-Oracle         ENET      KRidge  \
W1-95L                           0.324933   0.000000            -           -   
W1                               0.331775   0.000000            -           -   
W1-95R                           0.340192   0.000000            -           -   
M-95L                            0.627737   0.000000      306.297     305.098   
M                                0.651914   0.000000       309.06     308.169   
M-95R                            0.683217   0.000000      312.247     310.156   
N_Par                          775.000000   0.000000         1008           0   
Train_Time                      42.649558   2.710179  1.62076e+09     2.49863   
Test_Time/MC-Oracle_Test_Time    0.127335   1.000000  0.000585741  0.00332947   

                                     GBRF       DNN  
W1-95L                                  -         -  
W1 




# Summary of Point-Mass Regression Models

#### Training Model Facts

In [61]:
print(Summary_pred_Qual_models)
Summary_pred_Qual_models

                                      DNM  MC-Oracle         ENET      KRidge  \
W1-95L                           0.324933   0.000000            -           -   
W1                               0.331775   0.000000            -           -   
W1-95R                           0.340192   0.000000            -           -   
M-95L                            0.627737   0.000000      306.297     305.098   
M                                0.651914   0.000000       309.06     308.169   
M-95R                            0.683217   0.000000      312.247     310.156   
N_Par                          775.000000   0.000000         1008           0   
Train_Time                      42.649558   2.710179  1.62076e+09     2.49863   
Test_Time/MC-Oracle_Test_Time    0.127335   1.000000  0.000585741  0.00332947   

                                     GBRF       DNN  
W1-95L                                  -         -  
W1                                      -         -  
W1-95R                     

Unnamed: 0,DNM,MC-Oracle,ENET,KRidge,GBRF,DNN
W1-95L,0.324933,0.0,-,-,-,-
W1,0.331775,0.0,-,-,-,-
W1-95R,0.340192,0.0,-,-,-,-
M-95L,0.627737,0.0,306.297,305.098,305.547,3.01285
M,0.651914,0.0,309.06,308.169,305.769,3.0352
M-95R,0.683217,0.0,312.247,310.156,305.974,3.05998
N_Par,775.0,0.0,1008,0,9400,43
Train_Time,42.649558,2.710179,1.62076e+09,2.49863,2.25416,6.66467
Test_Time/MC-Oracle_Test_Time,0.127335,1.0,0.000585741,0.00332947,0.00127112,0.120127


#### Testing Model Facts

In [62]:
print(Summary_pred_Qual_models_test)
Summary_pred_Qual_models_test

                                      DNM  MC-Oracle          ENET  \
W1-95L                           0.202849   0.000000  1.054051e+02   
W1                               0.219712   0.000000  1.071263e+02   
W1-95R                           0.234712   0.000000  1.088801e+02   
M-95L                            0.191941   0.000000  3.062971e+02   
M                                0.195918   0.000000  3.090599e+02   
M-95R                            0.199908   0.000000  3.122472e+02   
N_Par                          775.000000   0.000000  1.008000e+03   
Train_Time                      42.649558   2.710179  1.620765e+09   
Test_Time/MC-Oracle_Test_Time    0.127335   1.000000  5.857405e-04   

                                   KRidge         GBRF        DNN  
W1-95L                         101.947442   103.187999   1.029138  
W1                             104.233336   105.006781   1.050255  
W1-95R                         106.222203   107.028121   1.066775  
M-95L                      

Unnamed: 0,DNM,MC-Oracle,ENET,KRidge,GBRF,DNN
W1-95L,0.202849,0.0,105.4051,101.947442,103.187999,1.029138
W1,0.219712,0.0,107.1263,104.233336,105.006781,1.050255
W1-95R,0.234712,0.0,108.8801,106.222203,107.028121,1.066775
M-95L,0.191941,0.0,306.2971,305.097881,305.547067,3.012848
M,0.195918,0.0,309.0599,308.168754,305.769298,3.035199
M-95R,0.199908,0.0,312.2472,310.156215,305.97393,3.059978
N_Par,775.0,0.0,1008.0,0.0,9400.0,43.0
Train_Time,42.649558,2.710179,1620765000.0,2.498629,2.254163,6.664672
Test_Time/MC-Oracle_Test_Time,0.127335,1.0,0.0005857405,0.003329,0.001271,0.120127


## 2) *Gaussian Benchmarks*

- Bencharm 1: [Gaussian Process Regressor](https://scikit-learn.org/stable/modules/gaussian_process.html)
- Benchmark 2: Deep Gaussian Networks:
These models train models which assume Gaussianity.  We may view these as models in $\mathcal{P}_2(\mathbb{R})$ via:
$$
\mathbb{R}^d \ni x \to (\hat{\mu}(x),\hat{\Sigma}(x)\hat{\Sigma}^{\top})\triangleq f(x) \in \mathbb{R}\times [0,\infty) \to 
(2\pi)^{-\frac{d}{2}}\det(\hat{\Sigma}(x))^{-\frac{1}{2}} \, e^{ -\frac{1}{2}(\cdot - \hat{\mu}(x))^{{{\!\mathsf{T}}}} \hat{\Sigma}(x)^{-1}(\cdot - \hat{\mu}(x)) } \mu \in \mathcal{G}_d\subset \mathcal{P}_2(\mathbb{R});
$$
where $\mathcal{G}_1$ is the set of Gaussian measures on $\mathbb{R}$ equipped with the relative Wasserstein-1 topology.

Examples of this type of architecture are especially prevalent in uncertainty quantification; see ([Deep Ensembles](https://arxiv.org/abs/1612.01474)] or [NOMU: Neural Optimization-based Model Uncertainty](https://arxiv.org/abs/2102.13640).  Moreover, their universality in $C(\mathbb{R}^d,\mathcal{G}_2)$ is known, and has been shown in [Corollary 4.7](https://arxiv.org/abs/2101.05390).

In [63]:
# %run Benchmarks_Model_Builder_Mean_Var.ipynb
exec(open('Benchmarks_Model_Builder_Mean_Var.py').read())

DNN Builder - Ready
Fitting 2 folds for each of 2 candidates, totalling 4 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   1 tasks      | elapsed:    0.8s
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:    0.8s remaining:    0.8s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    1.0s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    1.0s finished
100%|██████████| 250/250 [00:00<00:00, 1951.59it/s]

Infering Parameters for Deep Gaussian Network to train on!
Done Getting Parameters for Deep Gaussian Network!
Training Deep Gaussian Network!
Fitting 2 folds for each of 1 candidates, totalling 2 fits



[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.6s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.6s finished


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


  0%|          | 0/250 [00:00<?, ?it/s]

Training Deep Gaussian Network!: END
#---------------------------------------#
 Get Training Errors for: Gaussian Models
#---------------------------------------#


100%|██████████| 250/250 [00:02<00:00, 87.81it/s]
 22%|██▏       | 11/50 [00:00<00:00, 100.98it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#
#--------------------------------------#
 Get Testing Errors for: Gaussian Models
#--------------------------------------#


100%|██████████| 50/50 [00:00<00:00, 99.71it/s] 


#-------------------------#
 Get Training Error(s): END
#-------------------------#
-------------------------------------------------
Updating Performance Metrics Dataframe and Saved!
-------------------------------------------------
                                      DNM  MC-Oracle         ENET      KRidge  \
W1-95L                           0.324933   0.000000            -           -   
W1                               0.331775   0.000000            -           -   
W1-95R                           0.340192   0.000000            -           -   
M-95L                            0.627737   0.000000      306.297     305.098   
M                                0.651914   0.000000       309.06     308.169   
M-95R                            0.683217   0.000000      312.247     310.156   
N_Par                          775.000000   0.000000         1008           0   
Train_Time                      42.649558   2.710179  1.62076e+09     2.49863   
Test_Time/MC-Oracle_Test_Time    0.12

In [None]:
print("Prediction Quality (Updated): Test")
print(Summary_pred_Qual_models_test)
Summary_pred_Qual_models_test

In [None]:
print("Prediction Quality (Updated): Train")
print(Summary_pred_Qual_models)
Summary_pred_Qual_models

# 3) The natural Universal Benchmark: [Bishop's Mixture Density Network](https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf)

This implementation is as follows:
- For every $x$ in the trainingdata-set we fit a GMM $\hat{\nu}_x$, using the [Expectation-Maximization (EM) algorithm](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm), with the same number of centers as the deep neural model in $\mathcal{NN}_{1_{\mathbb{R}^d},\mathcal{D}}^{\sigma:\star}$ which we are evaluating.  
- A Mixture density network is then trained to predict the infered parameters; given any $x \in \mathbb{R}^d$.

In [None]:
if output_dim == 1:
    # %run Mixture_Density_Network.ipynb
    exec(open('Mixture_Density_Network.py').read())

## Get Final Outputs
Now we piece together all the numerical experiments and report a nice summary.

---
# Final Results
---

## Prasing Quality Metric Results

#### Finalizing Saving
**Note:** *We do it in two steps since the grid sometimes does not want to write nicely...*

In [None]:
## Write Performance Metrics
### Incase caption breaks
Summary_pred_Qual_models.to_latex((results_tables_path+"/Final_Results/"+"Performance_metrics_Problem_Type_"+str(f_unknown_mode)+"Problemdimension"+str(problem_dim)+"__SUMMARY_METRICS.tex"),
                                 float_format="{:0.3g}".format)
text_file = open((results_tables_path+"/Final_Results/"+"ZZZ_CAPTION_Performance_metrics_Problem_Type_"+str(f_unknown_mode)+"Problemdimension"+str(problem_dim)+"__SUMMARY_METRICS___CAPTION.tex"), "w")
text_file.write("Quality Metrics; d:"+str(problem_dim)+", D:"+str(output_dim)+", Depth:"+str(Depth_Bayesian_DNN)+", Width:"+str(width)+", Dropout rate:"+str(Dropout_rate)+".")
text_file.close()


### Incase caption does not break
Summary_pred_Qual_models.to_latex((results_tables_path+"/Final_Results/"+"Performance_metrics_Problem_Type_"+str(f_unknown_mode)+"Problemdimension"+str(problem_dim)+"__SUMMARY_METRICS.tex"),
                                 caption=("Quality Metrics; d:"+str(problem_dim)+", D:"+str(output_dim)+", Depth:"+str(Depth_Bayesian_DNN)+", Width:"+str(width)+", Dropout rate:"+str(Dropout_rate)+"."),
                                 float_format="{:0.3g}".format)

# For Terminal Runner(s):

In [None]:
# For Terminal Running
print("===================")
print("Predictive Quality:")
print("===================")
print(Summary_pred_Qual_models)
print("===================")
print(" ")
print(" ")
print(" ")
print("Kernel_Used_in_GPR: "+str(GPR_trash.kernel))
print("🙃🙃 Have a wonderful day! 🙃🙃")
Summary_pred_Qual_models

---
# Fin
---

---