# Deep Universal Regular Conditional Expectations:

---
This implements the universal deep neural model of $\mathcal{NN}_{1_{\mathbb{R}^n},\mathcal{D}}^{\sigma:\star}$ [Anastasis Kratsios](https://people.math.ethz.ch/~kratsioa/) - 2021.

---

## What does this code do?
1. Learn Heteroskedastic Non-Linear Regression Problem
     - $Y\sim f_{\text{unkown}}(x) + \epsilon$ where $f$ is an known function and $\epsilon\sim Laplace(0,\|x\|)$
2. Learn Random Bayesian Network's Law:
    - $Y = W_J Y^{J-1}, \qquad Y^{j}\triangleq \sigma\bullet A^{j}Y^{j-1} + b^{j}, \qquad Y^0\triangleq x$

3. In the above example if $A_j = M_j\odot \tilde{A_j}$ where $\tilde{A}_j$ is a deterministic matrix and $M_j$ is a "mask", that is, a random matrix with binary entries and $\odot$ is the Hadamard product then we recover the dropout framework.
4. Learn the probability distribution that the unique strong solution to the rough SDE with uniformly Lipschitz drivers driven by a factional Brownian motion with Hurst exponent $H \in [\frac1{2},1)$:
$$
X_t^x = x + \int_0^t \alpha(s,X_s^x)ds + \int_0^t \beta(s,X_s^x)dB_s^H
$$
belongs, at time $t=1$, to a ball about the initial point $x$ of random radius given by an independant exponential random-variable with shape parameter $\lambda=2$
5. Train a DNN to predict the returns of bitcoin with GD.  Since this has random initialization then each prediction of a given $x$ is stochastic...We learn the distribution of this conditional RV (conditioned on x in the input space).
$$
Y_x \triangleq \hat{f}_{\theta_{T}}(x), \qquad \theta_{(t+1)}\triangleq \theta_{(t)} + \lambda \sum_{x \in \mathbb{X}} \nabla_{\theta}\|\hat{f}_{\theta_t}(x) - f(x)\|, \qquad \theta_0 \sim N_d(0,1);
$$
$T\in \mathbb{N}$ is a fixed number of "SGD" iterations (typically identified by cross-validation on a single SGD trajectory for a single initialization) and where $\theta \in \mathbb{R}^{(d_{J}+1)+\sum_{j=0}^{J-1} (d_{j+1}d_j + 1)}$ and $d_j$ is the dimension of the "bias" vector $b_j$ defining each layer of the DNN with layer dimensions:
$$
\hat{f}_{\theta}(x)\triangleq A^{(J)}x^{(J)} + b^{(J)},\qquad x^{(j+1)}\triangleq \sigma\bullet A^{j}x^{(j)} + b^{j},\qquad x^{(0)}\triangleq x
.
$$

#### Mode:
Software/Hardware Testing or Real-Deal?

In [10]:
trial_run = False

### Simulation Method:

In [2]:
# Random DNN
# f_unknown_mode = "Heteroskedastic_NonLinear_Regression"

# Random DNN internal noise
# f_unknown_mode = "DNN_with_Random_Weights"
Depth_Bayesian_DNN = 1
width = 5

# Random Dropout applied to trained DNN
# f_unknown_mode = "DNN_with_Bayesian_Dropout"
Dropout_rate = 0.1

# GD with Randomized Input
# f_unknown_mode = "GD_with_randomized_input"
GD_epochs = 2

# SDE with fractional Driver
f_unknown_mode = "Rough_SDE"
N_Euler_Steps = 10**2
Hurst_Exponent = 0.75

# f_unknown_mode = "Rough_SDE_Vanilla"
## Define Process' dynamics in (2) cell(s) below.

#### Vanilla fractional SDE:
If f_unknown_mode == "Rough_SDE_Vanilla" is selected, then we can specify the process's dynamics.  

In [None]:
#--------------------------#
# Define Process' Dynamics #
#--------------------------#
# Define DNN Applier
def f_unknown_drift_vanilla(x):
    x_internal = x.reshape(-1,)
    x_internal = drift_constant*x_internal
    return x_internal
def f_unknown_vol_vanilla(x):
    x_internal = volatility_constant*diag(problem_dim)
    return x_internal

## Problem Dimension

In [3]:
problem_dim = 1

## Note: *Why the procedure is so computationally efficient*?
---
 - The sample barycenters do not require us to solve for any new Wasserstein-1 Barycenters; which is much more computationally costly,
 - Our training procedure never back-propages through $\mathcal{W}_1$ since steps 2 and 3 are full-decoupled.  Therefore, training our deep classifier is (comparatively) cheap since it takes values in the standard $N$-simplex.

---

#### Grid Hyperparameter(s)
- Ratio $\frac{\text{Testing Datasize}}{\text{Training Datasize}}$.
- Number of Training Points to Generate

In [4]:
train_test_ratio = .1
N_train_size = 20

Monte-Carlo Paramters

In [5]:
## Monte-Carlo
N_Monte_Carlo_Samples = 10**3

Initial radis of $\delta$-bounded random partition of $\mathcal{X}$!

In [6]:
# Hyper-parameters of Cover
delta = 0.01
Proportion_per_cluster = .75

## Dependencies and Auxiliary Script(s)

In [7]:
# %run Loader.ipynb
exec(open('Loader.py').read())
# Load Packages/Modules
exec(open('Init_Dump.py').read())
import time as time #<- Note sure why...but its always seems to need 'its own special loading...'

Using TensorFlow backend.


Deep Feature Builder - Ready
Deep Classifier - Ready
Deep Feature Builder - Ready


# Simulate or Parse Data

In [8]:
# %run Data_Simulator_and_Parser.ipynb
exec(open('Data_Simulator_and_Parser.py').read())

  0%|          | 0/20 [00:00<?, ?it/s]

---------------------------------------
Beginning Data-Parsing/Simulation Phase
---------------------------------------
Deciding on Which Simulator/Parser To Load
Setting/Defining: Internal Parameters
Deciding on Which Type of Data to Get/Simulate
Simulating Output Data for given input data


100%|██████████| 20/20 [01:27<00:00,  4.40s/it]
100%|██████████| 2/2 [00:08<00:00,  4.40s/it]

----------------------------------
Done Data-Parsing/Simulation Phase
----------------------------------





# Run Main:

In [11]:
print("------------------------------")
print("Running script for main model!")
print("------------------------------")
# %run Universal_Measure_Valued_Networks_Backend.ipynb
exec(open('Universal_Measure_Valued_Networks_Backend.py').read())

print("------------------------------------")
print("Done: Running script for main model!")
print("------------------------------------")

------------------------------
Running script for main model!
------------------------------


100%|██████████| 1500/1500 [00:00<00:00, 8540.31it/s]
[Parallel(n_jobs=40)]: Using backend LokyBackend with 40 concurrent workers.


Deep Feature Builder - Ready
Deep Classifier - Ready
Training Classifer Portion of Type-A Model
Fitting 4 folds for each of 10 candidates, totalling 40 fits


[Parallel(n_jobs=40)]: Done   1 tasks      | elapsed:  4.4min
[Parallel(n_jobs=40)]: Done   6 out of  40 | elapsed: 16.3min remaining: 92.6min
[Parallel(n_jobs=40)]: Done  11 out of  40 | elapsed: 20.4min remaining: 53.8min
[Parallel(n_jobs=40)]: Done  16 out of  40 | elapsed: 22.0min remaining: 33.1min
[Parallel(n_jobs=40)]: Done  21 out of  40 | elapsed: 37.2min remaining: 33.7min
[Parallel(n_jobs=40)]: Done  26 out of  40 | elapsed: 40.7min remaining: 21.9min
[Parallel(n_jobs=40)]: Done  31 out of  40 | elapsed: 46.6min remaining: 13.5min
[Parallel(n_jobs=40)]: Done  36 out of  40 | elapsed: 55.6min remaining:  6.2min
[Parallel(n_jobs=40)]: Done  40 out of  40 | elapsed: 60.6min finished


Epoch 1/250
Epoch 2/250
Epoch 3/250
Epoch 4/250
Epoch 5/250
Epoch 6/250
Epoch 7/250
Epoch 8/250
Epoch 9/250
Epoch 10/250
Epoch 11/250
Epoch 12/250
Epoch 13/250
Epoch 14/250
Epoch 15/250
Epoch 16/250
Epoch 17/250
Epoch 18/250
Epoch 19/250
Epoch 20/250
Epoch 21/250
Epoch 22/250
Epoch 23/250
Epoch 24/250
Epoch 25/250
Epoch 26/250
Epoch 27/250
Epoch 28/250
Epoch 29/250
Epoch 30/250
Epoch 31/250
Epoch 32/250
Epoch 33/250
Epoch 34/250
Epoch 35/250
Epoch 36/250
Epoch 37/250
Epoch 38/250
Epoch 39/250
Epoch 40/250
Epoch 41/250
Epoch 42/250
Epoch 43/250
Epoch 44/250
Epoch 45/250
Epoch 46/250
Epoch 47/250
Epoch 48/250
Epoch 49/250
Epoch 50/250
Epoch 51/250
Epoch 52/250
Epoch 53/250
Epoch 54/250
Epoch 55/250
Epoch 56/250
Epoch 57/250
Epoch 58/250
Epoch 59/250
Epoch 60/250
Epoch 61/250
Epoch 62/250
Epoch 63/250
Epoch 64/250
Epoch 65/250
Epoch 66/250
Epoch 67/250
Epoch 68/250
Epoch 69/250
Epoch 70/250
Epoch 71/250
Epoch 72/250
Epoch 73/250
Epoch 74/250
Epoch 75/250
Epoch 76/250
Epoch 77/250
Epoch 78

Epoch 83/250
Epoch 84/250
Epoch 85/250
Epoch 86/250
Epoch 87/250
Epoch 88/250
Epoch 89/250
Epoch 90/250
Epoch 91/250
Epoch 92/250
Epoch 93/250
Epoch 94/250
Epoch 95/250
Epoch 96/250
Epoch 97/250
Epoch 98/250
Epoch 99/250
Epoch 100/250
Epoch 101/250
Epoch 102/250
Epoch 103/250
Epoch 104/250
Epoch 105/250
Epoch 106/250
Epoch 107/250
Epoch 108/250
Epoch 109/250
Epoch 110/250
Epoch 111/250
Epoch 112/250
Epoch 113/250
Epoch 114/250
Epoch 115/250
Epoch 116/250
Epoch 117/250
Epoch 118/250
Epoch 119/250
Epoch 120/250
Epoch 121/250
Epoch 122/250
Epoch 123/250
Epoch 124/250
Epoch 125/250
Epoch 126/250
Epoch 127/250
Epoch 128/250
Epoch 129/250
Epoch 130/250
Epoch 131/250
Epoch 132/250
Epoch 133/250
Epoch 134/250
Epoch 135/250
Epoch 136/250
Epoch 137/250
Epoch 138/250
Epoch 139/250
Epoch 140/250
Epoch 141/250
Epoch 142/250
Epoch 143/250
Epoch 144/250
Epoch 145/250
Epoch 146/250
Epoch 147/250
Epoch 148/250
Epoch 149/250
Epoch 150/250
Epoch 151/250
Epoch 152/250
Epoch 153/250
Epoch 154/250
Epoch 155

Epoch 164/250
Epoch 165/250
Epoch 166/250
Epoch 167/250
Epoch 168/250
Epoch 169/250
Epoch 170/250
Epoch 171/250
Epoch 172/250
Epoch 173/250
Epoch 174/250
Epoch 175/250
Epoch 176/250
Epoch 177/250
Epoch 178/250
Epoch 179/250
Epoch 180/250
Epoch 181/250
Epoch 182/250
Epoch 183/250
Epoch 184/250
Epoch 185/250
Epoch 186/250
Epoch 187/250
Epoch 188/250
Epoch 189/250
Epoch 190/250
Epoch 191/250
Epoch 192/250
Epoch 193/250
Epoch 194/250
Epoch 195/250
Epoch 196/250
Epoch 197/250
Epoch 198/250
Epoch 199/250
Epoch 200/250
Epoch 201/250
Epoch 202/250
Epoch 203/250
Epoch 204/250
Epoch 205/250
Epoch 206/250
Epoch 207/250
Epoch 208/250
Epoch 209/250
Epoch 210/250
Epoch 211/250
Epoch 212/250
Epoch 213/250
Epoch 214/250
Epoch 215/250
Epoch 216/250
Epoch 217/250
Epoch 218/250
Epoch 219/250
Epoch 220/250
Epoch 221/250
Epoch 222/250
Epoch 223/250
Epoch 224/250
Epoch 225/250
Epoch 226/250
Epoch 227/250
Epoch 228/250
Epoch 229/250
Epoch 230/250
Epoch 231/250
Epoch 232/250
Epoch 233/250
Epoch 234/250
Epoch 

Epoch 245/250
Epoch 246/250
Epoch 247/250
Epoch 248/250
Epoch 249/250
Epoch 250/250
Training Classifer Portion of Type Model: Done!


  0%|          | 0/2000 [00:00<?, ?it/s]

#--------------------#
 Get Training Error(s)
#--------------------#


100%|██████████| 2000/2000 [23:28<00:00,  1.42it/s]
  0%|          | 0/200 [00:00<?, ?it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#
#----------------#
 Get Test Error(s)
#----------------#


100%|██████████| 200/200 [06:33<00:00,  1.97s/it]

#------------------------#
 Get Testing Error(s): END
#------------------------#
                                         DNM  MC-Oracle
W1-95L                              0.161423   0.000000
W1                                  0.168691   0.000000
W1-95R                              0.176543   0.000000
M-95L                               0.396704   0.396386
M                                   0.404971   0.404971
M-95R                               0.413845   0.413161
N_Par                          923500.000000   0.000000
Train_Time                       5564.811233  96.727152
Test_Time/MC-Oracle_Test_Time       0.006567   1.000000
------------------------------------
Done: Running script for main model!
------------------------------------





---
# Run: All Benchmarks

## 1) *Pointmass Benchmark(s)*
These benchmarks consist of subsets of $C(\mathbb{R}^d,\mathbb{R})$ which we lift to models in $C(\mathbb{R}^d,\cap_{1\leq q<\infty}\mathscr{P}_{q}(\mathbb{R}))$ via:
$$
\mathbb{R}^d \ni x \to f(x) \to \delta_{f(x)}\in \cap_{1\leq q<\infty}\mathcal{P}_{q}(\mathbb{R}).
$$

In [12]:
exec(open('CV_Grid.py').read())
# Notebook Mode:
# %run Evaluation.ipynb
# %run Benchmarks_Model_Builder_Pointmass_Based.ipynb
# Terminal Mode (Default):
exec(open('Evaluation.py').read())
exec(open('Benchmarks_Model_Builder_Pointmass_Based.py').read())

Deep Feature Builder - Ready
--------------
Training: ENET
--------------


 20%|██        | 403/2000 [00:00<00:00, 4021.79it/s]

---------------------
Training: ENET - Done
---------------------
#------------#
 Get Error(s) 
#------------#


100%|██████████| 2000/2000 [00:00<00:00, 4110.28it/s]
100%|██████████| 200/200 [00:00<00:00, 3887.88it/s]

#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                         DNM  MC-Oracle          ENET
W1-95L                              0.161423   0.000000  1.092669e+06
W1                                  0.168691   0.000000  1.208410e+06
W1-95R                              0.176543   0.000000  1.331617e+06
M-95L                               0.396704   0.396386  9.694256e+02
M                                   0.404971   0.404971  1.022149e+03
M-95R                               0.413845   0.413161  1.078668e+03
N_Par                          923500.000000   0.000000  4.000000e+03
Train_Time                       5564.811233  96.727152  1.619966e+09
Test_Time/MC-Oracle_Test_Time       0.006567   1.000000  8.935642e-06
-----------------
Training: K-Ridge
-----------------
Fitting 4 folds for each of 10 candidates, totalling 40 fits



[Parallel(n_jobs=40)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=40)]: Done   1 tasks      | elapsed:    5.6s
[Parallel(n_jobs=40)]: Done   6 out of  40 | elapsed:    6.9s remaining:   39.2s
[Parallel(n_jobs=40)]: Done  11 out of  40 | elapsed:    8.5s remaining:   22.4s
[Parallel(n_jobs=40)]: Done  16 out of  40 | elapsed:   11.8s remaining:   17.8s
[Parallel(n_jobs=40)]: Done  21 out of  40 | elapsed:   12.6s remaining:   11.4s
[Parallel(n_jobs=40)]: Done  26 out of  40 | elapsed:   13.0s remaining:    7.0s
[Parallel(n_jobs=40)]: Done  31 out of  40 | elapsed:   13.4s remaining:    3.9s
[Parallel(n_jobs=40)]: Done  36 out of  40 | elapsed:   14.0s remaining:    1.6s
[Parallel(n_jobs=40)]: Done  40 out of  40 | elapsed:   15.7s finished
 10%|█         | 209/2000 [00:00<00:00, 2088.27it/s]

#------------#
 Get Error(s) 
#------------#


100%|██████████| 2000/2000 [00:00<00:00, 3271.60it/s]
100%|██████████| 200/200 [00:00<00:00, 2673.88it/s]


#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                         DNM  MC-Oracle          ENET  \
W1-95L                              0.161423   0.000000  1.092669e+06   
W1                                  0.168691   0.000000  1.208410e+06   
W1-95R                              0.176543   0.000000  1.331617e+06   
M-95L                               0.396704   0.396386  9.694256e+02   
M                                   0.404971   0.404971  1.022149e+03   
M-95R                               0.413845   0.413161  1.078668e+03   
N_Par                          923500.000000   0.000000  4.000000e+03   
Train_Time                       5564.811233  96.727152  1.619966e+09   
Test_Time/MC-Oracle_Test_Time       0.006567   1.000000  8.935642e-06   

                                     KRidge  
W1-95L                         1.085188e

[Parallel(n_jobs=40)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=40)]: Done   1 tasks      | elapsed:    6.3s
[Parallel(n_jobs=40)]: Done   6 out of  40 | elapsed:   12.3s remaining:  1.2min
[Parallel(n_jobs=40)]: Done  11 out of  40 | elapsed:   18.5s remaining:   48.8s
[Parallel(n_jobs=40)]: Done  16 out of  40 | elapsed:   26.7s remaining:   40.1s
[Parallel(n_jobs=40)]: Done  21 out of  40 | elapsed:   29.2s remaining:   26.4s
[Parallel(n_jobs=40)]: Done  26 out of  40 | elapsed:   30.1s remaining:   16.2s
[Parallel(n_jobs=40)]: Done  31 out of  40 | elapsed:   31.0s remaining:    9.0s
[Parallel(n_jobs=40)]: Done  36 out of  40 | elapsed:   31.6s remaining:    3.5s
[Parallel(n_jobs=40)]: Done  40 out of  40 | elapsed:   32.1s finished
 21%|██        | 423/2000 [00:00<00:00, 4223.73it/s]

#------------#
 Get Error(s) 
#------------#


100%|██████████| 2000/2000 [00:00<00:00, 4155.07it/s]
100%|██████████| 200/200 [00:00<00:00, 4098.32it/s]
[Parallel(n_jobs=40)]: Using backend LokyBackend with 40 concurrent workers.


#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                         DNM  MC-Oracle          ENET  \
W1-95L                              0.161423   0.000000  1.092669e+06   
W1                                  0.168691   0.000000  1.208410e+06   
W1-95R                              0.176543   0.000000  1.331617e+06   
M-95L                               0.396704   0.396386  9.694256e+02   
M                                   0.404971   0.404971  1.022149e+03   
M-95R                               0.413845   0.413161  1.078668e+03   
N_Par                          923500.000000   0.000000  4.000000e+03   
Train_Time                       5564.811233  96.727152  1.619966e+09   
Test_Time/MC-Oracle_Test_Time       0.006567   1.000000  8.935642e-06   

                                     KRidge          GBRF  
W1-95L                    

[Parallel(n_jobs=40)]: Done   1 tasks      | elapsed:  1.7min
[Parallel(n_jobs=40)]: Done   6 out of  40 | elapsed:  5.6min remaining: 31.6min
[Parallel(n_jobs=40)]: Done  11 out of  40 | elapsed:  9.2min remaining: 24.2min
[Parallel(n_jobs=40)]: Done  16 out of  40 | elapsed: 10.2min remaining: 15.3min
[Parallel(n_jobs=40)]: Done  21 out of  40 | elapsed: 14.7min remaining: 13.3min
[Parallel(n_jobs=40)]: Done  26 out of  40 | elapsed: 16.6min remaining:  8.9min
[Parallel(n_jobs=40)]: Done  31 out of  40 | elapsed: 17.3min remaining:  5.0min
[Parallel(n_jobs=40)]: Done  36 out of  40 | elapsed: 23.5min remaining:  2.6min
[Parallel(n_jobs=40)]: Done  40 out of  40 | elapsed: 23.9min finished


Epoch 1/250
Epoch 2/250
Epoch 3/250
Epoch 4/250
Epoch 5/250
Epoch 6/250
Epoch 7/250
Epoch 8/250
Epoch 9/250
Epoch 10/250
Epoch 11/250
Epoch 12/250
Epoch 13/250
Epoch 14/250
Epoch 15/250
Epoch 16/250
Epoch 17/250
Epoch 18/250
Epoch 19/250
Epoch 20/250
Epoch 21/250
Epoch 22/250
Epoch 23/250
Epoch 24/250
Epoch 25/250
Epoch 26/250
Epoch 27/250
Epoch 28/250
Epoch 29/250
Epoch 30/250
Epoch 31/250
Epoch 32/250
Epoch 33/250
Epoch 34/250
Epoch 35/250
Epoch 36/250
Epoch 37/250
Epoch 38/250
Epoch 39/250
Epoch 40/250
Epoch 41/250
Epoch 42/250
Epoch 43/250
Epoch 44/250
Epoch 45/250
Epoch 46/250
Epoch 47/250
Epoch 48/250
Epoch 49/250
Epoch 50/250
Epoch 51/250
Epoch 52/250
Epoch 53/250
Epoch 54/250
Epoch 55/250
Epoch 56/250
Epoch 57/250
Epoch 58/250
Epoch 59/250
Epoch 60/250
Epoch 61/250
Epoch 62/250


Epoch 63/250
Epoch 64/250
Epoch 65/250
Epoch 66/250
Epoch 67/250
Epoch 68/250
Epoch 69/250
Epoch 70/250
Epoch 71/250
Epoch 72/250
Epoch 73/250
Epoch 74/250
Epoch 75/250
Epoch 76/250
Epoch 77/250
Epoch 78/250
Epoch 79/250
Epoch 80/250
Epoch 81/250
Epoch 82/250
Epoch 83/250
Epoch 84/250
Epoch 85/250
Epoch 86/250
Epoch 87/250
Epoch 88/250
Epoch 89/250
Epoch 90/250
Epoch 91/250
Epoch 92/250
Epoch 93/250
Epoch 94/250
Epoch 95/250
Epoch 96/250
Epoch 97/250
Epoch 98/250
Epoch 99/250
Epoch 100/250
Epoch 101/250
Epoch 102/250
Epoch 103/250
Epoch 104/250
Epoch 105/250
Epoch 106/250
Epoch 107/250
Epoch 108/250
Epoch 109/250
Epoch 110/250
Epoch 111/250
Epoch 112/250
Epoch 113/250
Epoch 114/250
Epoch 115/250
Epoch 116/250
Epoch 117/250
Epoch 118/250
Epoch 119/250
Epoch 120/250
Epoch 121/250
Epoch 122/250
Epoch 123/250


Epoch 124/250
Epoch 125/250
Epoch 126/250
Epoch 127/250
Epoch 128/250
Epoch 129/250
Epoch 130/250
Epoch 131/250
Epoch 132/250
Epoch 133/250
Epoch 134/250
Epoch 135/250
Epoch 136/250
Epoch 137/250
Epoch 138/250
Epoch 139/250
Epoch 140/250
Epoch 141/250
Epoch 142/250
Epoch 143/250
Epoch 144/250
Epoch 145/250
Epoch 146/250
Epoch 147/250
Epoch 148/250
Epoch 149/250
Epoch 150/250
Epoch 151/250
Epoch 152/250
Epoch 153/250
Epoch 154/250
Epoch 155/250
Epoch 156/250
Epoch 157/250
Epoch 158/250
Epoch 159/250
Epoch 160/250
Epoch 161/250
Epoch 162/250
Epoch 163/250
Epoch 164/250
Epoch 165/250
Epoch 166/250
Epoch 167/250
Epoch 168/250
Epoch 169/250
Epoch 170/250
Epoch 171/250
Epoch 172/250
Epoch 173/250
Epoch 174/250
Epoch 175/250
Epoch 176/250
Epoch 177/250
Epoch 178/250
Epoch 179/250
Epoch 180/250
Epoch 181/250
Epoch 182/250
Epoch 183/250
Epoch 184/250
Epoch 185/250


Epoch 186/250
Epoch 187/250
Epoch 188/250
Epoch 189/250
Epoch 190/250
Epoch 191/250
Epoch 192/250
Epoch 193/250
Epoch 194/250
Epoch 195/250
Epoch 196/250
Epoch 197/250
Epoch 198/250
Epoch 199/250
Epoch 200/250
Epoch 201/250
Epoch 202/250
Epoch 203/250
Epoch 204/250
Epoch 205/250
Epoch 206/250
Epoch 207/250
Epoch 208/250
Epoch 209/250
Epoch 210/250
Epoch 211/250
Epoch 212/250
Epoch 213/250
Epoch 214/250
Epoch 215/250
Epoch 216/250
Epoch 217/250
Epoch 218/250
Epoch 219/250
Epoch 220/250
Epoch 221/250
Epoch 222/250
Epoch 223/250
Epoch 224/250
Epoch 225/250
Epoch 226/250
Epoch 227/250
Epoch 228/250
Epoch 229/250
Epoch 230/250
Epoch 231/250
Epoch 232/250
Epoch 233/250
Epoch 234/250
Epoch 235/250
Epoch 236/250
Epoch 237/250
Epoch 238/250
Epoch 239/250
Epoch 240/250
Epoch 241/250
Epoch 242/250
Epoch 243/250
Epoch 244/250
Epoch 245/250
Epoch 246/250
Epoch 247/250
Epoch 248/250
Epoch 249/250
Epoch 250/250


#------------#

  0%|          | 0/2000 [00:00<?, ?it/s]


 Get Error(s) 
#------------#


100%|██████████| 2000/2000 [00:00<00:00, 3369.79it/s]
100%|██████████| 200/200 [00:00<00:00, 3762.13it/s]


#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
Updated DataFrame
                                         DNM  MC-Oracle          ENET  \
W1-95L                              0.161423   0.000000  1.092669e+06   
W1                                  0.168691   0.000000  1.208410e+06   
W1-95R                              0.176543   0.000000  1.331617e+06   
M-95L                               0.396704   0.396386  9.694256e+02   
M                                   0.404971   0.404971  1.022149e+03   
M-95R                               0.413845   0.413161  1.078668e+03   
N_Par                          923500.000000   0.000000  4.000000e+03   
Train_Time                       5564.811233  96.727152  1.619966e+09   
Test_Time/MC-Oracle_Test_Time       0.006567   1.000000  8.935642e-06   

                                     KRidge          GBRF           DNN  
W1-95L      

# Summary of Point-Mass Regression Models

#### Training Model Facts

In [13]:
print(Summary_pred_Qual_models)
Summary_pred_Qual_models

                                         DNM  MC-Oracle          ENET  \
W1-95L                              0.000128   0.000000  1.018309e+06   
W1                                  0.000185   0.000000  1.045608e+06   
W1-95R                              0.000307   0.000000  1.075119e+06   
M-95L                               0.002600   0.002595  9.517059e+02   
M                                   0.002797   0.002797  9.668141e+02   
M-95R                               0.003027   0.003022  9.814793e+02   
N_Par                          923500.000000   0.000000  4.000000e+03   
Train_Time                       5564.811233  96.727152  1.619966e+09   
Test_Time/MC-Oracle_Test_Time       0.006567   1.000000  8.935642e-06   

                                     KRidge          GBRF           DNN  
W1-95L                         1.014195e+06  9.329218e+05  1.013288e+06  
W1                             1.045607e+06  9.368544e+05  1.044346e+06  
W1-95R                         1.072116e+06  9.

Unnamed: 0,DNM,MC-Oracle,ENET,KRidge,GBRF,DNN
W1-95L,0.000128,0.0,1018309.0,1014195.0,932921.8,1013288.0
W1,0.000185,0.0,1045608.0,1045607.0,936854.4,1044346.0
W1-95R,0.000307,0.0,1075119.0,1072116.0,940829.1,1074491.0
M-95L,0.0026,0.002595,951.7059,952.7161,964.6336,952.6977
M,0.002797,0.002797,966.8141,966.8135,966.8141,966.2117
M-95R,0.003027,0.003022,981.4793,981.6957,968.7405,981.2708
N_Par,923500.0,0.0,4000.0,0.0,1917000.0,322401.0
Train_Time,5564.811233,96.727152,1619966000.0,17.04852,35.74519,1493.983
Test_Time/MC-Oracle_Test_Time,0.006567,1.0,8.935642e-06,0.001655043,0.0009877405,0.005394637


#### Testing Model Facts

In [14]:
print(Summary_pred_Qual_models_test)
Summary_pred_Qual_models_test

                                         DNM  MC-Oracle          ENET  \
W1-95L                              0.161423   0.000000  1.092669e+06   
W1                                  0.168691   0.000000  1.208410e+06   
W1-95R                              0.176543   0.000000  1.331617e+06   
M-95L                               0.396704   0.396386  9.694256e+02   
M                                   0.404971   0.404971  1.022149e+03   
M-95R                               0.413845   0.413161  1.078668e+03   
N_Par                          923500.000000   0.000000  4.000000e+03   
Train_Time                       5564.811233  96.727152  1.619966e+09   
Test_Time/MC-Oracle_Test_Time       0.006567   1.000000  8.935642e-06   

                                     KRidge          GBRF           DNN  
W1-95L                         1.085188e+06  9.386122e+05  1.089221e+06  
W1                             1.208125e+06  9.531860e+05  1.204600e+06  
W1-95R                         1.323333e+06  9.

Unnamed: 0,DNM,MC-Oracle,ENET,KRidge,GBRF,DNN
W1-95L,0.161423,0.0,1092669.0,1085188.0,938612.2,1089221.0
W1,0.168691,0.0,1208410.0,1208125.0,953186.0,1204600.0
W1-95R,0.176543,0.0,1331617.0,1323333.0,968845.8,1327178.0
M-95L,0.396704,0.396386,969.4256,966.0472,966.6866,963.9629
M,0.404971,0.404971,1022.149,1021.983,974.6349,1020.487
M-95R,0.413845,0.413161,1078.668,1082.549,982.1129,1080.823
N_Par,923500.0,0.0,4000.0,0.0,1917000.0,322401.0
Train_Time,5564.811233,96.727152,1619966000.0,17.04852,35.74519,1493.983
Test_Time/MC-Oracle_Test_Time,0.006567,1.0,8.935642e-06,0.001655043,0.0009877405,0.005394637


## 2) *Gaussian Benchmarks*

- Bencharm 1: [Gaussian Process Regressor](https://scikit-learn.org/stable/modules/gaussian_process.html)
- Benchmark 2: Deep Gaussian Networks:
These models train models which assume Gaussianity.  We may view these as models in $\mathcal{P}_2(\mathbb{R})$ via:
$$
\mathbb{R}^d \ni x \to (\hat{\mu}(x),\hat{\Sigma}(x)\hat{\Sigma}^{\top})\triangleq f(x) \in \mathbb{R}\times [0,\infty) \to 
(2\pi)^{-\frac{d}{2}}\det(\hat{\Sigma}(x))^{-\frac{1}{2}} \, e^{ -\frac{1}{2}(\cdot - \hat{\mu}(x))^{{{\!\mathsf{T}}}} \hat{\Sigma}(x)^{-1}(\cdot - \hat{\mu}(x)) } \mu \in \mathcal{G}_d\subset \mathcal{P}_2(\mathbb{R});
$$
where $\mathcal{G}_1$ is the set of Gaussian measures on $\mathbb{R}$ equipped with the relative Wasserstein-1 topology.

Examples of this type of architecture are especially prevalent in uncertainty quantification; see ([Deep Ensembles](https://arxiv.org/abs/1612.01474)] or [NOMU: Neural Optimization-based Model Uncertainty](https://arxiv.org/abs/2102.13640).  Moreover, their universality in $C(\mathbb{R}^d,\mathcal{G}_2)$ is known, and has been shown in [Corollary 4.7](https://arxiv.org/abs/2101.05390).

In [None]:
# %run Benchmarks_Model_Builder_Mean_Var.ipynb
exec(open('Benchmarks_Model_Builder_Mean_Var.py').read())

Deep Feature Builder - Ready
Fitting 2 folds for each of 10 candidates, totalling 20 fits


[Parallel(n_jobs=40)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=40)]: Done   4 out of  20 | elapsed:   45.8s remaining:  3.1min
[Parallel(n_jobs=40)]: Done   7 out of  20 | elapsed:  2.1min remaining:  3.9min
[Parallel(n_jobs=40)]: Done  10 out of  20 | elapsed:  5.1min remaining:  5.1min
[Parallel(n_jobs=40)]: Done  13 out of  20 | elapsed:  5.8min remaining:  3.1min
[Parallel(n_jobs=40)]: Done  16 out of  20 | elapsed:  7.2min remaining:  1.8min
[Parallel(n_jobs=40)]: Done  20 out of  20 | elapsed:  7.7min finished
  0%|          | 2/2000 [00:00<02:07, 15.66it/s]

Infering Parameters for Deep Gaussian Network to train on!


100%|██████████| 2000/2000 [00:46<00:00, 43.23it/s]

Done Getting Parameters for Deep Gaussian Network!
Training Deep Gaussian Network!
Fitting 4 folds for each of 10 candidates, totalling 40 fits



[Parallel(n_jobs=40)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=40)]: Done   1 tasks      | elapsed:  2.3min


In [None]:
print("Prediction Quality (Updated): Test")
print(Summary_pred_Qual_models_test)
Summary_pred_Qual_models_test

In [None]:
print("Prediction Quality (Updated): Train")
print(Summary_pred_Qual_models)
Summary_pred_Qual_models

# 3) The natural Universal Benchmark: [Bishop's Mixture Density Network](https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf)

This implementation is as follows:
- For every $x$ in the trainingdata-set we fit a GMM $\hat{\nu}_x$, using the [Expectation-Maximization (EM) algorithm](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm), with the same number of centers as the deep neural model in $\mathcal{NN}_{1_{\mathbb{R}^d},\mathcal{D}}^{\sigma:\star}$ which we are evaluating.  
- A Mixture density network is then trained to predict the infered parameters; given any $x \in \mathbb{R}^d$.

In [None]:
if output_dim == 1:
    # %run Mixture_Density_Network.ipynb
    exec(open('Mixture_Density_Network.py').read())

## Get Final Outputs
Now we piece together all the numerical experiments and report a nice summary.

# Result(s)

## Prediction Quality

#### Training

In [None]:
print("Final Training-Set Result(s)")
Summary_pred_Qual_models

#### Test

In [None]:
print("Final Test-Set Result(s)")
Summary_pred_Qual_models_test

# For Terminal Runner(s):

In [None]:
# For Terminal Running
print("============================")
print("Training Predictive Quality:")
print("============================")
print(Summary_pred_Qual_models)
print(" ")
print(" ")
print(" ")
print("===========================")
print("Testing Predictive Quality:")
print("===========================")
print(Summary_pred_Qual_models_test)
print("================================")
print(" ")
print(" ")
print(" ")
print("Kernel_Used_in_GPR: "+str(GPR_trash.kernel))
print("🙃🙃 Have a wonderful day! 🙃🙃")

---
# Fin
---

---