# Deep Universal Regular Conditional Expectations:

---
This implements the universal deep neural model of $\mathcal{NN}_{1_{\mathbb{R}^n},\mathcal{D}}^{\sigma:\star}$ [Anastasis Kratsios](https://people.math.ethz.ch/~kratsioa/) - 2021.

---

## What does this code do?
1. Learn Heteroskedastic Non-Linear Regression Problem
     - $Y\sim f_{\text{unkown}}(x) + \epsilon$ where $f$ is an known function and $\epsilon\sim Laplace(0,\|x\|)$
2. Learn Random Bayesian Network's Law:
    - $Y = W_J Y^{J-1}, \qquad Y^{j}\triangleq \sigma\bullet A^{j}Y^{j-1} + b^{j}, \qquad Y^0\triangleq x$

3. In the above example if $A_j = M_j\odot \tilde{A_j}$ where $\tilde{A}_j$ is a deterministic matrix and $M_j$ is a "mask", that is, a random matrix with binary entries and $\odot$ is the Hadamard product then we recover the dropout framework.
4. Learn the probability distribution that the unique strong solution to the rough SDE with uniformly Lipschitz drivers driven by a factional Brownian motion with Hurst exponent $H \in [\frac1{2},1)$:
$$
X_t^x = x + \int_0^t \alpha(s,X_s^x)ds + \int_0^t \beta(s,X_s^x)dB_s^H
$$
belongs, at time $t=1$, to a ball about the initial point $x$ of random radius given by an independant exponential random-variable with shape parameter $\lambda=2$

In [25]:
# Load Packages/Modules
exec(open('Init_Dump.py').read())

#### Mode:
Software/Hardware Testing or Real-Deal?

In [26]:
trial_run = True

### Simulation Method:

In [27]:
# Random DNN
# f_unknown_mode = "Heteroskedastic_NonLinear_Regression"

# Random DNN internal noise
# f_unknown_mode = "DNN_with_Random_Weights"
Depth_Bayesian_DNN = 2
width = 2

# Random Dropout applied to trained DNN
# f_unknown_mode = "DNN_with_Bayesian_Dropout"
Dropout_rate = 0.25

# Rough SDE (time 1)
f_unknown_mode = "Rough_SDE"

## Problem Dimension

In [28]:
problem_dim = 2

## Note: *Why the procedure is so computationally efficient*?
---
 - The sample barycenters do not require us to solve for any new Wasserstein-1 Barycenters; which is much more computationally costly,
 - Our training procedure never back-propages through $\mathcal{W}_1$ since steps 2 and 3 are full-decoupled.  Therefore, training our deep classifier is (comparatively) cheap since it takes values in the standard $N$-simplex.

---

#### Rough SDE Meta-Parameters

In [29]:
# SDE with Rough Driver
N_Euler_Steps = 10**1
Hurst_Exponent = 0.01

def alpha(t,x):
    output_drift_update = t-x
    return output_drift_update

def beta(t,x):
    output_vol_update = (t+0.001)*np.diag(np.cos(x))
    return output_vol_update

#### Grid Hyperparameter(s)
- Ratio $\frac{\text{Testing Datasize}}{\text{Training Datasize}}$.
- Number of Training Points to Generate

In [30]:
train_test_ratio = .2
N_train_size = 10**1

Monte-Carlo Paramters

In [31]:
## Monte-Carlo
N_Monte_Carlo_Samples = 10**1

Initial radis of $\delta$-bounded random partition of $\mathcal{X}$!

In [32]:
# Hyper-parameters of Cover
delta = 0.01
Proportion_per_cluster = .01

# Run Main:

In [33]:
print("------------------------------")
print("Running script for main model!")
print("------------------------------")
# %run Universal_Measure_Valued_Networks_Backend.ipynb
exec(open('Universal_Measure_Valued_Networks_Backend.py').read())

print("------------------------------------")
print("Done: Running script for main model!")
print("------------------------------------")

100%|██████████| 2/2 [00:00<00:00, 56.02it/s]
 70%|███████   | 7/10 [00:00<00:00, 66.90it/s]

------------------------------
Running script for main model!
------------------------------
Deep Feature Builder - Ready
Deep Classifier - Ready


100%|██████████| 10/10 [00:00<00:00, 65.12it/s]
100%|██████████| 2/2 [00:00<00:00, 59.32it/s]
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


Deep Feature Builder - Ready
Deep Classifier - Ready
Training Classifer Portion of Type-A Model
Fitting 2 folds for each of 1 candidates, totalling 2 fits


[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.5s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    3.5s finished


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Training Classifer Portion of Type Model: Done!


  0%|          | 0/10 [00:00<?, ?it/s]

#--------------------#
 Get Training Error(s)
#--------------------#


100%|██████████| 10/10 [00:00<00:00, 432.23it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#
#----------------#
 Get Test Error(s)
#----------------#


100%|██████████| 2/2 [00:00<00:00, 585.84it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#
------------------------------------
Done: Running script for main model!
------------------------------------





---
# Run: All Benchmarks

## 1) *Pointmass Benchmark(s)*
These benchmarks consist of subsets of $C(\mathbb{R}^d,\mathbb{R})$ which we lift to models in $C(\mathbb{R}^d,\cap_{1\leq q<\infty}\mathscr{P}_{q}(\mathbb{R}))$ via:
$$
\mathbb{R}^d \ni x \to f(x) \to \delta_{f(x)}\in \cap_{1\leq q<\infty}\mathcal{P}_{q}(\mathbb{R}).
$$

In [34]:
exec(open('CV_Grid.py').read())
# Notebook Mode:
# %run Evaluation.ipynb
# %run Benchmarks_Model_Builder_Pointmass_Based.ipynb
# Terminal Mode (Default):
exec(open('Evaluation.py').read())
exec(open('Benchmarks_Model_Builder_Pointmass_Based.py').read())

Deep Feature Builder - Ready
--------------
Training: ENET
--------------


100%|██████████| 10/10 [00:00<00:00, 866.05it/s]
100%|██████████| 2/2 [00:00<00:00, 670.61it/s]
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Batch computation too fast (0.0363s.) Setting batch_size=2.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    0.1s finished
100%|██████████| 10/10 [00:00<00:00, 434.48it/s]
100%|██████████| 2/2 [00:00<00:00, 569.07it/s]

#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
-----------------
Training: K-Ridge
-----------------
Fitting 2 folds for each of 2 candidates, totalling 4 fits
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#



[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Batch computation too fast (0.1690s.) Setting batch_size=2.


#-----------------#
 Get Error(s): END 
#-----------------#
--------------
Training: GBRF
--------------
Fitting 2 folds for each of 1 candidates, totalling 2 fits


[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    0.2s finished
100%|██████████| 10/10 [00:00<00:00, 915.33it/s]
100%|██████████| 2/2 [00:00<00:00, 673.84it/s]
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
-------------
Training: DNN
-------------
Fitting 2 folds for each of 1 candidates, totalling 2 fits


[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    1.9s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    1.9s finished


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


100%|██████████| 10/10 [00:00<00:00, 600.14it/s]
100%|██████████| 2/2 [00:00<00:00, 466.45it/s]

#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
#------------#
 Get Error(s) 
#------------#
#-----------------#
 Get Error(s): END 
#-----------------#
-----------------------
Computing Error Metrics
-----------------------





# Summary of Point-Mass Regression Models

#### Training Model Facts

In [35]:
print(Summary_pred_Qual_models)
Summary_pred_Qual_models

          ENET    kRidge  GBRF      ffNN
W1         0.0  0.005669   0.0  1.090057
Mean       0.0  0.074936   0.0  1.044058
Var        0.0  0.000000   0.0  0.000000
Skewness   NaN       NaN   NaN       NaN
Ex_Kur     NaN       NaN   NaN       NaN


Unnamed: 0,ENET,kRidge,GBRF,ffNN
W1,0.0,0.005669,0.0,1.090057
Mean,0.0,0.074936,0.0,1.044058
Var,0.0,0.0,0.0,0.0
Skewness,,,,
Ex_Kur,,,,


#### Testing Model Facts

In [36]:
print(Summary_pred_Qual_models_test)
Summary_pred_Qual_models_test

          ENET    kRidge  GBRF      ffNN
W1         0.0  0.004458   0.0  1.083733
Mean       0.0  0.066470   0.0  1.041023
Var        0.0  0.000000   0.0  0.000000
Skewness   NaN       NaN   NaN       NaN
Ex_Kur     NaN       NaN   NaN       NaN


Unnamed: 0,ENET,kRidge,GBRF,ffNN
W1,0.0,0.004458,0.0,1.083733
Mean,0.0,0.06647,0.0,1.041023
Var,0.0,0.0,0.0,0.0
Skewness,,,,
Ex_Kur,,,,


#### Model Complexitie(s)

In [37]:
print(Summary_Complexity_models)
Summary_Complexity_models

        N_Params_Trainable  N_Params    T_Time  T_Test/T_test-MC
ENET                     4         4  8.540087          0.001699
GBRF                  1000      1000  0.372330          0.009013
kRidge                  10        10  0.165996          0.006692
ffNN                    81        81  3.580964          2.393632


Unnamed: 0,N_Params_Trainable,N_Params,T_Time,T_Test/T_test-MC
ENET,4,4,8.540087,0.001699
GBRF,1000,1000,0.37233,0.009013
kRidge,10,10,0.165996,0.006692
ffNN,81,81,3.580964,2.393632


## 2) *Gaussian Benchmarks*

- Bencharm 1: [Gaussian Process Regressor](https://scikit-learn.org/stable/modules/gaussian_process.html)
- Benchmark 2: Deep Gaussian Networks:
These models train models which assume Gaussianity.  We may view these as models in $\mathcal{P}_2(\mathbb{R})$ via:
$$
\mathbb{R}^d \ni x \to (\hat{\mu}(x),\hat{\sigma}(x))\triangleq f(x) \in \mathbb{R}\times [0,\infty) \to \frac1{\hat{\sigma}(x)\sqrt{2\pi}}\exp\left(\frac{-(\cdot-\hat{\mu}(x))^2}{\hat{\sigma(x)}^2}\right) \in \mathcal{G}_1\subset \mathcal{P}_2(\mathbb{R});
$$
where $\mathcal{G}_1$ is the set of Gaussian measures on $\mathbb{R}$ equipped with the relative Wasserstein-1 topology.

Examples of this type of architecture are especially prevalent in uncertainty quantification; see ([Deep Ensembles](https://arxiv.org/abs/1612.01474)] or [NOMU: Neural Optimization-based Model Uncertainty](https://arxiv.org/abs/2102.13640).  Moreover, their universality in $C(\mathbb{R}^d,\mathcal{G}_2)$ is known, and has been shown in [Corollary 4.7](https://arxiv.org/abs/2101.05390).

In [38]:
# %run Benchmarks_Model_Builder_Mean_Var.ipynb
exec(open('Benchmarks_Model_Builder_Mean_Var.py').read())

Deep Feature Builder - Ready
Deep Feature Builder - Ready
Fitting 2 folds for each of 2 candidates, totalling 4 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=4)]: Batch computation too fast (0.0966s.) Setting batch_size=2.
[Parallel(n_jobs=4)]: Done   2 out of   4 | elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    0.1s finished
  0%|          | 0/10 [00:00<?, ?it/s]

Infering Parameters for Deep Gaussian Network to train on!


100%|██████████| 10/10 [00:00<00:00, 14.29it/s]
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.


Done Getting Parameters for Deep Gaussian Network!
Training Deep Gaussian Network!
Fitting 2 folds for each of 1 candidates, totalling 2 fits


[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    1.4s remaining:    0.0s
[Parallel(n_jobs=4)]: Done   2 out of   2 | elapsed:    1.4s finished


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


  0%|          | 0/10 [00:00<?, ?it/s]

Training Deep Gaussian Network!: END
#---------------------------------------#
 Get Training Errors for: Gaussian Models
#---------------------------------------#


100%|██████████| 10/10 [00:00<00:00, 1138.46it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

#-------------------------#
 Get Training Error(s): END
#-------------------------#
#---------------------------------------#
 Get Testing Errors for: Gaussian Models
#---------------------------------------#


100%|██████████| 2/2 [00:00<00:00, 770.73it/s]

#------------------------#
 Get Testing Error(s): END
#------------------------#
-------------------------------------------------
Updating Performance Metrics Dataframe and Saved!
-------------------------------------------------
------------------------------------------------
Updated Performance Metrics Dataframe and Saved!
------------------------------------------------
--------------------------------------------
Computing and Updating Complexity Metrics...
--------------------------------------------
-----------------------------------------------
Updated Complexity Metrics Dataframe and Saved!
-----------------------------------------------





In [39]:
print("Prediction Quality (Updated)")
print(Summary_pred_Qual_models_test)
Summary_pred_Qual_models_test

Prediction Quality (Updated)
          ENET    kRidge  GBRF      ffNN           GPR       DGN
W1         0.0  0.004458   0.0  1.083733  6.777927e-06  1.772334
Mean       0.0  0.066470   0.0  1.041023  1.356143e-11  0.978058
Var        0.0  0.000000   0.0  0.000000  1.356143e-11  1.022186
Skewness   NaN       NaN   NaN       NaN  0.000000e+00  0.000000
Ex_Kur     NaN       NaN   NaN       NaN           NaN       NaN


Unnamed: 0,ENET,kRidge,GBRF,ffNN,GPR,DGN
W1,0.0,0.004458,0.0,1.083733,6.777927e-06,1.772334
Mean,0.0,0.06647,0.0,1.041023,1.356143e-11,0.978058
Var,0.0,0.0,0.0,0.0,1.356143e-11,1.022186
Skewness,,,,,0.0,0.0
Ex_Kur,,,,,,


In [40]:
print("Model Complexities Quality (Updated)")
print(Summary_Complexity_models)
Summary_Complexity_models

Model Complexities Quality (Updated)
        N_Params_Trainable  N_Params        T_Time  T_Test/T_test-MC
ENET                   4.0       4.0  8.540087e+00          0.001699
GBRF                1000.0    1000.0  3.723297e-01          0.009013
kRidge                10.0      10.0  1.659961e-01          0.006692
ffNN                  81.0      81.0  3.580964e+00          2.393632
GPR                    0.0       0.0  1.959026e-01          0.006910
DGN                   81.0      81.0  1.619552e+09          1.205454


Unnamed: 0,N_Params_Trainable,N_Params,T_Time,T_Test/T_test-MC
ENET,4.0,4.0,8.540087,0.001699
GBRF,1000.0,1000.0,0.3723297,0.009013
kRidge,10.0,10.0,0.1659961,0.006692
ffNN,81.0,81.0,3.580964,2.393632
GPR,0.0,0.0,0.1959026,0.00691
DGN,81.0,81.0,1619552000.0,1.205454


# 3) The natural Universal Benchmark: [Bishop's Mixture Density Network](https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf)

This implementation is as follows:
- For every $x$ in the trainingdata-set we fit a GMM $\hat{\nu}_x$, using the [Expectation-Maximization (EM) algorithm](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm), with the same number of centers as the deep neural model in $\mathcal{NN}_{1_{\mathbb{R}^d},\mathcal{D}}^{\sigma:\star}$ which we are evaluating.  
- A Mixture density network is then trained to predict the infered parameters; given any $x \in \mathbb{R}^d$.

In [None]:
# %run Mixture_Density_Network.ipynb
exec(open('Mixture_Density_Network.py').read())

  0%|          | 0/10 [00:00<?, ?it/s]

Preparing Training Outputs for MDNs using EM-Algorithm


 20%|██        | 2/10 [00:00<00:00, 11.11it/s]

## Get Final Outputs
Now we piece together all the numerical experiments and report a nice summary.

In [None]:
# %run WrapUp_Summarizer.ipynb
exec(open('WrapUp_Summarizer.py').read())

# Result(s)

## Model Complexities

In [None]:
Summary_Complexity_models

## Prediction Quality

#### Training

In [None]:
PredictivePerformance_Metrics_Train

#### Test

In [None]:
PredictivePerformance_Metrics_Test

# For Terminal Runner(s):

In [None]:
# For Terminal Running
print(" ")
print(" ")
print(" ")
print("====================================")
print("Model Complexity Predictive Quality:")
print("====================================")
print(" ")
print(" ")
print(" ")
print(Summary_Complexity_models)
print(" ")
print(" ")
print(" ")
print("============================")
print("Training Predictive Quality:")
print("============================")
print(PredictivePerformance_Metrics_Train)
print(" ")
print(" ")
print(" ")
print("===========================")
print("Testing Predictive Quality:")
print("===========================")
print(PredictivePerformance_Metrics_Test)
print("================================")
print(" ")
print(" ")
print(" ")
print("Kernel_Used_in_GPR: "+str(GPR_trash.kernel))
print("🙃🙃 Have a wonderful day! 🙃🙃")

---
# Fin
---

---