# Tutorial 5 : GNP [WIP]

Last Update : 22 June 2019

**Aim**: 


In [1]:
N_THREADS = 8
# Nota Bene : notebooks don't deallocate GPU memory
IS_FORCE_CPU = False # can also be set in the trainer

## Environment

In [2]:
cd ..

/master


In [3]:
%autosave 600
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# CENTER PLOTS
from IPython.core.display import HTML
display(HTML(""" <style> .output_png {display: table-cell; text-align: center; margin:auto; }
.prompt display:none;}  </style>"""))

import os
if IS_FORCE_CPU:
    os.environ['CUDA_VISIBLE_DEVICES'] = ""
    
import sys
sys.path.append("notebooks")

import numpy as np
import matplotlib.pyplot as plt
import torch
torch.set_num_threads(N_THREADS)

Autosaving every 600 seconds


## Dataset

The dataset we will be using are simple functions sampled from different Gaussian kernel. See [Tutorial 1 - Conditional Neural Process] for more details.

[Tutorial 1 - Conditional Neural Process]: Tutorial%201%20-%20Conditional%20Neural%20Process.ipynb

In [4]:
from utils.visualize import plot_posterior_samples, plot_prior_samples, plot_dataset_samples
from ntbks_helpers import get_gp_datasets # defined in first tutorial (CNP)

X_DIM = 1  # 1D spatial input
Y_DIM = 1  # 1D regression
N_POINTS = 128
N_SAMPLES = 100000 # this is a lot and can work with less
datasets = get_gp_datasets(n_samples=N_SAMPLES, n_points=N_POINTS)

## Model


In [5]:
from neuralproc import GlobalNeuralProcess, merge_flat_input, discard_ith_arg
from ntbks_helpers import CNP_KWARGS # defined in first tutorial (CNP)
from neuralproc.utils.datasplit import CntxtTrgtGetter, GetRandomIndcs
from neuralproc.predefined import CNN, MLP
from neuralproc.utils.helpers import change_param

get_cntxt_trgt = CntxtTrgtGetter(contexts_getter=GetRandomIndcs(min_n_indcs=0.05, max_n_indcs=.5),
                                 targets_getter=GetRandomIndcs(min_n_indcs=0.05, max_n_indcs=.5),
                                 is_add_cntxts_to_trgts=False)  # don't context points to tagrtes

gnp_kwargs = dict(r_dim=64, 
                  get_cntxt_trgt=get_cntxt_trgt,
                  #keys_to_tmp_attn="weighted_dist",
                 TmpSelfAttn=change_param(CNN,
                              Conv=torch.nn.Conv1d,
                              n_layers=5,
                              is_depth_separable=True,
                              Normalization=torch.nn.BatchNorm1d,
                              is_chan_last=True,
                              kernel_size=7),
                 #tmp_to_queries_attn="weighted_dist",
                  #XEncoder=torch.nn.Linear,
                  #XYEncoder=discard_ith_arg(MLP, i=0),
                  #x_transf_dim=None,
                 #is_use_x=False
                 )

# initialize one model for each dataset
data_models = {name: (GlobalNeuralProcess(X_DIM, Y_DIM, **gnp_kwargs), data) 
                   for name, data in datasets.items()}

### N Param

Number of parameters (note that I did not play around with this much, this depends a lot on the representation size):

In [6]:
from utils.helpers import count_parameters

In [7]:
for k, (neural_proc, dataset) in data_models.items():
    print("N Param:", count_parameters(neural_proc))
    break

N Param: 28936


Using `"transformer"` attention increases the number of parameters, but using a deterministic path as well a smaller representation seize decreases the number of parameters.

## Training

In [8]:
import torch 
torch.autograd.set_detect_anomaly(True)

<torch.autograd.anomaly_mode.set_detect_anomaly at 0x7fde1ce05518>

In [9]:
torch.nn.functional.softplus(torch.tensor([-10.]))

tensor([4.5399e-05])

In [None]:
from ntbks_helpers import train_all_models_

train_all_models_(data_models, "results/notebooks/neural_process/gnp",
                  is_retrain=True) # if false load precomputed


--- Training rbf ---



HBox(children=(IntProgress(value=0, max=1563), HTML(value='')))

  epoch    train_loss    cp       dur
-------  ------------  ----  --------
      1        [36m0.4660[0m     +  449.2805


HBox(children=(IntProgress(value=0, max=1563), HTML(value='')))

The model converges extremely quickly ($\sim 15$ epochs)  but already has very good predictions after $\sim 5$ epochs.

# Inference

## Trained Prior

In [None]:
EXTRAP_DISTANCE = 1.5  # add 1.5 to the right for extrapolation
INTERPOLATION_RANGE = dataset.min_max
EXTRAPOLATION_RANGE = (dataset.min_max[0], dataset.min_max[1]+EXTRAP_DISTANCE )

In [None]:
for k,(neural_proc, dataset) in data_models.items():
    plot_prior_samples(neural_proc, 
                       title="Trained Prior Samples : {}".format(k), 
                       test_min_max=EXTRAPOLATION_RANGE, 
                       train_min_max=INTERPOLATION_RANGE)

# Posterior

In [None]:
for k,(neural_proc, dataset) in data_models.items():
    break

In [None]:
from neuralproc.utils.helpers import rescale_range

for k,(neural_proc, dataset) in data_models.items():
    extrap_rescaled_range = tuple(rescale_range(np.array(EXTRAPOLATION_RANGE), (-2,2), (-1,1)))
    neural_proc.extend_tmp_queries(extrap_rescaled_range)

In [None]:
N_CNTXT = 10
for k,(neural_proc, dataset) in data_models.items():
    plot_posterior_samples(dataset, neural_proc, 
                           n_cntxt=N_CNTXT, 
                           test_min_max=EXTRAPOLATION_RANGE, 
                           n_points=3*N_POINTS,
                           n_samples=1,
                           title="Posterior Samples Conditioned on {} Context Points : {}".format(N_CNTXT, k))

In [None]:
N_CNTXT = 2
for k,(neural_proc, dataset) in data_models.items():
    plot_posterior_samples(dataset, neural_proc, 
                           n_cntxt=N_CNTXT, 
                           test_min_max=EXTRAPOLATION_RANGE, 
                           n_points=2*N_POINTS,
                           n_samples=1,
                           title="Posterior Samples Conditioned on {} Context Points : {}".format(N_CNTXT, k))

In [None]:
N_CNTXT = 20
for k,(neural_proc, dataset) in data_models.items():
    plot_posterior_samples(dataset, neural_proc, 
                           n_cntxt=N_CNTXT, 
                           test_min_max=EXTRAPOLATION_RANGE, 
                           n_points=2*N_POINTS,
                           n_samples=1,
                           title="Posterior Samples Conditioned on {} Context Points : {}".format(N_CNTXT, k))

In [None]:
N_CNTXT = 1
for k,(neural_proc, dataset) in data_models.items():
    plot_posterior_samples(dataset, neural_proc, 
                           n_cntxt=N_CNTXT, 
                           test_min_max=EXTRAPOLATION_RANGE, 
                           n_points=2*N_POINTS,
                           n_samples=1,
                           title="Posterior Samples Conditioned on {} Context Points : {}".format(N_CNTXT, k))

We see that the predictions are much better than in [Tutorial 2 - Neural Process]

**Good**:
- often close to GP with the correct kernel
- the uncertainty decreases close to context points
- no more underfitting : the sampled function all go through or close to the context points
- does all of this with "only" 50k param (and I did not try to go below).
- very good results after $\sim 5$ epochs

**Bad**:
- there seems be some strange "jumps" in regions far from points. This is probably due to the softmax in cross attention, indicating that a head attends to a new point. This makes the model less smooth than GP, but could probably be solved using self attention or a larger model. 
- cannot extrapolate
- still not as smooth as GP
- not good at periodicity

[Tutorial 2 - Neural Process]: Tutorial%202%20-%20Neural%20Process.ipynb