**<font size = 6>**
# This is the `Demo.ipynb` file

I will demonstrate how to use my classes and functions.

In [1]:
import random
import torch
import numpy as np
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
seed = 1
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)

**<font size = 4>**
# The `UCIDatasets` loader in `/utils/dataframe.py `

`reference: https://gist.github.com/martinferianc/db7615c85d5a3a71242b4916ea6a14a2`

Note that the output `train` or `test` is pytorch.Dataset

In [2]:
from utils.dataframe import UCIDatasets, datalist
print(datalist)
data = UCIDatasets("housing")
train = data.get_split( load="train") #pytorch.dataset
test = data.get_split( load="test")

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
train_loader = data.get_split_dataloader(load = "train", batch_size = 16) #pytorch.dataloader

['housing', 'concrete', 'energy', 'power', 'redwine', 'whitewine', 'yacht']


**<font size = 4>**
# The `imputer` in `model/imputer.py`.

I assemble several common methods, some detail can see my code and the documentary of sklearn.impute. 

I have set up some parameters. change the parameters via `par_setting( par_dict )` function.

The input dictionary should be like `__getParvalue__()`

In [3]:
from model.imputer import imputer, method_list
import numpy as np
print(method_list)
X = np.random.rand(100*20).reshape(100,20)
X[51:, 11:] = np.nan

impObject = imputer(X, method = 'mice')
impObject.train()
imp = impObject.imp
imp.transform(X).shape 
print('\n\nSet up parameters via impObject.par_setting( par_dict ) and retrain the model, \
    the par_dict should be like the following structure:\n {} \n'.format(impObject.get_Parvalue()))
print('for instance: impObject.par_setting( { \'max_iter\': 10 } ) ')
print('\nEach par corresponding to method is as the following:')
impObject.get_Parlist()
print('where SimpleImputer includes [\'mean\', \'median\', \'most_frequent\']')


['mean', 'median', 'most_frequent', 'mice', 'missForest', 'knn']


Set up parameters via impObject.par_setting( par_dict ) and retrain the model,     the par_dict should be like the following structure:
 {'missing_values': nan, 'max_iter': 10, 'random_state': 0, 'n_estimators': 100, 'n_neighbors': 3, 'metric': 'nan_euclidean'} 

for instance: impObject.par_setting( { 'max_iter': 10 } ) 

Each par corresponding to method is as the following:
where SimpleImputer includes ['mean', 'median', 'most_frequent']


**<font size = 4>**
# This part is for `MIWAE` in `/model/MIWAE.py` and `trainer` in `/utils/trainer.py`.

`http://proceedings.mlr.press/v97/mattei19a/mattei19a.pdf (ICML, 2019)`.

`MIWAE` is a pytorch model

Might use `trainer` to train it.

Loss function are like `loss(self, outdic, indic)` in which outdic and indic are input and output

For `MIWAE`, 

`indic = {'x': x , 'm': m }` where `x` is dataset and `m` is missing indicator.

`output = {'lpxz': lpxz , 'lqzx': lqzx , 'lpz': lpz }`  means `l`og loss for `p`(`x` | `z`), `q`(`z` | `x`), `p`(`z`)

In MIWAE loss `self.MIWAE_ELBO(outdic, indic = None)` the indic is not required.


In [4]:
from model.MIWAE import MIWAE
from utils.trainer import VAE_trainer
from utils.dataframe import UCIDatasets, datalist
data = UCIDatasets("whitewine")

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
train_loader = data.get_split_dataloader(load = "train", batch_size = 16) #pytorch.dataloader
test_loader = data.get_split_dataloader(load = "test", batch_size = 16) #pytorch.dataloader
model = MIWAE(data_dim = 20, n_samples=5, permutation_invariance=True)
trainer = VAE_trainer(model = model, train_loader = train_loader, test_loader = test_loader)
trainer.model_summary()

Es torch.Size([16, 20, 20])
Esx torch.Size([16, 20, 21])
Esxr torch.Size([320, 21])
h torch.Size([320, 20])
hr torch.Size([16, 20, 20])
hz torch.Size([16, 20, 20])
g torch.Size([16, 20])
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 20]             440
            Linear-2                  [-1, 100]           2,100
              Tanh-3                  [-1, 100]               0
            Linear-4                  [-1, 100]          10,100
              Tanh-5                  [-1, 100]               0
            Linear-6                   [-1, 50]           5,050
            Linear-7                   [-1, 50]           5,050
            Linear-8               [-1, 5, 100]           5,100
            Linear-9               [-1, 5, 100]          10,100
           Linear-10                [-1, 5, 20]           2,020
           Linear-11                [-1, 5, 

**<font size = 4>**
# A simple example

Here is a simple example using `VAE_trainer` class from `/utils/trainer.py` to train `VAE` on `MNIST`.

First, we may set some hyperparameters.

Then read the training data and validation data (`train_loader` structure) and put them into trainer.

The trainer will use `VAE_loss` automatically for `VAE` model.

I will use this trainer for all `VAE` like models in this project.

In [5]:
import os
import torch
import torch.utils.data
import random
import pathlib
import gc
import numpy as np
from torchvision import datasets, transforms
from model.VAE import VAE
from utils.trainer import VAE_trainer
from utils.experiment import fs_setup, check_training_file

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
seed = 1
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
######################################
# settings
batch_size=128
max_epochs=52
no_cuda = False
seed=1
log_interval=10
cuda = not no_cuda and torch.cuda.is_available()
torch.manual_seed(seed)
device = torch.device("cuda" if cuda else "cpu")
"""
Setup directory of Demo
"""
config = {'batch_size': batch_size }
experiment_dir = fs_setup('Demo', seed, config)
expr_file = experiment_dir / f'VAE.npz'
check_point = experiment_dir / f'VAE_ckpt.pth'
###############
# Read history loss file if exist
###############
train_file = check_training_file(expr_file)
history, start_epoch = train_file['history'], train_file['start_epoch']
if start_epoch >= max_epochs:
    print('skipping {} (seed={})   start_epoch({}), num_of_epoch({})'.format('Demo', seed, start_epoch, max_epochs))
del train_file
gc.collect()
###############
# set kwargs for loading checkpoint if exist
###############
train_kwargs = {
    'check_point': check_point, 'expr_file': expr_file, 'start_epoch': start_epoch, 'history': history,
    } 
# end settings
######################################
"""
Make data loader
"""
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=True, download=True,
                   transform=transforms.ToTensor()),
    batch_size=batch_size, shuffle=True, num_workers = 1, pin_memory = True )

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=False, transform=transforms.transforms.ToTensor()),
    batch_size=batch_size, shuffle=True, num_workers = 1, pin_memory = True)
"""
Train model
"""
model = VAE()
trainer = VAE_trainer(model = model, train_loader = train_loader, test_loader= test_loader, 
            **config, **train_kwargs,)
trainer.train(max_epochs = max_epochs)

====> Epoch: 51 Average loss: 101.6453
====> Test set loss: 101.7710
====> Epoch: 52 Average loss: 101.6796
====> Test set loss: 101.8767


**<font size = 4>**
# Reproduce the experiment in the paper notMIWAE

I will follow the setting in the paper https://arxiv.org/pdf/2006.12871.pdf

The experiment code is in `exp_imputation` (see `utils/experiment.py`)

Use `UCIDatasets` to process the data

Use `imputer` to train the `mean`, `MissForest`, `MICE`

Use `VAE_trainer` to train the `miwae`, `notmiwae`

Store check_points every 50 epochs, including `model.state_dict()` and `optimize.state_dict()`.

In [6]:
"""
Use the MIWAE and not-MIWAE on UCI data
"""
import random
import torch
import numpy as np
import os
#import sys
from utils.dataframe import UCIDatasets
from utils.experiment import exp_imputation
#sys.path.append(os.getcwd())
"""
Follow the amputation setting and data settings in https://github.com/nbip/notMIWAE/blob/master/task01.py
"""
# ---- data settings
name = 'whitewine'
n_hidden = 128
n_samples = 20
max_iter = 100000
batch_size = 16
impute_sample = 10000

###   the missing model   ###
# mprocess = 'linear'
# mprocess = 'selfmasking'
mprocess = 'selfmasking_known'

# ---- number of runs
runs = 1
RMSE_result = dict()
methods = ['miwae','notmiwae','mean','mice','RF']
for method in methods:
    RMSE_result[method] = []
"""
load data: white wine
"""
data = UCIDatasets(name=name)
N, D = data.N, data.D
dl = D - 1
optim_kwargs = {'lr': 0.0001, 'betas': (0.9, 0.999), 'eps': 1e-08 }
MIWAE_kwargs = {
    'data_dim': D, 'z_dim': dl, 'h_dim': n_hidden, 'n_samples': n_samples
    }
notMIWAE_kwargs = {
    'data_dim': D, 'z_dim': dl, 'h_dim': n_hidden, 'n_samples': n_samples, 'missing_process': mprocess
    }
data_kwargs = {
    'batch_size': batch_size
    }
imputer_par = {
    'missing_values': np.nan, 'max_iter': 10, 'random_state': 0, 'n_estimators': 100, 'n_neighbors': 3, 'metric': 'nan_euclidean'
    }
exp_kwargs = {
    'dataset':name, 'runs':runs, 'seed': seed, 'impute_sample': impute_sample,
}
config = {
    'exp_kwargs': exp_kwargs, 'optim_kwargs': optim_kwargs,
    'MIWAE_kwargs': MIWAE_kwargs, 'notMIWAE_kwargs': notMIWAE_kwargs,
    'data_kwargs': data_kwargs, 'imputer_par': imputer_par,
    }
RMSE_result = exp_imputation( 'exp_imputation', model_list = ['miwae', 'notmiwae'], config = config, num_of_epoch = max_iter)

print("RMSE_miwae = {0:.5f} +- {1:.5f}".format(np.mean(RMSE_result['miwae']), np.std(RMSE_result['miwae'])))
print("RMSE_notmiwae = {0:.5f} +- {1:.5f}".format(np.mean(RMSE_result['notmiwae']), np.std(RMSE_result['notmiwae'])))
print("RMSE_mean = {0:.5f} +- {1:.5f}".format(np.mean(RMSE_result['mean']), np.std(RMSE_result['mean'])))
print("RMSE_mice = {0:.5f} +- {1:.5f}".format(np.mean(RMSE_result['mice']), np.std(RMSE_result['mice'])))
print("RMSE_missForest = {0:.5f} +- {1:.5f}".format(np.mean(RMSE_result['RF']), np.std(RMSE_result['RF'])))

====> Epoch: 10151 Average loss: -22.1649
====> Test set loss: 23.5346
====> Epoch: 10152 Average loss: -22.5333
====> Test set loss: 21.5107
====> Epoch: 10153 Average loss: -20.6052
====> Test set loss: 22.6985
====> Epoch: 10154 Average loss: -21.4098
====> Test set loss: 23.0525
====> Epoch: 10155 Average loss: -22.1051
====> Test set loss: 22.0633
====> Epoch: 10156 Average loss: -21.6142
====> Test set loss: 22.1638
====> Epoch: 10157 Average loss: -21.9231
====> Test set loss: 23.2571
====> Epoch: 10158 Average loss: -22.0138
====> Test set loss: 23.8517
====> Epoch: 10159 Average loss: -22.2801
====> Test set loss: 22.8445
====> Epoch: 10160 Average loss: -21.5979
====> Test set loss: 23.6451
====> Epoch: 10161 Average loss: -21.9118
====> Test set loss: 23.1715
====> Epoch: 10162 Average loss: -21.7351
====> Test set loss: 22.5823
====> Epoch: 10163 Average loss: -19.3785
====> Test set loss: 18.5539
====> Epoch: 10164 Average loss: -20.6302
====> Test set loss: 22.7602
====> 

ValueError: Expected parameter loc (Tensor of shape (16, 10)) of distribution Normal(loc: torch.Size([16, 10]), scale: torch.Size([16, 10])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]],
       grad_fn=<AddmmBackward0>)

**<font size = 4>**
# Some issues here

I follow all settings from the paper, a big difference is I use pytorch and the original paper use tensorflow 1

The encoder seems has some problem (need to fix it)

The original paper use clip as activation function for std in the encoder.

Perhaps torch.clamp has a little difference I haven't noticed.