# VIME Tutorial

### VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

- Paper: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, 
  "VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," 
  Neural Information Processing Systems (NeurIPS), 2020.

- Paper link: TBD

- Last updated Date: October 11th 2020

- Code author: Jinsung Yoon (jsyoon0823@gmail.com)

This notebook describes the user-guide of self- and semi-supervised learning for tabular domain using MNIST database.

### Prerequisite
Clone https://github.com/jsyoon0823/VIME.git to the current directory.

### Necessary packages and functions call

- data_loader: MNIST dataset loading and preprocessing
- supervised_models: supervised learning models (Logistic regression, XGBoost, and Multi-layer Perceptron)

- vime_self: Self-supervised learning part of VIME framework
- vime_semi: Semi-supervised learning part of VIME framework
- vime_utils: Some utility functions for VIME framework

In [None]:
import numpy as np
import os
import warnings
warnings.filterwarnings("ignore")
  
from data_loader import load_mnist_data
from supervised_models import logit, xgb_model, mlp

from vime_utils import perf_metric

### Set the parameters and define output

-   label_no: Number of labeled data to be used
-   model_sets: supervised model set (mlp, logit, or xgboost)
-   p_m: corruption probability for self-supervised learning
-   alpha: hyper-parameter to control the weights of feature and mask losses
-   K: number of augmented samples
-   beta: hyperparameter to control supervised and unsupervised loss
-   label_data_rate: ratio of labeled data
-   metric: prediction performance metric (either acc or auc)

In [None]:
# Experimental parameters
label_no = 1000  
model_sets = ['logit','xgboost','mlp']
  
# Hyper-parameters
p_m = 0.3
alpha = 2.0
K = 3
beta = 1.0
label_data_rate = 0.1

# Metric
metric = 'acc'
  
# Define output
results = np.zeros([len(model_sets)+2])  

### Load data

Load original MNIST dataset and preprocess the loaded data.
- Only select the subset of data as the labeled data

In [None]:
# Load data
x_train, y_train, x_unlab, x_test, y_test = load_mnist_data(label_data_rate)
    
# Use subset of labeled data
x_train = x_train[:label_no, :]
y_train = y_train[:label_no, :]  

### Train supervised models

- Train 3 supervised learning models (Logistic regression, XGBoost, MLP)
- Save the performances of each supervised model.

In [None]:
# Logistic regression
y_test_hat = logit(x_train, y_train, x_test)
results[0] = perf_metric(metric, y_test, y_test_hat) 

# XGBoost
y_test_hat = xgb_model(x_train, y_train, x_test)    
results[1] = perf_metric(metric, y_test, y_test_hat)   

# MLP
mlp_parameters = dict()
mlp_parameters['hidden_dim'] = 100
mlp_parameters['epochs'] = 100
mlp_parameters['activation'] = 'relu'
mlp_parameters['batch_size'] = 100
      
y_test_hat = mlp(x_train, y_train, x_test, mlp_parameters)
results[2] = perf_metric(metric, y_test, y_test_hat)

# Report performance
for m_it in range(len(model_sets)):  
    
  model_name = model_sets[m_it]  
    
  print('Supervised Performance, Model Name: ' + model_name + 
        ', Performance: ' + str(results[m_it]))

### Train & Test VIME-Self
Train self-supervised part of VIME framework only
- Check the performance of self-supervised part of VIME framework.

In [None]:
"""VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (VIME) Codebase.

Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, 
"VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," 
Neural Information Processing Systems (NeurIPS), 2020.
Paper link: TBD
Last updated Date: October 11th 2020
Code author: Jinsung Yoon (jsyoon0823@gmail.com)
-----------------------------

vime_self.py
- Self-supervised learning parts of the VIME framework
- Using unlabeled data to train the encoder
"""

# Necessary packages
from keras.layers import Input, Dense
from keras.models import Model
from keras import models

from vime_utils import mask_generator, pretext_generator


def vime_self (x_unlab, p_m, alpha, parameters):
  """Self-supervised learning part in VIME.
  
  Args:
    x_unlab: unlabeled feature
    p_m: corruption probability
    alpha: hyper-parameter to control the weights of feature and mask losses
    parameters: epochs, batch_size
    
  Returns:
    encoder: Representation learning block
  """
    
  # Parameters
  _, dim = x_unlab.shape
  epochs = parameters['epochs']
  batch_size = parameters['batch_size']
  
  # Build model  
  inputs = Input(shape=(dim,))
  # Encoder  
  h = Dense(int(dim), activation='relu')(inputs)  
  # Mask estimator
  output_1 = Dense(dim, activation='sigmoid', name = 'mask')(h)  
  # Feature estimator
  output_2 = Dense(dim, activation='sigmoid', name = 'feature')(h)
  
  model = Model(inputs = inputs, outputs = [output_1, output_2])
  
  model.compile(optimizer='rmsprop',
                loss={'mask': 'binary_crossentropy', 
                      'feature': 'mean_squared_error'},
                loss_weights={'mask':1.0, 'feature':alpha})
  
  # Generate corrupted samples
  m_unlab = mask_generator(p_m, x_unlab)
  m_label, x_tilde = pretext_generator(m_unlab, x_unlab)
  
  # Fit model on unlabeled data
  model.fit(x_tilde, {'mask': m_label, 'feature': x_unlab}, 
            epochs = epochs, batch_size= batch_size)
      
  # Extract encoder part
  layer_name = model.layers[1].name
  layer_output = model.get_layer(layer_name).output
  encoder = models.Model(inputs=model.input, outputs=layer_output)
  
  return encoder


In [None]:
# Train VIME-Self
vime_self_parameters = dict()
vime_self_parameters['batch_size'] = 128
vime_self_parameters['epochs'] = 10
vime_self_encoder = vime_self(x_unlab, p_m, alpha, vime_self_parameters)
  
# Save encoder
if not os.path.exists('save_model'):
  os.makedirs('save_model')

file_name = './save_model/encoder_model.h5'
  
vime_self_encoder.save(file_name)  
        
# Test VIME-Self
x_train_hat = vime_self_encoder.predict(x_train)
x_test_hat = vime_self_encoder.predict(x_test)
      
y_test_hat = mlp(x_train_hat, y_train, x_test_hat, mlp_parameters)
results[3] = perf_metric(metric, y_test, y_test_hat)
    
print('VIME-Self Performance: ' + str(results[3]))

### Train & Test VIME

Train semi-supervised part of VIME framework on top of trained self-supervised encoder
- Check the performance of entire part of VIME framework.

In [None]:
"""VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (VIME) Codebase.

Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, 
"VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," 
Neural Information Processing Systems (NeurIPS), 2020.
Paper link: TBD
Last updated Date: October 11th 2020
Code author: Jinsung Yoon (jsyoon0823@gmail.com)
-----------------------------

vime_semi.py
- Semi-supervised learning parts of the VIME framework
- Using both labeled and unlabeled data to train the predictor with the help of trained encoder
"""

# Necessary packages
import keras
import numpy as np
import tensorflow as tf
# from tensorflow.contrib import layers as contrib_layers
# if using tensorflow 2.0, use the following
from tensorflow.keras import layers as contrib_layers

from vime_utils import mask_generator, pretext_generator


def vime_semi(x_train, y_train, x_unlab, x_test, parameters, 
              p_m, K, beta, file_name):
  """Semi-supervied learning part in VIME.
  
  Args:
    - x_train, y_train: training dataset
    - x_unlab: unlabeled dataset
    - x_test: testing features
    - parameters: network parameters (hidden_dim, batch_size, iterations)
    - p_m: corruption probability
    - K: number of augmented samples
    - beta: hyperparameter to control supervised and unsupervised loss
    - file_name: saved filed name for the encoder function
    
  Returns:
    - y_test_hat: prediction on x_test
  """
      
  # Network parameters
  hidden_dim = parameters['hidden_dim']
  act_fn = tf.nn.relu
  batch_size = parameters['batch_size']
  iterations = parameters['iterations']

  # Basic parameters
  data_dim = len(x_train[0, :])
  label_dim = len(y_train[0, :])
  
  # Divide training and validation sets (9:1)
  idx = np.random.permutation(len(x_train[:, 0]))
  train_idx = idx[:int(len(idx)*0.9)]
  valid_idx = idx[int(len(idx)*0.9):]
  
  x_valid = x_train[valid_idx, :]
  y_valid = y_train[valid_idx, :]
  
  x_train = x_train[train_idx, :]
  y_train = y_train[train_idx, :]  

  # Input placeholder
  # Labeled data
  #x_input = tf.placeholder(tf.float32, [None, data_dim])
  #y_input = tf.placeholder(tf.float32, [None, label_dim])
  
  # Augmented unlabeled data
  #xu_input = tf.placeholder(tf.float32, [None, None, data_dim])

  # in tensorflow 2.0, use the following
  x_input = tf.keras.Input(shape=(data_dim,))
  y_input = tf.keras.Input(shape=(label_dim,))
  xu_input = tf.keras.Input(shape=(None, data_dim))
  
  ## Predictor
  def predictor(x_input):
    """Returns prediction.
    
    Args: 
      - x_input: input feature
      
    Returns:
      - y_hat_logit: logit prediction
      - y_hat: prediction
    """

    with tf.name_scope('predictor'):     
      # Stacks multi-layered perceptron
      #inter_layer = contrib_layers.fully_connected(x_input, 
      #                                             hidden_dim, 
       #                                            activation_fn=act_fn)
      #inter_layer = contrib_layers.fully_connected(inter_layer, 
        #                                           hidden_dim, 
         #                                          activation_fn=act_fn)

      #y_hat_logit = contrib_layers.fully_connected(inter_layer, 
          #                                         label_dim, 
           #                                        activation_fn=None)
      #y_hat = tf.nn.softmax(y_hat_logit)

      # if using tensorflow 2.0, use the following
      inter_layer = contrib_layers.Dense(hidden_dim, activation=act_fn)(x_input)
      inter_layer = contrib_layers.Dense(hidden_dim, activation=act_fn)(inter_layer)
      y_hat_logit = contrib_layers.Dense(label_dim, activation=None)(inter_layer)
      #y_hat = tf.nn.softmax(y_hat_logit)
      y_hat = tf.keras.activations.softmax(y_hat_logit)

    return y_hat_logit, y_hat

  # Build model
  y_hat_logit, y_hat = predictor(x_input)    
  yv_hat_logit, yv_hat = predictor(xu_input)
  
  # Defin losses
  # Supervised loss
  #y_loss = tf.losses.softmax_cross_entropy(y_input, y_hat_logit)
  # if using tensorflow 2.0, use the following
  y_loss = tf.keras.losses.CategoricalCrossentropy(y_input, y_hat_logit)  
  # Unsupervised loss
  # Assuming yv_hat_logit is a KerasTensor
  yv_hat_logit_tensor = tf.convert_to_tensor(yv_hat_logit)
  yu_loss = tf.math.reduce_mean(tf.nn.moments(yv_hat_logit_tensor, axes=0)[1])
  #yu_loss = tf.math.reduce_mean(tf.nn.moments(yv_hat_logit, axes = 0)[1])
  # if using tensorflow 2.0, use the following


  # Define variables
  p_vars = [v for v in tf.trainable_variables() \
            if v.name.startswith('predictor')]    
  # Define solver
  solver = tf.train.AdamOptimizer().minimize(y_loss + \
                                 beta * yu_loss, var_list=p_vars)

  # Load encoder from self-supervised model
  encoder = keras.models.load_model(file_name)
  
  # Encode validation and testing features
  x_valid = encoder.predict(x_valid)  
  x_test = encoder.predict(x_test)

  # Start session
  sess = tf.Session()
  sess.run(tf.global_variables_initializer())
  
  # Setup early stopping procedure
  class_file_name = './save_model/class_model.ckpt'
  saver = tf.train.Saver(p_vars)
    
  yv_loss_min = 1e10
  yv_loss_min_idx = -1
  
  # Training iteration loop
  for it in range(iterations):

    # Select a batch of labeled data
    batch_idx = np.random.permutation(len(x_train[:, 0]))[:batch_size]
    x_batch = x_train[batch_idx, :]
    y_batch = y_train[batch_idx, :]    
    
    # Encode labeled data
    x_batch = encoder.predict(x_batch)  
    
    # Select a batch of unlabeled data
    batch_u_idx = np.random.permutation(len(x_unlab[:, 0]))[:batch_size]
    xu_batch_ori = x_unlab[batch_u_idx, :]
    
    # Augment unlabeled data
    xu_batch = list()
    
    for rep in range(K):      
      # Mask vector generation
      m_batch = mask_generator(p_m, xu_batch_ori)
      # Pretext generator
      _, xu_batch_temp = pretext_generator(m_batch, xu_batch_ori)
      
      # Encode corrupted samples
      xu_batch_temp = encoder.predict(xu_batch_temp)
      xu_batch = xu_batch + [xu_batch_temp]
    # Convert list to matrix
    xu_batch = np.asarray(xu_batch)

    # Train the model
    _, y_loss_curr = sess.run([solver, y_loss], 
                              feed_dict={x_input: x_batch, y_input: y_batch, 
                                         xu_input: xu_batch})  
    # Current validation loss
    yv_loss_curr = sess.run(y_loss, feed_dict={x_input: x_valid, 
                                               y_input: y_valid})
  
    if it % 100 == 0:
      print('Iteration: ' + str(it) + '/' + str(iterations) + 
            ', Current loss: ' + str(np.round(yv_loss_curr, 4)))      
      
    # Early stopping & Best model save
    if yv_loss_min > yv_loss_curr:
      yv_loss_min = yv_loss_curr
      yv_loss_min_idx = it

      # Saves trained model
      saver.save(sess, class_file_name)
      
    if yv_loss_min_idx + 100 < it:
      break

  #%% Restores the saved model
  imported_graph = tf.train.import_meta_graph(class_file_name + '.meta')
  
  sess = tf.Session()
  imported_graph.restore(sess, class_file_name)
    
  # Predict on x_test
  y_test_hat = sess.run(y_hat, feed_dict={x_input: x_test})
  
  return y_test_hat


In [None]:
"""VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (VIME) Codebase.

Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, 
"VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," 
Neural Information Processing Systems (NeurIPS), 2020.
Paper link: TBD
Last updated Date: October 11th 2020
Code author: Jinsung Yoon (jsyoon0823@gmail.com)
-----------------------------

vime_semi.py
- Semi-supervised learning parts of the VIME framework
- Using both labeled and unlabeled data to train the predictor with the help of trained encoder
"""

# Necessary packages
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, losses
from vime_utils import mask_generator, pretext_generator

def vime_semi(x_train, y_train, x_unlab, x_test, parameters, 
              p_m, K, beta, file_name):
  """Semi-supervied learning part in VIME.
  
  Args:
    - x_train, y_train: training dataset
    - x_unlab: unlabeled dataset
    - x_test: testing features
    - parameters: network parameters (hidden_dim, batch_size, iterations)
    - p_m: corruption probability
    - K: number of augmented samples
    - beta: hyperparameter to control supervised and unsupervised loss
    - file_name: saved filed name for the encoder function
    
  Returns:
    - y_test_hat: prediction on x_test
  """
      
  # Network parameters
  hidden_dim = parameters['hidden_dim']
  act_fn = tf.nn.relu
  batch_size = parameters['batch_size']
  iterations = parameters['iterations']

  # Basic parameters
  data_dim = x_train.shape[1]
  label_dim = y_train.shape[1]
  
  # Divide training and validation sets (9:1)
  idx = np.random.permutation(len(x_train))
  train_idx = idx[:int(len(idx)*0.9)]
  valid_idx = idx[int(len(idx)*0.9):]
  
  x_valid = x_train[valid_idx]
  y_valid = y_train[valid_idx]
  
  x_train = x_train[train_idx]
  y_train = y_train[train_idx]

  # Define predictor model
  def build_predictor():
    model = models.Sequential([
      layers.Input(shape=(data_dim,)),
      layers.Dense(hidden_dim, activation=act_fn),
      layers.Dense(hidden_dim, activation=act_fn),
      layers.Dense(label_dim, activation=None)
    ])
    return model
  
  predictor = build_predictor()

  # Compile the predictor model
  optimizer = optimizers.Adam()
  predictor.compile(optimizer=optimizer, 
                    loss=losses.CategoricalCrossentropy(from_logits=True))

  # Load encoder from self-supervised model
  encoder = models.load_model(file_name)
  
  # Encode validation and testing features
  x_valid_encoded = encoder.predict(x_valid)
  x_test_encoded = encoder.predict(x_test)
  
  best_loss = float('inf')
  early_stop_counter = 0
  
  # Training iteration loop
  for it in range(iterations):

    # Select a batch of labeled data
    batch_idx = np.random.permutation(len(x_train))[:batch_size]
    x_batch = x_train[batch_idx]
    y_batch = y_train[batch_idx]    
    
    # Encode labeled data
    x_batch_encoded = encoder.predict(x_batch)  
    
    # Select a batch of unlabeled data
    batch_u_idx = np.random.permutation(len(x_unlab))[:batch_size]
    xu_batch_ori = x_unlab[batch_u_idx]
    
    # Augment unlabeled data
    xu_batch = []
    
    for _ in range(K):
      # Mask vector generation
      m_batch = mask_generator(p_m, xu_batch_ori)
      # Pretext generator
      _, xu_batch_temp = pretext_generator(m_batch, xu_batch_ori)
      
      # Encode corrupted samples
      xu_batch_temp_encoded = encoder.predict(xu_batch_temp)
      xu_batch.append(xu_batch_temp_encoded)
    
    # Convert list to numpy array
    xu_batch = np.array(xu_batch)

    with tf.GradientTape() as tape:
      y_hat_logit = predictor(x_batch_encoded, training=True)
      y_loss = tf.reduce_mean(losses.categorical_crossentropy(y_batch, y_hat_logit, from_logits=True))
      
      yv_hat_logit = predictor(xu_batch, training=True)
      yu_loss = tf.reduce_mean(tf.nn.moments(yv_hat_logit, axes=0)[1])
      
      loss = y_loss + beta * yu_loss

    grads = tape.gradient(loss, predictor.trainable_variables)
    optimizer.apply_gradients(zip(grads, predictor.trainable_variables))

    # Current validation loss
    yv_hat_logit_valid = predictor(x_valid_encoded, training=False)
    yv_loss = tf.reduce_mean(losses.categorical_crossentropy(y_valid, yv_hat_logit_valid, from_logits=True))

    if it % 100 == 0:
      print(f'Iteration: {it}/{iterations}, Current loss: {yv_loss.numpy()}')
      
    # Early stopping & Best model save
    if yv_loss < best_loss:
      best_loss = yv_loss
      early_stop_counter = 0
      predictor.save_weights('best_predictor.h5')
    else:
      early_stop_counter += 1

    if early_stop_counter > 100:
      break

  # Load the best model
  predictor.load_weights('best_predictor.h5')
  
  # Predict on x_test
  y_test_hat_logit = predictor(x_test_encoded, training=False)
  y_test_hat = tf.nn.softmax(y_test_hat_logit).numpy()
  
  return y_test_hat


In [None]:
"""VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (VIME) Codebase.

Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, 
"VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," 
Neural Information Processing Systems (NeurIPS), 2020.
Paper link: TBD
Last updated Date: October 11th 2020
Code author: Jinsung Yoon (jsyoon0823@gmail.com)
-----------------------------

vime_semi.py
- Semi-supervised learning parts of the VIME framework
- Using both labeled and unlabeled data to train the predictor with the help of trained encoder
"""

# Necessary packages
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, losses
from vime_utils import mask_generator, pretext_generator

def vime_semi(x_train, y_train, x_unlab, x_test, parameters, 
              p_m, K, beta, file_name):
  """Semi-supervied learning part in VIME.
  
  Args:
    - x_train, y_train: training dataset
    - x_unlab: unlabeled dataset
    - x_test: testing features
    - parameters: network parameters (hidden_dim, batch_size, iterations)
    - p_m: corruption probability
    - K: number of augmented samples
    - beta: hyperparameter to control supervised and unsupervised loss
    - file_name: saved filed name for the encoder function
    
  Returns:
    - y_test_hat: prediction on x_test
  """
      
  # Network parameters
  hidden_dim = parameters['hidden_dim']
  act_fn = tf.nn.relu
  batch_size = parameters['batch_size']
  iterations = parameters['iterations']

  # Basic parameters
  data_dim = x_train.shape[1]
  label_dim = y_train.shape[1]
  
  # Divide training and validation sets (9:1)
  idx = np.random.permutation(len(x_train))
  train_idx = idx[:int(len(idx)*0.9)]
  valid_idx = idx[int(len(idx)*0.9):]
  
  x_valid = x_train[valid_idx]
  y_valid = y_train[valid_idx]
  
  x_train = x_train[train_idx]
  y_train = y_train[train_idx]

  # Define predictor model
  def build_predictor():
    model = models.Sequential([
      layers.Input(shape=(data_dim,)),
      layers.Dense(hidden_dim, activation=act_fn),
      layers.Dense(hidden_dim, activation=act_fn),
      layers.Dense(label_dim, activation=None)
    ])
    return model
  
  predictor = build_predictor()

  # Compile the predictor model
  optimizer = optimizers.Adam()
  predictor.compile(optimizer=optimizer, 
                    loss=losses.CategoricalCrossentropy(from_logits=True))

  # Load encoder from self-supervised model
  encoder = models.load_model(file_name)
  
  # Encode validation and testing features
  x_valid_encoded = encoder.predict(x_valid)
  x_test_encoded = encoder.predict(x_test)
  
  best_loss = float('inf')
  early_stop_counter = 0
  
  # Training iteration loop
  for it in range(iterations):

    # Select a batch of labeled data
    batch_idx = np.random.permutation(len(x_train))[:batch_size]
    x_batch = x_train[batch_idx]
    y_batch = y_train[batch_idx]    
    
    # Encode labeled data
    x_batch_encoded = encoder.predict(x_batch)  
    
    # Select a batch of unlabeled data
    batch_u_idx = np.random.permutation(len(x_unlab))[:batch_size]
    xu_batch_ori = x_unlab[batch_u_idx]
    
    # Augment unlabeled data
    xu_batch = []
    
    for _ in range(K):
      # Mask vector generation
      m_batch = mask_generator(p_m, xu_batch_ori)
      # Pretext generator
      _, xu_batch_temp = pretext_generator(m_batch, xu_batch_ori)
      
      # Encode corrupted samples
      xu_batch_temp_encoded = encoder.predict(xu_batch_temp)
      xu_batch.append(xu_batch_temp_encoded)
    
    # Convert list to numpy array and reshape
    xu_batch = np.concatenate(xu_batch, axis=0)

    with tf.GradientTape() as tape:
      y_hat_logit = predictor(x_batch_encoded, training=True)
      y_loss = tf.reduce_mean(losses.categorical_crossentropy(y_batch, y_hat_logit, from_logits=True))
      
      yv_hat_logit = predictor(xu_batch, training=True)
      yu_loss = tf.reduce_mean(tf.nn.moments(yv_hat_logit, axes=0)[1])
      
      loss = y_loss + beta * yu_loss

    grads = tape.gradient(loss, predictor.trainable_variables)
    optimizer.apply_gradients(zip(grads, predictor.trainable_variables))

    # Current validation loss
    yv_hat_logit_valid = predictor(x_valid_encoded, training=False)
    yv_loss = tf.reduce_mean(losses.categorical_crossentropy(y_valid, yv_hat_logit_valid, from_logits=True))

    if it % 100 == 0:
      print(f'Iteration: {it}/{iterations}, Current loss: {yv_loss.numpy()}')
      
    # Early stopping & Best model save
    if yv_loss < best_loss:
      best_loss = yv_loss
      early_stop_counter = 0
      predictor.save_weights('best_predictor.h5')
    else:
      early_stop_counter += 1

    if early_stop_counter > 100:
      break

  # Load the best model
  predictor.load_weights('best_predictor.h5')
  
  # Predict on x_test
  y_test_hat_logit = predictor(x_test_encoded, training=False)
  y_test_hat = tf.nn.softmax(y_test_hat_logit).numpy()
  
  return y_test_hat


In [None]:
"""VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (VIME) Codebase.

Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, 
"VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," 
Neural Information Processing Systems (NeurIPS), 2020.
Paper link: TBD
Last updated Date: October 11th 2020
Code author: Jinsung Yoon (jsyoon0823@gmail.com)
-----------------------------

vime_semi.py
- Semi-supervised learning parts of the VIME framework
- Using both labeled and unlabeled data to train the predictor with the help of trained encoder
"""

# Necessary packages
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, losses
from vime_utils import mask_generator, pretext_generator

def vime_semi(x_train, y_train, x_unlab, x_test, parameters, 
              p_m, K, beta, file_name):
  """Semi-supervied learning part in VIME.
  
  Args:
    - x_train, y_train: training dataset
    - x_unlab: unlabeled dataset
    - x_test: testing features
    - parameters: network parameters (hidden_dim, batch_size, iterations)
    - p_m: corruption probability
    - K: number of augmented samples
    - beta: hyperparameter to control supervised and unsupervised loss
    - file_name: saved filed name for the encoder function
    
  Returns:
    - y_test_hat: prediction on x_test
  """
      
  # Network parameters
  hidden_dim = parameters['hidden_dim']
  act_fn = tf.nn.relu
  batch_size = parameters['batch_size']
  iterations = parameters['iterations']

  # Basic parameters
  data_dim = x_train.shape[1]
  label_dim = y_train.shape[1]
  
  # Divide training and validation sets (9:1)
  idx = np.random.permutation(len(x_train))
  train_idx = idx[:int(len(idx)*0.9)]
  valid_idx = idx[int(len(idx)*0.9):]
  
  x_valid = x_train[valid_idx]
  y_valid = y_train[valid_idx]
  
  x_train = x_train[train_idx]
  y_train = y_train[train_idx]

  # Define predictor model
  def build_predictor():
    model = models.Sequential([
      layers.Input(shape=(data_dim,)),
      layers.Dense(hidden_dim, activation=act_fn),
      layers.Dense(hidden_dim, activation=act_fn),
      layers.Dense(label_dim, activation=None)
    ])
    return model
  
  predictor = build_predictor()

  # Compile the predictor model
  optimizer = optimizers.Adam()
  predictor.compile(optimizer=optimizer, 
                    loss=losses.CategoricalCrossentropy(from_logits=True))

  # Load encoder from self-supervised model
  encoder = models.load_model(file_name)
  
  # Encode validation and testing features
  x_valid_encoded = encoder.predict(x_valid)
  x_test_encoded = encoder.predict(x_test)
  
  best_loss = float('inf')
  early_stop_counter = 0
  
  # Training iteration loop
  for it in range(iterations):

    # Select a batch of labeled data
    batch_idx = np.random.permutation(len(x_train))[:batch_size]
    x_batch = x_train[batch_idx]
    y_batch = y_train[batch_idx]    
    
    # Encode labeled data
    x_batch_encoded = encoder.predict(x_batch)  
    
    # Select a batch of unlabeled data
    batch_u_idx = np.random.permutation(len(x_unlab))[:batch_size]
    xu_batch_ori = x_unlab[batch_u_idx]
    
    # Augment unlabeled data
    xu_batch = []
    
    for _ in range(K):
      # Mask vector generation
      m_batch = mask_generator(p_m, xu_batch_ori)
      # Pretext generator
      _, xu_batch_temp = pretext_generator(m_batch, xu_batch_ori)
      
      # Encode corrupted samples
      xu_batch_temp_encoded = encoder.predict(xu_batch_temp)
      xu_batch.append(xu_batch_temp_encoded)
    
    # Convert list to numpy array and reshape
    xu_batch = np.concatenate(xu_batch, axis=0)

    with tf.GradientTape() as tape:
      y_hat_logit = predictor(x_batch_encoded, training=True)
      y_loss = tf.reduce_mean(losses.categorical_crossentropy(y_batch, y_hat_logit, from_logits=True))
      
      yv_hat_logit = predictor(xu_batch, training=True)
      yu_loss = tf.reduce_mean(tf.nn.moments(yv_hat_logit, axes=0)[1])
      
      loss = y_loss + beta * yu_loss

    grads = tape.gradient(loss, predictor.trainable_variables)
    optimizer.apply_gradients(zip(grads, predictor.trainable_variables))

    # Current validation loss
    yv_hat_logit_valid = predictor(x_valid_encoded, training=False)
    yv_loss = tf.reduce_mean(losses.categorical_crossentropy(y_valid, yv_hat_logit_valid, from_logits=True))

    if it % 100 == 0:
      print(f'Iteration: {it}/{iterations}, Current loss: {yv_loss.numpy()}')
      
    # Early stopping & Best model save
    if yv_loss < best_loss:
      best_loss = yv_loss
      early_stop_counter = 0
      predictor.save_weights('best_predictor.weights.h5')
    else:
      early_stop_counter += 1

    if early_stop_counter > 100:
      break

  # Load the best model
  predictor.load_weights('best_predictor.weights.h5')
  
  # Predict on x_test
  y_test_hat_logit = predictor(x_test_encoded, training=False)
  y_test_hat = tf.nn.softmax(y_test_hat_logit).numpy()
  
  return y_test_hat


In [None]:
# Train VIME-Semi
vime_semi_parameters = dict()
vime_semi_parameters['hidden_dim'] = 100
vime_semi_parameters['batch_size'] = 128
vime_semi_parameters['iterations'] = 1000
y_test_hat = vime_semi(x_train, y_train, x_unlab, x_test, 
                       vime_semi_parameters, p_m, K, beta, file_name)

# Test VIME
results[4] = perf_metric(metric, y_test, y_test_hat)
  
print('VIME Performance: '+ str(results[4]))

### Report Prediction Performances

- 3 Supervised learning models
- VIME with self-supervised part only
- Entire VIME framework

In [None]:
for m_it in range(len(model_sets)):  
    
  model_name = model_sets[m_it]  
    
  print('Supervised Performance, Model Name: ' + model_name + 
        ', Performance: ' + str(results[m_it]))
    
print('VIME-Self Performance: ' + str(results[m_it+1]))
  
print('VIME Performance: '+ str(results[m_it+2]))