## Mounting drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


## Structure of code

The code is structures as follows: 

* **Import module cell** : All the necessary modules are imported

* **Logger class** : A class implemented to manage logging of values of variables that need to be tracked while training. For example, training error, test accuracy, change in activation pattern (in our case), etc

* **Python generator function cell** : This cell has a class `StructuredDatasetGenerator` that initializes the parameters of the distribution from which data points need to sampled and has a generator method that samples data points from these distributions. This generator function is used in the `Model` class to build training and test dataset for the neural network

* **Model class** : The most important cell on this notebook. This has class `Model`, which when initialized builds data t inpupipeline, a neural netowrk and pseudo network on the same temsorflow computation graph. 

* **Configurations** : This cell contains the class `Configuration` that has the specifications/hyper-parameters of experiment to be carried out.


## Import modules

In [2]:
from sklearn.model_selection import train_test_split
from scipy.spatial.distance import hamming
from IPython import display

import matplotlib.animation as animation
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import skimage.transform as im
import tensorflow as tf
import pandas as pd
import numpy as np
import inspect
import random
import struct
import time
import cv2
import os

# tf.compat.v1.enable_eager_execution()

print(tf.__version__)


1.15.0


In [3]:
# Check if GPU is available
from tensorflow.python.client import device_lib 
print("Num GPUs Available: ", 
      len(tf.config.experimental.list_physical_devices('GPU')))
# print(device_lib.list_local_devices())

Num GPUs Available:  1


## Logger class

TLDR : This is an object to keep track of values like train error, test error, etc - the typical values one would like to track while training a neural network. In our case, we would also track activation pattern and the magnitude of weights changed

In [0]:
class Logger:
    """Logger class can be used to log values of several parameters of the 
    network during training. All the logs are stored in a dataframe and the 
    dataframe is updated when a new log comes in. By making `live_plot=True`, 
    the live plots of the values logged can also be observed.
    """
    def __init__(self, var_names, include_step=True, live_plot=True):
        """Logger class initializer
        Parameters:
            `var_names` : List of variable names to track. These names will be 
                          the header row in the internal dataframe maintained by
                          the Logger class
            `include_step` : Boolean to indicate whether to include a 'step' 
                             column in the dataframe
            `live_plot` : Boolean to indicate whether to show live plots of the 
                          variables as the logs get added
        """
        self.var_names = var_names
        self.include_step = include_step
        if include_step and ('step' not in self.var_names):
            self.var_names = ['step'] + self.var_names
        self.dataframe = pd.DataFrame(columns=self.var_names)
        self.num_vars = len(self.var_names)
        self.num_steps = 0
        self.plot_lists = {}
        self.verbose = []
        self.live_plot = live_plot
        if self.live_plot:
            print(self.var_names)
            if self.num_vars < 3:
                figsize = (7.0 * self.num_vars, 4.8)
            else:
                figsize = (21.0, 4.8 * (self.num_vars//3 + 1))
            self.fig = plt.figure(figsize=figsize)
            plt.subplots_adjust(wspace=0.3, hspace=0.4)
            self._set_plots(self.var_names[1:])

    def add_log(self, step, vals):
        """ Adds entries to the dataframe
        """
        if self.include_step:
            self.dataframe.loc[self.num_steps] = [step] + vals
        else:
            self.dataframe.loc[self.num_steps] = vals
        self.num_steps += 1
        if self.live_plot:
            self._visualize()

    def add_verbose(self, v):
        """ Adds verbose to the log. This log can be saved using `save` method 
        implemented below in a text file 'verbose.txt' 
        """
        self.verbose.append(v)

    def _visualize(self):
        for yname in self.var_names[1:]:
            plot_name = yname + '_vs_' + 'step'
            ax = self.plot_lists[plot_name]
            ax.clear()
            ax.set_title(plot_name, color='w')
            ax.set_xlabel('step', color='w')
            ax.set_ylabel(yname, color='w')
            ax.plot(list(self.dataframe['step'].values), 
                    list(self.dataframe[yname].values))
        display.clear_output(wait=True)
        display.display(self.fig)

    def _set_plots(self, ynames, xnames='step'):
        if isinstance(xnames, list):
            assert len(xnames) == len(ynames)
        else:
            assert isinstance(xnames, str)
            xnames = [xnames] * len(ynames)

        num_plots = len(ynames)
        for i, (yname, xname) in enumerate(zip(ynames, xnames)):    
            plot_name = yname + '_vs_' + xname
            # print(num_plots // 3 + 1)
            self.plot_lists[plot_name] = self.fig.add_subplot(num_plots//3 + 1, 
                                                              3, i+1)

    def save(self, folder='.'):
        self.dataframe.to_csv(os.path.join(folder, 'logs.csv'))
        np.savetxt(os.path.join(folder, 'verbose.txt'), self.verbose, 
                   delimiter='\n', fmt='%s')

    def close(self):
        display.clear_output(wait=False)


# TEST for Logger

# vars = ['a', 'b', 'c', 'd']
# ckpt = Logger(vars)   
# print(vars)
# for i in range(10):
#     vals = np.random.rand(4)
#     ckpt.add_log(i * 2, [y for x, y in zip(vars, vals)]) 
#     ckpt._visualize()
#     time.sleep(1)
# ckpt.close()


## Generator function to generate data points

This cell has a function that generates samples from a structured dataset. The parameter values inside the functions are follow the specifications of synthetic data from the original paper 


In [0]:
class StructuredDatasetGenerator:
    """ This object defines a mixture of gaussiand from which samples are to be 
    generated.
    """
    def __init__(self, n_dim=1000, n_classes=10, n_components_per_class=2, 
                 n_samples=4000, component_probability=None):
        """ The constructor creates a `StructuredDatasetGenerator` object. It 
        uses the arguments to create dataset having `n_classes` classes, 
        `n_components_per_class` distinct gaussian distributions per class. If 
        the `component_probability` is `None`, uniform probability value will be
        assigned to each component. The constructor generates `n_samples` data 
        points from these mixture of gaussians before exiting. The samples 
        generated are drawn from a mixture of 
        `n_classes` * `n_components_per_class` gaussian distributions.
        """
        self.n_samples = n_samples
        self.n_dim = n_dim
        self.n_classes = n_classes
        self.n_components_per_class = n_components_per_class
        self.component_probability = component_probability

        if component_probability is None:
            # if the component probabilites are not given, assign uniform
            # probabilities to all components
            _total_comp = n_classes * n_components_per_class
            p_ij = 1.0 / (_total_comp)
            self.component_probability = [p_ij for _ in range(_total_comp)]
    
        _sigma = 1.0    # specified in paper
        _sigma0 = 5.0   # specified in paper
    
        self.distributions = []  # stores the mean and std deviation
        for i in range(n_classes):
            distributions_per_class = []
            for j in range(n_components_per_class):
                mu = np.random.normal(loc=0.0, 
                                      scale=(_sigma0 / np.sqrt(n_dim)),
                                      size=n_dim)
                cov = (_sigma**2 / n_dim) * np.identity(n_dim)
                distributions_per_class.append((mu, cov))
            self.distributions.append(distributions_per_class)

        # Samples generated beforehand. The `self.generate` method will yeild 
        # these generated samples
        self.samples = []
        for i in range(n_samples):
            _sample = np.random.multinomial(1, self.component_probability)
            comp_chosen = np.argmax(_sample)
            curr_class = int(comp_chosen / self.n_components_per_class)
            curr_comp = int(comp_chosen % self.n_components_per_class)
    
            mu, cov = self.distributions[curr_class][curr_comp]
            curr_sample = np.random.multivariate_normal(mean=mu, cov=cov)
            curr_sample = curr_sample / np.linalg.norm(curr_sample)
            self.samples.append((curr_sample, curr_class))


    def generate(self, n_samples=-1):
        if n_samples == -1:
            n_samples = self.n_samples
        for i in range(n_samples):
            yield self.samples[i]


## Model class

The biggest part of the code. In this cell the `Model` class is implemented that loads in the dataset, generates input pipeline for neural net, builds a neural net. It has trainig and testing modules for training and testing the build network on the loaded dataset



In [0]:

class Model:

    def __init__(self, cfg, graph=None):
        """ The constructor takes in a configuration specifying the parameters 
        of dataset to be prepared and neural network to be built.
        """
        self.input = None
        self.cfg = cfg
        self.activation_fn = cfg.activation_fn
        if graph is None:
            self.comp_graph = tf.Graph()
        else:
            self.comp_graph = graph
        self.sess = tf.Session(graph=self.comp_graph)

        with self.comp_graph.as_default():
            # Building input pipeline
            self._build_dataset()
            self.is_training = tf.placeholder(dtype=tf.bool, 
                                              name='is_in_training_mode')
            self.fixed_order_inputs = tf.placeholder(dtype=tf.bool, 
                                                     name='fixed_order_dataset')
            
            # Building neural net architecture
            self._build_model(cfg.layer_sizes)
            self.sess.run(tf.global_variables_initializer())
            self._save_initial_weights()

            # Building pseudo net architecture - neural network where activation
            # patterns are made to be fixed while training
            with tf.variable_scope('PseudoNet', reuse=tf.AUTO_REUSE):
                self._build_pseudo_model(cfg.layer_sizes)
            self._initialize_pseudo_net()

            self.saver = tf.train.Saver()

        # Initializing the values of variables to track while training
        self.change_in_weights = 0.0
        self.frac_activation_changed = 0.0
        self.initial_pattern = self._get_activation_pattern()

        w, b = self.get_weight_val('DenseLayer_1')
        with tf.variable_scope('PseudoNet', reuse=tf.AUTO_REUSE):
            pw, pb = self.get_weight_val('DenseLayer_1')

        assert((w == pw).all() and (b == pb).all())

    def _build_dataset(self):
        """ Builds input dataset pipelene in the tensorflow graph
        """
        with self.comp_graph.as_default():
            # Generating dataset
            generator_args = [self.cfg.n_dim, 
                              self.cfg.n_classes, 
                              self.cfg.n_comp]
            dataset = StructuredDatasetGenerator(*generator_args)
            _types = (tf.float32, tf.float32)
            _shapes = ([self.cfg.n_dim,], [])
            self._dataset = tf.data.Dataset.from_generator(dataset.generate, 
                                                           output_types=_types,
                                                           output_shapes=_shapes
                                                           )
             
            # Splitting dataset into train and test
            _train_d = self._dataset.take(self.cfg.n_train_samples)
            _test_d = self._dataset.skip(self.cfg.n_train_samples)
            
            # Creating training dataset
            # Dataset is shuffled after every epoch and then split into batches
            self.train_dataset = _train_d.shuffle(self.cfg.n_train_samples)\
                                         .batch(self.cfg.batch_size)\
                                         .repeat()

            # No shuffling of dataset - This dataset is used to keep track of 
            # activation patterns of hidden layer
            self._ordered_batch = self.cfg.n_train_samples
            self.ordered_train_dataset = _train_d.batch(self._ordered_batch)\
                                                 .repeat()

            # Test dataset
            self.test_dataset = _test_d.take(self.cfg.n_test_samples)\
                                             .batch(self.cfg.n_test_samples)\
                                             .repeat()

            _train_iter = tf.data.make_one_shot_iterator(self.train_dataset)
            self.inputs_batch, self.gt_batch = _train_iter.get_next()
            _iter = tf.data.make_one_shot_iterator(self.ordered_train_dataset)
            self.ordered_inputs, self.ordered_gt = _iter.get_next()
            _test_iter = tf.data.make_one_shot_iterator(self.test_dataset)
            self.test_inputs, self.test_gt = _test_iter.get_next()

    def _build_model(self, layer_sizes=None):
        """ Builds neural network model in the tensorflow graph
        """
        # Preparing input and ground truth tensors according to the flags 
        # `self.is_training`, `self.fixed_order_inputs`
        _case_training = lambda: tf.cond(self.fixed_order_inputs, 
                                         lambda: self.ordered_inputs,
                                         lambda: self.inputs_batch)
        _case_testing = lambda: self.test_inputs
        outp = tf.cond(self.is_training, _case_training, _case_testing)
        outp.set_shape(tf.TensorShape([None, self.cfg.n_dim]))
        outp = tf.expand_dims(outp, axis=1)

        _case_training = lambda: tf.cond(self.fixed_order_inputs,
                                         lambda: self.ordered_gt,
                                         lambda: self.gt_batch)
        _case_testing = lambda: self.test_gt
        gt = tf.cond(self.is_training, _case_training, _case_testing)
        if self.cfg.task == 'classification':
            gt.set_shape(tf.TensorShape([None,]))
        else:
            gt.set_shape(tf.TensorShape([None, self.cfg.layer_sizes[-1]]))

        # Building network 
        self.layer_outputs = []
        self.layer_outputs.append(tf.squeeze(outp))
        prev_sz = layer_sizes[0]
        for layer_idx, sz in enumerate(layer_sizes[1:-1]):
            with tf.variable_scope('DenseLayer_{}'.format(layer_idx + 1), 
                                   reuse=tf.AUTO_REUSE):
                # creating weight matrix and bias vector, and initializing them 
                # as specified in the paper
                w = tf.get_variable(name='kernel',
                                    shape=(prev_sz, sz), dtype=tf.float32,
                                    initializer=tf.random_normal_initializer(
                                                    mean=0.0, 
                                                    stddev=1.0 / np.sqrt(sz)
                                                ),
                                    trainable=True)
                b = tf.get_variable(name='bias',
                                    shape=(sz,), dtype=tf.float32,
                                    initializer=tf.random_normal_initializer(
                                                    mean=0.0,
                                                    stddev=1.0 / np.sqrt(sz)
                                                ),
                                    trainable=True)
                outp = tf.add(tf.matmul(outp, w), b)
                outp = self.activation_fn(outp)
                self.layer_outputs.append(tf.squeeze(outp))
            prev_sz = sz

        with tf.variable_scope('OutputLayer', reuse=tf.AUTO_REUSE):
            sz = layer_sizes[-1]
            w = tf.get_variable(name='kernel',
                                shape=(prev_sz, sz), dtype=tf.float32,
                                initializer=tf.random_normal_initializer(
                                                mean=0.0,
                                                stddev=1.0
                                            ),
                                trainable=self.cfg.trainable_output_weights)
            b = tf.get_variable(name='bias',
                                shape=(sz,), dtype=tf.float32,
                                initializer=tf.random_normal_initializer(
                                                mean=0.0,
                                                stddev=1.0
                                            ),
                                trainable=self.cfg.trainable_output_weights)
            self.output = tf.add(tf.matmul(outp, w), b)

        self.output = tf.squeeze(self.output)
        self.layer_outputs.append(self.output)

        # Defining post-output operations: finding loss, back propagation, 
        # evaluating accuracy of predictions, etc
        self.loss_op = self.get_loss(labels=gt, predictions=self.output)
        self.gradients = self._create_gradient_ops(self.loss_op, layer_sizes)

        if self.cfg.task == 'classification':
            self.pred_output = tf.arg_max(self.output, dimension=-1)

        self.opt = self.cfg.optimizer(learning_rate=self.cfg.learning_rate, 
                                      name='Optimizer')
        _global_step = tf.train.get_global_step()
        self.train_op = self.opt.minimize(self.loss_op,
                                          global_step=_global_step,
                                          name='training_operations')

    def _build_pseudo_model(self, layer_sizes=None):
        """ Builds the pseudo neural network model in the tensorflow graph
        """
        # Preparing input and ground truth tensors according to the flags 
        # `self.is_training`, `self.fixed_order_inputs`
        _case_training = lambda: tf.cond(self.fixed_order_inputs, 
                                         lambda: self.ordered_inputs,
                                         lambda: self.inputs_batch)
        _case_testing = lambda: self.test_inputs
        outp = tf.cond(self.is_training, _case_training, _case_testing)
        outp.set_shape(tf.TensorShape([None, self.cfg.n_dim]))
        outp = tf.expand_dims(outp, axis=1)

        _case_training = lambda: tf.cond(self.fixed_order_inputs,
                                         lambda: self.ordered_gt,
                                         lambda: self.gt_batch)
        _case_testing = lambda: self.test_gt
        gt = tf.cond(self.is_training, _case_training, _case_testing)
        if self.cfg.task == 'classification':
            gt.set_shape(tf.TensorShape([None,]))
        else:
            gt.set_shape(tf.TensorShape([None, self.cfg.layer_sizes[-1]]))

        # Building pseudo network 
        self.p_layer_outputs = []
        self.p_layer_outputs.append(tf.squeeze(outp))
        prev_sz = layer_sizes[0]
        for layer_idx, sz in enumerate(layer_sizes[1:-1]):
            idx = layer_idx + 1
            with tf.variable_scope('DenseLayer_{}'.format(idx), 
                                   reuse=tf.AUTO_REUSE):
                setattr(self, 'p_w_init_{}'.format(idx), 
                        tf.placeholder(dtype=tf.float32, shape=[prev_sz, sz], 
                                       name='Initial_weight_{}'.format(idx)))
                setattr(self, 'p_b_init_{}'.format(idx), 
                        tf.placeholder(dtype=tf.float32, shape=[sz,], 
                                       name='Initial_bias_{}'.format(idx)))
                w = tf.get_variable(name='kernel', dtype=tf.float32,
                                    initializer=getattr(self,
                                                        'p_w_init_%d'%idx),
                                    trainable=True)
                b = tf.get_variable(name='bias', dtype=tf.float32,
                                    initializer=getattr(self,
                                                        'p_b_init_%d'%idx),
                                    trainable=True)
                _outp = tf.add(tf.matmul(outp, w), b)
                _init_dotproduct = tf.add(tf.matmul(outp, 
                                                    getattr(self,
                                                            'p_w_init_%d'%idx)),
                                          getattr(self, 'p_b_init_%d'%idx))
                _pseudo_relu_state = tf.math.sign(_init_dotproduct)
                _pseudo_relu_state = (_pseudo_relu_state + 1.0) / 2.0
                outp = _outp * _pseudo_relu_state
                self.p_layer_outputs.append(tf.squeeze(outp))
            prev_sz = sz

        with tf.variable_scope('OutputLayer', reuse=tf.AUTO_REUSE):
            sz = layer_sizes[-1]
            setattr(self, 'p_w_init_output', 
                    tf.placeholder(dtype=tf.float32, shape=[prev_sz, sz], 
                                   name='Initial_weight_output'))
            setattr(self, 'p_b_init_output', 
                    tf.placeholder(dtype=tf.float32, shape=[sz,], 
                                   name='Initial_bias_output'))
            w = tf.get_variable(name='kernel', dtype=tf.float32,
                                initializer=getattr(self, 'p_w_init_output'),
                                trainable=self.cfg.trainable_output_weights)
            b = tf.get_variable(name='bias', dtype=tf.float32,
                                initializer=getattr(self, 'p_b_init_output'),
                                trainable=self.cfg.trainable_output_weights)
            self.p_output = tf.add(tf.matmul(outp, w), b)

        self.p_output = tf.squeeze(self.p_output)
        self.p_layer_outputs.append(self.p_output)

        # Defining post-output operations: finding loss, back propagation, 
        # evaluating accuracy of predictions, etc
        self.p_loss_op = self.get_loss(labels=gt, predictions=self.p_output)
        self.p_gradients = self._create_gradient_ops(self.p_loss_op, 
                                                     layer_sizes)

        if self.cfg.task == 'classification':
            self.p_pred_output = tf.arg_max(self.p_output, dimension=-1)

        self.p_opt = self.cfg.optimizer(learning_rate=self.cfg.learning_rate, 
                                        name='Optimizer')
        _global_step = tf.train.get_global_step()
        self.p_train_op = self.p_opt.minimize(self.p_loss_op, 
                                              global_step=_global_step, 
                                              name='training_operations')
        
    def _initialize_pseudo_net(self):
        self.feed_dict = {}
        with tf.variable_scope('PseudoNet', reuse=tf.AUTO_REUSE):
            for layer_idx, sz in enumerate(self.cfg.layer_sizes[1:-1]):
                idx = layer_idx + 1
                with tf.variable_scope('DenseLayer_{}'.format(layer_idx + 1), 
                                       reuse=tf.AUTO_REUSE):
                    self.feed_dict[getattr(self, 'p_w_init_%d'%idx)] =\
                                            self.initial_weights[layer_idx][0]
                    self.feed_dict[getattr(self, 'p_b_init_%d'%idx)] =\
                                            self.initial_bias[layer_idx]
            with tf.variable_scope('OutptuLayer'):
                self.feed_dict[getattr(self, 'p_w_init_output')] =\
                                            self.initial_weights[-1][0]
                self.feed_dict[getattr(self, 'p_b_init_output')] =\
                                            self.initial_bias[-1]

        var_list = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
                                     scope='PseudoNet') 
        self.sess.run(tf.initialize_variables(var_list),
                      feed_dict=self.feed_dict)

    def _save_initial_weights(self):
        with self.comp_graph.as_default():
            self.initial_weights = []
            self.initial_bias = []
            for layer_idx, sz in enumerate(self.cfg.layer_sizes[1:-1]):
                w, b = self.get_weight_val('DenseLayer_' + str(layer_idx + 1))
                self.initial_weights.append((w, np.linalg.norm(w)))
                self.initial_bias.append(b)
            w, b = self.get_weight_val('OutputLayer')
            self.initial_weights.append((w, np.linalg.norm))
            self.initial_bias.append(b)

    def _get_activation_pattern(self):
        """ Returns a list of strings of size equal to number of hidden layers. 
        The length of each string will be equal to the number of training 
        samples in the dataset. Each string will be made up of either '1' 
        indicating activated node (positive value after ReLU operation) or '0' 
        indicating dead node (0 after ReLU operation) for that input
        """
        with self.comp_graph.as_default():
            _pattern = []
            _activation_states = []
            _num_batches = int(self.cfg.n_train_samples / self._ordered_batch)
            for data_idx in range(_num_batches):
                feed_dict = {self.is_training: True, 
                             self.fixed_order_inputs: True}
                # gets a matrix `_activations` of size (batch_size, layer_size)) 
                # stors activations for the current batchof inputs
                _activations = self.sess.run(self.layer_outputs[1], 
                                             feed_dict=feed_dict)
                _activations = np.squeeze(np.where(_activations > 0.0, 1, 0))   
                _activation_states.append(_activations)
            _activation_states = np.concatenate(_activation_states, axis=0).T  
            # new shape of `_activation_states` is (layer_size, num_samples)
            # _pattern = np.array([''.join(st) for st in _activation_states])
            return _activation_states

    def _create_gradient_ops(self, loss_op, layer_sizes):
        prev_sz = layer_sizes[0]
        gradients = []
        for layer_idx, sz in enumerate(layer_sizes[1:-1]):
            idx = layer_idx + 1
            with tf.variable_scope('DenseLayer_%d'%idx,
                                   reuse=tf.AUTO_REUSE):
                _w = tf.get_variable('kernel'.format(layer_idx + 1))
                _grad = tf.gradients(loss_op, _w, 
                                     name='Gradient_wrt_Weight_%d'%idx)
                gradients.append(_grad)
        return gradients

    def get_loss(self, labels, predictions):
        """ Loss function for the neural network
        """
        with self.comp_graph.as_default():
            if self.cfg.task == 'classification':
                loss_mean = self.cfg.loss_fn(labels=tf.cast(labels,
                                                            dtype=tf.int32),
                                             logits=predictions)
            else:
                _dtype = predictions.dtype
                loss_mean = self.cfg.loss_fn(labels=tf.cast(labels,
                                                            dtype=_dtype),
                                             predictions=predictions)
            loss_mean = tf.reduce_mean(loss_mean)
            return loss_mean

    def train(self):
        """ This method trains both the true and pseudo neural network
        """
        # Variables to track and log while training
        vars_to_log = ['train_loss', 'p_train_loss', 'frac_activation_changed',
                       'change_in_weights', 'test_accuracy', 'grad_coupling']
        ckpt = Logger(vars_to_log, live_plot=False)
        global_step = 0

        tik = time.time()
        avg_time_taken = 0.0
        for step in range(self.cfg.max_steps):
            global_step += 1
            with self.comp_graph.as_default():
                feed_dict = {self.is_training: True,
                             self.fixed_order_inputs: False}
                feed_dict.update(self.feed_dict)
                vars_to_fetch = [self.train_op, self.p_train_op, self.loss_op,
                                 self.p_loss_op]
                tik = time.time()
                _, _, self.train_loss, self.p_train_loss = self.sess.run(
                                                            vars_to_fetch, 
                                                            feed_dict=feed_dict
                                                           )
                tok = time.time()
                avg_time_taken += float(tok - tik)
            verbose = "[Step {}/{}] ".format(step + 1, self.cfg.max_steps)
            verbose += "Training loss = {:.5f} ".format(self.train_loss)

            if step % self.cfg.log_step == 0: # log values of variables to track
                time_taken = avg_time_taken / self.cfg.log_step
                verbose += "  (Time taken = {:.3f}sec)".format(time_taken)
                avg_time_taken = 0.0
                tik = time.time()

                # Computing changes activation pattern
                curr_activ_pattern = self._get_activation_pattern()
                def hamming_dist(arr1, arr2):
                    if isinstance(arr1, list) or isinstance(arr1, np.ndarray):
                        lst = [hamming(s1, s2) for s1, s2 in zip(arr1, arr2)]
                        return np.array(lst)
                    return hamming(arr1, arr2)
                # Threshold for hamming distance between two activation pattern
                thresh = 0.20
                # Two activation patterns will be considered same if their 
                # hamming distance is less than `thresh`
                h_dist = hamming_dist(curr_activ_pattern, self.initial_pattern)
                _num_changed = np.sum(np.where(h_dist <= thresh, 0, 1))

                num_units = len(curr_activ_pattern)
                self.frac_activation_changed = _num_changed/num_units

                # Computing change in weights
                _current_w, _curr_bias = self.get_weight_val('DenseLayer_1')
                _init_w, _init_norm = self.initial_weights[0]
                _diff_w = _current_w - _init_w
                self.change_in_weights = np.linalg.norm(_diff_w) / _init_norm

                feed_dict = {self.is_training: False,
                             self.fixed_order_inputs: False}
                feed_dict.update(self.feed_dict)
                vars_to_fetch = [self.pred_output, self.test_gt, 
                                 self.gradients[0], self.p_gradients[0]]
                pred, gt, grad, p_grad = self.sess.run(vars_to_fetch, 
                                                       feed_dict=feed_dict)

                # Calculating test accuracy
                num_correct = np.sum(np.where(pred == gt, 1, 0))
                self.test_accuracy = num_correct / len(pred)

                # Calculating difference between true and pseudo gradients
                grad, p_grad = grad[0], p_grad[0]
                p_grad_norm = np.linalg.norm(p_grad)
                self.grad_coupling = np.linalg.norm(grad - p_grad) / p_grad_norm

                vals_of_vars = [getattr(self, var) for var in vars_to_log]
                ckpt.add_log(step, vals_of_vars)
                tok = time.time()
                time_taken = "{:.3f}sec".format(float(tik - tok))
                verbose += " (Time taken for logging = %s)"%(time_taken)
                ckpt.add_verbose(verbose)
                print(verbose)

            if (step + 1) % self.cfg.ckpt_step == 0:   
                # evaluate on test set and save model
                if self.cfg.test:
                    self.test(1)

                if self.cfg.save_model:
                    model_dir = os.path.join(self.cfg.expt_dir, 
                                             self.cfg.expt_name)
                    if not os.path.exists(model_dir):
                        os.mkdir(model_dir)
                    if not os.path.exists(os.path.join(model_dir, 'models')):
                        os.mkdir(os.path.join(model_dir, 'models'))
                    with self.comp_graph.as_default():
                        save_path = os.path.join(model_dir, 'models/model')
                        self.saver.save(self.sess, save_path, global_step=step)
                    verbose = "Checkpoint created after %d steps"%(step + 1)
                    ckpt.add_verbose(verbose)
                    print(verbose)

                model_dir = os.path.join(self.cfg.expt_dir, self.cfg.expt_name)
                if not os.path.exists(model_dir):
                    os.mkdir(model_dir)
                ckpt.save(model_dir)

            if step == self.cfg.max_steps - 1:
                curr_w, curr_b = self.get_weight_val('DenseLayer_1')
                init_w, init_b = self.initial_weights[0]
                diff_w = curr_w - init_w
                self.diff_eigvals = np.linalg.eigvals(diff_w.dot(diff_w.T))
                self.final_eigvals = np.linalg.eigvals(curr_w.dot(curr_w.T)) 
                _combined = np.stack([self.diff_eigvals, self.final_eigvals], 
                                     axis=0).T
                np.savetxt(os.path.join(model_dir, 'eigen_vals.txt'), _combined)

    def test(self, n_batches):
        with self.comp_graph.as_default():
            print("\nTesting model on test dataset ...")
            tik = time.time()
            accuracy = 0.0
            for data_idx in range(n_batches):
                feed_dict = {self.is_training: False, 
                             self.fixed_order_inputs: False}
                feed_dict.update(self.feed_dict)
                pred, gt = self.sess.run([self.pred_output, self.test_gt],
                                         feed_dict=feed_dict)
                num_correct = np.sum(np.where(pred == gt, 1, 0))
                acc = num_correct / len(pred)
                accuracy += acc;
            accuracy = accuracy * 100.0 / n_batches
            tok = time.time()
            print("Accuracy on test data = {:.2f}%".format(accuracy))
            time_taken = float(tik - tok)
            print("Time taken for testing = {:.3f}sec\n".format(time_taken))

    def score(self, test_images, test_labels):
        with self.comp_graph.as_default():
            feed_dict = {self.input_batch: test_images, 
                         self.gt_batch: test_labels, 
                         self.is_training: False, 
                         self.fixed_order_inputs: False}
            feed_dict.update(self.feed_dict)
            pred = self.sess.run(self.pred_output, feed_dict=feed_dict)
            num_correct = np.sum(np.where(pred == test_lables, 1, 0))
            test_accuracy = num_correct / len(pred)
        return test_accuracy

    def predict(self, inputs, one_hot_output=False):
        with self.comp_graph.as_default():
            feed_dict = {self.test_inputs: inputs, self.is_training: False, 
                         self.fixed_order_inputs: False}
            feed_dict.update(self.feed_dict)
            out = self.sess.run(self.pred_output, feed_dict=feed_dict)
            return out

    def load(self, model_dir):
        """ Loads the model stored at the latest check point in the given 
        directory 
        """
        with self.comp_graph.as_default():
            latest_checkpoint = tf.train.latest_checkpoint(model_dir)
            saver = tf.train.Saver()
            self.sess.run(tf.local_variables_initializer())
            saver.restore(self.sess, latest_checkpoint)

    def get_layer_val(self, inputs, layer_idx, unit_idx=0):
        assert (layer_idx < len(self.cfg.layer_sizes))
        with self.comp_graph.as_default():
            feed_dict = {self.test_inputs: inputs, self.is_training: False,
                         self.fixed_order_inputs: False}
            feed_dict.update(self.feed_dict)
            layer_output = self.sess.run(self.layer_outputs[layer_idx], 
                                         feed_dict=feed_dict)

        if isinstance(unit_idx, int):
            return layer_output[:, unit_idx]
        elif isinstance(unit_idx, tuple) or isinstance(unit_idx, list):
            assert(len(unit_idx) == 2)
            assert (unit_idx[0] >= 0 and unit_idx[0] < unit_idx[1] and 
                    unit_idx[1] < len(layer_output))
            return layer_output[: ,unit_idx[0]: unit_idx[1]]

        raise TypeError(("Wrong type of argument `unit_idx` passed to", 
                        "`get_layer_val` function.\n Expected either an int or",
                        "a list or tuple of size 2"))
        return 

    def get_weight_val(self, layer_name):
        kernel_val, bias_val = None, None
        kernel_val = self.comp_graph\
                         .get_tensor_by_name(layer_name + '/kernel:0')\
                         .eval(session=self.sess)
        bias_val = self.comp_graph\
                       .get_tensor_by_name(layer_name + '/bias:0')\
                       .eval(session=self.sess)
        return kernel_val, bias_val


## Configurations

In [0]:

class Configuration:
    """ Class containing the parameters that design each experiment.
    """
    def __init__(self):
        self.expt_name = 'Experiment1'
        self.expt_dir = 'drive/My Drive/CS7020_MiniProject'
        self.task = 'classification'

        # Dataset properties
        self.n_train_samples = 1000
        self.n_test_samples = 1000
        self.n_classes = 10
        self.n_comp = 2
        self.n_dim = 100

        # Architecture design
        self.layer_sizes = [self.n_dim, 5000, self.n_classes]
        self.activation_fn = 'relu'
        self.loss_fn = 'sparse_softmax_cross_entropy_with_logits'

        # Training configuration
        self.trainable_output_weights = False
        self.learning_rate = 10.0 / self.layer_sizes[1]
        self.batch_size = 16
        self.max_steps = 400
        self.optimizer = 'sgd'
        self.test = False

        self.log_step = 5
        self.ckpt_step = 50
        self.save_model = True

        self.comments = "Testing with hamming distance thresh 0.20"

    def process(self):
        # activation function
        if hasattr(tf.nn, self.activation_fn):
            self.activation_fn = getattr(tf.nn, self.activation_fn)
        elif hasattr(tf.math, self.activation_fn):
            self.activation_fn = getattr(tf.math, self.activation_fn)

        # loss function
        if hasattr(tf.losses, self.loss_fn):
            self.loss_fn = getattr(tf.losses, self.loss_fn)
        elif hasattr(tf.math, self.loss_fn):
            self.loss_fn = getattr(tf.math, self.loss_fn)
        elif hasattr(tf.nn, self.loss_fn):
            self.loss_fn = getattr(tf.nn, self.loss_fn)

        # optimizer
        if self.optimizer.lower() == 'sgd':
            self.optimizer = 'GradientDescent'
        self.optimizer += 'Optimizer'
        if hasattr(tf.train, self.optimizer):
            self.optimizer = getattr(tf.train, self.optimizer)
        else:
            print("%s optimizer doesnt exist in tensorflow"%self.optimizer)

    def save(self):
        model_dir = os.path.join(self.expt_dir, self.expt_name)
        if not os.path.exists(model_dir):
            os.mkdir(model_dir)
        lst = []
        for keys, vals in Configuration().__dict__.items():
            curr_line = keys + " : {}".format(vals)
            lst.append(curr_line)
        np.savetxt(os.path.join(model_dir, 'config.txt'), lst, fmt='%s',
                   delimiter='\n')



## Training & Testing

In [8]:
config = Configuration()
config.process()
config.save()
model = Model(config)
model.train()


Instructions for updating:
Use `tf.variables_initializer` instead.
[Step 1/400] Training loss = 2.69164   (Time taken = 0.072sec) (Time taken for logging = -0.894sec)
[Step 6/400] Training loss = 1.32467   (Time taken = 0.271sec) (Time taken for logging = -0.765sec)
[Step 11/400] Training loss = 0.66234   (Time taken = 0.270sec) (Time taken for logging = -0.756sec)
[Step 16/400] Training loss = 0.52956   (Time taken = 0.265sec) (Time taken for logging = -0.714sec)
[Step 21/400] Training loss = 0.32590   (Time taken = 0.264sec) (Time taken for logging = -0.710sec)
[Step 26/400] Training loss = 0.24289   (Time taken = 0.273sec) (Time taken for logging = -0.745sec)
[Step 31/400] Training loss = 0.14602   (Time taken = 0.282sec) (Time taken for logging = -0.742sec)
[Step 36/400] Training loss = 0.13372   (Time taken = 0.270sec) (Time taken for logging = -0.736sec)
[Step 41/400] Training loss = 0.10619   (Time taken = 0.270sec) (Time taken for logging = -0.735sec)
[Step 46/400] Training los