# Online Local Adaptive Model - Notebook 2

* Prior Probability Shift is one of the common problems encountered in Machine Learning algorithms.   
* There are some approaches for dealing with this problem in a 'static' scenario. But there are situations in which we need a model which deals with secvential data as input (e.g. a server which gets input from different users, with different data distributions).   
* In this project, we try to build a model which self adapts its predictions based on the local label distribution. 

### About notebook 2

In this notebook we ilustrate an example of how a different test distribution has an impact on the model's performance.
We train multiple models, on a range of subsets of MNIST, with different distributions. Then each model is tested on a range of test subsets with respect to the distributions considered in the training phase.

## Section 1 - Notebook setup and data preparation

### Notebook setup

In [None]:
from IPython.core.display import display, HTML
from IPython.display import Image
display(HTML("<style>.container { width:100% !important; }</style>"))
%matplotlib inline
# %matplotlib qt
%load_ext autoreload
%autoreload 2

### Imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
import time
from collections import deque
import os
import pickle
from training_plotter import TrainingPlotter
from dataset import MNISTDataset
from utils import Utils
from lenet5 import Lenet5

# numpy print options
np.set_printoptions(linewidth = 150)
np.set_printoptions(edgeitems = 10)
np.set_printoptions(precision=3)
pd.set_option('display.precision', 3)

### Set seed

In [None]:
# create a random generator using a constant seed in order to reproduce results
seed = 112358
nprg = np.random.RandomState(seed)

### Import MNIST dataset

In [None]:
MNIST_TRAIN_IMAGES_FILEPATH = 'MNIST_dataset/train-images.idx3-ubyte'
MNIST_TRAIN_LABELS_FILEPATH = 'MNIST_dataset/train-labels.idx1-ubyte'
MNIST_TEST_IMAGES_FILEPATH = 'MNIST_dataset/t10k-images.idx3-ubyte'
MNIST_TEST_LABELS_FILEPATH = 'MNIST_dataset/t10k-labels.idx1-ubyte'

mnist_ds = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)


### Use a subset of  MNIST dataset

In [None]:
mnist_subset = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
subset_size = 10000
mnist_subset.impose_distr_on_train_dataset(subset_size=subset_size, weights = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1])
print(np.sum(mnist_subset.train.images))
print(mnist_subset.summary)

### Data augmentation

In [None]:
mnist_ds_aug = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
mnist_ds_aug.enhance_with_random_rotate(ratio = 2)

In [None]:
mnist_ds_aug.enhance_with_random_zoomin(ratio = 2)

In [None]:
mnist_ds_aug.enhance_with_random_zoomin_and_rotate(ratio = 2)

## Section 2 - Train and test models on subsets with different distributions

#### Considering some subsets with different class distributions, we want to use the same amount of training data

We will train 10 * num_distributions_considered as follows:
- consider a set of n distributions used for generating subsets of original dataset:  


\begin{equation}
\{weights_1,weights_2,...,weights_n\}
\text{, where } 
weights_k = \{weight_{class_0},weight_{class_1},...,weight_{class_9}\}
\end{equation}    

- than, for each distribution we'll consider another 9 distributions by circular shifting the original one:
\begin{equation}
\text{For the } k^{th} \text{ distribution, we'll also consider:}\\
\{weight_{class_1},weight_{class_2},...,weight_{class_9},weight_{class_0}\}, 
\{weight_{class_2},weight_{class_3},...,weight_{class_0},weight_{class_1}\}, 
\dots, 
\{weight_{class_9},weight_{class_0},...,weight_{class_7},weight_{class_8}\} \\
\end{equation}    

In order to keep the number of examples constant for every distributions considered, we take into account that the 'worst' case (i.e. when we have to use the lowest number of examples) happens when the lowest weight value (from all distributions considered) will correspond to the smallest bin =>

\begin{equation}
\text{Considering }\ count_{min} = min\{{counts_{class_i}}|\  i = \overline{0,9}\} \text{ and }\\
\ weight_{max} =  max\{{weight_{class_i}} \ of \ weights_k \ |\  i = \overline{0,9}, k = \overline{1,n} \} \ \  \text{(i.e. maximum weight from all n sets of weights)} \\
num\_examples = \sum_{i=0}^{i=9} count_{min}* weight_{class_i} * \frac{1}{weight_{max}} =  \frac{counts_{min}}{weight_{max}} \sum_{i=0}^{i=9} weight_{class_i} = \frac{counts_{min}}{weight_{max}} \\
\end{equation}

### 1. Build some label distributions

In [None]:
# Generate some distributions using geometric progressions
ratios = [1,1.05,1.1,1.2,1.4,2,4]
perms_per_ratio = 5
generated_distrs = []
for idx, ratio in enumerate(ratios):
    distr = [0.1]
    for i in range(9):
        distr.append(distr[i] * ratio)
    distr = np.array(distr)
    distr /= np.sum(distr)
    print('generated_distr[{}]:'.format(idx))
    print(', '.join([str(np.round(x, decimals=3)) for x in distr]))
#     plt.bar(range(10),distr)
#     plt.show()
    generated_distrs.append(distr.tolist())
    
# Select some of the above distribution and build others by circular shifting
base_distrs = np.array([generated_distrs[0],
                       generated_distrs[2],
                      generated_distrs[4]])
num_distributions_considered = base_distrs.shape[0]
print('\nBase distributions considered:\n', base_distrs)
for distr in base_distrs:
    print(distr)
    plt.bar(range(10),distr)
    plt.show()
    
# check the maximum number of examples that can be used using the above rule
counts_per_label_training = np.bincount(np.argmax(mnist_ds.train.labels, axis=1))
counts_per_label_test = np.bincount(np.argmax(mnist_ds.test.labels, axis=1))
print('Counts for each digit (training): ', counts_per_label_training)
print('Counts for each digit (test): ', counts_per_label_test)
counts_min_training = np.min(counts_per_label_training)
counts_min_test = np.min(counts_per_label_test)
weight_max = np.max(base_distrs)
max_num_examples_training = np.floor(counts_min_training / weight_max).astype(np.int32)
max_num_examples_test = np.floor(counts_min_test / weight_max).astype(np.int32)
print('max_num_examples for training = ', max_num_examples_training)
print('max_num_examples for test = ', max_num_examples_test)
# round to hundreds
max_num_examples_training -= max_num_examples_training % 100
max_num_examples_test -= max_num_examples_test % 100
print('max_num_examples_training (rounded to thousands): ',max_num_examples_training)
print('max_num_examples_test (rounded to thousands): ',max_num_examples_test)

# build subsets w.r.t. a distribution, with max_num_examples
distrs_used_for_training = []
for base_distr in base_distrs:
    distr = deque(base_distr)
    for i in range(10):
        print(distr)
        distrs_used_for_training.append(distr.copy())
        last_distr = distr.copy()
        distr.rotate(1)
        if np.sum(np.abs(np.array(last_distr) - np.array(distr))) < 1.e-5:
            break
print('#distributions used for training = {}'.format(len(distrs_used_for_training)))

In [None]:
# Build some distributions by hand
distrs_used_for_training = []

# uniform distribution
distr = np.array([1,1,1,1,1,1,1,1,1,1])
distrs_used_for_training.append(distr/np.sum(distr))
# normal distribution centered about label 4-5
r = 2
distr = [r**1,r**2,r**3,r**4,r**5,r**5,r**4,r**3,r**2,r**1]
distrs_used_for_training.append(distr/np.sum(distr))

# skewed normal distribution centered about 2
distr = [r**3,r**4,r**5,r**4.5,r**4,r**3.5,r**3,r**2.5,r**2,r**1.5]
distrs_used_for_training.append(distr/np.sum(distr))

# skwed normal distribution centered about 7
distr = [r**1.5,r**2,r**2.5,r**3,r**3.5,r**4,r**4.5,r**5,r**4,r**3]
distrs_used_for_training.append(distr/np.sum(distr))

# bimodal normal distribution
distr = [r**1,r**2,r**3,r**2,r**1,r**1,r**2,r**3,r**2,r**1]
distrs_used_for_training.append(distr/np.sum(distr))

# bimodal skewed normal distribution
distr = [r**3.5,r**4,r**3,r**2,r**1,r**1,r**2,r**3,r**4,r**3.5]
distrs_used_for_training.append(distr/np.sum(distr))


# exponential distribution
r=1.4
distr = [r**1,r**2,r**3,r**4,r**5,r**6,r**7,r**8,r**9,r**10]
distrs_used_for_training.append(distr/np.sum(distr))

# exponential distribution
r=1.4
distr = [r**10,r**9,r**8,r**7,r**6,r**5,r**4,r**3,r**2,r**1]
distrs_used_for_training.append(distr/np.sum(distr))

print('#distributions used for training = {}'.format(len(distrs_used_for_training)))
for idx, distr in enumerate(distrs_used_for_training):
    print('idx = {}: distr = {}'.format(idx,distr))
    plt.bar(range(10), distr)
    plt.show()

### 2. Train LeNet5 models by imposing the considered distributions on the original MNIST dataset

In [None]:
global_max_weight = np.max(distrs_used_for_training)
mnist_ds.backup()
for k, distr in enumerate(distrs_used_for_training):
    print('\n\nk = {}: Imposed distribution: {}'.format(k, np.round(np.array(distr), decimals=3)))
    mnist_ds = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
    mnist_ds.impose_distribution(np.array(distr), global_max_weight)
    lenet5_model = Lenet5(mnist_ds, "with_imposed_distr_{}_".format(k), epochs=40, batch_size=256, variable_mean=0, variable_stddev=0.1, learning_rate=0.001, drop_out_keep_prob=0.5)
    lenet5_model.train()

#### Use a subset of  MNIST dataset

In [None]:
subset_size = 10000
global_max_weight = np.max(distrs_used_for_training)
for k, distr in enumerate(distrs_used_for_training):
    mnist_ds = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
#     mnist_ds.train.shuffle()
    print('\n\nk = {}: Imposed distribution: {}'.format(k, np.round(np.array(distr), decimals=3)))
    mnist_ds.impose_distribution(np.array(distr), global_max_weight, max_training_size=subset_size)
    lenet5_model = Lenet5(mnist_ds, "with_imposed_distr_{}_{}samples".format(k, subset_size), epochs=40, batch_size=256, variable_mean=0, variable_stddev=0.1, learning_rate=0.001, drop_out_keep_prob=0.5)
    lenet5_model.train()

In [None]:
SUBSET_SIZE_LIST = [150, 250, 500, 1000, 5000, 10000]
global_max_weight = np.max(distrs_used_for_training)

for subset_size in SUBSET_SIZE_LIST:
    for k, distr in enumerate(distrs_used_for_training):
        mnist_ds = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
        print('\n\nk = {}: Imposed distribution: {}'.format(k, np.round(np.array(distr), decimals=3)))
        mnist_ds.impose_distribution(np.array(distr), global_max_weight, max_training_size=subset_size)
        lenet5_model = Lenet5(mnist_ds, "with_imposed_distr_{}_{}samples".format(k, subset_size), epochs=40, batch_size=50, variable_mean=0, variable_stddev=0.1, learning_rate=0.001, drop_out_keep_prob=0.75)
        lenet5_model.train()
        plt.show()

### 3. Test the train models again on their test data and on data which respects the other distributions

In order to summarize the results, we will build a matrix, similar to a confusion matrix, as following:
- take each trained model and test it again on his initial test set
- then test each model on the data that respects, by turn, the other distributions considered
- save these results into a dictionary and then to disk for reuse them later
- finally, build a matrix where an element $m_{ij}$ represents the test accuracy of model $i$ evaluated on the subset with distribution $j$


In [None]:
def get_all_files_from_dir_ending_with(directory, ending, without_file_extension = False):
    file_list = []
    files = os.listdir(directory)
    files.sort(key=lambda fn: os.path.getmtime(os.path.join(directory, fn))) # sort by date
    for file in files:
        if file.endswith(ending):
            if without_file_extension:
                file_list.append(os.path.splitext(file)[0])
            else:
                file_list.append(file)
    return file_list

In [None]:
# ckpt_dir = "./results/PriorProbabilityShift_experiment_5_augmented/"
ckpt_dir = "./results/PriorProbabilityShift_experiment_5_10000samples/"
# ckpt_dir = "./results/PriorProbabilityShift_experiment_5_5000samples/"
ckpt_file_list = get_all_files_from_dir_ending_with(ckpt_dir, "ckpt.meta", without_file_extension=True)
perf_dict = {'idx_model':[], 'idx_distr':[], 'test_loss':[], 'test_acc':[], 'total_predict':[], 'total_actual':[], 'correct_predicted_distr':[], 'wrong_predicted_distr':[], 'wrong_actual_distr':[], 'train_distr':[], 'test_distr':[], 'ckpt_file':[]}

# build a list with all ditributions considered in training phase
distrs_used_for_training = []
for idx_model, ckpt_file in enumerate(ckpt_file_list):
    print('Restoring model {} from {}'.format(idx_model, ckpt_file))
    temp_model = Lenet5(mnist_dataset=mnist_ds, display_summary=False)
    temp_model.restore_session(ckpt_dir=ckpt_dir, ckpt_filename=ckpt_file)
    current_model_train_distr = temp_model.session.run(temp_model.train_distr)
    distrs_used_for_training.append(current_model_train_distr)   
    print('The restored model {} was trained using distr: {}\n'.format(idx_model, current_model_train_distr))

# test each model on each of the above distributions
print('\n\n\n--- test each model on all the above distributions ---\n')
for idx_model, ckpt_file in enumerate(ckpt_file_list):
    for idx_distr, distr in enumerate(distrs_used_for_training):
        # reload every time the original dataset in order to ensure that we build the subset starting from the same point
        mnist_ds = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
#         mnist_ds.test.shuffle()
#         mnist_ds.test.shuffle()
#         mnist_ds.test.shuffle()
        temp_model = Lenet5(mnist_ds, display_summary=False)
        temp_model.restore_session(ckpt_dir=ckpt_dir, ckpt_filename=ckpt_file)
        restored_train_distr = temp_model.session.run(temp_model.train_distr)
        print('Restoring model from {}'.format(ckpt_file))
        temp_model.restore_session(ckpt_dir=ckpt_dir, ckpt_filename=ckpt_file)
        print('train_distr: {}'.format(restored_train_distr))
        print('test_distr: {}'.format(distr))
        mnist_ds.impose_distribution(np.array(distr), np.max(distrs_used_for_training))
        test_loss, test_acc, total_predict, total_actual, wrong_predict_images, _= temp_model.test_data(mnist_ds.test, use_only_one_batch=True)
        perf_dict['idx_model'].append(idx_model)
        perf_dict['idx_distr'].append(idx_distr)
        perf_dict['test_loss'].append(test_loss)
        perf_dict['test_acc'].append(test_acc)
        perf_dict['total_predict'].append(total_predict)
        perf_dict['total_actual'].append(total_actual)
        correct_predict = total_predict[total_actual == total_predict]
        wrong_predict = total_predict[total_actual != total_predict]
        wrong_actual = total_actual[total_actual != total_predict]
        perf_dict['correct_predicted_distr'].append(np.histogram(correct_predict)[0])
        perf_dict['wrong_predicted_distr'].append(np.histogram(wrong_predict)[0])
        perf_dict['wrong_actual_distr'].append(np.histogram(wrong_actual)[0])
        perf_dict['train_distr'].append(distr)
        perf_dict['test_distr'].append(mnist_ds.test.label_distr)
        perf_dict['ckpt_file'].append(ckpt_file)
        print('idx_model = {}, idx_distr = {}: test_loss = {:.3f}, test_acc = {:.3f} ({}/{})\n\n\n'.format(idx_model, idx_distr, test_loss, test_acc, mnist_ds.test.num_examples - len(wrong_predict_images), mnist_ds.test.num_examples))
        
# save the above results dictionary to file
filename = 'results_{}.dict.pickle'.format(Utils.now_as_str())
full_filepath = os.path.join(ckpt_dir, filename)
filehandler = open(full_filepath, 'wb') 
pickle.dump(perf_dict, filehandler)
print('Results dictionary was succesfully saved to: {}'.format(full_filepath))
filehandler.close()


In [None]:
# restore and analyze the results from the above dictionary obtained by testing the models on all distributions
# work_dir = './results/PriorProbabilityShift_experiment_5/'
# filename = 'results_2018_03_18---20_56.dict.pickle'
# filename = 'results_2018_03_20---12_33.dict.pickle' # after 1 test dataset shuffles 
# filename = 'results_2018_03_20---12_42.dict.pickle' # after 2 test dataset shuffles
# filename = 'results_2018_03_20---12_55.dict.pickle' # after 3 test dataset shuffles


# work_dir = './results/PriorProbabilityShift_experiment_5_2/'
# filename = 'results_2018_03_20---14_28.dict.pickle'


# work_dir = './results/PriorProbabilityShift_experiment_5_3/'
# filename = 'results_2018_03_20---23_26.dict.pickle'

# work_dir = './results/PriorProbabilityShift_experiment_5_augmented/'
# filename = 'results_2018_03_21---11_54.dict.pickle'
# filename = 'results_2018_05_08---00_42.dict.pickle'

work_dir = './results/PriorProbabilityShift_experiment_5_10000samples/'
# filename = 'results_2018_05_07---11_38.dict.pickle'
filename = 'results_2018_05_08---01_03.dict.pickle'

# work_dir = './results/PriorProbabilityShift_experiment_5_5000samples/'
# filename = 'results_2018_05_07---12_17.dict.pickle'


filehandler = open(os.path.join(work_dir,filename), 'rb') 
restored_perf_dict = pickle.load(filehandler)
filehandler.close()
print('Results dictionary was succesfully restored from: {}'.format(filename))

# build a pandas dataframe from results dictionary
perf_df = pd.DataFrame(restored_perf_dict, columns=list(restored_perf_dict.keys()))
display(perf_df.describe())
display(perf_df)

# build and plot the accuracy comparison matrix
L = np.sqrt(len(perf_df)).astype(np.int32)
acc_matrix = np.array(perf_df['test_acc']).reshape((L, L))
distrs_used_for_training = perf_df['train_distr'][perf_df['idx_model'] == perf_df['idx_distr']]
acc_matrix_plt = Utils.plot_acc_matrix(train_distributions=distrs_used_for_training, acc_matrix=acc_matrix)
acc_matrix_plt.savefig(os.path.join(work_dir,filename + '.acc_matrix.png'))

# build and plot distributions corresponding to wrong predictions
wrong_predicted_distr_matrix = np.array(perf_df['wrong_predicted_distr']).reshape((L, L))
distrs_used_for_training = perf_df['train_distr'][perf_df['idx_model'] == perf_df['idx_distr']]
acc_matrix_plt = Utils.plot_acc_matrix(train_distributions=distrs_used_for_training, acc_matrix=acc_matrix, distr_matrix=wrong_predicted_distr_matrix)
acc_matrix_plt.savefig(os.path.join(work_dir,filename + '.wrong_predictions_distr_matrix.png'))

# build and plot distributions corresponding to correct predictions
wrong_predicted_distr_matrix = np.array(perf_df['correct_predicted_distr']).reshape((L, L))
distrs_used_for_training = perf_df['train_distr'][perf_df['idx_model'] == perf_df['idx_distr']]
acc_matrix_plt = Utils.plot_acc_matrix(train_distributions=distrs_used_for_training, acc_matrix=acc_matrix, distr_matrix=wrong_predicted_distr_matrix)
acc_matrix_plt.savefig(os.path.join(work_dir,filename + '.correct_predictions_distr_matrix.png'))

# build and plot distributions corresponding to wrong actual predictions
wrong_predicted_distr_matrix = np.array(perf_df['wrong_actual_distr']).reshape((L, L))
distrs_used_for_training = perf_df['train_distr'][perf_df['idx_model'] == perf_df['idx_distr']]
acc_matrix_plt = Utils.plot_acc_matrix(train_distributions=distrs_used_for_training, acc_matrix=acc_matrix, distr_matrix=wrong_predicted_distr_matrix)
acc_matrix_plt.savefig(os.path.join(work_dir,filename + '.wrong_actual_predictions_distr_matrix.png'))


In [None]:
# Analyze average acc matrix
work_dir = './results/PriorProbabilityShift_experiment_5/'
dict_file_list = get_all_files_from_dir_ending_with(work_dir, "dict.pickle")
acc_matrices_list = [] # in order to make a comparison between results
plot = False
for dict_file in dict_file_list:
    # restore the results dictionary
    filename = os.path.join(work_dir, dict_file)
    filehandler = open(filename, 'rb') 
    restored_perf_dict = pickle.load(filehandler)
    filehandler.close()
    print('Results dictionary was succesfully restored from: {}'.format(filename))
    
    # build a pandas dataframe from results dictionary
    perf_df = pd.DataFrame(restored_perf_dict, columns=list(restored_perf_dict.keys()))
    display(perf_df.describe())
    display(perf_df)
    
    # build a matrix with all test accuracies and plot it
    print(len(perf_df))
    L = np.sqrt(len(perf_df)).astype(np.int32)
    acc_matrix = np.array(perf_df['test_acc']).reshape((L, L))
    acc_matrices_list.append(acc_matrix)
    if plot:
        acc_matrix_plt = Utils.plot_acc_matrix(train_distributions=distrs_used_for_training, acc_matrix=acc_matrix)
        acc_matrix_plt.savefig('{}.acc_matrix.png'.format(filename))

acc_matrices = np.array(acc_matrices_list)
avg_acc_matrix = np.average(acc_matrices, axis = 0)
acc_matrix_plt = Utils.plot_acc_matrix(train_distributions=distrs_used_for_training, acc_matrix=avg_acc_matrix)
acc_matrix_plt.savefig(os.path.join(work_dir, 'average_acc_matrix.png'))


### 4. Test the trained models on the entire MNIST test set

In [None]:
# compare the first column of the above accuracy matrix with accuracies obtained by testing the models on the entire MNIST dataset
# work_dir = './results/PriorProbabilityShift_experiment_5/'
work_dir = './results/PriorProbabilityShift_experiment_5_10000samples/'
ckpt_file_list = get_all_files_from_dir_ending_with(work_dir, "ckpt.meta", without_file_extension=True)
perf_dict = {'idx_model':[], 'idx_distr':[], 'test_loss':[], 'test_acc':[], 'total_predict':[], 'total_actual':[], 'train_distr':[], 'test_distr':[], 'ckpt_file':[]}
for idx_model, ckpt_file in enumerate(ckpt_file_list):
    mnist_ds = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
    temp_model = Lenet5(mnist_ds, display_summary=False)
    print('Restoring model from {}'.format(ckpt_file))
    temp_model.restore_session(ckpt_dir=work_dir, ckpt_filename=ckpt_file)
    current_model_train_distr = temp_model.session.run(temp_model.train_distr)
    print('train_distr: {}'.format(current_model_train_distr))
    test_loss, test_acc, total_predict, total_actual, wrong_predict_images, _= temp_model.test_data(mnist_ds.test, use_only_one_batch=True)
    perf_dict['idx_model'].append(idx_model)
    perf_dict['idx_distr'].append(-1)
    perf_dict['test_loss'].append(test_loss)
    perf_dict['test_acc'].append(test_acc)
    perf_dict['total_predict'].append(total_predict)
    perf_dict['total_actual'].append(total_actual)
    perf_dict['train_distr'].append(current_model_train_distr)
    perf_dict['test_distr'].append(mnist_ds.test.label_distr)
    perf_dict['ckpt_file'].append(ckpt_file)
    print('idx_model = {}, idx_distr = {}: test_loss = {:.4f}, test_acc = {:.4f} ({}/{})\n\n\n'.format(idx_model, -1, test_loss, test_acc, mnist_ds.test.num_examples - len(wrong_predict_images), mnist_ds.test.num_examples))
        
# save the above results dictionary to file
filename = 'results_from_testing_on_the_entire_testset.dict2.pickle'
filehandler = open(os.path.join(work_dir,filename), 'wb') 
pickle.dump(perf_dict, filehandler)
print('Results dictionary was succesfully saved to: {}'.format(filename))
filehandler.close()

In [None]:
# restore and analyze the results dictionary obtained by testing the models on the entire testset
# work_dir = './results/PriorProbabilityShift_experiment_5/'
work_dir = './results/PriorProbabilityShift_experiment_5_10000samples/'
filename = 'results_from_testing_on_the_entire_testset.dict2.pickle'
filehandler = open(os.path.join(work_dir,filename), 'rb')
restored_perf_dict = pickle.load(filehandler)
filehandler.close()
print('Results dictionary was succesfully restored from: {}{}'.format(work_dir, filename))

# build a pandas dataframe from results dictionary
perf_df = pd.DataFrame(restored_perf_dict, columns=list(restored_perf_dict.keys()))
display(perf_df.describe())
display(perf_df)

###### Test the models on the entire MNIST test set

In [None]:
def restore_and_test_a_model_on_a_mnist_subset(mnist_subset, ckpt_dir, ckpt_filemame, plot_filename):
    print('Restoring model from {}{}'.format(ckpt_dir, ckpt_filemame))
    restored_model = Lenet5(mnist_subset,display_summary=False)
    restored_model.restore_session(ckpt_dir=ckpt_dir, ckpt_filename=ckpt_filemame)
    train_distr = restored_model.session.run(restored_model.train_distr)
    test_loss, test_acc, total_predict, total_actual, wrong_predict_images, total_softmax_output_probs = restored_model.test_data(mnist_subset.test)

    print('test_loss = {:.4f}, test_acc = {:.4f} ({}/{})'.format(test_loss, test_acc, mnist_subset.test.num_examples - len(wrong_predict_images), mnist_subset.test.num_examples))
    
    # sort wrong_predict_images by target label
    correct_predict = total_predict[total_actual == total_predict]
    wrong_predict = total_predict[total_actual != total_predict]
    wrong_predict_softmax_output_probs = total_softmax_output_probs[total_actual != total_predict]
    wrong_actual = total_actual[total_actual != total_predict]
    wrong_predict_images = np.array(wrong_predict_images)
    wrong_predict_images_sorted = wrong_predict_images[wrong_actual.argsort(), ]
    wrong_predict_images_sorted = [image for image in wrong_predict_images_sorted]

    count_figures = 6
    fig = plt.figure(figsize=(30, 3))
    fig.suptitle(y = 1.1, t = 'test_acc = {:.4f} ({}/{})'.format(test_acc, mnist_subset.test.num_examples - len(wrong_predict_images), mnist_subset.test.num_examples), fontsize=18, fontweight='bold')

    k = 1
    plt.subplot(1,count_figures, k)
    plt.bar(range(10), train_distr)
    plt.xticks(range(0, 10))
    plt.title('train label distr')
    
    k+=1
    plt.subplot(1,count_figures, k)
    plt.bar(range(10), mnist_subset.test.label_distr)
    plt.xticks(range(0, 10))
    plt.title('test label distr')

    k+=1
    plt.subplot(1,count_figures, k)
    plt.hist(correct_predict, bins=np.arange(11), rwidth=0.8, normed=False)
    plt.xticks(range(0, 10))
    plt.title('correct predicted label distr')
    
    k+=1
    plt.subplot(1,count_figures, k)
    plt.hist(wrong_predict, bins=np.arange(11), rwidth=0.8, normed=False)
    plt.xticks(range(0, 10))
    plt.title('wrong predicted label distr')
    
    k+=1
    plt.subplot(1,count_figures, k)
    plt.hist(wrong_actual, bins=np.arange(11), rwidth=0.8, normed=False)
    plt.xticks(range(0, 10))
    plt.title('wrong actual label distr')
    
    k+=1
    plt.subplot(1,count_figures, k)
    plt.bar(range(0, 10), np.average(wrong_predict_softmax_output_probs, axis=0))
    plt.xticks(range(0, 10))
    plt.title('average of wrong actual softmax output probabilities')

    plt.savefig(os.path.join(ckpt_dir, plot_filename))
    plt.show()


In [None]:
# ckpt_dir = './results/PriorProbabilityShift_experiment_5/'
ckpt_dir = './results/PriorProbabilityShift_experiment_5_10000samples/'
ckpt_file_list = get_all_files_from_dir_ending_with(ckpt_dir, "ckpt.meta", without_file_extension=True)
for idx, ckpt_file in enumerate(ckpt_file_list):
    mnist_ds = MNISTDataset(MNIST_TRAIN_IMAGES_FILEPATH, MNIST_TRAIN_LABELS_FILEPATH, MNIST_TEST_IMAGES_FILEPATH, MNIST_TEST_LABELS_FILEPATH)
    # test on all original data distribution, without imposing any distribution
    restore_and_test_a_model_on_a_mnist_subset(mnist_ds, ckpt_dir=ckpt_dir, ckpt_filemame=ckpt_file, plot_filename = 'tested_on_all_data_{}'.format(idx))

Obs. In many situations, wrong predicted distributions are correlated to train distributions (especially when train distribution is very skewed). So the model tends to predict based what it has seen the most during the training.