## Training overview:
There are 5 different ways you can run training: 
1. Without Domain Adaptation (noDA) - training only on the source data, but testing on source and target
2. With Domain Adaptation (DA) using Maximum Mean Discrepancy (MMD) as transfer loss.
3. With Domain Adaptaton using adversarial loss from a DANN as transfer loss (ADA).
4. With MMD as transfer loss, with addition of Fisher loss and Entropy minimization loss to make classes more compact and separated (DA+Fisher)
5. With adversarial loss and Fisher and Entropy (ADA+Fisher)

Additionaly MMD and ADA can be trained with transfer learning by loading pretrained weights of some model.

## Files overview:
To run the training use the following files from GitHub (galaxy_merge_edits folder):
1. For no domain adaptation use: no_domain_adaptation.py
2. For MMD (with and without Fisher) use: `train_MMD.py`
3. For ADA (with and without Fisher) use:` train_ADA.py`

To run evaluation of any of the trained models you use file: evaluation.py

Fisher loss and Entropy minimization are removed by adding argument `--fisher_or_no 'no'`. If this argument is not added default value is 'Fisher' and this runs training WITH Fisher loss and Entropy minimizaiton.

Transfer learning is added by including `--ckpt_path` argument when running training files. If ommited weights will be randomly initialized.

## Data overview:
Currently you can run these files for Illustris distant merging galaxies with and without observational noise (Simulation to Simulation) and for Illustris nearby merging galaxies and SDSS observations (Simulation to Real).

## Neural Network overview:
You can currently run training using:
1. ResNet - several sizes from 18 to 152. We are using ResNet18.
2. DeepMerge

## Outputs:
There are many different output files which you get after running training and evaluation:
1. `.txt` file looging different losses durin training
2. Tensorboard outputs so you can track training or look at losses afterwards
3. `best_model.pth.tar` file with weights for best performing model during training.
4. Additional folders with optional plots like tSNE, learning rate scan plots (to asses which learning rate to use for one-cycle learning), or Grad-CAMs.
5. Running evaluation outputs `.txt` file with confusion matrices and accuracies for classification of images in source and target domain test sets, so you can compare the performance.
6. Evaluation also creates two `.csv` files - one with true labels and predictions, and one with output probabilities for both classes.

The following cells show how to run training and evaluation and all arguments you need to add in order to run a particular type of experiment.

In [4]:
 # mount GoogleDrive
from google.colab import drive
drive.mount('/content/drive')

In [3]:
# move to the desired folder on your drive
%cd /content/drive/My Drive/Colab Notebooks/FisherDA
!ls

In [None]:
import torch
import numpy as np
from torch.utils.data import Dataset, TensorDataset, DataLoader
import torchvision.transforms as transform
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_confusion_matrix

%%capture --no-display
!pip3 install tensorboardX
!pip3 install tensorboard

In [None]:
#Function for plotting nice confusion matrices using the output values from txt file we get after evaluation 
#(numbers have to be added to this function manually at the moment)

def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """
    import matplotlib.pyplot as plt
    plt.rcParams.update({'font.size': 24})
    import numpy as np
    import itertools

    accuracy = (np.trace(cm) / float(np.sum(cm)))*100
    misclass = 100 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=0)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.2f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")


    plt.tight_layout()
    plt.ylabel('True label')
    #plt.xlabel('Predicted label\naccuracy={:0.2f}; misclass={:0.2f}'.format(accuracy, misclass))
    plt.xlabel('\n Predicted label \n \naccuracy={:0.1f}%\n misclass={:0.1f}%'.format(accuracy,misclass))
    plt.show()

First we have to run tensorboard so it will display progress in real time while we are training (usefull for long runs)

In [None]:
# load tensorboard to look at progress during training

%load_ext tensorboard
%tensorboard --logdir output_source_target

## Training without domain adaptation - noDA


In [None]:
# noDA
!python no_domain_adaptation.py --gpu_id 0 \
                              --net DeepMerge \ # which network to use (we also test ResNet18)
                              --dset 'galaxy' \ # name of our dataset
                              --dset_path 'arrays/' \ #location where the files are
                              --output_dir 'output_DeepMerge/' \ # folder where outputs will be saved
                              --source_x_file Xdata_source.npy \ # names of the source and target image and label files
                              --source_y_file ydata_source.npy \
                              --target_x_file Xdata_target.npy \
                              --target_y_file ydata_target.npy \
                              --ly_type cosine \ # distance calculation method between extracted feature (euclidean or cosine, we use cosine)
                              --one_cycle 'yes' \ # add in case we want leraning rate to change using one-cycle method (defaule value is None, so one-cycle will be omitted)
                              --cycle_length 5 \ # cycle length in epochs
                              --lr 0.001 \ # initial learning rate
                              --weight_decay 0.001 \ # parameter in the optimazier which corresponds to the strenght of L2 regularisation of weights
                              --epoch 200 \ # how many epoch to run
                              --early_stop_patience 20 \ # how many epoch are we willing to tolerate no improvements before we stop
                              --optim_choice 'Adam' \ # optimizer (we use Adam but there is also SGD option)
                              --seed 1 \ # which fixed random seed to use to shuffle images
                              --blobs 'yes' \ # do we want tSNE plots (if this argument is left out default value is None and no tSNEs will be produced)
                              --grad_vis 'yes' # do we want to output images to visualize gradients of our network? Only put this option if you want them. Default is no.

## Training with MMD

In [None]:
# Train DeepMerge with domain adaptation using MMD (with and without Fisher loss and Entropy minimizaiton)

!python train_MMD.py --gpu_id 0 \
                              --net DeepMerge \
                              --dset 'galaxy' \
                              --dset_path 'arrays/' \
                              --output_dir 'output_DeepMerge_SDSS/' \
                              --source_x_file Xdata_source.npy \ 
                              --source_y_file ydata_source.npy \
                              --target_x_file Xdata_target.npy \
                              --target_y_file ydata_target.npy \
                              --ly_type cosine \
                              --loss_type mmd \
                              --one_cycle 'yes' \
                              --cycle_length 5 \
                              --lr 0.001 \
                              --epoch 200 \
                              --early_stop_patience 20 \
                              --weight_decay 0.001 \
                              --optim_choice 'Adam' \
                              --seed 1 \
                              --trade_off 1.0 \ # weight that multiplies transfer loss (MMD) and dictates it's strength
                              --fisher_or_no 'Fisher' \ # Do we want Fisher loss and entropy minimization? Default is 'Fisher', but if we want to remove it we have to add this argument as 'no'
                              --em_loss_coef 0.05 \ # weight that multiplies entropy minimization loss
                              --inter_loss_coef 1.0 \ # weight that multiplies between class matrix of the Fisher loss
                              --intra_loss_coef 0.01 \ # weight that multiplies within class matrix of the Fisher loss
                              --blobs 'yes' \
                              --grad_vis 'yes'\
                              --ckpt_path 'output_DeepMerge_SDSS/' # add in case you want to load weights from a pretrained model located in that folder (we used this in case of MMD for Simulated to Real experiment). Default is argument value is None.

## Domain Adversarial training with DANNs - ADA



In [None]:
# Train DeepMerge with ADA (with and without Fisher loss and Entropy minimizaiton)

!python train_ADA.py --gpu_id 0 \
                              --net DeepMerge \
                              --dset 'galaxy' \
                              --dset_path 'arrays/' \
                              --output_dir 'output_DeepMerge/' \
                              --source_x_file Xdata_source.npy \ 
                              --source_y_file ydata_source.npy \
                              --target_x_file Xdata_target.npy \
                              --target_y_file ydata_target.npy \
                              --ly_type cosine \
                              --one_cycle 'yes' \
                              --cycle_length 8 \
                              --lr 0.001 \
                              --epoch 200 \
                              --early_stop_patience 20 \
                              --weight_decay 0.0001 \
                              --seed 1 \
                              --optim_choice 'Adam' \
                              --trade_off 1.0 \
                              --fisher_or_no 'Fisher' \
                              --em_loss_coef 0.05 \ 
                              --inter_loss_coef 1.0 \ 
                              --intra_loss_coef 0.01 \
                              --blobs 'yes' \
                              --grad_vis 'yes' \
                              --ckpt_path 'output_DeepMerge_SDSS/' # add in case you want to load weights from a pretrained model

# Note:

In case you want to run **with transfer learning** from a pretrained model you have to manually add fixed mean and std values for normalizartion (for images that the pretrained model was trained on). This is done in `import_and_nomralize.py`. Example that we used can be found commented out in the given file.

## Evaluation

In [None]:
# Evaluation of all experiments

!python evaluation.py --gpu_id 0 \
                --net DeepMerge \
                --dset 'galaxy' \
                --dset_path 'arrays' \
                --ly_type cosine \
                --ckpt_path 'output_DeepMerge' \ # where to load the trained model from
                --source_x_file Xdata_source.npy \ 
                --source_y_file ydata_source.npy \
                --target_x_file Xdata_target.npy \
                --target_y_file ydata_target.npy \
                --seed 1

Plotting nice Confusion Matrices

In [None]:
# Numbers in the matrix ahve to be added manually after you run evaluation

plot_confusion_matrix(cm           = np.array([[567, 52],
                                               [47, 534]]), 
                      normalize    = True,
                      target_names = ['Non-Merger', 'Merger'],
                      title        = "SOURCE noDA")

plot_confusion_matrix(cm           = np.array([[111, 479],
                                               [125, 485]]), 
                      normalize    = True,
                      target_names = ['Non-Merger', 'Merger'],
                      title        = "TARGET noDA")