# Further Exploration of AT-TPC Data

Now you will have the opportunity to further explore the Argon 46 data from the AT-TPC. This will be a much more open-ended opportunity for you to play with the data and try new things.

Before getting started, make sure you are using a GPU-enabled runtime in Google Colab. Go to "Runtime" $\rightarrow$ "Change runtime type", then make sure "GPU" is selected for "Hardware accelerator".

## Setup

This is where you can import any Python libraries that you may want to use.

In [5]:
import os

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import h5py

# This is simply an alias for convenience
layers = tf.keras.layers

# Prevent TensorFlow from showing us deprecation warnings
tf.logging.set_verbosity(tf.logging.ERROR)

We also define some utility functions that will be helpful.

In [6]:
def get_attpc_class(label):
    """Gets the class name for a given label.
    
    Arguments:
        label (int): The integer target label.
        
    Returns:
        The name of the class that corresponds to the given label.
    """
    return ['proton', 'carbon', 'junk'][label]

def load_attpc_data(path):
    """Loads in the AT-TPC data from an HDF5 file.
    
    Arguments:
        path (str): The path to the HDF5 file containing the data.
        
    Returns:
        A tuple containing the features and targets, e.g. (features, targets).
    """
    with h5py.File(path, 'r') as h5:
        features = h5['features'][:]
        targets = h5['targets'][:]
    
    return features, targets

def plot_learning_curve(history):
    """Plots a learning curve from a training history.
    
    Arguments:
        history (dict): The training history returned by `model.fit()`.
        
    Returns:
        None.
    """
    plt.figure(figsize=(11, 6), dpi=100)
    plt.plot(history.history['loss'], 'o-', label='Training Loss')
    plt.plot(history.history['val_loss'], 'o:', color='r', label='Validation Loss')
    plt.legend(loc='best')
    plt.title('Learning Curve')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.xticks(range(0, len(history.history['loss'])), range(1, len(history.history['loss']) + 1))
    plt.show()
    
def plot_confusion_matrix(y_true,
                          y_pred,
                          classes,
                          title=None,
                          cmap=plt.cm.Blues):
    """This function prints and plots the confusion matrix.
    
    Adapted from:
    https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
    
    Arguments:
        y_true: Real class labels.
        y_pred: Predicted class labels.
        classes: List of class names.
        title: Title for the plot.
        cmap: Colormap to be used.
    
    Returns:
        None.
    """
    if not title:
        title = 'Confusion matrix'

    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred)

    fig, ax = plt.subplots(figsize=(4, 4), dpi=100)
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
    ax.set(xticks=np.arange(cm.shape[1]),
           yticks=np.arange(cm.shape[0]),
           # ... and label them with the respective list entries
           xticklabels=classes, yticklabels=classes,
           title=title,
           ylabel='True label',
           xlabel='Predicted label')

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha='right', rotation_mode='anchor')

    # Loop over data dimensions and create text annotations.
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], 'd'),
                    ha='center', va='center',
                    color='white' if cm[i, j] > thresh else 'black')
    fig.tight_layout()
    plt.show()

## Loading the AT-TPC data

We load in the real and simulated AT-TPC data below. Set `REAL_DATA_PATH` and `SIM_DATA_PATH` to point to the respective locations of the HDF5 data files on your computer.

In [7]:
REAL_DATA_PATH = '../attpc-data/real-attpc-events.h5'
SIM_DATA_PATH = '../attpc-data/simulated-attpc-events.h5'

real_features, real_targets = load_attpc_data(REAL_DATA_PATH)
sim_features, sim_targets = load_attpc_data(SIM_DATA_PATH)

## How to proceed

We have provided you with the data, and now you can do with it as you wish. Below is a list of suggestions for things you can try, and you can work off of the CNN notebook from the earlier lecture.

 * Try to improve the results of the transfer learning problem from earlier.
   * Perform hyperparameter tuning.
   * Train on more of the simulated data.
   * Experiment with different network architectures (add dropout, change hidden layers, etc).
   * Rather than freezing the convolutional base of the VGG16 model, fine-tune the convolutional layers by training the entire network.
 * Consider a different learning task: train on real data and test on real data (this should get very good results).
 * Build and train a CNN from scratch.
 
Model your workflow on the previous CNN notebook. Good luck!