# Further Exploration of AT-TPC Data

Now you will have the opportunity to further explore the Argon 46 data from the AT-TPC. This will be a much more open-ended opportunity for you to play with the data and try new things.

Before getting started, make sure you are using a GPU-enabled runtime in Google Colab. Go to "Runtime" $\rightarrow$ "Change runtime type", then make sure "GPU" is selected for "Hardware accelerator".

## Setup

This is where you can import any Python libraries that you may want to use.

In [None]:
import os

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import h5py

# This is simply an alias for convenience
layers = tf.keras.layers

# Prevent TensorFlow from showing us deprecation warnings
tf.logging.set_verbosity(tf.logging.ERROR)

We also define some utility functions that will be helpful.

In [None]:
def get_attpc_class(label):
    """Gets the class name for a given label.
    
    Arguments:
        label (int): The integer target label.
        
    Returns:
        The name of the class that corresponds to the given label.
    """
    return ['proton', 'carbon', 'junk'][label]

def load_attpc_data():
    """Loads in the AT-TPC data.
        
    Returns:
        A tuple of the form ((real_features, real_targets), (simulated_features, simulated_targets))
    """
    simulated_data_origin = 'https://github.com/CompPhysics/MachineLearningMSU/raw/master/Day2_materials/data/simulated-attpc-events.h5'
    real_data_origin = 'https://github.com/CompPhysics/MachineLearningMSU/raw/master/Day2_materials/data/real-attpc-events.h5'
    
    simulated_path = tf.keras.utils.get_file('simulated-attpc-data.h5', origin=simulated_data_origin)
    real_path = tf.keras.utils.get_file('real-attpc-data.h5', origin=real_data_origin)
    
    with h5py.File(simulated_path, 'r') as h5:
        simulated_features = h5['features'][:]
        simulated_targets = h5['targets'][:]
        
    with h5py.File(real_path, 'r') as h5:
        real_features = h5['features'][:]
        real_targets = h5['targets'][:]
    
    return (real_features, real_targets), (simulated_features, simulated_targets)

## Loading the AT-TPC data

We load in the real and simulated AT-TPC data below.

In [None]:
(real_features, real_targets), (simulated_features, simulated_targets) = load_attpc_data()

If running this notebook on Google Colab, you will not be able to fit all 50,000 simulated events in RAM after they have been normalized. Run the cell below to use only 10,000.

In [None]:
sim_features = sim_features[:10000]
sim_targets = sim_targets[:10000]

## How to proceed

We have provided you with the data, and now you can do with it as you wish. Below is a list of suggestions for things you can try, and you can work off of the CNN notebook from the earlier lecture.

 * Try to improve the results of the transfer learning problem from earlier.
   * Perform hyperparameter tuning.
   * Train on more of the simulated data.
   * Experiment with different network architectures (add dropout, change hidden layers, etc).
   * Rather than freezing the convolutional base of the VGG16 model, fine-tune the convolutional layers by training the entire network.
 * Consider a different learning task: train on real data and test on real data (this should get very good results).
 * Build and train a CNN from scratch.
 
Model your workflow on the previous CNN notebook. Good luck!