<a href="https://colab.research.google.com/github/dcorre/otrainee/blob/Kenza/otrainee/tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#this code enables he program to function on a TPU, it needs to be added in file train.py
#and 
import tensorflow as tf
try:
    # TPU detection. 
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    # Default distribution strategy in Tensorflow. Works on CPU and single GPU.
    strategy = tf.distribute.get_strategy()
print("REPLICAS: ", strategy.num_replicas_in_sync)

# Getting started:

Clone the git project:
if you want to change the name of the folder in your notebook, add the name right after the command like so: 


> !git clone https://github.com/dcorre/tbd_cnn.git name



In [None]:
!git clone -b Kenza https://github.com/dcorre/otrainee.git

Install all necessary libraries: 


*   numpy
*   matplotlib
*   pandas
*   shapely
*   h5py
*   requests
*   scikit-learn
*   scipy









In [None]:
!pip install numpy scipy matplotlib astropy pandas shapely requests h5py scikit-image

Since there's few complications with the newest version of tensorflow, we'll install the version 2.3.1

In [None]:
!python3 -m pip install lacosmic hjson voevent-parse xmltodict astroML regions photutils keras keras-vis tensorflow cython regions  opencv-python-headless
!python3 -m pip install --pre astroquery

Move to the folder otrainee (or what you called the git in the first command), and setup the environment to be able to use the executables

In [None]:
cd otrainee

In [None]:
!python3.7 setup.py develop --user

In [None]:
!pip uninstall tensorflow --yes
!pip install tensorflow==2.3.1 

To upload the datacube, you can either upload them manually (but they'll be deleted if you disconnect the notebook or reset it) or you can upload them from your drive with the command line below:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Launching the training:

Running the cell below will launch the training process on the datacube provided :

*   --cube : path to your cube.

You can specify:
*   the model path (--model-path) with the model name (--model-name) to store your trained model.
*   the number of epochs (training steps), and the threshold (all candidates whose probability is greater than this value will be considered real transients: class True).

It automatically prints the different evaluation metrics values (recall, precision, F1-score, Matthew correlation coefficient) and the confusion matrix, and stores the plots (ROC, precision-recall curve and the probability distribution plot), plus the folders with the misclassified candidates in the same path as your model.

It also generates a datacube with the validation dataset if you want to get results with having to train it again.



In [None]:
ls

In [None]:
!python3.7 otrainee/cli/train.py --cube ../drive/MyDrive/cube_KGuitalens1.npz --model-path cnn --model-name model --epochs 10 

# Getting the results of a pretrained model and generating cutouts of the misclassified candidates:

If you have a pre-trained model, or you want to use your trained model (from the cell above), to get results for a different threshold or a different datacube (of the same telescope), you can run the cells below: 

the first cell will output the metrics' values for the threshold and generates the folders FN and FP, and the second one will generated all the plots above-mentioned.

You should specify:


*   the model path, and the path to the cube you want to test it on, and the .
*   for the first cell you'll also have to specify the threshold, and the path to where you want to store the folders "misclassified" of FP and FN, and "well classified" of 30 randomly chosen TN and 30 TP (if not specified, the default path will be the path to your model).


but note that these outputs will be deleted if the notebook is reset.

The cube used here is the validation datacube that was automatically generated by train.py, but you can apply it to a different datacube 


In [None]:
!python3.7 otrainee/cli/diagnostic.py --path-cube-validation cnn_OAJ/CNN_training/model/validation_set/datacube/cube_validation.npz --model-path cnn/CNN_training/model/model.h5 --threshold 0.38 --path-diagnostics cnn_results

In [None]:
!python3.7 otrainee/cli/plot_results.py --path-cube-validation cnn/CNN_training/model/validation_set/datacube/cube_validation.npz --model-path cnn/CNN_training/model/model.h5 --path-plots cnn_plots --threshold 0.4

The cell below will create a compressed file of the folders "Well classified" and "Misclassified" that you generated, so that you can download it and browse locally.

In [None]:
import shutil
#shutil.make_archive(outputfile, 'zip', directory)
shutil.make_archive("results", 'zip', "cnn_results/")


# Finding the minimum dataset size for your telescope:

We conducted a study on four different telescopes to determine the minimum dataset size, above which the performance of the model stabilizes and there's no underfitting nor overfitting. We've concluded that this value is around 6000 candidates (Real and Bogus combined), but you can test it yourself with the command below by specifying the same parameters as the training command, plus the number of sections you want to divide the dataset into. This will launch a loop on the dataset size. Beginning with one section, it adds each time a section and reinitializes the model and trained it on this subset, the following results for each size will be stored, and it generates three different plots for every results versus the dataset size in the same path as the model:

*   final validation loss
*   accuracy-validation accuracy
*   evaluation metrics (F1-score, MCC, final validation accuracy)

 





In [None]:
!python3.7 otrainee/cli/optimize_size.py --cube ../drive/MyDrive/cube_KGuitalens1.npz --model-path cnn_size --model-name model --epochs 15 --n_sections 10

In [None]:
!git add otrainee/train.py otrainee/cli/plot_results.py otrainee/plot_results.py

# GRAD - CAM Understanding the CNN
In this section, we'll run the grad-cam code (gradient- class activation map) that will show us where the model focused to give a certain prediction

In [None]:
!python3.7 otrainee/cli/grad_cam.py --cube cnn_OAJ/CNN_training/model/validation_set/datacube/cube_validation.npz --model-path cnn_OAJ/CNN_training/model/model.h5 --cam_path cnn_grad_cam --threshold 0.5

In [None]:
import shutil
shutil.make_archive("results", 'zip', "grad_cam/")