<center>    
<h3>American Association of Physicists in Medicine</h3>    
<h3>Grand Challenge 2020</h3> 
<h3>OpenKBP</h3>
<hr>
<h1>Introduction</h1>    
<h3>February 14, 2020</h3>
</center> 

Before running this notebook, make a directory in the main directory of your Google Drive and name it open-kbp. The 
code-block below will mount your Google Drive and give you access to all your files from this notebook. 

In [None]:
# Mount your personal google drive
# from google.colab import drive 
# drive.mount('/content/drive')

By default, the path to your Drive should be "/content/drive/My Drive". You may check this by clicking the file icon in
the toolbar on the left side of Colab. From there you can navigate through your file tree and copy the path of any
file in your Drive. 

Next, add the open-kbp directory to your path.

In [None]:
# Add all files to path related to open-kbp. 
# A directory 'train-pats' with all training patient data should be included in open-kbp
# import sys
# sys.path.insert(0, '/content/drive/My Drive/open-kbp')

# # Use tensorflow 2
# %tensorflow_version 2.x

Import all necessary packages for the notebook

In [1]:
# %tensorflow_version 2.x #  This ensures you use the newest version of tensorflow
import tensorflow as tf

# Import provided classes and functions
import shutil

from provided_code.data_loader import DataLoader
from provided_code.dose_evaluation_class import EvaluateDose
from provided_code.general_functions import get_paths, make_directory_and_return_path
from provided_code.network_functions import PredictionModel

Define the paths where the provided data is stored and where the results (e.g., models, predictions) should be saved. 

In [2]:
# Define parent directory
main_data_dir = '/Users/aaronbabier/Desktop/open-kbp/provided-data'  # path where any provided data is stored
results_dir = '/Users/aaronbabier/Desktop/open-kbp/results'  # parent path where results are stored

# Define path to training data and validation data 
training_data_dir = '{}/train-pats'.format(main_data_dir)
validation_data_dir = '{}/valid-pats'.format(main_data_dir)

Name the model. This name will be used to label directories containing the results that the model generates. Also, 
define how many epochs the model should be trained for. It will likely take a large number of epochs (e.g., 100-200)
to get good results. 

In [None]:
prediction_name = 'baseline'
number_of_training_epochs = 2

Retrieve the paths for all patient directories, and make a list of patient directories 
for a training and validation set.

In [None]:
plan_paths = get_paths(training_data_dir, ext='')  # gets the path of each plan's directory
num_train_pats = 100  # number of plans that will be used to train model
training_paths = plan_paths[:num_train_pats]  # list of training plans
hold_out_paths = plan_paths[num_train_pats:]  # list of paths used for held out testing

Initialize a loads data from the list of patients in the training set. 

In [None]:
data_loader_train = DataLoader(training_paths)

Initialize the prediction model class and train the model over the specified number of epochs. 

In [None]:
# Idealize the model
dose_prediction_model_train = PredictionModel(data_loader_train, results_dir, model_name=prediction_name)


# Train the model and save it after a specified number of epochs since the last save
dose_prediction_model_train.train_model(epochs=number_of_training_epochs, save_frequency=1, keep_model_history=1)


Now that the model is trained we can use it to predict the dose for the set of held-out patients from the training set 
that we set aside earlier. We start by making a new data loader for the held-out set, and use it to predict (and save) a 
set of out-of-sample dose distributions. Note that we change the mode of the data loader to 'dose_prediction' to 
load only the data needed to make a prediction.

In [None]:
# Predict dose for the held out set
data_loader_hold_out = DataLoader(hold_out_paths, mode_name='dose_prediction')
dose_prediction_model_hold_out = PredictionModel(data_loader_hold_out, results_dir, model_name=prediction_name)
dose_prediction_model_hold_out.predict_dose(epoch=number_of_training_epochs)

Load each predicted dose distribution and evaluate it against the ground truth using the 
competition metrics.

In [None]:
# Evaluate dose metrics
data_loader_hold_out.set_mode('evaluation')  # Change the data loader to evaluation mode
prediction_paths = get_paths(dose_prediction_model_hold_out.prediction_dir, ext='csv')  # Get path to newly made predictions
hold_out_prediction_loader = DataLoader(prediction_paths, mode_name='predicted_dose')  # Initialize the prediction loader
dose_evaluator = EvaluateDose(data_loader_hold_out, hold_out_prediction_loader)
dvh_score, dose_score = dose_evaluator.make_metrics()
print('In this out-of-sample test:\n'
      '\tthe DVH score is {:.3f}\n '
      '\tthe dose score is {:.3f}'.format(dvh_score, dose_score))

If the model is performing well here you can try predicting the dose from the OpenKBP validation set. All you need to do
is use the same prediction model that was just trained and predict the dose for the patients in the validation set 

In [None]:
# Apply model to validation set
validation_data_paths = get_paths(validation_data_dir, ext='')  # gets the path of each plan's directory
validation_data_loader = DataLoader(validation_data_paths, mode_name='dose_prediction')
dose_prediction_model_hold_out = PredictionModel(validation_data_loader, results_dir, model_name=prediction_name)
dose_prediction_model_hold_out.predict_dose(epoch=number_of_training_epochs)

You may wish to check that the metrics seem reasonable. Although there's no baseline available you can still
evaluate plans based against on dose metrics to double check that your prediction seems reasonable. 


In [None]:
validation_eval_data_loader = DataLoader(validation_data_paths, mode_name='evaluation')  # Set data loader
dose_evaluator = EvaluateDose(validation_eval_data_loader)
dose_evaluator.make_metrics()
validation_prediction_metrics = dose_evaluator.reference_dose_metric_df.head()

Once you're happy with your dose distributions you can zip up the predictions with the code block below. The zipped file
will contain the dose distributions for the validation set. It can be uploaded directly to CodaLab.

In [None]:
# Zip dose to submit
submission_dir = make_directory_and_return_path('{}/submissions/{}'.format(results_dir, prediction_name))
shutil.make_archive(submission_dir, 'zip', dose_prediction_model_hold_out.prediction_dir)