<center>    
<h3>American Association of Physicists in Medicine</h3>    
<h3>Grand Challenge 2020</h3> 
<h3>OpenKBP</h3>
<hr>
<h1>Introduction for Google Colab</h1>
<h3>January 25, 2022</h3>
</center> 

Before running this notebook, make a directory in the main directory of your Google Drive and name it open-kbp. The 
code-block below will mount your Google Drive and give you access to all your files from this notebook. 

In [None]:
# Mount your personal google drive
from google.colab import drive 
drive.mount('/content/drive')

By default, the path to your Drive should be "/content/drive/My Drive". You may check this by clicking the file icon in
the toolbar on the left side of Colab. From there you can navigate through your file tree and copy the path of any
file in your Drive. 

Next, add the open-kbp directory to your path.

In [1]:
# Add all files to path related to open-kbp. 
# A directory 'train-pats' with all training patient data should be included in open-kbp
import sys
primary_directory = '/content/drive/My Drive/open-kbp'
sys.path.insert(0, primary_directory)


Import all necessary packages for the notebook.

In [1]:
# %tensorflow_version 2.x #  This ensures you use the newest version of tensorflow
%tensorflow_version 2.x # Use tensorflow 2

# Import provided classes and functions
import shutil
from pathlib import Path

from provided_code.data_loader import DataLoader
from provided_code.dose_evaluation_class import DoseEvaluator
from provided_code.network_functions import PredictionModel
from provided_code.utils import get_paths

The functions loaded from _provided\_code_ are written for this competition, and you can access them via the file 
explorer on the left hand side of the Colab window. You're welcome to change them as much as 
you'd like. Keep in mind, however, that on Colab any changes you make to the files in your Google Drive (e.g., files in
 _provided\_code_ directory) will only be recognized by Colab when the _Runtime_ is restarted via the Restart 
 Runtime option in the top toolbar. If you implement a neural network, we urge you to you start with the provided 
 network architecture and network functions. The neural network we provide is only meant to be a template, and will not 
 be a competitive model without some significant modifications.

Before we run anything, first define the paths where the provided data is stored and where the results (e.g., models, predictions) should be saved. 

In [2]:
# Define project directories
primary_directory = Path().resolve()  # directory where everything is stored
provided_data_dir = primary_directory / "provided-data"
training_data_dir = provided_data_dir / "train-pats"
validation_data_dir = provided_data_dir / "validation-pats"
testing_data_dir = provided_data_dir / "test-pats"
results_dir = primary_directory / "results"  # where any data generated by this code (e.g., predictions, models) are stored

Name the model. This name will be used to label directories containing the results that the model generates. Also, 
define how many epochs the model should be trained for. It will likely take a large number of epochs (e.g., 100-200)
to get good results. 

In [None]:
test_time = False  # Only change this to True when the model has been fully tuned on the validation set
prediction_name = "baseline"  # Name model to train and number of epochs to train it for
num_epochs = 2

Retrieve the paths for all patient directories in the training set and seperate them into a list of paths for training 
a model and another for hold-out testing. 

In [None]:
# Prepare the data directory 
training_plan_paths = get_paths(training_data_dir)  # gets the path of each plan's directory

Initialize a data loader for the training set data, and use it to initialize a prediction model object. Call the
train_model method to train the model for the predefined number of epochs.

In [None]:
# Train a model
data_loader_train = DataLoader(training_plan_paths)
dose_prediction_model_train = PredictionModel(data_loader_train, results_dir, prediction_name,  "train")
dose_prediction_model_train.train_model(epochs=num_epochs, save_frequency=1, keep_model_history=1)

Note that during training we will only keep models that are __save_frequency * keep_model_history__ epochs back from the
current epoch. We do this because models are very large (~1 GB). 

Now that the model is trained we can use it to predict the dose for a set of hold-out patients from the validation or 
testing set. The code block below gets the paths of all plans in the hold out set you selected earlier.
 

In [None]:
# Define hold out set
hold_out_data_dir = validation_data_dir if test_time is False else testing_data_dir
stage_name, _ = hold_out_data_dir.stem.split("-")
hold_out_plan_paths = get_paths(hold_out_data_dir)

We start by making a new data loader for the held-out set, and use it to predict (and save) a 
set of out-of-sample dose distributions. Note that we change the mode of the data loader to 'dose_prediction' to 
load only the data needed to make a prediction.


In [None]:
# Predict dose for the held out set
data_loader_hold_out = DataLoader(hold_out_plan_paths)
dose_prediction_model_hold_out = PredictionModel(data_loader_hold_out, results_dir, model_name=prediction_name, stage=stage_name)
dose_prediction_model_hold_out.predict_dose(epoch=num_epochs)

Load each predicted dose distribution and evaluate it against the ground truth using the 
competition metrics.

In [None]:
 # Evaluate dose metrics
data_loader_hold_out_eval = DataLoader(hold_out_plan_paths)
prediction_paths = get_paths(dose_prediction_model_hold_out.prediction_dir, extension="csv")
hold_out_prediction_loader = DataLoader(prediction_paths)
dose_evaluator = DoseEvaluator(data_loader_hold_out_eval, hold_out_prediction_loader)

# print out scores if data was left for a hold out set
if not data_loader_hold_out_eval.patient_paths:
    print("No patient information was given to calculate metrics")
else:
    dose_evaluator.evaluate()
    dvh_score, dose_score = dose_evaluator.get_scores()
    print(f"For this out-of-sample test on {stage_name}:\n\tthe DVH score is {dvh_score:.3f}\n\tthe dose score is {dose_score:.3f}")

Once you're happy with your dose distributions you can zip up the predictions with the code block below. The zipped file
will contain the dose distributions for the validation set. It can be uploaded directly to CodaLab.

In [None]:
# Zip dose to submit
submission_dir = results_dir / "submissions"
submission_dir.mkdir(exist_ok=True)
shutil.make_archive(str(submission_dir / prediction_name), "zip", dose_prediction_model_hold_out.prediction_dir)