This notebook will show, briefly, how to build an autosegmentation model for thoracic OARs using pytorch and pytorch-lightning. We will be using some open data from The Cancer Imaging Archive (TCIA), originally used for a AAPM challenge. This dataset contains 60 patients, each of which has five OARs segmented.
To handle the data, we will use pydicom to load slices and ideas from dicom-contour to convert RTSTRUCT objects into masks.
We will be using a suite of pre-built pytorch segmentation models in the excellent segmentation-models package. This package simplifies the building of a pretrained segmentation network in 2D. We will use a 2D approach, looping over slices in the data to segment 3D organs.
Pytorch can be quite intimidating, but is very powerful when you get to grips with it. In the interests of simplicity, we will use a wrapper around pytorch called pytorch-lightning. Lightning separates out the different bits of ML, allowing you to write a bit less boilerplate code, and letting us very quickly and easily use best-practise methods to train our models.
The steps in this notebook make the following steps:
- Install prerequisites and set up
- Load DICOM data containing CT and segmentation and convert to numpy arrays
- Define some preprocessing and apply it to the CT slices
- Create a segmentation model, using a library to make a pre-trained model for our segmentation task
- Train a the model to reproduce the training examples
- Test the model against the testing data and produce the AAPM competition ranking score
This notebook should run anywhere with python and jupyter installed. It will also run on google colab, if you have access to that.
To get started, you only need python + jupyter, all other dependencies will be installed inside the notebook. If you don't want this to mess with your global python environment, I would reccomend using a virtual environment
To open this notebook in colab, click this link: colab