# Predict the Survival Chances of Glioma Patients 

## Introduction

A glioma is a type of tumor that starts in the glial cells of the brain or the spine. Gliomas comprise about 30 percent of all brain tumors and central nervous system tumours, and 80 percent of all malignant brain tumours. [[1]](#[1]) Since it is such a common type of brain tumor, it is important to predict the chances of survival of a patient in order to plan further treatment startegies at the time of diagnosis. Most patients are diagnosed based on Computed Tomography (CT) or Magnetic Resonance Imaging (MRI) scans. In this project, we use a Convolutional Neural Network (CNN) model to analyze these scans and predict whether the patient will survive.

## Related Work

During our initial research, we came across a few papers of interest which employed deep learning techniques to analyze medical imaging. A couple of them are listed briefly below:
- Use of Artificial Neural Netowrks (ANN) to predict survival of patients who were diagnosed with Amyotrophic Lateral Sclerosis (ALS) based on MRI scans. [[2]](#[2]) 
- Evaluation of deep learning networks for predicting clinical outcomes through analyzing time series CT images of patients with locally advanced non–small cell lung cancer (NSCLC). [[3]](#[3])
- Brain Tumor Segmentation and Survival Prediction Using Multimodal MRI Scans With Deep Learning [[4]](#[4])

## Content

The following topics are covered in the project:
- [Data Collection](#Data-Collection)
- [Data Preprocessing](#Data-Preprocessing)
- [CNN Model](#CNN-Model)
- [Conclusions](#Conclusions)
- [Future Work](#Future-Work)
- [References](#References)

## Data Collection

## Data Preprocessing

## CNN Model

## Conclusions

In this study, we developed a convolutional neural network to predict the survivability of a Glioma patient based on CT and MRI scans of his/her brain alone. We obtained the images from the The Cancer Image Archive (TCIA) databased, which provides a convenient API for downloading these images. In order to prepare our labels, we parsed the clinical data files for each patient from TCIA using web-scraping. Initially, a simple CNN with a single convolutional 2D layer and a fully connected layer was used. Although, this gave high accuracy of around 90% for both train and validation datasets, it did not predict the number of negative classes, i.e., the dead counts, correctly. The dead counts for test data was only 8%. We updated the model and used the LeNet architecture and performed a grid search on the number of output channels of the convolutional layers, the learning rates and the number of epochs. We found that the optimum number was 12 output channels for both the layers, a learning rate of 0.01 and 200 epochs.

The parameters gave 36.7%, 27.2% and 27.4% dead counts for train, validation and test datasets respectively with the accuracy and F1 scores being around 90% for both all three cases. The fact that our dead counts score for validation and test are the same points to the fact that our model is very stable across different datasets, and overfitting to the training data is minimized.

## Future Work

We used only one collection to train and test our model. Although, this in itself contains a lot of data, it would be beneficial to train the neural network on more data. Thus, more collection data can be downloaded from the TCIA database. Furthermore, for each serial ID we collect at most 10 images. This gives us around 50 images per patient. Discarding the rest of the image data may lead to loss of valuable insights about the patients. We predict that the model could perform better, especially in predicting dead counts, if more images of the same brain at different angles and orientations are used in the training process.

Collecting more data would help mitigate the problem we faced with the unbalanced dataset. The limited data that we collected had many people who survived than not. This made training the model difficult. Other than collection more data, there are three other ways to deal with the imbalanced dataset. [-] We could undersampling the data, which means randomly deleting data from the alive class so that the comparative ratio of dead and alive classes. The obvious problem with this approach is that there is a high possibility that the data that we are deleting may contain important information. A slightly better approach is Oversampling, where we increase the dead class data by randomly selecting and duplicating some dead class data points. Although, this gives us sufficient number of samples to play with without, this oversampling may lead to overfitting to the training data. The last and best option is synthetic sampling or SMOTE analysis where observations of the unbalanced classes are synthetically manufacture which are similar to the existing using nearest neighbors classification. These 3 methods can be implemented and compared in a future work.

Lastly, there is the issue of multiple images existing for each patient in the database. In this work, we treated each image as its own data point regardless of whether it belonged to the same patient or not. This could lead to overfitting of the data or data leakage. In order to combat this, we propose 2 methods. One way could be to collect all the images belonging to a patient and organize them as channels in the image tensor of the patient. Currently, only one channel is present for each tensor as the images are all in RGB scale. But, we could add different images corresponding to a specific patient into the channels to create a large image tensor to train on. The disadvantage of this is that the model would need to be more complex to account for this. Another method is to use a majority vote classification system, where the model is trained on each image as in the current work, but while testing we take a majority vote of all the predictions of images corresponding to each patient to determine the actual class the patient belongs to.

## References

<a id = '[1]'>[1]</a> https://en.wikipedia.org/wiki/Glioma

<a id = '[2]'>[2]</a> https://www.researchgate.net/publication/309182563_Deep_learning_predictions_of_survival_based_on_MRI_in_amyotrophic_lateral_sclerosis

<a id = '[3]'>[3]</a> https://clincancerres.aacrjournals.org/content/25/11/3266

<a id = '[4]'>[4]</a> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6707136/