Skip to content

Experiments with several machine learning models for tumor classification

License

Notifications You must be signed in to change notification settings

ArmandoDomi/Tumor_Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tumor_Detection

Experiments with several machine learning models for tumor classification.
Used two brain MRI datasets founded on Kaggle.

The first dataset you can find it here
The second dataset here


About the data:
The first dataset contains 155 positive and 98 negative examples, resulting in 253 example images.The folder yes contains 155 Brain MRI Images that are tumorous and the folder no contains 98 Brain MRI Images that are non-tumorous.

The second dataset contains 100 positive and 100 negative examples, resulting in 200 example images. The dataset is seperate by test,train and validation and each folder has a hemmorhage_data and non_hemmorhage_data

Data Preprocessing

For every image, the following preprocessing steps were applied:

1. Resize to 250,250,3 (image_width, image_height,channels) because images in the two datasets come in different sizes.
2. Convert image from RGB to grayscale.
3. Use Hog for feature extraction.

After the preprocessing we use the hog features to our models. Also there is an option to use Principal component analysis (PCA) for feature reduction .

HOG


The output after applying HOG for pixels per cell : 32x32

Image of HOG in no_tumor class Image of HOG in yes_tumor class

Machine Learning Models

Experiments with SVM, Linear-SVM, Random Forest,Logistic Regression using 5-fold cross validation.

Metrics

Accuracy, Precision, Recall, Fmeasure, Specificity

The goal

The goal is try to make the recall equal to 1 . So the FN must be equal to 0. This way the classifier always will spot images that are tumorous.

Predicted Label
No Yes
No TN FP
Yes FN=0 TP

We try to find the best threshold for our problem.


The threshold is selected based the accuracy of the model and the recall at validation set Image of thresholds

Results

Results for the first dataset with 253 images :
Accuracy Presicion Recall Fmeasure Spesificity
LR 0.842 0.8 1.0 0.888 0.571
SVM 0.815 0.774 1.0 0.872 0.5
Linear-SVM 0.868 0.827 1.0 0.905 0.642
RF 0.815 0.793 0.958 0.867 0.571
SVM-Additive Chi^2 0.894 0.857 1.0 0.923 0.714