<a href="https://colab.research.google.com/github/beryaelio/Brain_MRI_cancer_classifier/blob/main/Brain_MRI_Scan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from skimage.feature import hog
from skimage import io, color
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In this notebook, I'll build and experiment with the SVM ML algorithm, that's supposed to work well classifying images. I'll use the Brain MRI scan images from the link:
https://www.kaggle.com/datasets/volodymyrpivoshenko/brain-mri-scan-images-tumor-detection/data

The task at hand is make a model that will be able to detect tumors is an MRI scan.

At first I'll load the data -


In [None]:
Positive = '/content/drive/MyDrive/brain_scan_db/brain_mri_scan_images/positive'
Negative = '/content/drive/MyDrive/brain_scan_db/brain_mri_scan_images/negative'

pos = os.listdir(Positive)
neg = os.listdir(Negative)

feature_vectors = []
labels = []

pos_files = [file for file in pos if file.lower().endswith('.jpg')]

def extract_features(image):
    if len(image.shape) == 3:
        gray_image = color.rgb2gray(image)
    else:
        gray_image = image

    if len(gray_image.shape) == 3:
        gray_image = np.squeeze(gray_image)

    features = hog(gray_image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), visualize=False, multichannel=False)
    return features

for path in pos_files:
    image = io.imread(Positive +'/'+ path, as_gray=True)

    features = extract_features(image)
    feature_vectors.append(features)
    labels.append(1)

for path in neg:
    image = io.imread(Negative +'/'+ path, as_gray=True)
    features = extract_features(image)
    feature_vectors.append(features)
    labels.append(0)


  features = hog(gray_image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), visualize=False, multichannel=False)


Now preparing the data for training and testing

In [None]:
X = np.array(feature_vectors)
y = np.array(labels)

  X = np.array(feature_vectors)


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

The SVM model building:
I decided to set the kernel on linear because I have only one parameter. The C is set on 10 randomly.

In [None]:
svm_classifier = SVC(C = 10, kernel='linear')

X_train = [x[0] for x in X_train]
X_train = np.array(X_train).reshape(-1, 1)
y_train = np.array([y_train])
y_train = y_train.flatten()

svm_classifier.fit(X_train, y_train)

Now that the model is fit with the train sections, the testing can be done and evaluated.

In [None]:
X_test = [x[0] for x in X_test]
X_test = np.array(X_test).reshape(-1, 1)

predictions = svm_classifier.predict(X_test)

The evaluation metrics' scores:

In [None]:
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

precision = precision_score(y_test, predictions)
print("Precision:", precision)

recall = recall_score(y_test, predictions)
print("Recall:", recall)

f1 = f1_score(y_test, predictions)
print("F1-score:", f1)

conf_matrix = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(conf_matrix)


Accuracy: 0.6739130434782609
Precision: 0.6739130434782609
Recall: 1.0
F1-score: 0.8051948051948052
Confusion Matrix:
[[ 0 15]
 [ 0 31]]


The SVM model has f1-score of approx. 0.805.
I'll try to find better parameters for the model using the GridSearchCV.

In [None]:
param_grid = {'C': [0.1, 1, 10],
              'kernel': ['linear', 'rbf', 'poly']}

grid_search = GridSearchCV(SVC(), param_grid, cv=5, scoring='f1')
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
print("best params: ",best_params)
best_model = grid_search.best_estimator_
print("best model: ",best_model)


best params:  {'C': 0.1, 'kernel': 'linear'}
best model:  SVC(C=0.1, kernel='linear')


The GridSearch says the optimal parameters are 'linear' and C = 0.1
I'll try to run the model and evaluate it again but with the updated parameters

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_classifier = SVC(C = 0.1, kernel='linear')

X_train = [x[0] for x in X_train]
X_train = np.array(X_train).reshape(-1, 1)
y_train = np.array([y_train])
y_train = y_train.flatten()

svm_classifier.fit(X_train, y_train)

X_test = [x[0] for x in X_test]
X_test = np.array(X_test).reshape(-1, 1)
predictions = svm_classifier.predict(X_test)

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

precision = precision_score(y_test, predictions)
print("Precision:", precision)

recall = recall_score(y_test, predictions)
print("Recall:", recall)

f1 = f1_score(y_test, predictions)
print("F1-score:", f1)

conf_matrix = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(conf_matrix)

Accuracy: 0.6739130434782609
Precision: 0.6739130434782609
Recall: 1.0
F1-score: 0.8051948051948052
Confusion Matrix:
[[ 0 15]
 [ 0 31]]


The model's performance hasn't changed. using SVM I inffer it's the best outcome.
The model successfully identifies all positive images, and there are no false negatives. there's a problem with the false positives but it's ok for me.
The project is done.