# Introduction
This notebook is about brain tumor classification using two machine learning methods: KNN and SVM. Models are trained with multiclass labels, however, you can do both binary classification and multiclassification using this code.

The code is structured as follows:
1. Load packages
2. Preprocess data
3. Train and test KNN models
4. Train and test SVM models

You can run part 1 and part 2 first, then run either part 3 or part 4 to train and test your own model.

ATTENTION: This code is based on Windows. If you want to run it on Google Colab or other linux based servers, please change all '\\'  to '/'.

## 1.Load packages

In [1]:
import cv2
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics

from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

## 2.Prepocess data
This part includes:

1. image preprocessing using openCV
2. images and labels reading
3. train set and test set preparation
4. the function used to convert multiclass labels to binary labels

In [2]:
#hog
winSize = (28, 28)
blockSize = (14, 14)
blockStride = (7, 7)
cellSize = (14, 14)
nbins = 9
derivAperture = 1
winSigma = -1
histogramNormType = 0
L2HysThreshold = 0.2
gammaCorrection = 1
nlevels = 64
signedGradients = True

hog = cv2.HOGDescriptor(winSize, blockSize, blockStride,
                        cellSize, nbins, derivAperture,
                        winSigma, histogramNormType, L2HysThreshold,
                        gammaCorrection, nlevels, signedGradients)

In [3]:
#image processing
data_path='.\\dataset\\'
names=list(pd.read_csv(data_path+'label.csv')['file_name'])
labels=list(pd.read_csv(data_path+'label.csv')['label'])
imgs=[]
for name in names:
    img_path=data_path+'image\\'+name
    img=cv2.imread(img_path)
    img_hsv = hog.compute(img)
    imgs.append(np.squeeze(img_hsv))

In [4]:
#split trainset and testset
train_data,test_data,train_label,test_label=train_test_split(imgs,labels,test_size=0.2,random_state=3)

In [5]:
#multiclass to binary (run this if you want to do binary classification)
def multi_to_binary(labels):
    binary=[]
    for label in labels:
        binary.append(label) if label=='no_tumor' else binary.append('tumor')
    return binary

## 3. Train and test KNN models
Through this part you can:

1. train your KNN model with multiclass labels
2. test your KNN model for multiclass task
3. test your KNN model for binary task

If you want to do multiclassification, just run 1 and 2, if you want to do binary task, run all parts.

In [6]:
#KNN training and prediction
neigh=KNeighborsClassifier(n_neighbors=3)
neigh.fit(train_data,train_label)
predict_label=neigh.predict(test_data)

In [7]:
#accuracy of multiclassification
score=metrics.accuracy_score(test_label,predict_label)
score

0.8216666666666667

In [8]:
#accuracy of binary classification
predict_binary=multi_to_binary(predict_label)
test_binary=multi_to_binary(test_label)
score_binary=metrics.accuracy_score(test_binary,predict_binary)
score_binary

0.9383333333333334

## 4. Train and test SVM models
Through this part you can:

1. train your SVM model with multiclass labels
2. test your SVM model for multiclass task
3. test your SVM model for binary task

If you want to do multiclassification, just run 1 and 2, if you want to do binary task, run all parts.

In [9]:
#SVM training and prediction
clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
clf.fit(train_data, train_label)
predict_label = clf.predict(test_data)

In [10]:
#accuracy of multiclassification
score=metrics.accuracy_score(test_label,predict_label)
score

0.8783333333333333

In [11]:
#accuracy of binary classification
predict_binary=multi_to_binary(predict_label)
test_binary=multi_to_binary(test_label)
score_binary=metrics.accuracy_score(test_binary,predict_binary)
score_binary

0.9616666666666667