# Medical Image Processing

## Classifying acanthocytes using image processing and ML techniques: A comparative study


The diagnosis of several diseases can be improved with the identification of acanthocytes, i.e., red blood cells with abnormal form. The following paper describes an approach to autonomously identify such cells in blood sample images: [Classifying acanthocytes using image processing and ML techniques: A comparative study](https://www.researchgate.net/publication/345003926_Classifying_acanthocytes_using_image_processing_and_ML_techniques_A_comparative_study).

The method relies on image processing operations and conventional machine learning methods. The principal motivation is the fact that this identification is usually performed by specialized devices or done manually by humans. Specialized devices are rare and costly, while manual identification is prune to error. Our approach reaches a precision of 91%, showing the potential of the solution.

The main goal is to develop a reliable detection and classification procedure for acanthocytes, using a reduced set of features. The first step is to apply image procesing techniques and after this, the step is to apply ML algorithms.

Image processing techniques are used to segment blood cells and conventional ML models to classify them. The output is the classification of each blood cell into one of two classes: normal cells or acanthocytes. Additionally, the number of acanthocytes in the blood sample is computed.

## Describing dataset

The code that generated dataset is publicly available [here](https://github.com/catarinaacsilva/medical-image-processing).

The first step is to normalize an input image by converting it to gray scale and applying a 9x9 median filter to smooth noise. 
The gray image is then converted to binary using the Otsu thresholding method. 
Those operations may originate some holes in the middle of the cells and medium-sized noise (by-product of the binarization).

The next steps fix that by executing a filling operation (imfill) that applies a guided flooding operation to close holes inside blobs.
Morphological reconstruction (elliptic shaped 9x9 kernel) is applied to remove the medium-sized noise produced during the binarization. Finally, the Canny edge detector is applied to extract region contours.

![](images/img00.png)

Based on the extracted region contours, we compute several features that describe them.
The first feature is the histogram from the chain code. A chain code characterizes the shape of a contour but is not rotation and scale invariant. To achieve that we compute an histogram with the relative weight for each direction of the chain code.
The remaining features are circularity, roundness, aspect-ratio and solidity.
The previously mentioned features are shape descriptors commonly used by image processing toolboxes to classify blobs.
These features are meant to enhance the classification process, by expanding the expressiveness of the histogram, and capture characteristics that are invariant to scale and rotation.

In [None]:
#Load arff file

import arff, numpy as np
dataset = arff.load(open('dataset/medical_image.arff', 'r'))
data = np.array(dataset['data'])
print(data)

In [None]:
#Split into input (X) and output (y) variables

X = data[:, :-1]
X = X.astype(np.float64)
y = data[:,-1]
print(X)
print(y)



# Import label encoder 
from sklearn import preprocessing 
  
# label_encoder object knows how to understand word labels. 
label_encoder = preprocessing.LabelEncoder() 

y= label_encoder.fit_transform(y)

print(y)

In [None]:
#Feature selection
from sklearn.feature_selection import SelectKBest, chi2

print(X.shape)

# Define the cutoff for best features
k=3

X = SelectKBest(chi2, k=k).fit_transform(X, y)
print(X.shape)

In [None]:
from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.linear_model import Perceptron
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
import math

k_fold = StratifiedKFold(n_splits=5)

acc = np.array([])
precision = np.array([])
recall = np.array([])
f1 = np.array([])
mcc = np.array([])

classifiers = [('svm', SGDClassifier(loss="log")), 
              ('perceptron', Perceptron(penalty=None, alpha=0.0001, fit_intercept=True, tol=None, 
               eta0=0.1, n_jobs=1, random_state=0, class_weight=None, warm_start=False)),
              ('knn1', KNeighborsClassifier(n_neighbors=1)),
              ('knn3', KNeighborsClassifier(n_neighbors=3)),
              ('knn5', KNeighborsClassifier(n_neighbors=5)),
              ('decision tree', DecisionTreeClassifier()),
              ('random forest', RandomForestClassifier())]
for name, model in classifiers:
    for train_indices, test_indices in k_fold.split(X,y):
        scaler = StandardScaler() 
        X_train=X[train_indices]

        Xs=scaler.fit_transform(X_train)

        Xtest=scaler.transform(X[test_indices])

        model.fit(Xs,y[train_indices])

        y_pred=model.predict(Xtest)
        cm = confusion_matrix(y[test_indices],y_pred)
        #print(cm)

        tp = cm[1][1]
        tn = cm[0][0]
        fp = cm[0][1]
        fn = cm[1][0]

        #Accuracy
        acc = np.append(acc, (tp+tn)/(tp+tn+fp+fn))

        # Precision
        precision = np.append(precision, tp/(tp+fp))


        #Recall
        recall = np.append(recall, tp/(tp+fn))


        #F1 Score
        f1 = np.append(f1, (2*tp)/(2*tp + fp + fn))


        #MCC
        mcc =np.append(mcc, (tp*tn-fp*fn)/math.sqrt((tp+fp)*(tp+fn)*(tn+fp)*(tn+fn)))
    print('Model: ', name) 
    print('Accuracy = ', np.average(acc))
    print('Precision = ', np.average(precision))
    print('Recall = ', np.average(recall))
    print('F1 = ', np.average(f1))
    print('Matthews correlation coefficient = ', np.average(mcc))
    print('********************************************************************************')
    print()