<h1> Library yang digunakan

In [None]:
import sklearn
import os
import cv2
import mahotas
import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier


<h1> Input dan baca Gambar

Directory root citra

In [None]:
directory_path = r'./images_folder'
print(len(os.listdir(directory_path)))

Baca citra yang ada pada tiap directory (Parasitized dan Uninfected)

In [None]:
files=[]
for dirname, _, filenames in os.walk(directory_path):
    for filename in filenames:
        if '.db' not in filename:
            files.append(os.path.join(dirname,filename))

Cek jumlah citra

In [None]:
len(files)

Tetapkan directory untuk tiap kelas citra

In [None]:
Parasitized_Dir='./images_folder/Parasitized'
Uninfected_Dir='./images_folder/Uninfected'

In [None]:
pd.DataFrame(files).sample(frac=1).reset_index(drop=True)

Fungsi untuk membuat dataframe citra

In [None]:
class DetectMalaria:
    def __init__(self,para_dir,uninfect_dir):
        self.parasitized_dir=para_dir
        self.uninfected_dir=uninfect_dir
    def dataset(self,ratio,files):
        Dataset=pd.DataFrame(files,columns=['Path'])
        Dataset=Dataset.sample(frac=1).reset_index(drop=True)
        trainfiles,testfiles=train_test_split(Dataset,test_size=ratio,random_state=None)
        return(trainfiles,testfiles)

Citra dibaca lalu dibuat menjadi dataframe, yang kemudian di assign ke variabel x

In [None]:
x=DetectMalaria(Parasitized_Dir,Uninfected_Dir)

<h1>Split dataframe / dataset

Rasio split data yaitu 80:20

In [None]:
train_data,test_data=x.dataset(ratio=0.2,files=files)

In [None]:
def label(df):
    if 'Uninfected' in df:
        return 0
    else:
        return 1


train_data['label']=train_data['Path'].apply(label)
test_data['label']=test_data['Path'].apply(label)

<h1> Visualisasi citra

In [None]:
image=cv2.imread('./images_folder/Uninfected/Uninfected_(1).png')
image_rgb=cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
plt.imshow(image_rgb)

Jika sel memiliki malaria maka akan bintik pada sel tersebut

Plot 4 citra dengan acak

In [None]:
fig,ax=plt.subplots(2,2)

for i,axes in enumerate(ax.flatten()):
    image_path=random.choice(train_data['Path'].reset_index(drop=True))
    image=cv2.imread(image_path)
    image_rgb=cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
    axes.imshow(image_rgb)
    if 'Uninfected' in image_path:
        axes.set_title('Uninfected')
    else:
        axes.set_title('parasite')
plt.show()

Baca citra

In [None]:
image_gray= cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
feature=cv2.HuMoments(cv2.moments(image_gray)).flatten()
print(feature)
print(mahotas.features.haralick(image_gray).mean(axis=0))

<h1> Fitur Ekstraksi

<h3> Fitur ekstraksi #1 (Hu'Moments)

Hu Moments (atau lebih tepatnya Hu moment invariants) adalah sekumpulan 7 angka yang dihitung menggunakan momen sentral yang invarian terhadap transformasi gambar. 6 momen pertama telah terbukti invarian terhadap translasi, skala, dan rotasi, dan refleksi. Sedangkan tanda momen ke-7 berubah untuk bayangan bayangan.

In [None]:
# Fungsi Hu'Moments
def fd_hu_moments(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    feature = cv2.HuMoments(cv2.moments(image)).flatten()
    return feature

<h3> Fitur ekstraksi #1 (Haralick Features)

Haralick Texture digunakan untuk mengukur gambar berdasarkan tekstur. Konsep dasar yang terlibat dalam komputasi fitur Tekstur Haralick adalah Gray Level Co-occurrence Matrix atau Glcm.

In [None]:
# Fungsi Haralick Features
def fd_haralick(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    haralick = mahotas.features.haralick(gray).mean(axis=0)
    return haralick

<h3> Fiture ekstraksi #3 (Histogram)

In [None]:
def fd_histogram(image, mask=None):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    hist  = cv2.calcHist([image], [0, 1, 2], None, [256, 256, 256], [0, 256, 0, 256, 0, 256])
    cv2.normalize(hist, hist)
    return(hist.flatten())

<h3> Ekstrak fitur dari gambar

In [None]:
feature=[]
def dataframe(df):

    image=cv2.imread(df['Path'])
    print(df['Path'])
    global_feature = np.hstack([ fd_haralick(image), fd_hu_moments(image),df['label']])
    feature.append(global_feature)
    
train_data.apply(dataframe,axis=1)

In [None]:
X_train=pd.DataFrame(feature).drop(columns=[20])
y_train=train_data['label']

In [None]:
feature=[]
test_data.apply(dataframe,axis=1)

In [None]:
X_test=pd.DataFrame(feature).drop(columns=[20])
y_test=test_data['label']

<h3> Menskalakan data pada skala umum

In [None]:
scaler=StandardScaler()
scaler.fit(X_train)
X_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)

In [None]:

svc=SVC()
svc.fit(X_train,y_train)
pred=svc.predict(X_test)

accuracy_score(y_test, pred)

<h1> Random forest

<h3> Parameter tuning

Mencari parameter terbaik untuk pembuatan model random forest

In [None]:
## Random Forest Classifier dengan tuning ###
parameters = {  'n_estimators':[80,90,100],
                'min_samples_leaf':[1,2,3,4],
                'max_depth':[20,30,40],
                'max_features':[20,30,40],
                'criterion':['gini','entropy']
            }

gridSVM = GridSearchCV(RandomForestClassifier(), parameters, refit = True, verbose = 1, cv=2)
gridSVM.fit(X_train, y_train)

Mendapatkan parameter terbaik

In [None]:
print(gridSVM.best_params_)

Didapat parameter terbaik adalah
* Criterion: Entropy
* Max depth: 30
* Max features: 40
* Min samples leaf: 2
* n estimators: 90

<h3> Implementasi parameter terbaik

Menggunakan parameter terbaik untuk pembuatan model random forest

In [None]:
rf = RandomForestClassifier(n_estimators=90, criterion='entropy' ,max_depth=30, min_samples_leaf=2, max_features=40)
rf.fit(X_train, y_train)
pred = rf.predict(X_test)

Gunakan kode dibawah jika telah menjalankan gridSVM

In [None]:
y_pred_train = gridSVM.predict(X_train)
y_pred_test = gridSVM.predict(X_test)

Gunakan kode dibawah jika tidak menjalankan gridSVM tapi sudah menggunakan parameter terbaik

In [None]:
y_pred_train = rf.predict(X_train)
y_pred_test = rf.predict(X_test)

<h3> Cetak skor akurasi

In [None]:
print(accuracy_score(y_train, y_pred_train))
print(accuracy_score(y_test, y_pred_test))

<h3> Metriks untuk training

In [None]:
print("Training metrics:")
print(sklearn.metrics.classification_report(y_true= y_train, y_pred= y_pred_train))

<h3> Metriks untuk testing

In [None]:
print("Test data metrics:")
print(sklearn.metrics.classification_report(y_true= y_test, y_pred= y_pred_test))

<h2> Confusion matrix

In [None]:
cm = confusion_matrix(y_test, pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Uninfected', 'Parasitized'])

disp.plot(cmap="Blues")
plt.show()

<h1> Fungsi untuk klasifikasi citra

In [None]:
def classify_image(image_path):
    image = cv2.imread(image_path)
    global_feature = np.hstack([fd_haralick(image), fd_hu_moments(image)])
    scaled_feature = scaler.transform([global_feature])
    prediction = rf.predict(scaled_feature)
    
    if prediction == 0:
        print(f"citra '{image_path}' diklasifikasi sebegai Uninfected.")
    else:
        print(f"citra '{image_path}' diklasifikasi sebegai Parasitized.")
    

    plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    plt.title("Citra asli")
    plt.axis("off")
    plt.show()