# Fetal State Classification
The aim of the project is build a classification model that can help doctors in categorizing cardiotocograms(CTGS). The three categories of the foetus include normal, suspect and pathologic. The dataset utilized for the project is obtained from UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/cardiotocography).

The data is multivariate, contains 2126 instances and 23 attributes. From this information, the best kernel for the dataset is RBF because the number of instances is not significantly greater than the number of attributes.


In [1]:
# Importing Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
import matplotlib.pyplot as plt


In [3]:
# Load Data from the sheet called Raw Data
data =pd.read_excel('CTG.xls', 'Raw Data')

data.head()

Unnamed: 0,FileName,Date,SegFile,b,e,LBE,LB,AC,FM,UC,...,C,D,E,AD,DE,LD,FS,SUSP,CLASS,NSP
0,,NaT,,,,,,,,,...,,,,,,,,,,
1,Variab10.txt,1996-12-01,CTG0001.txt,240.0,357.0,120.0,120.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,9.0,2.0
2,Fmcs_1.txt,1996-05-03,CTG0002.txt,5.0,632.0,132.0,132.0,4.0,0.0,4.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0
3,Fmcs_1.txt,1996-05-03,CTG0003.txt,177.0,779.0,133.0,133.0,2.0,0.0,5.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0
4,Fmcs_1.txt,1996-05-03,CTG0004.txt,411.0,1192.0,134.0,134.0,2.0,0.0,6.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0


In [5]:
from collections import Counter
# Selecting the samples and assigning feature set.

X = data.iloc[1:2126, 3:-2].values

Y = data.iloc[1:2126, -1].values

# Check Clas proportions
print(Counter(Y))

Counter({1.0: 1654, 2.0: 295, 3.0: 176})


The Counter shows that majority of the images are in normal fetal position, about 78%. Understanding the proportions of the data is imperative because it influences hyperparameter tuning to improve the performance of the classification model.

In [6]:
# Split the test and train data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

In [7]:
# Tuning RBF based SVM model using C, and kernel coefficient

svc = SVC(kernel='rbf')
parameters = {'C':(100, 1e3, 1e4, 1e5),
            'gamma':(1e-08,1e-7,1e-6,1e-5)}

grid_search = GridSearchCV(svc, parameters, n_jobs=-1, cv=5)
grid_search.fit(X_train, Y_train)

print('Best Parameters: ', grid_search.best_params_)
print ('Best Score: ', grid_search.best_score_)

Best Parameters:  {'C': 100000.0, 'gamma': 1e-07}
Best Score:  0.9448534562628523


In [8]:
# Best Model applied to the testing set

best_svc = grid_search.best_estimator_

accuracy = best_svc.score(X_test, Y_test)

print(f'Accuracy: {accuracy*100:.1f}%')

Accuracy: 95.6%


In [9]:
# Classification Report and Predictions
prediction = best_svc.predict(X_test)

report = classification_report(Y_test, prediction)

print(report)

              precision    recall  f1-score   support

         1.0       0.97      0.98      0.97       488
         2.0       0.89      0.89      0.89       104
         3.0       0.97      0.83      0.89        46

    accuracy                           0.96       638
   macro avg       0.95      0.90      0.92       638
weighted avg       0.96      0.96      0.96       638



The Classification report details the overall model performance. In general, the accuracy of the model is commendable given that it has reached 96%. 