# Fetal State Classification on Cardiotocography

* This project helps classify CTGs into one of three fetal states: normal,
suspect and pathologic.
* The dataset was obtained from: https://archive.ics.uci.edu/ml/datasets/Cardiotocography
* The dataset consists of measurements of fetal heart rate and uterine contraction as features, and the fetal state class code (1=normal, 2=suspect, 3=pathologic) as a label.
* There are in total 2,126 samples with 23 features.

## Techniques used:
1. SVC classifier with RBF kernel (The instances and features ratio is not
significantly larger).


# Step 1: Loading and analyzing the data

In [11]:
import pandas as pd
# Getting the data located in the sheet named Raw Data
df = pd.read_excel("./datasets/CTG.xls", "Raw Data")
df.head()

Unnamed: 0,FileName,Date,SegFile,b,e,LBE,LB,AC,FM,UC,...,C,D,E,AD,DE,LD,FS,SUSP,CLASS,NSP
0,,NaT,,,,,,,,,...,,,,,,,,,,
1,Variab10.txt,1996-12-01,CTG0001.txt,240.0,357.0,120.0,120.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,9.0,2.0
2,Fmcs_1.txt,1996-05-03,CTG0002.txt,5.0,632.0,132.0,132.0,4.0,0.0,4.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0
3,Fmcs_1.txt,1996-05-03,CTG0003.txt,177.0,779.0,133.0,133.0,2.0,0.0,5.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0
4,Fmcs_1.txt,1996-05-03,CTG0004.txt,411.0,1192.0,134.0,134.0,2.0,0.0,6.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0


In [12]:
# We then take 2126 data samples and assign the feature set (from columns D
# to AL) and the label set (Column AN)
X = df.iloc[1:2126, 3:-2].values
Y = df.iloc[1:2126, -1].values

In [14]:
from collections import Counter
# Checking class proportions
print(Counter(Y))

Counter({1.0: 1654, 2.0: 295, 3.0: 176})


We conclude that this is an unbalanced set.

# Step 2: Splitting into training and testing

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2,
                                                    random_state = 42)

# Step 3: Building the model
* We'll start off with an SVC with RBF kernel and we'll use GridSearchCV to
find the right parameters.

In [24]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

svc = SVC(kernel = 'rbf', )
parameters = {'C': (1, 10, 100, 1e3, 1e4, 1e5, 1e6),
              'gamma': (1e-08, 1e-07, 1e-06, 1e-05, 1e-04)}
grid_search = GridSearchCV(svc, parameters, n_jobs = -1, cv = 5)
grid_search.fit(X_train, Y_train)

print(grid_search.best_params_)
print(grid_search.best_score_)


{'C': 1000000.0, 'gamma': 1e-07}
0.968235294117647


# Step 4: Predicting using the best model

In [25]:
from sklearn.metrics import classification_report
svc_best = grid_search.best_estimator_

# Getting the accuracy
accuracy = svc_best.score(X_test, Y_test)
print(f"The accuracy is: {accuracy*100:.2f}%")
# Predicting
prediction = svc_best.predict(X_test)
# Getting the classification report
report = classification_report(Y_test, prediction)
print(report)

The accuracy is: 96.47%
              precision    recall  f1-score   support

         1.0       0.97      0.99      0.98       324
         2.0       0.95      0.91      0.93        65
         3.0       0.97      0.81      0.88        36

    accuracy                           0.96       425
   macro avg       0.96      0.90      0.93       425
weighted avg       0.96      0.96      0.96       425

