# Fetal Health Classifier

### 1. Problem Statement

- This project understands how the the health of fetus is affected by various parameters from Cardiotocogram results.
- We classify fetal health into 3 classes: 1-Normal, 2-Suspect and 3-Pathological

### 2.Data Collection

- Dataset Source : [link](https://www.kaggle.com/datasets/andrewmvd/fetal-health-classification)
- The data consists of 22 columns & 2126 rows.

### 2.1 Import Data and Required Packages

#### Importing packages

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

#### Import CSV Data as Pandas DataFrame

In [2]:
df = pd.read_csv('../data/fetal_health.csv')

#### Top 5 rows of data

In [3]:
df.head()

Unnamed: 0,baseline_value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,...,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
0,120.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0,0.5,43.0,...,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2.0
1,132.0,0.006,0.0,0.006,0.003,0.0,0.0,17.0,2.1,0.0,...,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1.0
2,133.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.1,0.0,...,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1.0
3,134.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1.0
4,132.0,0.007,0.0,0.008,0.0,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1.0


#### Shape of Dataset

In [4]:
df.shape

(2126, 22)

### 3. Data Checks to perform

- Check missing values
- Check for duplicates
- Check data type
- Check unique values of each column
- Check statistics of data set

### 3.1 Check Missing Values

In [5]:
df.isna().sum()

baseline_value                                            0
accelerations                                             0
fetal_movement                                            0
uterine_contractions                                      0
light_decelerations                                       0
severe_decelerations                                      0
prolongued_decelerations                                  0
abnormal_short_term_variability                           0
mean_value_of_short_term_variability                      0
percentage_of_time_with_abnormal_long_term_variability    0
mean_value_of_long_term_variability                       0
histogram_width                                           0
histogram_min                                             0
histogram_max                                             0
histogram_number_of_peaks                                 0
histogram_number_of_zeroes                                0
histogram_mode                          

There are no missing values in the dataset

### 3.2 Check Duplicates

In [6]:
df.duplicated().sum()

np.int64(13)

There are 13 duplicate rows

#### Removing Duplicates

In [7]:
df = df.drop_duplicates()
df.duplicated().sum()

np.int64(0)

No more duplicate rows in the dataset.

### 3.3 Check data types

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2113 entries, 0 to 2125
Data columns (total 22 columns):
 #   Column                                                  Non-Null Count  Dtype  
---  ------                                                  --------------  -----  
 0   baseline_value                                          2113 non-null   float64
 1   accelerations                                           2113 non-null   float64
 2   fetal_movement                                          2113 non-null   float64
 3   uterine_contractions                                    2113 non-null   float64
 4   light_decelerations                                     2113 non-null   float64
 5   severe_decelerations                                    2113 non-null   float64
 6   prolongued_decelerations                                2113 non-null   float64
 7   abnormal_short_term_variability                         2113 non-null   float64
 8   mean_value_of_short_term_variability       

All columns are float64 data type.

### 3.4 Checking number of unique values

In [9]:
df.nunique()

baseline_value                                             48
accelerations                                              20
fetal_movement                                            102
uterine_contractions                                       16
light_decelerations                                        16
severe_decelerations                                        2
prolongued_decelerations                                    6
abnormal_short_term_variability                            75
mean_value_of_short_term_variability                       57
percentage_of_time_with_abnormal_long_term_variability     87
mean_value_of_long_term_variability                       249
histogram_width                                           154
histogram_min                                             109
histogram_max                                              86
histogram_number_of_peaks                                  18
histogram_number_of_zeroes                                  9
histogra

### 3.5 Check statistics of dataset

In [10]:
df.describe()

Unnamed: 0,baseline_value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,...,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
count,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,...,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0,2113.0
mean,133.30478,0.003188,0.009517,0.004387,0.001901,3e-06,0.000159,46.993848,1.335021,9.795078,...,93.5646,164.099858,4.077142,0.325603,137.45433,134.599621,138.089446,18.907241,0.318504,1.303833
std,9.837451,0.003871,0.046804,0.002941,0.002966,5.7e-05,0.000592,17.177782,0.884368,18.337073,...,29.562269,17.945175,2.951664,0.707771,16.402026,15.610422,14.478957,29.038766,0.611075,0.614279
min,106.0,0.0,0.0,0.0,0.0,0.0,0.0,12.0,0.2,0.0,...,50.0,122.0,0.0,0.0,60.0,73.0,77.0,0.0,-1.0,1.0
25%,126.0,0.0,0.0,0.002,0.0,0.0,0.0,32.0,0.7,0.0,...,67.0,152.0,2.0,0.0,129.0,125.0,129.0,2.0,0.0,1.0
50%,133.0,0.002,0.0,0.005,0.0,0.0,0.0,49.0,1.2,0.0,...,93.0,162.0,4.0,0.0,139.0,136.0,139.0,7.0,0.0,1.0
75%,140.0,0.006,0.003,0.007,0.003,0.0,0.0,61.0,1.7,11.0,...,120.0,174.0,6.0,0.0,148.0,145.0,148.0,24.0,1.0,1.0
max,160.0,0.019,0.481,0.015,0.015,0.001,0.005,87.0,7.0,91.0,...,159.0,238.0,18.0,10.0,187.0,182.0,186.0,269.0,1.0,3.0


**Insight:** Fetal health should be categorical column

#### Converting fetal_health into categorical string

In [11]:
df['fetal_health'].unique()

array([2., 1., 3.])

In [12]:
mapping = {1.0: 'Normal', 2.0: 'Suspect', 3.0: 'Pathological'}
df['fetal_health'] = df['fetal_health'].map(mapping)
df['fetal_health']

0       Suspect
1        Normal
2        Normal
3        Normal
4        Normal
         ...   
2121    Suspect
2122    Suspect
2123    Suspect
2124    Suspect
2125     Normal
Name: fetal_health, Length: 2113, dtype: object

In [13]:
df['fetal_health'].unique()

array(['Suspect', 'Normal', 'Pathological'], dtype=object)

Fetal Health now categorical column

### 4. Exploring Data and Visualization