# **HYPOTHYROIDISM DIAGNOSIS**

## **Objective**
1. To create an algorithm that diagnoses patients with hypothyroid-like symptoms.
2. To create a classification model with 99% accuracy

## **Context**

## **Experimental Design**

## **Loading Libraries**

In [178]:
# Reading libraries
import pandas as pd
import numpy as np
from datetime import date, time
# Ploting libraries
import matplotlib.pyplot as plt
import seaborn as sns
# Statististical libraries
# Machine learning libraries
# Metrics

## **Loading and previewing the datasets**

In [179]:
hypothyroiddf= pd.read_csv('hypothyroid.csv')

In [180]:
hypothyroiddf.head()

Unnamed: 0,status,age,sex,on_thyroxine,query_on_thyroxine,on_antithyroid_medication,thyroid_surgery,query_hypothyroid,query_hyperthyroid,pregnant,...,T3_measured,T3,TT4_measured,TT4,T4U_measured,T4U,FTI_measured,FTI,TBG_measured,TBG
0,hypothyroid,72,M,f,f,f,f,f,f,f,...,y,0.6,y,15,y,1.48,y,10,n,?
1,hypothyroid,15,F,t,f,f,f,f,f,f,...,y,1.7,y,19,y,1.13,y,17,n,?
2,hypothyroid,24,M,f,f,f,f,f,f,f,...,y,0.2,y,4,y,1.0,y,0,n,?
3,hypothyroid,24,F,f,f,f,f,f,f,f,...,y,0.4,y,6,y,1.04,y,6,n,?
4,hypothyroid,77,M,f,f,f,f,f,f,f,...,y,1.2,y,57,y,1.28,y,44,n,?


In [181]:
hypothyroiddf.tail()

Unnamed: 0,status,age,sex,on_thyroxine,query_on_thyroxine,on_antithyroid_medication,thyroid_surgery,query_hypothyroid,query_hyperthyroid,pregnant,...,T3_measured,T3,TT4_measured,TT4,T4U_measured,T4U,FTI_measured,FTI,TBG_measured,TBG
3158,negative,58,F,f,f,f,f,f,f,f,...,y,1.7,y,86,y,0.91,y,95,n,?
3159,negative,29,F,f,f,f,f,f,f,f,...,y,1.8,y,99,y,1.01,y,98,n,?
3160,negative,77,M,f,f,f,f,f,f,f,...,y,0.6,y,71,y,0.68,y,104,n,?
3161,negative,74,F,f,f,f,f,f,f,f,...,y,0.1,y,65,y,0.48,y,137,n,?
3162,negative,56,F,t,f,f,f,f,f,f,...,y,1.8,y,139,y,0.97,y,143,n,?


In [182]:
hypothyroiddf.shape
#The dataset has 3163 records and 26 features

(3163, 26)

In [183]:
hypothyroiddf.dtypes

status                       object
age                          object
sex                          object
on_thyroxine                 object
query_on_thyroxine           object
on_antithyroid_medication    object
thyroid_surgery              object
query_hypothyroid            object
query_hyperthyroid           object
pregnant                     object
sick                         object
tumor                        object
lithium                      object
goitre                       object
TSH_measured                 object
TSH                          object
T3_measured                  object
T3                           object
TT4_measured                 object
TT4                          object
T4U_measured                 object
T4U                          object
FTI_measured                 object
FTI                          object
TBG_measured                 object
TBG                          object
dtype: object

In [184]:
hypothyroiddf.columns

Index(['status', 'age', 'sex', 'on_thyroxine', 'query_on_thyroxine',
       'on_antithyroid_medication', 'thyroid_surgery', 'query_hypothyroid',
       'query_hyperthyroid', 'pregnant', 'sick', 'tumor', 'lithium', 'goitre',
       'TSH_measured', 'TSH', 'T3_measured', 'T3', 'TT4_measured', 'TT4',
       'T4U_measured', 'T4U', 'FTI_measured', 'FTI', 'TBG_measured', 'TBG'],
      dtype='object')

## **Data Cleaning**

### Duplicates

In [185]:
# Checking and dropping duplicated records
def duplicates(data):
  dup =data.duplicated().sum()
  if dup > 0:
    data.drop_duplicates(inplace= True)
  return data.duplicated().sum()

In [186]:
duplicates(hypothyroiddf)
#Duplicated records have been dealth with

0

### Missing Values

In [187]:
#Checking for missing values
def missing_values(data):
  data.iloc[:] =data.iloc[:].replace('?',np.NaN)
  data.iloc[:] =data.iloc[:].replace('!',np.NaN)
  data.iloc[:] =data.iloc[:].replace(' ',np.NaN)
  miss= data.isnull().sum()/len(data) *100
  missdf= pd.DataFrame(miss)
  return missdf.transpose() 

In [188]:
missing_values(hypothyroiddf)
# The TBG column has 91.9% missing values. 
# Therefore the TBG column will be dropped.

Unnamed: 0,status,age,sex,on_thyroxine,query_on_thyroxine,on_antithyroid_medication,thyroid_surgery,query_hypothyroid,query_hyperthyroid,pregnant,...,T3_measured,T3,TT4_measured,TT4,T4U_measured,T4U,FTI_measured,FTI,TBG_measured,TBG
0,0.0,14.19313,2.365522,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,22.034997,0.0,7.777058,0.0,7.744653,0.0,7.712249,0.0,91.866494


In [189]:
# Dropping the TBG column
hypothyroiddf.drop('TBG', axis=1, inplace=True)

In [190]:
# Correcting data types from object dtype to float dtype
columns= ['age','T3','TT4','T4U','FTI']
for x in columns:
  hypothyroiddf[x]=hypothyroiddf[x].astype('float')

In [191]:
#Finding the means of the 'age','T3','TT4','T4U','TFI' 
# columns where status is negative or positive.
col= ['age','T3','TT4','T4U','FTI']
for m in col:
  means=hypothyroiddf.groupby('status')[m].mean()
means

status
hypothyroid     33.240426
negative       119.773181
Name: FTI, dtype: float64

In [192]:
# Filling missing age with the mean
columns= ['age','T3','TT4','T4U','TFI']
def fill_with_mean(data):
  for i in columns:
    data[i].fillna(data[i].mean, inplace=True)