# Chronic Kidney Disease Prediction using Machine Learning


## Problem Statement

Chronic Kidney Disease (CKD) is a serious health condition that can lead to kidney failure if not detected early. 
The objective of this project is to build a machine learning model that predicts whether a patient has CKD or not based on clinical and laboratory features.

## Dataset Information

- Source: UCI Machine Learning Repository
- Number of Instances: 400
- Number of Features: 24
- Target Variable: class (ckd / notckd)
- Contains missing values

In [30]:
import pandas as pd

In [31]:
df = pd.read_csv("./data/ckd_raw.csv")

df.head()

Unnamed: 0,age,bp,sg,al,su,rbc,pc,pcc,ba,bgr,...,pcv,wc,rc,htn,dm,cad,appet,pe,ane,class
0,48.0,80.0,1.02,1.0,0.0,,normal,notpresent,notpresent,121.0,...,44.0,7800.0,5.2,yes,yes,no,good,no,no,ckd
1,7.0,50.0,1.02,4.0,0.0,,normal,notpresent,notpresent,,...,38.0,6000.0,,no,no,no,good,no,no,ckd
2,62.0,80.0,1.01,2.0,3.0,normal,normal,notpresent,notpresent,423.0,...,31.0,7500.0,,no,yes,no,poor,no,yes,ckd
3,48.0,70.0,1.005,4.0,0.0,normal,abnormal,present,notpresent,117.0,...,32.0,6700.0,3.9,yes,no,no,poor,yes,yes,ckd
4,51.0,80.0,1.01,2.0,0.0,normal,normal,notpresent,notpresent,106.0,...,35.0,7300.0,4.6,no,no,no,good,no,no,ckd


In [32]:
df.shape

(401, 25)

In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 401 entries, 0 to 400
Data columns (total 25 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   age     390 non-null    float64
 1   bp      387 non-null    float64
 2   sg      352 non-null    float64
 3   al      353 non-null    float64
 4   su      350 non-null    float64
 5   rbc     247 non-null    object 
 6   pc      334 non-null    object 
 7   pcc     395 non-null    object 
 8   ba      395 non-null    object 
 9   bgr     355 non-null    float64
 10  bu      380 non-null    float64
 11  sc      382 non-null    float64
 12  sod     312 non-null    float64
 13  pot     311 non-null    float64
 14  hemo    347 non-null    float64
 15  pcv     328 non-null    float64
 16  wc      293 non-null    float64
 17  rc      268 non-null    float64
 18  htn     397 non-null    object 
 19  dm      397 non-null    object 
 20  cad     397 non-null    object 
 21  appet   398 non-null    object 
 22  pe

In [34]:
df.isnull().sum()

age       11
bp        14
sg        49
al        48
su        51
rbc      154
pc        67
pcc        6
ba         6
bgr       46
bu        21
sc        19
sod       89
pot       90
hemo      54
pcv       73
wc       108
rc       133
htn        4
dm         4
cad        4
appet      3
pe         3
ane        3
class      2
dtype: int64

## Target Variable

The target variable for this classification problem is "class".
It indicates whether the patient has CKD or not.

In [35]:
df['class'].value_counts()

class
ckd       250
notckd    149
Name: count, dtype: int64

## Type of Machine Learning Problem

This is a Supervised Learning problem.
It is a Binary Classification task since the target variable has two categories:
- ckd
- notckd