# ü©∫ Chronic Kidney Disease Classification using Supervised Machine Learning

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RansiluRanasinghe/CKD-Classification-ML/blob/main/notebook.ipynb)

---

## üìå Notebook Introduction

This Google Colab notebook demonstrates an **end-to-end supervised machine learning pipeline** for predicting **Chronic Kidney Disease (CKD)** using clinical and laboratory data.

The notebook focuses on the practical application of machine learning concepts, including:

1. ‚úÖ **Data understanding and cleaning**
2. ‚úÖ **Handling missing and inconsistent medical data**
3. ‚úÖ **Feature preprocessing and scaling**
4. ‚úÖ **Dimensionality reduction** using Principal Component Analysis (PCA)
5. ‚úÖ **Training and evaluating** a supervised classification model

---

## üéØ Key Focus: PCA with and without Scaling

Special attention is given to **PCA behavior with and without feature scaling**, highlighting its importance when working with medical features that have different units and ranges.

This comparison demonstrates:
- How scaling affects principal component variance
- Why standardization is critical for medical data
- The impact of feature magnitude on dimensionality reduction

---

## üí° Notebook Objectives

The objective of this notebook is not only to achieve **reliable prediction performance**, but also to demonstrate:

- üìä A **clear, explainable** ML workflow
- üîÑ **Reproducible** methodology
- üéì Alignment with **academic coursework** requirements
- üè• Understanding of **real-world medical data** constraints


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
np.random.seed(42)

####Loading the dataset

In [3]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("mansoordaku/ckdisease")

print("Path to dataset files:", path)

Using Colab cache for faster access to the 'ckdisease' dataset.
Path to dataset files: /kaggle/input/ckdisease


In [4]:
import os

dataset_path = "/kaggle/input/ckdisease"

os.listdir(dataset_path)

['kidney_disease.csv']

In [6]:
dataset = os.path.join(dataset_path, "kidney_disease.csv")

df = pd.read_csv(dataset)

display(df.head(10))

Unnamed: 0,id,age,bp,sg,al,su,rbc,pc,pcc,ba,...,pcv,wc,rc,htn,dm,cad,appet,pe,ane,classification
0,0,48.0,80.0,1.02,1.0,0.0,,normal,notpresent,notpresent,...,44,7800.0,5.2,yes,yes,no,good,no,no,ckd
1,1,7.0,50.0,1.02,4.0,0.0,,normal,notpresent,notpresent,...,38,6000.0,,no,no,no,good,no,no,ckd
2,2,62.0,80.0,1.01,2.0,3.0,normal,normal,notpresent,notpresent,...,31,7500.0,,no,yes,no,poor,no,yes,ckd
3,3,48.0,70.0,1.005,4.0,0.0,normal,abnormal,present,notpresent,...,32,6700.0,3.9,yes,no,no,poor,yes,yes,ckd
4,4,51.0,80.0,1.01,2.0,0.0,normal,normal,notpresent,notpresent,...,35,7300.0,4.6,no,no,no,good,no,no,ckd
5,5,60.0,90.0,1.015,3.0,0.0,,,notpresent,notpresent,...,39,7800.0,4.4,yes,yes,no,good,yes,no,ckd
6,6,68.0,70.0,1.01,0.0,0.0,,normal,notpresent,notpresent,...,36,,,no,no,no,good,no,no,ckd
7,7,24.0,,1.015,2.0,4.0,normal,abnormal,notpresent,notpresent,...,44,6900.0,5.0,no,yes,no,good,yes,no,ckd
8,8,52.0,100.0,1.015,3.0,0.0,normal,abnormal,present,notpresent,...,33,9600.0,4.0,yes,yes,no,good,no,yes,ckd
9,9,53.0,90.0,1.02,2.0,0.0,abnormal,abnormal,present,notpresent,...,29,12100.0,3.7,yes,yes,no,poor,no,yes,ckd


####Dataset Analysis