# **PCOS Detection and Classification using Machine Learning**

Author: **Sriraj Thiruchety**

College: **Vasavi College of Engineering, Hyderabad**

Project Title: **PCOS Detection and Classification using Machine Learning**

This project aims to develop an intelligent system for the early **detection** and classification of **Polycystic Ovary Syndrome (PCOS)** using **machine learning algorithms**. The system leverages a structured dataset containing clinical, hormonal, and lifestyle-related parameters collected from women across various age groups.

The workflow involves thorough data preprocessing, including missing value imputation, encoding categorical variables, feature selection using correlation analysis, and scaling techniques to ensure model robustness. Several supervised learning models such as **Logistic Regression, Random Forest, Support Vector Machines (SVM), and XGBoost** are trained and evaluated. Model performance is assessed using metrics such as **accuracy, precision, recall, F1-score, and ROC-AUC curves**.

Hyperparameter tuning via **GridSearchCV** and cross-validation ensures model generalizability. The best-performing model is further validated and interpreted using tools like **confusion matrices** and SHAP values to understand feature importance and model behavior.

The project highlights how machine learning can play a significant role in assisting healthcare professionals with early PCOS diagnosis, potentially **reducing** long-term complications through timely intervention.

# Code and Implementation


In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [6]:
from google.colab import files
uploaded=files.upload()
df = pd.read_excel("PCOS_data_without_infertility.xlsx", sheet_name=1)
df.head(7)


Saving PCOS_data_without_infertility.xlsx to PCOS_data_without_infertility.xlsx


Unnamed: 0,Sl. No,Patient File No.,PCOS (Y/N),Age (yrs),Weight (Kg),Height(Cm),BMI,Blood Group,Pulse rate(bpm),RR (breaths/min),...,Fast food (Y/N),Reg.Exercise(Y/N),BP _Systolic (mmHg),BP _Diastolic (mmHg),Follicle No. (L),Follicle No. (R),Avg. F size (L) (mm),Avg. F size (R) (mm),Endometrium (mm),Unnamed: 44
0,1,1,0,28,44.6,152.0,19.3,15,78,22,...,1.0,0,110,80,3,3,18.0,18.0,8.5,
1,2,2,0,36,65.0,161.5,24.921163,15,74,20,...,0.0,0,120,70,3,5,15.0,14.0,3.7,
2,3,3,1,33,68.8,165.0,25.270891,11,72,18,...,1.0,0,120,80,13,15,18.0,20.0,10.0,
3,4,4,0,37,65.0,148.0,29.674945,13,72,20,...,0.0,0,120,70,2,2,15.0,14.0,7.5,
4,5,5,0,25,52.0,161.0,20.060954,11,72,18,...,0.0,0,120,80,3,4,16.0,14.0,7.0,
5,6,6,0,36,74.1,165.0,27.217631,15,78,28,...,0.0,0,110,70,9,6,16.0,20.0,8.0,
6,7,7,0,34,64.0,156.0,26.298488,11,72,18,...,0.0,0,120,80,6,6,15.0,16.0,6.8,
