HERE IS A SUMMARY OF THE PCOS PREDICTION PROJECT I WORKED ON WHICH HAS TO DO WITH DATA PROCESSING AND MODEL TRAINING:
-
Data Import and Inspection:
- Imported the dataset and checked basic information such as column names, data types, and missing values.
-
Handling Missing Data:
- Identified and handled missing values for key columns by filling NaNs with appropriate methods.
-
Encoding Categorical Variables:
- Applied One-Hot Encoding to convert categorical variables into binary columns.
- Used custom mappings to convert string values in BMI and Stress Levels columns into numeric values.
- Applied the mappings to the respective columns (BMI & Stress Levels).
-
Data Type Verification:
- Verified and corrected data types for consistency and ensured that all columns had the correct type.
-
Label Encoding:
- Initialized and applied LabelEncoder to convert binary/categorical columns to numeric values.
-
Feature Standardization:
- Standardized numerical features to ensure they are on a comparable scale.
-
Feature and Target Variable Definition:
- Defined features (independent variables) and target variable (dependent variable).
-
Data Splitting:
- Split the data into training and testing sets for model evaluation.
-
Model Training and Evaluation:
- Trained a prediction model on the training set.
- Evaluated model performance with classification reports and confusion matrix.
-
Model Visualization:
- Visualized results by plotting feature importance and showing the confusion matrix.
-
Exploratory Data Analysis (EDA):
- Visualized the distribution of Age and BMI.
- Used Pairplot to explore relationships between numerical features (Age, BMI, Lifestyle Score, and Stress Levels).
- Generated a correlation heatmap to analyze feature correlations.