<a href="https://colab.research.google.com/github/TemiOyee/logistic_regression/blob/main/PCOS_Classification_using_Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Polycystic Ovary Syndrome (PCOS) Classification using Logistic Regression**

## **Overview**

This project focuses on utilizing Logistic Regression for the classification of individuals with Polycystic Ovary Syndrome (PCOS).

**Logistic Regression**, a powerful binary classification algorithm, has been employed to predict the likelihood of PCOS based on various health-related features.

The goal is to contribute to early detection, personalized health recommendations, and clinical decision support in the context of PCOS.

####**Objective**

The primary objective is to build a predictive model that can accurately classify individuals as either having or not having PCOS. This classification is based on features such as age, weight, hormonal levels, and other health parameters.







####**Dataset**
Two datasets were utilized in the project:

1. PCOS_data_without_infertility.xlsx: Contains information on individuals, including whether they have PCOS (Y/N), age, weight, hormonal levels, and other health-related features

2. PCOS_infertility.csv: Provides additional information on infertility parameters for a subset of individuals.

Link to dataset: https://www.kaggle.com/datasets/prasoonkottarathil/polycystic-ovary-syndrome-pcos

####**Methodology**
The project follows these key steps:

1. Data Preprocessing: Handling missing values, scaling features, and preparing the data for model training.

2. Logistic Regression: Training a Logistic Regression model on the preprocessed data to predict the presence or absence of PCOS.

3. Evaluation: Assessing the model's performance using metrics such as accuracy, precision, recall, and the confusion matrix.


## **Implementation**

The Libaries used in this project includes:

1. pandas: Used for data manipulation and analysis.

2. train_test_split: Splits the dataset into training and testing sets.

3. StandardScaler: Standardizes features by removing the mean and scaling to unit variance.

4. LogisticRegression: Implements logistic regression for classification.

5. accuracy_score, classification_report, confusion_matrix: Metrics for evaluating the model's performance.

6. SimpleImputer: Handles missing values in the dataset.

In [None]:
# Load datasets
import pandas as pd
import numpy as np

In [None]:
# Load datasets
pcos_a= pd.read_excel("PCOS_data_without_infertility.xlsx")
pcos_a.head()

Unnamed: 0,Sl. No,Patient File No.,PCOS (Y/N),Age (yrs),Weight (Kg),Height(Cm),BMI,Blood Group,Pulse rate(bpm),RR (breaths/min),...,Fast food (Y/N),Reg.Exercise(Y/N),BP _Systolic (mmHg),BP _Diastolic (mmHg),Follicle No. (L),Follicle No. (R),Avg. F size (L) (mm),Avg. F size (R) (mm),Endometrium (mm),Unnamed: 44
0,1,1,0,28,44.6,152.0,19.3,15,78,22,...,1.0,0,110,80,3,3,18.0,18.0,8.5,
1,2,2,0,36,65.0,161.5,24.921163,15,74,20,...,0.0,0,120,70,3,5,15.0,14.0,3.7,
2,3,3,1,33,68.8,165.0,25.270891,11,72,18,...,1.0,0,120,80,13,15,18.0,20.0,10.0,
3,4,4,0,37,65.0,148.0,29.674945,13,72,20,...,0.0,0,120,70,2,2,15.0,14.0,7.5,
4,5,5,0,25,52.0,161.0,20.060954,11,72,18,...,0.0,0,120,80,3,4,16.0,14.0,7.0,


In [None]:
pcos_b= pd.read_csv("PCOS_infertility.csv")
pcos_b.head()

Unnamed: 0,Sl. No,Patient File No.,PCOS (Y/N),I beta-HCG(mIU/mL),II beta-HCG(mIU/mL),AMH(ng/mL)
0,1,10001,0,1.99,1.99,2.07
1,2,10002,0,60.8,1.99,1.53
2,3,10003,1,494.08,494.08,6.63
3,4,10004,0,1.99,1.99,1.22
4,5,10005,0,801.45,801.45,2.26


In [None]:
#Combine datasets based on common columns

pcos= pd.merge(pcos_a,pcos_b, on=["Sl. No", "PCOS (Y/N)"], how="inner")

 Data Exploration and Preprocessing

In [None]:
pcos.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 541 entries, 0 to 540
Data columns (total 49 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Sl. No                    541 non-null    int64  
 1   Patient File No._x        541 non-null    int64  
 2   PCOS (Y/N)                541 non-null    int64  
 3    Age (yrs)                541 non-null    int64  
 4   Weight (Kg)               541 non-null    float64
 5   Height(Cm)                541 non-null    float64
 6   BMI                       541 non-null    float64
 7   Blood Group               541 non-null    int64  
 8   Pulse rate(bpm)           541 non-null    int64  
 9   RR (breaths/min)          541 non-null    int64  
 10  Hb(g/dl)                  541 non-null    float64
 11  Cycle(R/I)                541 non-null    int64  
 12  Cycle length(days)        541 non-null    int64  
 13  Marraige Status (Yrs)     540 non-null    float64
 14  Pregnant(Y

In [None]:
pcos.describe()

Unnamed: 0,Sl. No,Patient File No._x,PCOS (Y/N),Age (yrs),Weight (Kg),Height(Cm),BMI,Blood Group,Pulse rate(bpm),RR (breaths/min),...,BP _Systolic (mmHg),BP _Diastolic (mmHg),Follicle No. (L),Follicle No. (R),Avg. F size (L) (mm),Avg. F size (R) (mm),Endometrium (mm),Patient File No._y,I beta-HCG(mIU/mL)_y,II beta-HCG(mIU/mL)_y
count,541.0,541.0,541.0,541.0,541.0,541.0,541.0,541.0,541.0,541.0,...,541.0,541.0,541.0,541.0,541.0,541.0,541.0,541.0,541.0,541.0
mean,271.0,271.0,0.327172,31.430684,59.637153,156.484835,24.319353,13.802218,73.247689,19.243993,...,114.661738,76.927911,6.12939,6.641405,15.018115,15.451701,8.475915,10271.0,664.549235,238.229518
std,156.317519,156.317519,0.469615,5.411006,11.028287,6.033545,4.050819,1.840812,4.430285,1.688629,...,7.384556,5.574112,4.229294,4.436889,3.566839,3.318848,2.165381,156.317519,3348.920576,1603.826221
min,1.0,1.0,0.0,20.0,31.0,137.0,12.417882,11.0,13.0,16.0,...,12.0,8.0,0.0,0.0,0.0,0.0,0.0,10001.0,1.3,0.110417
25%,136.0,136.0,0.0,28.0,52.0,152.0,21.707923,13.0,72.0,18.0,...,110.0,70.0,3.0,3.0,13.0,13.0,7.0,10136.0,1.99,1.99
50%,271.0,271.0,0.0,31.0,59.0,156.0,24.238227,14.0,72.0,18.0,...,110.0,80.0,5.0,6.0,15.0,16.0,8.5,10271.0,20.0,1.99
75%,406.0,406.0,1.0,35.0,65.0,160.0,26.638918,15.0,74.0,20.0,...,120.0,80.0,9.0,10.0,18.0,18.0,9.8,10406.0,297.21,97.63
max,541.0,541.0,1.0,48.0,108.0,180.0,38.900714,18.0,82.0,28.0,...,140.0,100.0,22.0,20.0,24.0,24.0,18.0,10541.0,32460.97,25000.0


In [None]:
#Handle missing values
pcos = pcos.fillna(0)

In [None]:
# Drop unnecessary columns
pcos = pcos.drop(["Sl. No", "Patient File No._x"], axis=1)

In [None]:
# Check for any remaining missing values
pcos.isnull().sum()

PCOS (Y/N)                  0
 Age (yrs)                  0
Weight (Kg)                 0
Height(Cm)                  0
BMI                         0
Blood Group                 0
Pulse rate(bpm)             0
RR (breaths/min)            0
Hb(g/dl)                    0
Cycle(R/I)                  0
Cycle length(days)          0
Marraige Status (Yrs)       0
Pregnant(Y/N)               0
No. of aborptions           0
  I   beta-HCG(mIU/mL)_x    0
II    beta-HCG(mIU/mL)_x    0
FSH(mIU/mL)                 0
LH(mIU/mL)                  0
FSH/LH                      0
Hip(inch)                   0
Waist(inch)                 0
Waist:Hip Ratio             0
TSH (mIU/L)                 0
AMH(ng/mL)_x                0
PRL(ng/mL)                  0
Vit D3 (ng/mL)              0
PRG(ng/mL)                  0
RBS(mg/dl)                  0
Weight gain(Y/N)            0
hair growth(Y/N)            0
Skin darkening (Y/N)        0
Hair loss(Y/N)              0
Pimples(Y/N)                0
Fast food 

 Define Features and Target Variable

**Features(independent variables or attributes):** These are the input variables or characteristics of the data that are used to make predictions.


**Target variable (dependent variable or label)**: This is the variable we want to predict. It is the outcome or result that the model will learn to predict based on the features.




In [None]:
# Define features and target variable
X = pcos.drop("PCOS (Y/N)", axis=1)
y = pcos["PCOS (Y/N)"]


**X**

This variable holds the features. pcos.drop("PCOS (Y/N)", axis=1) is removing the column labeled "PCOS (Y/N)" from the dataset (pcos), and the remaining columns become the features. So, X contains all the columns in the dataset except the one representing whether an individual has PCOS or not. It contains all the features, such as age, weight, hormonal levels, etc., excluding the column indicating the presence or absence of PCOS.


**y**

 This variable holds the target variable. pcos["PCOS (Y/N)"] selects the column labeled "PCOS (Y/N)" from the dataset (pcos). This column contains the information about whether each individual has PCOS or not. It contains the target variable, which represents whether an individual has PCOS (1) or not (0). This is the variable that the machine learning model will try to predict based on the information provided in the features.


In [None]:
X


Unnamed: 0,Age (yrs),Weight (Kg),Height(Cm),BMI,Blood Group,Pulse rate(bpm),RR (breaths/min),Hb(g/dl),Cycle(R/I),Cycle length(days),...,Follicle No. (L),Follicle No. (R),Avg. F size (L) (mm),Avg. F size (R) (mm),Endometrium (mm),Unnamed: 44,Patient File No._y,I beta-HCG(mIU/mL)_y,II beta-HCG(mIU/mL)_y,AMH(ng/mL)_y
0,28,44.6,152.000,19.300000,15,78,22,10.48,2,5,...,3,3,18.0,18.0,8.5,0,10001,1.99,1.99,2.07
1,36,65.0,161.500,24.921163,15,74,20,11.70,2,5,...,3,5,15.0,14.0,3.7,0,10002,60.80,1.99,1.53
2,33,68.8,165.000,25.270891,11,72,18,11.80,2,5,...,13,15,18.0,20.0,10.0,0,10003,494.08,494.08,6.63
3,37,65.0,148.000,29.674945,13,72,20,12.00,2,5,...,2,2,15.0,14.0,7.5,0,10004,1.99,1.99,1.22
4,25,52.0,161.000,20.060954,11,72,18,10.00,2,5,...,3,4,16.0,14.0,7.0,0,10005,801.45,801.45,2.26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
536,35,50.0,164.592,18.456637,17,72,16,11.00,2,5,...,1,0,17.5,10.0,6.7,0,10537,1.99,1.99,1.7
537,30,63.2,158.000,25.316456,15,72,18,10.80,2,5,...,9,7,19.0,18.0,8.2,0,10538,80.13,1.99,5.6
538,36,54.0,152.000,23.372576,13,74,20,10.80,2,6,...,1,0,18.0,9.0,7.3,0,10539,1.99,1.99,3.7
539,27,50.0,150.000,22.222222,15,74,20,12.00,4,2,...,7,6,18.0,16.0,11.5,0,10540,292.92,1.99,5.2


In [None]:
y

0      0
1      0
2      1
3      0
4      0
      ..
536    0
537    0
538    0
539    0
540    1
Name: PCOS (Y/N), Length: 541, dtype: int64

Spliting the Data into Training and Testing Sets

In [None]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=0)


In [None]:
print(X.shape, X_train.shape, X_test.shape)

(541, 46) (432, 46) (109, 46)


**train_test_split function:** randomly shuffles the data and splits it into two sets: one for training the model and one for testing the model's performance.

**X_train and y_train:** used to train the machine learning model.

**X_test and y_test:** used to evaluate the model's performance on data it has never seen before.

Setting **random_state=42** ensures that the random split is reproducible, meaning if you run the code again with the same dataset, you'll get the same split.

test_size=0.2 implies that 20% of the data will be reserved for testing, and the remaining 80% will be used for training.

Feature Scaling

**Feature scaling** is essential for many machine learning algorithms, especially those that are sensitive to the scale of features (e.g., distance-based algorithms).




In [None]:
from sklearn.preprocessing import StandardScaler

# Convert non-numeric values to NaN row-wise
X_train = X_train.apply(pd.to_numeric, errors='coerce')
X_test = X_test.apply(pd.to_numeric, errors='coerce')

# Drop rows with NaN values
X_train = X_train.dropna()
X_test = X_test.dropna()

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


 Train the Logistic Regression Model

In [None]:
from sklearn.linear_model import LogisticRegression
import numpy as np

# Handle missing values in X_train if needed
y_train = np.ravel(y_train)

# Check the number of samples in X_train and y_train
num_samples_X_train = X_train.shape[0]
num_samples_y_train = y_train.shape[0]

# Align the number of samples if they are inconsistent
min_samples = min(num_samples_X_train, num_samples_y_train)
X_train = X_train[:min_samples, :]
y_train = y_train[:min_samples]

# Check the number of samples again
num_samples_X_train = X_train.shape[0]
num_samples_y_train = y_train.shape[0]
print("Number of samples in X_train:", num_samples_X_train)
print("Number of samples in y_train:", num_samples_y_train)

# Train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)


Number of samples in X_train: 429
Number of samples in y_train: 429


 Evaluate the Model

In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Ensure y_test is a 1D array (flatten if necessary)
y_test = np.ravel(y_test)

# Check the number of samples in X_test and y_test
num_samples_X_test = X_test.shape[0]
num_samples_y_test = y_test.shape[0]

# Align the number of samples if they are inconsistent
min_samples = min(num_samples_X_test, num_samples_y_test)
X_test = X_test[:min_samples, :]
y_test = y_test[:min_samples]

# Check the number of samples again
num_samples_X_test = X_test.shape[0]
num_samples_y_test = y_test.shape[0]
print("Number of samples in X_test:", num_samples_X_test)
print("Number of samples in y_test:", num_samples_y_test)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(classification_report(y_test, y_pred))


Number of samples in X_test: 109
Number of samples in y_test: 109
Accuracy: 0.7981651376146789
Confusion Matrix:
[[64  6]
 [16 23]]
Classification Report:
              precision    recall  f1-score   support

           0       0.80      0.91      0.85        70
           1       0.79      0.59      0.68        39

    accuracy                           0.80       109
   macro avg       0.80      0.75      0.76       109
weighted avg       0.80      0.80      0.79       109



# **Result and Analysis**


##**Confusion Matrix**

[[64  6]

 [16 23]]

1. True Positive (TP): 23
2. True Negative (TN): 64
3. False Positive (FP): 6
4. False Negative (FN): 16


##**Accuracy**

This is the ratio of correctly predicted instances to the total instances.

~Accuracy = (TP + TN) / (TP + TN + FP + FN)

~Accuracy = (64 + 23) / 109 ≈ 0.7982 (or 79.82%)

## **Precision, Recall, and F1-Score**


              precision    recall  f1-score   support

        0       0.80      0.91      0.85        70
        1       0.79      0.59      0.68        39


**Precision (for class 1):** Precision is the ratio of correctly predicted positive observations to the total predicted positives.

~Precision = TP / (TP + FP) = 23 / (23 + 6) ≈ 0.7931 (or 79.31%)

**Recall (for class 1):** Recall is the ratio of correctly predicted positive observations to all observations in the actual class.

~Recall = TP / (TP + FN) = 23 / (23 + 16) ≈ 0.5897 (or 58.97%)


**F1-Score (for class 1):** The weighted average of Precision and Recall.

~ F1-Score = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.7931 * 0.5897) / (0.7931 + 0.5897) ≈ 0.6757 (or 67.57%)


##**Classification Report**

The classification report provides precision, recall, and F1-score for both classes (0 and 1) along with support (the number of actual occurrences of the class in the specified dataset).

##**Interpretation**

* The overall accuracy of your model is approximately 79.82%, indicating a good performance in correctly predicting both classes.

* The precision, recall, and F1-score for class 1 (the positive class) are relatively balanced, suggesting that the model is performing well in identifying instances of class 1. The F1-Score is particularly important when dealing with imbalanced datasets, as it provides a balance between precision and recall.

* The macro average and weighted average in the classification report are close to each other, indicating a balanced dataset.
The precision and recall for class 0 (the negative class) are high, suggesting good performance in identifying instances of class 0.


In summary, the Logistic Regression model appears to be performing well, especially in correctly identifying instances of class 1. The balanced precision and recall for both classes, along with a high overall accuracy, are also positive indicators.







# **Impact in Business**


**Positive Impact on Healthcare Decision-Making**

If the logistic regression model is used in a healthcare setting for PCOS prediction, the relatively high accuracy and balanced performance for both classes are positive signs. It could assist healthcare professionals in making more informed decisions about patient risk and potential early intervention.

**Potential for Early Intervention**

The model's ability to identify instances of class 1 (individuals with PCOS) with reasonable precision and recall could contribute to early intervention strategies. Early detection of PCOS can lead to timely medical interventions and lifestyle changes, potentially improving health outcomes.


**Education and Awareness Programs**

Businesses could consider implementing education and awareness programs based on the insights from the model. This could involve targeted campaigns for individuals at higher risk of PCOS, emphasizing the importance of regular health check-ups, lifestyle modifications, and early intervention.


# **Testing other Models**


Random Forest Model

In [None]:
# ... (your existing code)

# Try using a different model, e.g., Random Forest
from sklearn.ensemble import RandomForestClassifier

# Train the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(classification_report(y_test, y_pred))


Accuracy: 0.8532110091743119
Confusion Matrix:
[[70  0]
 [16 23]]
Classification Report:
              precision    recall  f1-score   support

           0       0.81      1.00      0.90        70
           1       1.00      0.59      0.74        39

    accuracy                           0.85       109
   macro avg       0.91      0.79      0.82       109
weighted avg       0.88      0.85      0.84       109



XGBoost Model

In [None]:
# Install xgboost if not already installed
# pip install xgboost

from xgboost import XGBClassifier

# Train the XGBoost model
xgb_model = XGBClassifier(random_state=42)
xgb_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_xgb = xgb_model.predict(X_test)

# Evaluate the XGBoost model
accuracy_xgb = accuracy_score(y_test, y_pred_xgb)
conf_matrix_xgb = confusion_matrix(y_test, y_pred_xgb)

print(f"XGBoost Accuracy: {accuracy_xgb}")
print("Confusion Matrix:")
print(conf_matrix_xgb)
print("Classification Report:")
print(classification_report(y_test, y_pred_xgb))


XGBoost Accuracy: 0.8165137614678899
Confusion Matrix:
[[64  6]
 [14 25]]
Classification Report:
              precision    recall  f1-score   support

           0       0.82      0.91      0.86        70
           1       0.81      0.64      0.71        39

    accuracy                           0.82       109
   macro avg       0.81      0.78      0.79       109
weighted avg       0.82      0.82      0.81       109



Neural Network with TensorFlow

In [None]:
# Install TensorFlow if not already installed
# pip install tensorflow

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Build a simple neural network
model_nn = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile the model
model_nn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the neural network
model_nn.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.2)

# Evaluate the neural network
accuracy_nn = model_nn.evaluate(X_test, y_test)[1]

print(f"Neural Network Accuracy: {accuracy_nn}")


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Neural Network Accuracy: 0.7981651425361633


Support Vector Machine (SVM) Model

In [None]:
# Import necessary libraries
from sklearn.svm import SVC

# Define the SVM model
svm_model = SVC(random_state=42)

# Train the SVM model
svm_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_svm = svm_model.predict(X_test)

# Evaluate the SVM model
accuracy_svm = accuracy_score(y_test, y_pred_svm)
conf_matrix_svm = confusion_matrix(y_test, y_pred_svm)
classification_rep_svm = classification_report(y_test, y_pred_svm)

# Print the results for SVM
print("Model: Support Vector Machine")
print(f"Accuracy: {accuracy_svm}")
print("Confusion Matrix:")
print(conf_matrix_svm)
print("Classification Report:")
print(classification_rep_svm)
print("\n" + "="*40 + "\n")


Model: Support Vector Machine
Accuracy: 0.8348623853211009
Confusion Matrix:
[[68  2]
 [16 23]]
Classification Report:
              precision    recall  f1-score   support

           0       0.81      0.97      0.88        70
           1       0.92      0.59      0.72        39

    accuracy                           0.83       109
   macro avg       0.86      0.78      0.80       109
weighted avg       0.85      0.83      0.82       109





RANKING

In [None]:
# Import necessary libraries
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Define a list of models
models = [
    ("Logistic Regression", LogisticRegression()),
    ("Decision Tree", DecisionTreeClassifier(random_state=42)),
    ("Random Forest", RandomForestClassifier(n_estimators=100, random_state=42)),
    ("Support Vector Machine", SVC(random_state=42)),
    ("XGBoost", XGBClassifier(random_state=42))
]

# Train, evaluate, and rank the models
ranked_models = []

for model_name, model in models:
    # Train the model
    model.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = model.predict(X_test)

    # Evaluate the model
    accuracy = accuracy_score(y_test, y_pred)
    conf_matrix = confusion_matrix(y_test, y_pred)
    classification_rep = classification_report(y_test, y_pred)

    # Rank the models based on accuracy
    ranked_models.append((model_name, accuracy))

    # Print the results
    print(f"Model: {model_name}")
    print(f"Accuracy: {accuracy}")
    print("Confusion Matrix:")
    print(conf_matrix)
    print("Classification Report:")
    print(classification_rep)
    print("\n" + "="*40 + "\n")

# Rank models based on accuracy
ranked_models.sort(key=lambda x: x[1], reverse=True)

# Print the ranked models
print("Ranked Models:")
for rank, (model_name, accuracy) in enumerate(ranked_models, 1):
    print(f"{rank}. {model_name}: {accuracy}")


Model: Logistic Regression
Accuracy: 0.7981651376146789
Confusion Matrix:
[[64  6]
 [16 23]]
Classification Report:
              precision    recall  f1-score   support

           0       0.80      0.91      0.85        70
           1       0.79      0.59      0.68        39

    accuracy                           0.80       109
   macro avg       0.80      0.75      0.76       109
weighted avg       0.80      0.80      0.79       109



Model: Decision Tree
Accuracy: 0.6972477064220184
Confusion Matrix:
[[57 13]
 [20 19]]
Classification Report:
              precision    recall  f1-score   support

           0       0.74      0.81      0.78        70
           1       0.59      0.49      0.54        39

    accuracy                           0.70       109
   macro avg       0.67      0.65      0.66       109
weighted avg       0.69      0.70      0.69       109



Model: Random Forest
Accuracy: 0.8532110091743119
Confusion Matrix:
[[70  0]
 [16 23]]
Classification Report:
      