<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <style>
        body {
            background-color: #121212;
            color: #E0E0E0;
            font-family: Arial, sans-serif;
        }
        h1, h2, h3 {
            color: #BB86FC;
            text-align: center;
        }
        hr {
            border: 1px solid #03DAC5;
        }
        ul {
            list-style: none;
            padding: 0;
        }
        ul li::before {
            content: "•";
            color: #BB86FC;
            font-weight: bold;
            display: inline-block;
            width: 1em;
            margin-left: -1em;
        }
        ul ul li::before {
            content: "→";
            color: #03DAC5;
        }
        p, ul {
            font-size: 1.1em;
        }
        code {
            background-color: #1F1F1F;
            color: #BB86FC;
            padding: 2px 4px;
            border-radius: 4px;
        }
    </style>
</head>
<body>
    <h1><b><center>Summary of Model Trainings and Quick Insights</center></b></h1>
    <hr>
    <h2>🔍 <b>Model Performances</b></h2>
    <h3>1. Logistic Regression</h3>
    <ul>
        <li><b>Accuracy:</b> 62%</li>
        <li><b>Precision:</b> 0.61 (macro avg)</li>
        <li><b>Recall:</b> 0.55 (macro avg)</li>
        <li><b>F1-Score:</b> 0.56 (weighted avg)</li>
        <li><b>Observations:</b>
            <ul>
                <li>Achieved a high recall for the "No" class but struggled with the "Yes" class, reflecting class imbalance issues.</li>
                <li>Fast training due to simplicity but limited in handling complex datasets due to its linear nature.</li>
            </ul>
        </li>
    </ul>
    <h3>2. XGBoost</h3>
    <ul>
        <li><b>Accuracy:</b> 64%</li>
        <li><b>Precision:</b> 0.59</li>
        <li><b>Recall:</b> 0.36</li>
        <li><b>F1-Score:</b> 0.45</li>
        <li><b>Observations:</b>
            <ul>
                <li>Slightly improved accuracy compared to Logistic Regression.</li>
                <li>Low recall indicates underperformance on certain classes despite hyperparameter tuning.</li>
                <li>Requires further experimentation with hyperparameters and feature scaling to maximize performance.</li>
            </ul>
        </li>
    </ul>
    <h3>3. Decision Tree</h3>
    <ul>
        <li><b>Accuracy:</b> 58.8%</li>
        <li><b>Precision:</b> 0.63 (macro avg)</li>
        <li><b>Recall:</b> 0.59 (macro avg)</li>
        <li><b>F1-Score:</b> 0.55 (weighted avg)</li>
        <li><b>Top Features:</b>
            <ul>
                <li>diag_2_Other</li>
                <li>gender_Male</li>
                <li>encounter_id</li>
            </ul>
        </li>
        <li><b>Observations:</b>
            <ul>
                <li>Lower accuracy compared to Logistic Regression and XGBoost.</li>
                <li>Highly interpretable, with clear feature importance insights.</li>
                <li>Prone to overfitting, particularly with deeper trees.</li>
            </ul>
        </li>
    </ul>
    <hr>
    <h2>🚀 <b>Next Steps to Increase Model Accuracy</b></h2>
    <ul>
        <li>📊 <b>Feature Engineering:</b>
            <ul>
                <li>Scale numerical features using StandardScaler or MinMaxScaler.</li>
                <li>Apply one-hot encoding or other transformations to categorical variables.</li>
                <li>Perform feature selection or elimination based on correlation and importance.</li>
            </ul>
        </li>
        <li>📉 <b>Class Imbalance Handling:</b>
            <ul>
                <li>We tried SMOTE. ADASYN or class-weight adjustments can be tried to improve minority class recall further.</li>
            </ul>
        </li>
        <li>🛠️ <b>Data Augmentation:</b>
            <ul>
                <li>Introduce synthetic data or transformations to improve generalization.</li>
            </ul>
        </li>
        <li>🔄 <b>Cross-Validation:</b>
            <ul>
                <li>Apply stratified K-Fold cross-validation for robust performance evaluation.</li>
            </ul>
        </li>
        <li>✨ <b>Advanced Models:</b>
            <ul>
                <li>We can next try Neural Networks as the relationship of these features seems complex.</li>
            </ul>
        </li>
    </ul>
    <hr>
    <h2>📋 <b>Summary of Model Performances</b></h2>
    <ul>
        <li>📉 <b>Logistic Regression:</b> Consistent performance but limited by its linear assumptions.</li>
        <li>🌟 <b>XGBoost:</b> Best accuracy but struggles with recall. Promising with further tuning.</li>
        <li>🌲 <b>Decision Tree:</b> Interpretability shines, but accuracy lags due to overfitting risks.</li>
    </ul>
    <hr>
    <h2>💡 <b>Conclusion</b></h2>
    <p>
        The next phase will focus on enhancing <b>feature engineering</b>, balancing class distributions, and optimizing model parameters. Leveraging advanced models there is still little scope to improve accuracy, recall, and F1-score, particularly for the minority class (readmitted patients). This iterative process will refine the predictions and contribute to more reliable outcomes.
    </p>
</body>
</html>
