# üìä Final Project Summary: Credit Risk Assessment

---

## üîé Problem Statement
Financial institutions need reliable methods to assess loan applicants and predict default risk.  
This project builds a machine learning model to classify whether a loan should be approved or rejected based on applicant and loan features.

---

## üìÇ Dataset Explanation
- **Features**: Age, Gender, Education, Income, Employment Experience, Home Ownership, Loan Amount, Loan Intent, Loan Interest Rate, Loan Percent Income, Credit History Length, Credit Score, Previous Defaults, Debt-to-Income Ratio, Age Group.  
- **Target**: Loan Status (Approved = 1, Not Approved = 0).  
- **Source**: Publicly available credit risk datasets.

---

## üõ†Ô∏è Preprocessing Steps
- Missing values handled with imputation (mean/median for numeric, mode for categorical).  
- Categorical variables encoded (Label Encoding / One-Hot Encoding).  
- Numerical features scaled for consistency.  
- Train-test split applied (e.g., 80/20).  
- Addressed class imbalance with SMOTE / class weights.

---

## üìà Visualizations
- Distribution plots for income, credit score, loan amount.  
- Correlation heatmap to identify feature relationships.  
- Loan approval rates segmented by education, gender, home ownership.  
- ROC curves and confusion matrices for model evaluation.

---

## ü§ñ Algorithms Used
- Logistic Regression  
- Random Forest  
- Gradient Boosting  
- XGBoost  

---

## ‚öñÔ∏è Performance Comparison (Final Results)

### Logistic Regression
- **Best Params**: `{'C': 10, 'solver': 'liblinear'}`  
- **Accuracy**: 0.896556  
- **Precision**: 0.7768  
- **Recall**: 0.75  
- **F1 Score**: 0.763165  

### Random Forest
- **Best Params**: `{'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}`  
- **Accuracy**: 0.929556  
- **Precision**: 0.899415  
- **Recall**: 0.769  
- **F1 Score**: 0.829111  

### Gradient Boosting
- **Best Params**: `{'learning_rate': 0.2, 'max_depth': 5, 'n_estimators': 200}`  
- **Accuracy**: 0.939333  
- **Precision**: 0.904338  
- **Recall**: 0.813  
- **F1 Score**: 0.85624  

### XGBoost
- **Best Params**: `{'learning_rate': 0.2, 'max_depth': 7, 'n_estimators': 200}`  
- **Accuracy**: 0.936778  
- **Precision**: 0.892485  
- **Recall**: 0.8135  
- **F1 Score**: 0.851164  

---

## üåê Deployment Details
- **Backend**: FastAPI hosted on Hugging Face Spaces  
  - Endpoint: `/predict`  
  - Docs: [Swagger UI](https://sujith2121-credit-risk-assessment-fastapi.hf.space/docs)  
- **Frontend**: Streamlit hosted on Streamlit Community Cloud  
  - App: [Credit Risk Assessment Streamlit](https://credit-risk-assessment-stream.streamlit.app/)  
- **Note**: Both services are on free tiers, so they may **go to sleep when idle** and take a few seconds to wake up.

---

## üéì Learning Outcomes
- Learned how to preprocess and balance real-world credit datasets.  
- Compared multiple ML algorithms and tuned hyperparameters.  
- Understood trade-offs between accuracy, precision, recall, and F1-score.  
- Built and deployed a full-stack ML solution (FastAPI backend + Streamlit frontend).  
- Gained experience with Dockerization, Hugging Face Spaces, and Streamlit Cloud deployment.

---

## ‚ö†Ô∏è Challenges Faced 
- Choosing appropriate evaluation metrics beyond accuracy.  
- Hyperparameter tuning for ensemble methods.  
- Deployment quirks (Hugging Face requires port 7860, Streamlit apps sleep when idle).  
- Ensuring reproducibility and smooth integration between backend and frontend.

---
