## Conclusions Logistic Regression

1. **General Performance**
   - All logistic regression models achieved **very high accuracy (>97%)**, but this metric is misleading due to class imbalance.
   - The key focus was on detecting **fraud cases** (minority class), which had only 95 instances in the test set.

2. **Recall on Fraud Class**
   - All models achieved **high recall** (≈0.87) for fraud, meaning they correctly identified most frauds.
   - This came at the cost of **very low precision** (≈0.05–0.11), indicating many false positives.

3. **Balanced SMOTE Models**
   - Models like `logistic_regression_bsmote_eval.pkl` and its variations:
     - Achieved **slightly higher precision** (up to 0.11) on the fraud class.
     - F1-score was still low (max ≈0.20).
     - ROC AUC varied between **0.927–0.944**, showing decent separation ability.

4. **Cleaned-Only Models**
   - Models without SMOTE (e.g. `logistic_regression_clean_eval_*.pkl`):
     - Had **extremely consistent recall ≈0.8737** across all variants.
     - Precision remained very low (~0.0529), suggesting limited improvement.
     - Slight increase in ROC AUC (**up to 0.9690** in the best case).
     - Many of these models were effectively **identical**, suggesting the optimization had converged.

5. **Conclusion**
   - Logistic regression is not suitable for fraud detection in this case.
   - Despite its good recall, the **model is too often wrong**, labeling legitimate transactions as fraudulent.
   - A F1-score of < 0.20** indicates low overall quality.
   - The models are not ready for actual use in the banking environment.

## Conclusions RandomForest
1.	The model showed a very high overall accuracy (accuracy of about 0.9999) and a very good ability to detect fraud even with a modified threshold (threshold = 0.7), which allows for better control of the trade-off between false positives and false negatives.

2.	The recall for the Fraud class reached 95.8%, which is a very high rate - the model detects almost all cases of fraud.

3.	The F1-measure for Fraud was in the range of ~0.95-0.98, which indicates a balance between precision and recall, and a very good ability of the model to work with a difficult (less represented) class.

4.	ROC AUC ~0.99+, which confirms the model's excellent ability to separate classes even at different probability thresholds.

5.	Easy retraining is possible, as some models had perfect metrics on the test set (e.g., 100% accuracy for the Legit class and almost 96% recall for Fraud). This may indicate that the model has partially adjusted to the data and should be tested on a completely independent dataset.

## Conclusions CatBoost & LightGBM Models

1. **General Performance**
   - Both CatBoost and LightGBM models achieved **very high accuracy (~99.96–99.98%)**, **excellent ROC AUC (>0.997)**, and strong fraud detection metrics.
   - These results place them among the **top-performing models**, comparable to the best XGBoost variants.

2. **CatBoost Tuned Model (`catboost_tuned_eval.pkl`)**
   - Achieved **precision = 0.94**, **recall = 0.97**, and **F1-score = 0.95** on the fraud class.
   - The model made only **3 false negatives** and **6 false positives** out of ~57k samples.
   - ROC AUC = **0.9976**, indicating very strong discrimination capability.
   - **Excellent balance** between catching fraud and minimizing false alarms.

3. **LightGBM Tuned Model (`lightgbm_tuned_eval.pkl`)**
   - Achieved **recall = 0.97**, but slightly lower **precision = 0.81** on fraud class.
   - Produced more false positives (**21**) than CatBoost, which slightly lowered its F1-score (0.88).
   - ROC AUC = **0.9980**, still among the highest.
   - **Strong recall, moderate precision**, may be suitable where catching fraud is more critical than reducing false positives.

4. **Conclusion**
   - Both models are **highly effective** and can be used in production.
   - **CatBoost** outperforms LightGBM in **precision and overall balance**, making it slightly more robust.
   - **LightGBM** may still be preferred in cases requiring higher recall and model speed.

## Conclusions Neural Network (NN)

1. **General Performance**
   - The tuned neural network model achieved **very high accuracy (~99.97%)**, close to the best tree-based models.
   - However, as always, accuracy is **misleading** due to class imbalance.

2. **Recall vs. Precision Trade-off**
   - With the standard threshold, the model achieved **recall ≈0.79** and **precision ≈0.99** for fraud detection:
     - This means the model **rarely makes false alarms** but still **misses some fraud cases**.
   - When the threshold was lowered (e.g., 0.1), recall increased to **~0.89**, but precision dropped to **~0.19**:
     - The model detected more fraud but **made many incorrect fraud predictions**.

3. **F1-score and Balance**
   - F1-score ranged from **~0.32 to ~0.88**, depending on the threshold.
   - Indicates that the model can either:
     - Be **very conservative** (few false positives, lower recall), or
     - Be **aggressive** (high recall, low precision), depending on threshold tuning.

4. **Conclusion**
   - The neural network shows **potential**, but requires **careful threshold selection** depending on business priorities.
   - While it does not outperform the best tuned XGBoost or CatBoost models, its results are **still solid**.
   - May be useful in ensemble settings or where neural networks are preferred.

## Summary

- **Logistic Regression** had good recall but very low precision — it marked too many normal transactions as fraud.  
  ➤ Not suitable for real use.

- **Random Forest** worked well but might be slightly overfitted.  
  ➤ Needs more testing on new data.

- **XGBoost** (with tuning or class weights) was the best:
  - High precision and recall (≈96–100%)
  - Some models made **zero** false alerts  
  ➤ Ready for real use.

- **CatBoost and LightGBM** also showed great results.
  - **CatBoost** was more balanced.
  - **LightGBM** found more fraud but made more mistakes.  
  ➤ Both are strong choices.

- **Neural Network (NN)** had good accuracy but was less reliable than boosted trees.  
  ➤ Can be used together with other models.

**Conclusion**:  
**XGBoost and CatBoost are the best choices**. They are accurate, reliable, and safe to use for real fraud detection.