# **Machine Learning & MLOps**

### Objectives

Build and evaluate distress prediction models with comprehensive MLOps tracking:

**Models tested:**
- Logistic Regression (interpretable baseline)
- Random Forest (ensemble, non-linear)
- Gradient Boosting (sequential boosting)
- XGBoost (optimized gradient boosting)

**MLOps integration:**
- MLflow experiment tracking (parameters, metrics, artifacts)
- Model versioning and registry
- Performance monitoring over time
- Reproducible training pipelines

**Validation approach:**
- Train/test split with temporal awareness
- Cross-validation for robustness
- Backtesting on held-out periods

**Import Libraries**

- Loading ML algorithms 
- Evaluation metrics 
- MLflow for experiment tracking.

In [3]:
# Data manipulation
import pandas as pd
import numpy as np

# Machine Learning
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    classification_report, confusion_matrix, roc_auc_score, 
    roc_curve, precision_recall_curve, f1_score, accuracy_score
)

# ML Algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier

# MLOps
import mlflow
import mlflow.sklearn
from mlflow.models import infer_signature

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Database
from sqlalchemy import create_engine

# Utilities
import warnings
warnings.filterwarnings('ignore')
from datetime import datetime
import json

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("Libraries loaded successfully")
print(f"MLflow version: {mlflow.__version__}")

Libraries loaded successfully
MLflow version: 3.6.0
