## Model Building

### Objective
The goal of this step is to build predictive models to classify which bank customers are likely to churn.  
We will train three models:

1. **Logistic Regression** – a simple, interpretable baseline.  
2. **Random Forest** – ensemble method capturing non-linear relationships.  
3. **XGBoost** – gradient boosting algorithm, often highly accurate on tabular data. 

In [9]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
import joblib

In [6]:
# Load prepared training data
X_train = pd.read_csv('../data/X_train_prepared.csv')
y_train = pd.read_csv('../data/y_train.csv')

In [7]:

# Define models
models = {
    "Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
    "Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
    "XGBoost": XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)
}

In [8]:
# Train baseline models
print("---- Baseline Model Training ----")
for name, model in models.items():
    model.fit(X_train, y_train)
    print(f"{name} trained successfully.")


---- Baseline Model Training ----
Logistic Regression trained successfully.


  y = column_or_1d(y, warn=True)
  return fit_method(estimator, *args, **kwargs)


Random Forest trained successfully.
XGBoost trained successfully.


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


We train three models:

- **Logistic Regression:** Simple and interpretable baseline model.  
- **Random Forest:** Ensemble method, captures non-linear relationships and feature interactions.  
- **XGBoost:** Gradient boosting algorithm, highly efficient and often best-performing on tabular data.


In [15]:
# Save models
joblib.dump(models["Logistic Regression"], "../models/logistic_regression.pkl")
joblib.dump(models["Random Forest"], "../models/random_forest.pkl")
joblib.dump(models["XGBoost"], "../models/xgboost.pkl")

print("All models saved in models/ folder")

All models saved in models/ folder
