* Objective: To establish a simple, end-to-end machine learning pipeline with basic features and a straightforward model. This provides a performance benchmark to compare against more complex models and feature engineering.

* Key Library Choices: `pandas`, `numpy`, `scikit-learn` (for preprocessing, model selection, metrics), `xgboost` or `lightgbm` (for a robust baseline classifier).

* Specific Technical Steps/Code Snippets:

**Data Preparation (Basic):** Load pre-processed data (or perform minimal joins/aggregations if not yet done). Select a subset of directly available features.
**Target Variable Engineering:** Apply the defined churn logic to create the binary target variable (`is_churn`).


In [None]:
# ----------------------------
# 1. Observation & Prediction Windows
# ----------------------------
snapshot_date = df_transactions['order_purchase_timestamp'].max() - pd.Timedelta(days=90)  # predict churn for last 3 months
observation_window_end = snapshot_date
observation_window_start = observation_window_end - pd.Timedelta(days=180)  # 6 months prior
prediction_window_end = df_transactions['order_purchase_timestamp'].max()
prediction_window_start = snapshot_date  # 3 months after

# ----------------------------
# 2. Aggregate customer activity in observation window
# ----------------------------
customer_activity = df_transactions[
    (df_transactions['order_purchase_timestamp'] >= observation_window_start) &
    (df_transactions['order_purchase_timestamp'] < observation_window_end)
].groupby('customer_unique_id').agg(
    last_purchase=('order_purchase_timestamp', 'max'),
    total_orders=('order_id', 'nunique'),
    monetary=('price', 'sum')  # toplam harcama
).reset_index()

# ----------------------------
# 3. Check activity in prediction window
# ----------------------------
customer_future_activity = df_transactions[
    (df_transactions['order_purchase_timestamp'] >= prediction_window_start) &
    (df_transactions['order_purchase_timestamp'] <= prediction_window_end)
].groupby('customer_unique_id').agg(
    future_orders=('order_id', 'nunique')
).reset_index()

# ----------------------------
# 4. Merge & define churn
# ----------------------------
baseline_df = pd.merge(customer_activity, customer_future_activity, on='customer_unique_id', how='left')
baseline_df['future_orders'] = baseline_df['future_orders'].fillna(0)
baseline_df['is_churn'] = ((baseline_df['total_orders'] > 0) & (baseline_df['future_orders'] == 0)).astype(int)

**Feature Selection:** Select simple numerical features (e.g., total_orders, last_purchase_recency) and one-hot encode simple categorical features (e.g., customer_state).
**Train/Test Split:** Split the data into training and testing sets.

In [None]:
# ----------------------------
# 6. Train/Test split
# ----------------------------
from sklearn.model_selection import train_test_split

X = baseline_df.drop(['customer_unique_id', 'is_churn'], axis=1)
y = baseline_df['is_churn']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

**Model Training:** Train a simple classifier (e.g., `LogisticRegression` or `XGBClassifier`).

In [None]:
# ----------------------------
# 7. Model: XGBoost
# ----------------------------
from xgboost import XGBClassifier

model = XGBClassifier(random_state=42, eval_metric='logloss')
model.fit(X_train, y_train)

**Baseline Evaluation:** Evaluate model performance using standard metrics (Accuracy, Precision, Recall, F1-Score, ROC AUC).

In [None]:
# ----------------------------
# 8. Predictions & Metrics
# ----------------------------
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Precision: {precision_score(y_test, y_pred):.4f}")
print(f"Recall: {recall_score(y_test, y_pred):.4f}")
print(f"F1 Score: {f1_score(y_test, y_pred):.4f}")
print(f"ROC AUC: {roc_auc_score(y_test, y_proba):.4f}")