### Section 1 Marketing Campaign 


In [None]:
# --- Marketing Campaign Dataset ---
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import RepeatedKFold, GridSearchCV
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# helpers for compatibility
def rmse_score(y_true, y_pred):
    try:
        return mean_squared_error(y_true, y_pred, squared=False)
    except TypeError:
        return np.sqrt(mean_squared_error(y_true, y_pred))

def make_ohe():
    try:
        return OneHotEncoder(handle_unknown="ignore", sparse_output=False)
    except TypeError:
        return OneHotEncoder(handle_unknown="ignore", sparse=False)

# load data 
df_marketing = pd.read_csv("marketing_campaign_cleaned.csv")
target = "Response"

print("[Marketing] shape:", df_marketing.shape)
print("[Marketing] target distribution:")
print(df_marketing[target].value_counts())

X = df_marketing.drop(columns=[target])
y = df_marketing[target]
numeric_cols = X.select_dtypes(include=[np.number]).columns.tolist()
categorical_cols = X.select_dtypes(exclude=[np.number]).columns.tolist()

# preprocessing 
numeric_pipe = Pipeline([
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])
categorical_pipe = Pipeline([
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", make_ohe())
])
preprocessor = ColumnTransformer([
    ("num", numeric_pipe, numeric_cols),
    ("cat", categorical_pipe, categorical_cols)
])

# CV + grids 
cv = RepeatedKFold(n_splits=5, n_repeats=2, random_state=42)
ridge_grid = {"model__alpha": np.logspace(-3, 3, 13)}
lasso_grid = {"model__alpha": np.logspace(-3, 1, 9)}
enet_grid  = {"model__alpha": np.logspace(-3, 1, 9),
              "model__l1_ratio": np.linspace(0.1, 0.9, 9)}

# run model helper 
def run_model(name, model, grid):
    pipe = Pipeline([("pre", preprocessor), ("model", model)])
    search = GridSearchCV(pipe, grid, cv=cv,
                          scoring="neg_mean_absolute_error", n_jobs=-1)
    search.fit(X, y)
    y_pred = search.best_estimator_.predict(X)
    r2 = r2_score(y, y_pred)
    mae = mean_absolute_error(y, y_pred)
    rmse = rmse_score(y, y_pred)
    print(f"\n[{name}] Best params: {search.best_params_}")
    print(f"[{name}] R2={r2:.4f} | MAE={mae:.4f} | RMSE={rmse:.4f}")
    return search.best_estimator_

# run all three 
ridge_best = run_model("Ridge", Ridge(), ridge_grid)
lasso_best = run_model("Lasso", Lasso(max_iter=20000, random_state=42), lasso_grid)
enet_best  = run_model("ElasticNet", ElasticNet(max_iter=20000, random_state=42), enet_grid)

# Feature Importance (Top 10)
def show_top_features(estimator, X):
    pre = estimator.named_steps["pre"]
    model = estimator.named_steps["model"]
    try:
        feature_names = pre.get_feature_names_out()
    except:
        feature_names = list(range(len(model.coef_)))
    coefs = model.coef_
    coef_df = pd.DataFrame({
        "feature": feature_names,
        "coefficient": coefs,
        "abs_coeff": np.abs(coefs)
    }).sort_values("abs_coeff", ascending=False).head(10)
    print(coef_df)

print("\nTop features - Ridge")
show_top_features(ridge_best, X)

print("\nTop features - Lasso")
show_top_features(lasso_best, X)

print("\nTop features - ElasticNet")
show_top_features(enet_best, X)



[Marketing] shape: (2216, 29)
[Marketing] target distribution:
Response
0    1883
1     333
Name: count, dtype: int64

[Ridge] Best params: {'model__alpha': np.float64(316.22776601683796)}
[Ridge] R2=0.3101 | MAE=0.2000 | RMSE=0.2968

[Lasso] Best params: {'model__alpha': np.float64(0.01)}
[Lasso] R2=0.2818 | MAE=0.2009 | RMSE=0.3028

[ElasticNet] Best params: {'model__alpha': np.float64(0.01), 'model__l1_ratio': np.float64(0.7000000000000001)}
[ElasticNet] R2=0.2964 | MAE=0.2002 | RMSE=0.2997

Top features - Ridge
                         feature  coefficient  abs_coeff
5                   num__Recency    -0.060892   0.060892
17             num__AcceptedCmp3     0.059695   0.059695
19             num__AcceptedCmp5     0.055934   0.055934
20             num__AcceptedCmp1     0.044966   0.044966
8           num__MntMeatProducts     0.042899   0.042899
15        num__NumStorePurchases    -0.040735   0.040735
33   cat__Marital_Status_Married    -0.036436   0.036436
34    cat__Marital_Stat

### Observations 

## Marketing Campaign – Observations

### Model Performance
- **Ridge (α ≈ 316.2)** performed best with an **R² of 0.3101**, slightly ahead of Elastic Net (0.2964) and Lasso (0.2818).  
- Across all models, **MAE ≈ 0.20** and **RMSE ≈ 0.30**, indicating relatively small errors but also showing the models struggle to capture more than ~31% of the variance.  
- This aligns with your earlier EDA and Random Forest analysis: **marketing response is difficult to predict linearly** due to diffuse relationships and class imbalance (responders vs. non-responders).  

---

### Feature Importance
**Consistently strong predictors across all three methods:**
- **Campaign acceptance indicators**: `AcceptedCmp3`, `AcceptedCmp5`, `AcceptedCmp1`  
  (positive influence → past acceptance increases response likelihood).  
- **Recency** (negative coefficient → more recent customers are more likely to respond).  
- **MntMeatProducts** (positive → higher spending on meat correlates with responsiveness).  
- **NumWebVisitsMonth** (positive → digital engagement correlates with campaign response).  

**Demographics (Marital_Status, Education)** showed up in Ridge and Elastic Net but not as strongly in Lasso, suggesting these features are secondary drivers once regularization penalizes weaker signals.

---

### Interpretation
- The results confirm that **past campaign interactions and spending patterns** are the most valuable predictors for marketing responsiveness.  
- **Demographics alone** (e.g., marital status, education) play a smaller role compared to behavioral indicators.  
- The relatively modest R² values highlight the **inherent difficulty of predicting rare events (response = 1)** with linear methods — the imbalance (333 responders vs. 1883 non-responders) limits accuracy.  
- **Elastic Net’s blend of Ridge and Lasso** helped balance between feature selection and coefficient shrinkage, surfacing both behavioral and demographic features.  


### Customer Churn Dataset 

In [None]:
# Customer Churn Dataset
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import RepeatedKFold, GridSearchCV
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# helpers for compatibility
def rmse_score(y_true, y_pred):
    try:
        return mean_squared_error(y_true, y_pred, squared=False)
    except TypeError:
        return np.sqrt(mean_squared_error(y_true, y_pred))

def make_ohe():
    try:
        return OneHotEncoder(handle_unknown="ignore", sparse_output=False)
    except TypeError:
        return OneHotEncoder(handle_unknown="ignore", sparse=False)

# load data
df_churn = pd.read_csv("customer_churn_cleaned.csv")
target = "Churn"

print("[Churn] shape:", df_churn.shape)
print("[Churn] target distribution:")
print(df_churn[target].value_counts())

X = df_churn.drop(columns=[target])
y = df_churn[target]
numeric_cols = X.select_dtypes(include=[np.number]).columns.tolist()
categorical_cols = X.select_dtypes(exclude=[np.number]).columns.tolist()

# preprocessing 
numeric_pipe = Pipeline([
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])
categorical_pipe = Pipeline([
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", make_ohe())
])
preprocessor = ColumnTransformer([
    ("num", numeric_pipe, numeric_cols),
    ("cat", categorical_pipe, categorical_cols)
])

# CV + grids 
cv = RepeatedKFold(n_splits=5, n_repeats=2, random_state=42)
ridge_grid = {"model__alpha": np.logspace(-3, 3, 13)}
lasso_grid = {"model__alpha": np.logspace(-3, 1, 9)}
enet_grid  = {"model__alpha": np.logspace(-3, 1, 9),
              "model__l1_ratio": np.linspace(0.1, 0.9, 9)}

# run model helper 
def run_model(name, model, grid):
    pipe = Pipeline([("pre", preprocessor), ("model", model)])
    search = GridSearchCV(pipe, grid, cv=cv,
                          scoring="neg_mean_absolute_error", n_jobs=-1)
    search.fit(X, y)
    y_pred = search.best_estimator_.predict(X)
    r2 = r2_score(y, y_pred)
    mae = mean_absolute_error(y, y_pred)
    rmse = rmse_score(y, y_pred)
    print(f"\n[{name}] Best params: {search.best_params_}")
    print(f"[{name}] R2={r2:.4f} | MAE={mae:.4f} | RMSE={rmse:.4f}")
    return search.best_estimator_

# run all three 
ridge_best = run_model("Ridge", Ridge(), ridge_grid)
lasso_best = run_model("Lasso", Lasso(max_iter=20000, random_state=42), lasso_grid)
enet_best  = run_model("ElasticNet", ElasticNet(max_iter=20000, random_state=42), enet_grid)

# Feature Importance (Top 10) 
def show_top_features(estimator, X):
    pre = estimator.named_steps["pre"]
    model = estimator.named_steps["model"]
    try:
        feature_names = pre.get_feature_names_out()
    except:
        feature_names = list(range(len(model.coef_)))
    coefs = model.coef_
    coef_df = pd.DataFrame({
        "feature": feature_names,
        "coefficient": coefs,
        "abs_coeff": np.abs(coefs)
    }).sort_values("abs_coeff", ascending=False).head(10)
    print(coef_df)

print("\nTop features - Ridge")
show_top_features(ridge_best, X)

print("\nTop features - Lasso")
show_top_features(lasso_best, X)

print("\nTop features - ElasticNet")
show_top_features(enet_best, X)



[Churn] shape: (440832, 12)
[Churn] target distribution:
Churn
1.0    249999
0.0    190833
Name: count, dtype: int64

[Ridge] Best params: {'model__alpha': np.float64(0.001)}
[Ridge] R2=0.7854 | MAE=0.1829 | RMSE=0.2295

[Lasso] Best params: {'model__alpha': np.float64(0.001)}
[Lasso] R2=0.7853 | MAE=0.1831 | RMSE=0.2296

[ElasticNet] Best params: {'model__alpha': np.float64(0.001), 'model__l1_ratio': np.float64(0.1)}
[ElasticNet] R2=0.7854 | MAE=0.1830 | RMSE=0.2295

Top features - Ridge
                           feature  coefficient  abs_coeff
0                  num__CustomerID    -0.302301   0.302301
14    cat__Contract Length_Monthly     0.105317   0.105317
4               num__Support Calls     0.095997   0.095997
6                 num__Total Spend    -0.062226   0.062226
13     cat__Contract Length_Annual    -0.052772   0.052772
15  cat__Contract Length_Quarterly    -0.052547   0.052547
5               num__Payment Delay     0.040715   0.040715
8               cat__Gender_Female

## Customer Churn – Observations

### Model Performance
All three models performed very similarly:

- **Ridge (α ≈ 0.001)**: R² = 0.7854, MAE = 0.1829, RMSE = 0.2295  
- **Lasso (α ≈ 0.001)**: R² = 0.7853, MAE = 0.1831, RMSE = 0.2296  
- **Elastic Net (α ≈ 0.001, l1_ratio = 0.1)**: R² = 0.7854, MAE = 0.1830, RMSE = 0.2295  

The R² of ~0.79 indicates the models explain nearly 80% of the variance in churn outcomes — far higher than what we saw in the Marketing dataset.  



---

### Feature Importance
**Dominant predictors:**
- **Contract Length_Monthly** → large positive coefficient: customers with monthly contracts are more likely to churn.  
- **Support Calls** → positive influence: frequent support calls correlate with higher churn risk.  
- **Payment Delay** → positive: delays in payments increase churn probability.  
- **Total Spend** → negative: higher spending customers are less likely to churn.  

**Moderate predictors:**
- **Gender**: Female positive, Male negative (but relatively smaller coefficients).  
- **Age** and **Last Interaction** had small but consistent positive effects.  
- **Tenure** and **Usage Frequency** showed weak negative relationships (slight retention effect).  

**CustomerID** appears with a large coefficient in all models , this is likely an artifact of ID encoding and should be excluded from modeling since it carries no business meaning.

---

### Interpretation
The results strongly reinforce the story from your EDA and Random Forest:

- **Contract structure, service interaction, and payment behavior** are the most powerful churn drivers.  
- Customers on flexible (monthly) contracts with repeated support issues and late payments are the **highest churn risk**.  
- Stable, higher-spending customers are **least likely to leave**.  
- Linear models here perform well because the **relationships are direct and structured** — unlike the marketing responsiveness case, churn behavior shows clearer patterns.  
- Removing `CustomerID` in future iterations will refine interpretability further and avoid misleading importance rankings.  
