## Hyperparameter tuning for Decision Tree

In [18]:
import pandas as pd
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 1. Load Data
df = pd.read_csv('../data/telco_churn_processed.csv')
X = df.drop('Churn', axis=1)
y = df['Churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Define the "Grid" of settings to test
param_grid = {
    'max_depth': [3, 5, 7, 10, None],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

# 3. Setup the Grid Search
grid_search = GridSearchCV(estimator=DecisionTreeClassifier(random_state=42),
                           param_grid=param_grid,
                           cv=5,
                           verbose=1,
                           scoring='accuracy')

# 4. Run the Search
print("Starting Hyperparameter Tuning...")
grid_search.fit(X_train, y_train)

# 5. The Results
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print(f"\n✅ Best Parameters Found: {best_params}")
print(f"✅ Best Cross-Validation Accuracy: {best_score:.4f}")

# 6. Evaluate the "Optimized" Model on Test Data
best_model = grid_search.best_estimator_
test_acc = accuracy_score(y_test, best_model.predict(X_test))
print(f"Test Set Accuracy of Optimized Tree: {test_acc:.4f}")

Starting Hyperparameter Tuning...
Fitting 5 folds for each of 30 candidates, totalling 150 fits

✅ Best Parameters Found: {'criterion': 'entropy', 'max_depth': 7, 'min_samples_split': 2}
✅ Best Cross-Validation Accuracy: 0.7931
Test Set Accuracy of Optimized Tree: 0.7765


## 2. Neural Network Hyperparameter Tuning
Unlike Decision Trees, where `GridSearch` is computationally inexpensive, Neural Networks require a **Manual Architecture Search**. We conduct a controlled experiment to determine if increasing model complexity improves performance.

### **Experimental Setup**
* **Model A (Baseline):** The architecture defined in Notebook 03 (Input $\to$ 16 Neurons $\to$ 8 Neurons $\to$ Output). Optimized with `Adam`.
* **Model B (Experiment):** A deeper, wider architecture (Input $\to$ 64 Neurons $\to$ Dropout 20% $\to$ 32 Neurons $\to$ Output). Optimized with `RMSprop`.

### **Hypothesis**
A larger model with Dropout regularization (Model B) should capture more complex non-linear patterns and generalize better than the simpler Model A.

In [20]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input, Dropout

# 1. Define the Baseline Score
accuracy_baseline = 0.7922

# 2. Define Model B
model_b = Sequential([
    Input(shape=(X_train.shape[1],)),
    Dense(64, activation='relu'),   # More neurons (was 16)
    Dropout(0.2),                   # Drop 20% of neurons to prevent overfitting
    Dense(32, activation='relu'),   # More neurons (was 8)
    Dense(1, activation='sigmoid')
])

# 3. Compile with a different optimizer
model_b.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

# 4. Train Model B
print("Training Neural Network Experiment B...")
history_b = model_b.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1, verbose=0)

# 5. Evaluate
loss_b, acc_b = model_b.evaluate(X_test, y_test)
print(f"\nModel A (Baseline) Accuracy: {accuracy_baseline:.4f}")
print(f"Model B (Tuned) Accuracy:    {acc_b:.4f}")

# Conclusion Logic
print("-" * 30)
if acc_b > accuracy_baseline:
    print("OBSERVATION: The complex architecture (Model B) improved accuracy.")
    print("RECOMMENDATION: Adopt Model B for deployment.")
else:
    print("OBSERVATION: Increasing complexity (Model B) resulted in similar or lower accuracy.")
    print("CONCLUSION: The simpler Model A is more robust. We recommend proceeding with the Baseline model to reduce computational cost.")
print("-" * 30)

Training Neural Network Experiment B...
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7922 - loss: 0.4290 

Model A (Baseline) Accuracy: 0.7922
Model B (Tuned) Accuracy:    0.7922
------------------------------
OBSERVATION: Increasing complexity (Model B) resulted in similar or lower accuracy.
CONCLUSION: The simpler Model A is more robust. We recommend proceeding with the Baseline model to reduce computational cost.
------------------------------
