### Determining the Optimal Number of Hidden Layers and Neurons for an Artificial Neural Network (ANN)

Determining the optimal architecture of an ANN can be challenging and often requires experimentation. However, the following guidelines and methods can help in making an informed decision:

#### 1. Start Simple
- Begin with a simple architecture and gradually increase complexity if needed.

#### 2. Grid Search / Random Search
- Use **Grid Search** or **Random Search** to experiment with different network architectures.

#### 3. Cross-Validation
- Apply **cross-validation** to evaluate the performance of different architectures.
- This helps in selecting a model that generalizes well to unseen data.

#### 4. Heuristics and Rules of Thumb
Some commonly used heuristics include:
- The number of neurons in the hidden layer should be **between the size of the input layer and the size of the output layer**.
- A common practice is to start with **1â€“2 hidden layers** and increase only if required.


In [37]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.pipeline import Pipeline
from scikeras.wrappers import KerasClassifier
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Input
from tensorflow.keras.callbacks import EarlyStopping
import pickle

In [38]:
# Drop unnecessary columns
data = pd.read_csv('Churn_Modelling.csv')
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

# Label Encoding for Gender
label_encoder_gender = LabelEncoder()
data['Gender'] = label_encoder_gender.fit_transform(data['Gender'])

# One-Hot Encoding for Geography
onehot_encoder_geo = OneHotEncoder(handle_unknown='ignore')
geo_encoded = onehot_encoder_geo.fit_transform(data[['Geography']]).toarray()

geo_encoded_df = pd.DataFrame(
    geo_encoded,
    columns=onehot_encoder_geo.get_feature_names_out(['Geography'])
)

# Combine data
data = pd.concat([data.drop('Geography', axis=1), geo_encoded_df], axis=1)

# Split features and target
X = data.drop('Exited', axis=1)
y = data['Exited']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Save encoders and scaler for later use
with open('label_encoder_gender2.pkl', 'wb') as file:
    pickle.dump(label_encoder_gender, file)
with open('onehot_encoder_geo2.pkl', 'wb') as file:
    pickle.dump(onehot_encoder_geo, file)
with open('scaler2.pkl', 'wb') as file:
    pickle.dump(scaler, file)

In [39]:
def create_model(layers=1, units=32, activation='relu'):
    model = Sequential()
    
    # 1. Define the input shape explicitly using an Input layer
    model.add(Input(shape=(X_train.shape[1],))) 
    
    # 2. Add the first hidden layer
    model.add(Dense(units, activation=activation))
    
    # Add subsequent layers
    for _ in range(layers - 1):
        model.add(Dense(units, activation=activation))
    
    # Output layer
    model.add(Dense(1, activation='sigmoid'))
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

In [40]:
## Create a Keras Classifier
model = KerasClassifier(model=create_model, verbose=0)

In [41]:
# Define the grid search parameters with the 'model__' prefix
param_grid = {
    'model__layers': [1, 2, 3],
    'model__units': [16, 32, 64],
    'model__activation': ['relu', 'tanh'],
    'batch_size': [16, 32],
    'epochs': [10, 20]     
}

# Perform grid search 
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3,verbose=1)
grid_result = grid.fit(X_train, y_train)

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Fitting 3 folds for each of 72 candidates, totalling 216 fits
Best: 0.859999 using {'batch_size': 16, 'epochs': 20, 'model__activation': 'tanh', 'model__layers': 2, 'model__units': 16}


In [42]:
# 1. Predict using the best model found by Grid Search
y_pred = grid.predict(X_test)
y_pred = (y_pred > 0.5) # Convert probabilities to binary (0 or 1)

# 2. Print the Final Accuracy on Test Data
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

test_accuracy = accuracy_score(y_test, y_pred)
print(f"Final Test Accuracy: {test_accuracy:.4f}")

# 3. See where the model is making mistakes
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Final Test Accuracy: 0.8655

Confusion Matrix:
[[1545   62]
 [ 207  186]]

Classification Report:
              precision    recall  f1-score   support

           0       0.88      0.96      0.92      1607
           1       0.75      0.47      0.58       393

    accuracy                           0.87      2000
   macro avg       0.82      0.72      0.75      2000
weighted avg       0.86      0.87      0.85      2000

