### Determining the optimal number of hidden layers and neurons for an Artificial Neural Network (ANN) 
This can be challenging and often requires experimentation. However, there are some guidelines and methods that can help you in making an informed decision:

- Start Simple: Begin with a simple architecture and gradually increase complexity if needed.
- Grid Search/Random Search: Use grid search or random search to try different architectures.
- Cross-Validation: Use cross-validation to evaluate the performance of different architectures.
- Heuristics and Rules of Thumb: Some heuristics and empirical rules can provide starting points, such as:
  -    The number of neurons in the hidden layer should be between the size of the input layer and the size of the output layer.
  -  A common practice is to start with 1-2 hidden layers.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.pipeline import Pipeline
from scikeras.wrappers import KerasClassifier
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
import pickle

# Configure GPU memory growth (prevents TF from grabbing all VRAM)
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print(f"GPU available: {gpus}")
    except RuntimeError as e:
        print(e)
else:
    print("No GPU found, using CPU")

2026-02-13 22:40:07.860104: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-02-13 22:40:07.870194: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1771002607.881507  260965 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771002607.885278  260965 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2026-02-13 22:40:07.896977: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

GPU available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [2]:
data=pd.read_csv('Churn_Modelling.csv')
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

label_encoder_gender = LabelEncoder()
data['Gender'] = label_encoder_gender.fit_transform(data['Gender'])

onehot_encoder_geo = OneHotEncoder(handle_unknown='ignore')
geo_encoded = onehot_encoder_geo.fit_transform(data[['Geography']]).toarray()
geo_encoded_df = pd.DataFrame(geo_encoded, columns=onehot_encoder_geo.get_feature_names_out(['Geography']))

data = pd.concat([data.drop('Geography', axis=1), geo_encoded_df], axis=1)

X = data.drop('Exited', axis=1)
y = data['Exited']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# # Save encoders and scaler for later use
# with open('label_encoder_gender.pkl', 'wb') as file:
#     pickle.dump(label_encoder_gender, file)

# with open('onehot_encoder_geo.pkl', 'wb') as file:
#     pickle.dump(onehot_encoder_geo, file)

# with open('scaler.pkl', 'wb') as file:
#     pickle.dump(scaler, file)

In [3]:
## Define a function to create the model and try different parameters(KerasClassifier)

def create_model(neurons=32, layers=1, meta=None, compile_kwargs=None):
    # Clear previous models from GPU memory
    tf.keras.backend.clear_session()
    
    model = Sequential()
    model.add(Dense(neurons, activation='relu', input_shape=(12,)))  # 12 features

    for _ in range(layers - 1):  # hidden layers
        model.add(Dense(neurons, activation='relu'))

    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])

    return model

In [None]:
## Create a Keras classifier
model = KerasClassifier(model=create_model, neurons=32, layers=1, verbose=0)

In [None]:
# Define the grid search parameters
param_grid = {
    'model__neurons': [16, 32, 64, 128],
    'model__layers': [1, 2],
    'epochs': [50, 100]
}

In [None]:
# Perform grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, verbose=1, error_score='raise')
grid_result = grid.fit(X_train, y_train)

# Print the best parameters
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Fitting 3 folds for each of 4 candidates, totalling 12 fits


I0000 00:00:1771002609.832357  260965 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2140 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2050, pci bus id: 0000:01:00.0, compute capability: 8.6
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
I0000 00:00:1771002610.693844  261076 service.cc:148] XLA service 0x7f1df00046c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1771002610.693861  261076 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce RTX 2050, Compute Capability 8.6
2026-02-13 22:40:10.705621: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1771002610.776360  261076 cuda_dnn.cc:529] Loaded cuDNN version 91900
I0000 00:00:1771002610.918393  261076 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for

[CV] END ......epochs=50, model__layers=1, model__neurons=32; total time=  10.0s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=1, model__neurons=32; total time=   9.4s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=1, model__neurons=32; total time=   9.5s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=1, model__neurons=64; total time=   9.7s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=1, model__neurons=64; total time=   9.5s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=1, model__neurons=64; total time=   9.8s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=2, model__neurons=32; total time=  10.6s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=2, model__neurons=32; total time=  10.1s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=2, model__neurons=32; total time=  10.4s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=2, model__neurons=64; total time=  11.1s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=2, model__neurons=64; total time=  10.3s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[CV] END ......epochs=50, model__layers=2, model__neurons=64; total time=  10.9s


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Best: 0.856124 using {'epochs': 50, 'model__layers': 2, 'model__neurons': 32}
