In the initial stages of model training, we often employ a higher learning rate to achieve better performance.

As the model progresses, we gradually reduce the learning rate.

Stochastic Gradient Descent (SGD) is commonly used in practice with a learning rate schedule.

At the start of the training, the learning rate ($\alpha$) is set to a high value.

This rate is then reduced by a constant factor after every N epochs.

The rationale behind this approach is that during the early phases of training, the algorithm should explore the parameter space extensively, leaping from one valley to another to locate a reasonable region.

Once we are in the vicinity of the optimal solution in the later stages, our focus shifts to fine-tuning the parameters.

Therefore, we decrease $\alpha$ to make more subtle adjustments.

In [None]:
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
import tensorflow as tf

# Load the dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

# Preprocess the data
scaler = MinMaxScaler(feature_range=(-1,1))
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Define the SGD optimizer with the specified learning rate and momentum
sgd = SGD(learning_rate=0.3, momentum=0.9, nesterov=True)

# Compile the model using SGD as the optimizer
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])

# Define the EarlyStopping callback
early_stopping = EarlyStopping(monitor='val_accuracy', patience=10, restore_best_weights=True)

# Define the ReduceLROnPlateau callback
reduce_lr = ReduceLROnPlateau(monitor='val_accuracy', factor=0.2, patience=5, min_lr=0.01)

# Add EarlyStopping to the callbacks list
callbacks = [early_stopping, reduce_lr]

# Train the model with the ReduceLROnPlateau and EarlyStopping callbacks
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test), callbacks=callbacks)

model.evaluate(X_test, y_test)