Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG Report]: EarlyStopping Callback Stops Training Prematurely Regardless of Performance Improvement #1239

Open
Patan77 opened this issue Mar 27, 2024 · 0 comments

Comments

@Patan77
Copy link

Patan77 commented Mar 27, 2024

Description

Issue Description:
I am experiencing an issue with the EarlyStopping callback in TensorFlow.NET where the training process stops exactly after the number of epochs specified in the patience parameter, even if the monitored metric (in this case, accuracy) is still improving. This behavior is inconsistent with the expected functionality of the EarlyStopping callback, which should only stop training after the specified patience epochs if there is no improvement.

Steps to Reproduce:

Set up a model and compile it with accuracy as a metric.
Initialize the EarlyStopping callback with patience set to 10 and monitor set to "accuracy".
Train the model using model.fit() with a significant number of epochs (e.g., 1000).
Observe that training stops exactly after 10 epochs regardless of performance improvements.

Expected Behavior:
The training should continue past the patience number of epochs as long as the monitored metric (accuracy) is still improving. The training should only stop if there is no improvement in the metric for the duration specified by patience.

Actual Behavior:
The training stops exactly after the number of epochs specified by patience, even though the accuracy is still improving.

Code Snippet:

 int epochCount = 1000;

 var callbackParams = new CallbackParams
 {
     Model = model, 
     Epochs = epochCount, 
 };

 var earlyStoppingCallback = new EarlyStopping(
    parameters: callbackParams,
    monitor: "accuracy",
    min_delta: 0.0001f,
    patience: 10,
    mode: "max",
    baseline: 0.999f,
    restore_best_weights: true
);

 model.fit(inputs, labels, epochs: epochCount, batch_size: 100, callbacks: new List<ICallback> { customCallback , earlyStoppingCallback });

  public class CustomCallback : ICallback
  {
      public void on_epoch_end(int epoch, Dictionary<string, float> epoch_logs)
      {
          // Check if the logs contain the loss and accuracy keys

          if (epoch_logs.ContainsKey("loss"))
          {
              System.Diagnostics.Debug.WriteLine($"Epoch {epoch + 1}, Loss: {epoch_logs["loss"]}");
          }

          if (epoch_logs.ContainsKey("accuracy")) // Or "acc" depending on the version and setup
          {
              System.Diagnostics.Debug.WriteLine($"Epoch {epoch + 1}, Accuracy: {epoch_logs["accuracy"]}");
          }
      }
}

Logs / Output:

Epoch 1, Loss: 0.67718434
Epoch 1, Accuracy: 0.625
Epoch 2, Loss: 0.6478515
Epoch 2, Accuracy: 0.684
Epoch 3, Loss: 0.6122737
Epoch 3, Accuracy: 0.711
Epoch 4, Loss: 0.5662995
Epoch 4, Accuracy: 0.758
Epoch 5, Loss: 0.5094736
Epoch 5, Accuracy: 0.828
Epoch 6, Loss: 0.44897592
Epoch 6, Accuracy: 0.873
Epoch 7, Loss: 0.3877059
Epoch 7, Accuracy: 0.908
Epoch 8, Loss: 0.33612937
Epoch 8, Accuracy: 0.935
Epoch 9, Loss: 0.29862627
Epoch 9, Accuracy: 0.94
Epoch 10, Loss: 0.2639172
Epoch 10, Accuracy: 0.957
Model trained complete.

Reproduction Steps

No response

Known Workarounds

Not a workaround but putting baseline = 0.0f looks like it use the patience but still not working if you also want to set a "accuracy" threshold

Configuration and Other Information

SciSharp.TensorFlow.Redist version 2.16.0
TensorFlow.NET version 0.150.0
.NET 6.0
Operating System: Windows 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant