In [None]:
import pandas as pd    

In [None]:
training_log = pd.read_csv('./output/train/spanetv2_s18_hybrid-224/summary.csv')

In [None]:
training_log.head(10)

In [None]:
training_log.tail(10)

In [None]:
max = training_log['eval_top1'].max()
print(f"{max:.2f}")

In [None]:
training_log['eval_top1'].argmax()

In [None]:
training_log.loc[training_log['eval_top1'].idxmax()]

In [None]:
summary = training_log[training_log['eval_top1']== training_log['eval_top1'].max()]
summary.head()

***
# Determine Training Stability 

In [None]:
import matplotlib.pyplot as plt

In [None]:
# Plot training and validation loss
plt.figure(figsize=(12, 5))

# Training and validation loss plot
plt.subplot(1, 2, 1)
plt.plot(training_log['epoch'], training_log['train_loss'], label='Training Loss')
plt.plot(training_log['epoch'], training_log['eval_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.tight_layout()
plt.grid()

# Validation accuracy plot
plt.subplot(1, 2, 2)
plt.plot(training_log['epoch'], training_log['eval_top1'], label='Top-1 Accuracy')
plt.plot(training_log['epoch'], training_log['eval_top5'], label='Top-5 Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.title('Validation Accuracy')
plt.legend()
plt.tight_layout()
plt.grid()

plt.show()

In [None]:
# -- Plot learning rate progression
#plt.figure(figsize=(6, 5))
#plt.plot(training_log['epoch'], training_log['lr'])
#plt.xlabel('Epoch')
#plt.ylabel('Learning Rate')
#plt.title('Learning Rate Progression')
#plt.show()

### Interpretation of the Plots
**Loss Plots**: 
* Both training and validation loss should decrease smoothly over epochs. Significant fluctuations or increases may indicate instability.

**Accuracy Plots**: 
* Accuracy should improve steadily. Erratic behavior or lack of improvement might signal issues.

**Learning Rate Plot**: 
* Ensure the learning rate increases smoothly during the warmup period and adjusts appropriately.

### Training and Validation Loss

**Stable Training**: 
* Both training and validation loss decrease smoothly over epochs.

**Unstable Training**: 
* If you see significant fluctuations or increases in the validation loss, it could indicate instability.

### Validation Accuracy

**Stable Training**: 
* The top-1 and top-5 accuracy should improve steadily.

**Unstable Training**: 
* Erratic behavior or lack of improvement in accuracy might indicate issues.

### Learning Rate Progression
**Stable Training**: 
* The learning rate increases smoothly during the warmup period and adjusts appropriately afterward.

**Unstable Training**: 
* Abrupt changes or failure to adjust correctly can indicate instability.

*** 
# Actions to Take 

In [None]:
import numpy as np

# Create example data to simulate erratic training behavior
epochs = training_log['epoch']
train_loss_erratic = np.sin(epochs / 2) * 0.5 + training_log['train_loss']  # Adding sinusoidal noise
eval_loss_erratic = np.cos(epochs / 3) * 0.5 + training_log['eval_loss']  # Adding cosine noise
eval_top1_erratic = np.clip(training_log['eval_top1'] + np.random.normal(0, 5, size=len(epochs)), 0, 100)  # Adding random noise
eval_top5_erratic = np.clip(training_log['eval_top5'] + np.random.normal(0, 5, size=len(epochs)), 0, 100)  # Adding random noise

# Plotting the erratic training and validation loss
plt.figure(figsize=(12, 5))

# Training and validation loss plot (erratic)
plt.subplot(1, 2, 1)
plt.plot(epochs, train_loss_erratic, label='Training Loss (Erratic)')
plt.plot(epochs, eval_loss_erratic, label='Validation Loss (Erratic)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Erratic Training and Validation Loss')
plt.legend()

# Validation accuracy plot (erratic)
plt.subplot(1, 2, 2)
plt.plot(epochs, eval_top1_erratic, label='Top-1 Accuracy (Erratic)')
plt.plot(epochs, eval_top5_erratic, label='Top-5 Accuracy (Erratic)')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.title('Erratic Validation Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

If you observe such erratic behavior in your training logs:

**Learning Rate**: 
* Consider reducing the learning rate.

**Warmup Period**: 
* Increase the number of warmup epochs to allow the model to stabilize before reaching the target learning rate.

**Gradient Clipping**: 
* Implement gradient clipping to avoid exploding gradients.

**Batch Size**: 
* Re-evaluate the effective batch size and adjust if necessary.

By addressing these factors, you can improve the stability of your training process and achieve more consistent results