
---

###  **Checkpoint Saving – Quick Notes**

**What is it?**
Checkpointing saves your model's **weights** (or full model) during training to prevent loss from crashes, timeouts, or unexpected interruptions.

---

###  **Key Parameters (`ModelCheckpoint`)**

```python
ModelCheckpoint(
    filepath='cp.weights.h5',     #  must end with .weights.h5 if saving weights only
    save_weights_only=True,       # True: only weights, False: full model
    save_best_only=False,         # True: saves only best (based on `monitor`)
    save_freq='epoch',            # 'epoch' or an integer (every N batches)
    verbose=1                     # 0 = silent, 1 = logs when saving
)
```

---

###  **What Gets Saved?**

* If `save_weights_only=True`:
  → A `.weights.h5` file containing **only the model weights**.

* If `save_weights_only=False`:
  → A file (e.g., `.keras`) containing **the full model** (architecture + weights + optimizer state).

---
###  **How to Resume Training**

Before calling `model.fit()`:

```python
if os.path.exists('cp.weights.h5'):
    print("Loading checkpoint...")
    model.load_weights('cp.weights.h5')
```

---

###  **Best Practices (Colab/Cloud)**

* Save checkpoints to **Google Drive** or a persistent location.
* Use **short epochs** (5–10) and save frequently.
* Combine with **`EarlyStopping`** or resume manually after interruptions.
* Use **`save_best_only=True`** to save only the best model (based on validation loss).


In [None]:
# --------------------------------------------
# Step 1: Import Libraries
# --------------------------------------------
import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical



In [None]:
# --------------------------------------------
# Step 2: Load and Preprocess the CIFAR-10 Dataset
# --------------------------------------------
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# Normalize pixel values
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# One-hot encode labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)



In [None]:

# --------------------------------------------
# Step 3: Define the CNN Model
# --------------------------------------------
def create_model():
    model = Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
        MaxPooling2D((2, 2)),
        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(64, activation='relu'),
        Dropout(0.5),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

model = create_model()


In [None]:
# --------------------------------------------
# Step 4: Setup Checkpointing
# --------------------------------------------

# To use Google Drive (if running in Colab), you can do:
from google.colab import drive
drive.mount('/content/drive')

checkpoint_path = "/content/drive/MyDrive/0.Latest_DS_Course/CNN/Notebooks/checkpoint_saving/cp.weights.h5"

# checkpoint_dir = "checkpoints_cifar10"
# os.makedirs(checkpoint_dir, exist_ok=True)
# checkpoint_path = os.path.join(checkpoint_dir, "cp.weights.h5")

checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_path,
    save_weights_only=True,
    verbose=1
)



In [None]:
# --------------------------------------------
# Step 5: Load Checkpoint If Exists
# --------------------------------------------
if os.path.exists(checkpoint_path):
    print("Checkpoint found. Loading weights...")
    model.load_weights(checkpoint_path)
else:
    print("No checkpoint found. Training from scratch.")




In [None]:
# --------------------------------------------
# Step 6: Train the Model with Resume Capability
# --------------------------------------------
history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=10,
    batch_size=64,
    callbacks=[checkpoint_callback]
)

In [None]:

# --------------------------------------------
# Step 7: Evaluate
# --------------------------------------------
loss, acc = model.evaluate(X_test, y_test)
print(f"\nTest Accuracy: {acc:.4f}")
