# Connect4 CNN2 Fine-Tuning Notebook

This notebook fine-tunes a saved Connect4 CNN model using the loss-correction dataset generated from game logs.

It supports both local VS Code and Google Colab + Drive paths.

## 1) Environment Setup and Dependency Installation

Install required packages (if needed), import libraries, and confirm runtime (GPU/CPU).

In [1]:
# Optional for fresh Colab runtimes:
# !pip -q install tensorflow==2.15.0 scikit-learn

import tensorflow as tf
print('TensorFlow:', tf.__version__)
print('GPU available:', bool(tf.config.list_physical_devices('GPU')))
print('Devices:', tf.config.list_physical_devices())

TensorFlow: 2.19.0
GPU available: True
Devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [2]:
import os
import json
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split

print('TF:', tf.__version__)

TF: 2.19.0


## 2) Project Paths and Data/Model Mounting

Define model/data/checkpoint paths. If using Colab, mount Google Drive and use absolute Drive paths.

In [15]:
# ====== CONFIG ======
RUN_IN_COLAB = True

# Drive model source (same family as your CNN training outputs)
DRIVE_BASE = '/content/drive/MyDrive/Connect4_Combined'
DRIVE_MODEL_DIR = f'{DRIVE_BASE}/models'
DRIVE_DATASET_DIR = f'{DRIVE_BASE}/datasets'

MODEL_CANDIDATES = [
    f'{DRIVE_MODEL_DIR}/connect4_cnn_v2_best.keras',
    f'{DRIVE_MODEL_DIR}/connect4_cnn_v2_final.keras',
    f'{DRIVE_MODEL_DIR}/connect4_cnn_final.keras',
]

NPZ_CANDIDATES = [
    f'{DRIVE_DATASET_DIR}/finetune_loss_cnn2_teacher_quick.npz',
    f'{DRIVE_DATASET_DIR}/finetune_loss_cnn2_teacher.npz',
    '/content/finetune_loss_cnn2_teacher_quick.npz',
    '/content/finetune_loss_cnn2_teacher.npz',
]

BASE_DATASET_PATH = f'{DRIVE_DATASET_DIR}/connect4_combined_unique.npz'
MIX_WITH_BASE_DATASET = True
BASE_MIX_RATIO = 1.0
BASE_SAMPLE_WEIGHT = 0.85

BASE_MODEL_PATH = next((p for p in MODEL_CANDIDATES if os.path.exists(p)), MODEL_CANDIDATES[0])
FINETUNE_NPZ_PATH = next((p for p in NPZ_CANDIDATES if os.path.exists(p)), NPZ_CANDIDATES[0])
OUT_MODEL_PATH = f'{DRIVE_MODEL_DIR}/connect4_cnn2_loss_finetuned.keras'

SEED = 42
VAL_SIZE = 0.15
BATCH_SIZE = 128
EPOCHS = 28
LEARNING_RATE = 2e-5
EARLY_STOP_PATIENCE = 4
FREEZE_PREFIX_FRACTION = 0.0

print('BASE_MODEL_PATH:', BASE_MODEL_PATH)
print('FINETUNE_NPZ_PATH:', FINETUNE_NPZ_PATH)
print('BASE_DATASET_PATH:', BASE_DATASET_PATH)
print('OUT_MODEL_PATH:', OUT_MODEL_PATH)

BASE_MODEL_PATH: /content/drive/MyDrive/Connect4_Combined/models/connect4_cnn_v2_best.keras
FINETUNE_NPZ_PATH: /content/drive/MyDrive/Connect4_Combined/datasets/finetune_loss_cnn2_teacher_quick.npz
BASE_DATASET_PATH: /content/drive/MyDrive/Connect4_Combined/datasets/connect4_combined_unique.npz
OUT_MODEL_PATH: /content/drive/MyDrive/Connect4_Combined/models/connect4_cnn2_loss_finetuned.keras


In [16]:
if RUN_IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=False)

np.random.seed(SEED)
tf.random.set_seed(SEED)

# Re-resolve after Drive mount so detection is accurate
BASE_MODEL_PATH = next((p for p in MODEL_CANDIDATES if os.path.exists(p)), None)
FINETUNE_NPZ_PATH = next((p for p in NPZ_CANDIDATES if os.path.exists(p)), None)

print('Resolved model:', BASE_MODEL_PATH)
print('Resolved npz  :', FINETUNE_NPZ_PATH)

if BASE_MODEL_PATH is None:
    print('Checked model candidates:', MODEL_CANDIDATES)
    raise AssertionError('Model not found on Drive. Please verify Connect4_Combined/models contains a saved .keras model.')

if FINETUNE_NPZ_PATH is None:
    print('Checked NPZ candidates:', NPZ_CANDIDATES)
    raise AssertionError('Finetune NPZ not found. Expected under Connect4_Combined/datasets/finetune_loss_cnn2_teacher_quick.npz')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Resolved model: /content/drive/MyDrive/Connect4_Combined/models/connect4_cnn_v2_best.keras
Resolved npz  : /content/drive/MyDrive/Connect4_Combined/datasets/finetune_loss_cnn2_teacher_quick.npz


## 3) Dataset Loading and Split Verification

Load NPZ tensors, print sample counts/shapes, and verify label distribution before training.

In [17]:
ft = np.load(FINETUNE_NPZ_PATH, allow_pickle=True)
X_ft = ft['X_train'].astype(np.float32)
y_ft = ft['y_train'].astype(np.int64)
w_ft = ft['sample_weight'].astype(np.float32)

print('Finetune:', X_ft.shape, y_ft.shape)
print('Finetune move dist:', np.bincount(y_ft, minlength=7).tolist())
if 'label_source' in ft.files:
    u, c = np.unique(ft['label_source'], return_counts=True)
    print('label_source:', dict(zip(u.tolist(), c.tolist())))
if 'original_chosen_col' in ft.files:
    print('changed_labels:', int(np.sum(y_ft != ft['original_chosen_col'])))

rng = np.random.default_rng(SEED)

if MIX_WITH_BASE_DATASET and os.path.exists(BASE_DATASET_PATH):
    base = np.load(BASE_DATASET_PATH, allow_pickle=True)
    X_base = base['X_train'].astype(np.float32)
    y_base = base['y_train'].astype(np.int64)

    n_pick = min(int(len(X_ft) * BASE_MIX_RATIO), len(X_base))
    idx = rng.choice(len(X_base), size=n_pick, replace=False)
    Xb = X_base[idx]
    yb = y_base[idx]
    wb = np.full((n_pick,), BASE_SAMPLE_WEIGHT, dtype=np.float32)

    X = np.concatenate([X_ft, Xb], axis=0)
    y = np.concatenate([y_ft, yb], axis=0)
    w = np.concatenate([w_ft, wb], axis=0)

    perm = rng.permutation(len(y))
    X, y, w = X[perm], y[perm], w[perm]

    print(f'Base mixed: {n_pick} samples from {BASE_DATASET_PATH}')
else:
    X, y, w = X_ft, y_ft, w_ft
    print('Base mixing skipped.')

print('Final:', X.shape, y.shape, w.shape)
print('Final weight range:', float(w.min()), float(w.max()))
print('Final move dist:', np.bincount(y, minlength=7).tolist())

Finetune: (91, 6, 7, 2) (91,)
Finetune move dist: [14, 20, 22, 9, 17, 6, 3]
label_source: {'teacher_mcts': 91}
changed_labels: 66
Base mixed: 91 samples from /content/drive/MyDrive/Connect4_Combined/datasets/connect4_combined_unique.npz
Final: (182, 6, 7, 2) (182,) (182,)
Final weight range: 0.8500000238418579 1.899999976158142
Final move dist: [29, 31, 35, 21, 32, 19, 15]


## 4) Model Loading (Saved CNN Checkpoint)

Load the saved `.keras` model and run a forward pass sanity check.

In [18]:
# Split first so we can validate shape assumptions early
X_train, X_val, y_train, y_val, w_train, w_val = train_test_split(
    X, y, w, test_size=VAL_SIZE, random_state=SEED, stratify=y
)
y_train_oh = keras.utils.to_categorical(y_train, num_classes=7).astype(np.float32)
y_val_oh = keras.utils.to_categorical(y_val, num_classes=7).astype(np.float32)

print('Train:', X_train.shape, y_train.shape)
print('Val:', X_val.shape, y_val.shape)

# ---- Section 4: model loading + forward pass sanity ----
model = keras.models.load_model(BASE_MODEL_PATH)
_ = model(X_train[:4], training=False)
print('Loaded model and forward pass succeeded.')

# ---- Section 5: classifier head update for new classes ----
num_classes = 7
if int(model.output_shape[-1]) != num_classes:
    x = model.layers[-2].output
    new_out = keras.layers.Dense(num_classes, activation='softmax', name='policy_head_ft')(x)
    model = keras.Model(inputs=model.input, outputs=new_out)
    print('Replaced classifier head to 7 outputs.')
else:
    print('Classifier head already matches 7 outputs.')

# ---- Section 6: fine-tuning configuration (freeze/unfreeze) ----
if FREEZE_PREFIX_FRACTION > 0:
    freeze_upto = int(len(model.layers) * FREEZE_PREFIX_FRACTION)
    for layer in model.layers[:freeze_upto]:
        layer.trainable = False
    print(f'Froze first {freeze_upto} layers out of {len(model.layers)}')

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=LEARNING_RATE),
    loss='categorical_crossentropy',
    metrics=['accuracy', keras.metrics.TopKCategoricalAccuracy(k=2, name='top2_acc')]
)

# ---- Section 7 + 8: training, validation metrics, checkpointing ----
os.makedirs(os.path.dirname(OUT_MODEL_PATH), exist_ok=True)
callbacks = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=EARLY_STOP_PATIENCE, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, min_lr=1e-6, verbose=1),
    keras.callbacks.ModelCheckpoint(filepath=OUT_MODEL_PATH, monitor='val_loss', save_best_only=True),
]

history = model.fit(
    X_train, y_train_oh,
    validation_data=(X_val, y_val_oh, w_val),
    sample_weight=w_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    callbacks=callbacks,
    verbose=1,
)

print('Saved best model:', OUT_MODEL_PATH)

Train: (154, 6, 7, 2) (154,)
Val: (28, 6, 7, 2) (28,)


  saveable.load_own_variables(weights_store.get(inner_path))
Expected: ['input_layer']
Received: inputs=Tensor(shape=(4, 6, 7, 2))


Loaded model and forward pass succeeded.
Classifier head already matches 7 outputs.
Epoch 1/28
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 15s/step - accuracy: 0.4263 - loss: 2.7188 - top2_acc: 0.6456 - val_accuracy: 0.2857 - val_loss: 3.9245 - val_top2_acc: 0.6071 - learning_rate: 2.0000e-05
Epoch 2/28
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 1s/step - accuracy: 0.4402 - loss: 2.6143 - top2_acc: 0.6638 - val_accuracy: 0.2857 - val_loss: 3.9072 - val_top2_acc: 0.6071 - learning_rate: 2.0000e-05
Epoch 3/28
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - accuracy: 0.4541 - loss: 2.5802 - top2_acc: 0.6586 - val_accuracy: 0.2857 - val_loss: 3.8908 - val_top2_acc: 0.6071 - learning_rate: 2.0000e-05
Epoch 4/28
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4s/step - accuracy: 0.4610 - loss: 2.5496 - top2_acc: 0.6335 - val_accuracy: 0.2857 - val_loss: 3.8762 - val_top2_acc: 0.6071 - learning_rate: 2.0000e-05
Epoch

In [20]:
# ---- Section 9: inference sanity check on sample boards ----
best_model = keras.models.load_model(OUT_MODEL_PATH)
eval_out = best_model.evaluate(X_val, y_val_oh, sample_weight=w_val, verbose=0)
print('Validation metrics:', dict(zip(best_model.metrics_names, [float(v) for v in eval_out])))

hist_path = OUT_MODEL_PATH.replace('.keras', '_history.json')
with open(hist_path, 'w') as f:
    json.dump({k: [float(vv) for vv in vals] for k, vals in history.history.items()}, f, indent=2)
print('History:', hist_path)

# Show a few predictions vs labels
sample_n = min(8, len(X_val))
pred = best_model.predict(X_val[:sample_n], verbose=0)
pred_col = pred.argmax(axis=1)
conf = pred.max(axis=1)

for i in range(sample_n):
    print(f'sample {i}: pred={int(pred_col[i])} conf={float(conf[i]):.3f} label={int(y_val[i])}')

Validation metrics: {'loss': 3.532895565032959, 'compile_metrics': 0.2857142984867096}
History: /content/drive/MyDrive/Connect4_Combined/models/connect4_cnn2_loss_finetuned_history.json
sample 0: pred=3 conf=0.415 label=3
sample 1: pred=4 conf=0.463 label=4
sample 2: pred=2 conf=0.916 label=0
sample 3: pred=1 conf=0.946 label=1
sample 4: pred=4 conf=0.541 label=4
sample 5: pred=0 conf=0.696 label=2
sample 6: pred=2 conf=0.304 label=1
sample 7: pred=4 conf=0.506 label=3


In [21]:
# Compare base model vs finetuned model on same splits
base_model = keras.models.load_model(BASE_MODEL_PATH)
ft_model = keras.models.load_model(OUT_MODEL_PATH)

base_eval = base_model.evaluate(X_val, y_val_oh, sample_weight=w_val, verbose=0)
ft_eval = ft_model.evaluate(X_val, y_val_oh, sample_weight=w_val, verbose=0)

print('Base val metrics :', dict(zip(base_model.metrics_names, [float(v) for v in base_eval])))
print('FT val metrics   :', dict(zip(ft_model.metrics_names, [float(v) for v in ft_eval])))

# Loss-focused teacher set comparison (hard set)
X_ft_eval = X_ft.astype(np.float32)
y_ft_oh = keras.utils.to_categorical(y_ft, num_classes=7).astype(np.float32)

base_ft_eval = base_model.evaluate(X_ft_eval, y_ft_oh, sample_weight=w_ft, verbose=0)
ft_ft_eval = ft_model.evaluate(X_ft_eval, y_ft_oh, sample_weight=w_ft, verbose=0)

print('Base on loss-set :', dict(zip(base_model.metrics_names, [float(v) for v in base_ft_eval])))
print('FT on loss-set   :', dict(zip(ft_model.metrics_names, [float(v) for v in ft_ft_eval])))



Base val metrics : {'loss': 4.010879039764404, 'compile_metrics': 0.3571428656578064}
FT val metrics   : {'loss': 3.532895565032959, 'compile_metrics': 0.2857142984867096}
Base on loss-set : {'loss': 4.924814701080322, 'compile_metrics': 0.2637362778186798}
FT on loss-set   : {'loss': 3.9568326473236084, 'compile_metrics': 0.34065935015678406}


## 10) Go / No-Go Gates (Regression + Hard-Loss + Professor Dataset)

This section runs deployment gates:
- No harmful regression on mixed validation
- Improvement on hard loss-teacher set
- Optional comparison on professor `mcts7500_pool.pickle` dataset (if file exists)
- Final GO / NO-GO verdict

## 5) Classifier Head Update for New Classes

The training cell below checks output width and replaces the final head only if it does not match 7 Connect4 action classes.

## 6) Fine-Tuning Configuration (Freeze/Unfreeze Layers)

Use `FREEZE_PREFIX_FRACTION` to freeze early layers for safer adaptation.

## 7) Training Loop and Validation Metrics

Tracks training loss, validation loss, accuracy, and top-2 accuracy.

## 8) Checkpointing and Best-Model Saving

Saves best checkpoint by validation loss and exports training history JSON.

## 9) Inference Sanity Check on Sample Inputs

Runs predictions on validation samples and prints predicted move vs label + confidence.