# Jet Tagging with Convolutional Neural Networks (CNN)

This notebook show how to use a CNN for jet classification between QCD and TT jets using jet images.

## What is a CNN?
A Convolutional Neural Network (CNN) is a type of neural network designed to process grid-like data, such as images. It uses convolutional layers to learn spatial patterns.

## Why use CNNs for Jet Tagging?
- Natural for processing jet images
- Can learn spatial patterns in energy deposits
- Translation invariant
- Parameter efficient

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from jet_utils import load_images, preprocess_jet_images
from jet_plotting_utils import plot_jet_image, plot_training_history, plot_confusion_matrix, plot_roc_curve

## 1. Load and Prepare Data

We'll use the jet images directly as input to our CNN. Each image represents the energy deposits in the η-φ plane.

In [None]:
X_train, y_train, train_ids, X_val_, y_val, val_ids, X_test, test_ids = load_images()

In [None]:

# Visualize a sample jet image
plot_jet_image(X_train[3400, :, :], "Sample Jet Image")#

## 2. Build CNN Model

We'll create a CNN with:
- Convolutional layers to learn spatial patterns
- MaxPooling layers to reduce spatial dimensions
- Dense layers for classification
- Dropout for regularization

In [None]:
optimizer = keras.optimizers.SGD(
    learning_rate=0.05,
    momentum=0.9,
    nesterov=True
)

def build_cnn_model(input_shape):
    model = keras.Sequential([
        # First convolutional block
        keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),

        keras.layers.BatchNormalization(),
        keras.layers.MaxPooling2D((2, 2)),
        
        # Second convolutional block
        keras.layers.Conv2D(64, (3, 3), activation='relu'),
        keras.layers.MaxPooling2D((2, 2)),
        
        # Third convolutional block
        keras.layers.Conv2D(64, (3, 3), activation='relu'),
        
        # Flatten and dense layers
        keras.layers.Flatten(),
        keras.layers.Dense(64, activation='relu'),
        keras.layers.Dropout(0.25),
        keras.layers.Dense(1, activation='sigmoid')
    ])
    
    model.compile(optimizer=optimizer,
                 loss='binary_crossentropy',
                 metrics=['accuracy'])
    
    return model

# Create and compile model
model = build_cnn_model(X_train.shape[1:])
model.summary()

In [None]:
X_train.shape

## 3. Train Model

We'll train the CNN with:
- Early stopping to prevent overfitting
- Validation split to monitor performance
- Batch size of 32 for stable training

In [None]:
lr_scheduler = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',     # or 'val_accuracy'
    factor=0.5,
    patience=5,
    verbose=1,
    min_lr=1e-6
)
# Train model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[
        keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True), lr_scheduler
    ]
)

# Plot training history
plot_training_history(history)

## 4. Evaluate Model

Let's evaluate our model's performance on the test set.

In [None]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_val_, y_val)
print(f"Test Accuracy: {test_accuracy:.4f}")

# Make predictions
y_pred = model.predict(X_val_)
y_pred_discrete = (y_pred > 0.5).astype(int)
# Plot confusion matrix
plot_confusion_matrix(y_val, y_pred_discrete)

In [None]:
plot_roc_curve(y_val, y_pred)

# 5 - Making predictions

In [None]:
 import pandas as pd 
 test_predictions = model.predict(X_test)

In [None]:
print(test_ids.shape)
print(test_predictions.shape)


In [None]:
# Ensure 1D arrays
test_ids = np.ravel(test_ids)
test_predictions = np.ravel(test_predictions)

# Create submission DataFrame
solution = pd.DataFrame({'id': test_ids, 'label': test_predictions})
solution.to_csv('submission.csv', index=False)