Path A: The Autoencoder Approach (Learned Features): The model figures out what's important by itself.
Path B: The Manual Labeling Approach (Engineered Features): You, the human, tell the model what's important by labeling it.

The best project will use the Manual Labeling approach as a baseline to prove how powerful the Autoencoder is


That's an excellent strategic question. The choice of boss will have a significant impact on the difficulty of your project and the clarity of your results. An easier boss is definitely better, but "easy" for an AI has a different meaning than "easy" for a human.

Here is a breakdown of boss recommendations, from best to worst, specifically for your imitation learning project.

---

### **Top Recommendation: Cagney Carnation**

This is, without a doubt, your best choice. Cagney provides the ideal balance of simplicity and variety for this project.

**Reasons Why Cagney is Perfect:**

1.  **Fixed Arena & Camera:** The screen does not scroll. The camera is locked in place, and the ground platforms never move. This is a **massive** advantage. It means the background is static, making it much easier for the autoencoder to learn to ignore it and focus only on the moving elements (Cagney, projectiles, you).
2.  **Clear, Telegraphed Attacks:** Cagney's attacks are visually distinct and have obvious tells.
    *   *Face Lunge:* His face stretches out.
    *   *Seed Gatling Gun:* He opens his petals and shoots seeds in clear patterns.
    *   *Flying Acorns:* Acorns fly in predictable arcs.
    *   This clarity means there is a strong, learnable correlation between "what's on screen" and "what the player should do." Your model can learn rules like "IF big face stretches, THEN dash."
3.  **Consistent Player Position:** You spend almost the entire fight on the ground level, moving left and right. There's very little vertical platforming. This simplifies the "state" of the player, making your actions (jump, dash, shoot) more consistent and easier for the model to imitate.
4.  **Manageable Final Phase:** The final phase with the thorny vines is visually distinct but still predictable. The patterns are regular, making it a solvable problem for the AI.

**In short, the Cagney Carnation fight is a highly constrained and repeatable scientific experiment, which is exactly what you want for a machine learning project.**

---

### **Good Alternative: Goopy Le Grande**

If you want a slightly different but still very manageable boss, Goopy is a strong second choice.

**Reasons Why Goopy is a Good Choice:**

1.  **Fixed Arena:** Just like Cagney, the camera is locked, and the ground is flat. This is a huge bonus.
2.  **Simple Visuals:** Goopy is just a big blue blob. His attacks (bouncing, punching) are visually simple. There are very few projectiles on screen at any given time, reducing visual clutter.
3.  **Distinct Phases:** His transformation into a giant Goopy and then a tombstone are visually dramatic and distinct. This is a great opportunity to analyze your autoencoder's latent space (the t-SNE plot) to see if it creates three separate "mega-clusters" corresponding to the three phases of the fight.

**The Downside:** The fight is almost *too* simple. The required actions are less varied than in the Cagney fight (mostly just jumping over him), which might make the results slightly less interesting, but it's a very safe and reliable choice.


# Sync check

https://www.loom.com/share/2841d59aadc14c17be7281f7e3fc98c2

In [None]:
import os
# Set Keras backend to TensorFlow
os.environ["KERAS_BACKEND"] = "tensorflow"

from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Conv2DTranspose
from keras.models import Model
from sklearn.model_selection import train_test_split

print("✅ Keras imports successful!")

: 

In [None]:
import json
import cv2
import numpy as np
import pandas as pd
from pathlib import Path
from tqdm.notebook import tqdm
import math

# --- Configuration (remains the same) ---
SESSION_NAME = "Cagney_1"
DATA_DIR = Path("../data/sessions")
VIDEO_PATH = DATA_DIR / f"{SESSION_NAME}.mp4"
LOG_PATH = DATA_DIR / f"{SESSION_NAME}.jsonl"
IMG_HEIGHT, IMG_WIDTH = 72, 128
ACTIONS = ['Key.up', 'Key.down', 'Key.left', 'Key.right', 'Key.space', 'f', 'd', 'x', 'a']
ACTION_MAP = {action: i for i, action in enumerate(ACTIONS)}
NUM_ACTIONS = len(ACTIONS)

# --- Helper functions (remain the same) ---
def get_fight_intervals(log_path):
    intervals = []
    start_time = None
    with open(log_path, 'r') as f:
        for line in f:
            data = json.loads(line)
            if data.get('event') == 'marker':
                if data['type'] == 'fight_start': start_time = data['t']
                elif data['type'] == 'fight_end' and start_time is not None:
                    intervals.append((start_time, data['t']))
                    start_time = None
    return intervals

def get_key_state_timeline(log_path):
    key_events = []
    with open(log_path, 'r') as f:
        for line in f:
            data = json.loads(line)
            if data.get('event') in ['keydown', 'keyup']:
                key_events.append((data['t'], data['key'], data['event']))
    key_events.sort(key=lambda x: x[0])
    return key_events

def get_keys_down_at_time(timeline, current_time):
    keys_down = set()
    for t, key, event in timeline:
        if t > current_time: break
        if event == 'keydown': keys_down.add(key)
        elif event == 'keyup': keys_down.discard(key)
    return keys_down

# --- NEW: The Data Generator ---
def data_generator(video_path, log_path, batch_size=32):
    fight_intervals = get_fight_intervals(log_path)
    key_timeline = get_key_state_timeline(log_path)
    
    cap = cv2.VideoCapture(str(video_path))
    fps = cap.get(cv2.CAP_PROP_FPS)
    
    while True: # Loop forever so the generator never terminates
        frames_batch = []
        labels_batch = []
        
        while len(frames_batch) < batch_size:
            ret, frame = cap.read()
            if not ret:
                # Reached end of video, reset the capture to the beginning
                cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
                continue

            current_time = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000.0
            is_in_fight = any(start <= current_time <= end for start, end in fight_intervals)

            if is_in_fight:
                processed_frame = cv2.resize(frame, (IMG_WIDTH, IMG_HEIGHT))
                processed_frame = cv2.cvtColor(processed_frame, cv2.COLOR_BGR_GRAY)
                processed_frame = processed_frame / 255.0
                frames_batch.append(processed_frame)

                keys_down = get_keys_down_at_time(key_timeline, current_time)
                label = np.zeros(NUM_ACTIONS, dtype=int)
                for key in keys_down:
                    if key in ACTION_MAP:
                        label[ACTION_MAP[key]] = 1
                labels_batch.append(label)

        # Convert batch to numpy arrays and yield
        X_batch = np.array(frames_batch).reshape(-1, IMG_HEIGHT, IMG_WIDTH, 1)
        y_batch = np.array(labels_batch)
        yield X_batch, y_batch
        
# --- Get Total Number of Fight Frames (for training steps calculation) ---
def count_fight_frames(video_path, log_path):
    fight_intervals = get_fight_intervals(log_path)
    cap = cv2.VideoCapture(str(video_path))
    fps = cap.get(cv2.CAP_PROP_FPS)
    count = 0
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    for i in tqdm(range(total_frames)):
        current_time = i / fps
        if any(start <= current_time <= end for start, end in fight_intervals):
            count += 1
    cap.release()
    return count

print("Calculating total number of valid fight frames...")
# This will still take a minute, but uses almost no memory
total_fight_frames = count_fight_frames(VIDEO_PATH, LOG_PATH) 
print(f"Found {total_fight_frames} frames within fight intervals.")

Calculating total number of valid fight frames...


  0%|          | 0/7222 [00:00<?, ?it/s]

Found 6118 frames within fight intervals.


In [2]:
print("--- Step 1: Defining the Encoder Architecture ---")

# --- Encoder ---
input_img = Input(shape=(IMG_HEIGHT, IMG_WIDTH, 1), name="Encoder_Input")
print(f"Input Shape: {input_img.shape}")

# First convolutional block
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
print(f"Shape after 1st Conv+Pool block: {x.shape}")

# Second convolutional block
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
print(f"Shape after 2nd Conv+Pool block: {x.shape}")

# Third convolutional block
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same', name="Latent_Vector")(x) 
print(f"Shape of final ENCODED representation (Latent Vector): {encoded.shape}")

print("\n--- Step 2: Defining the Decoder Architecture ---")

# --- Decoder ---
# First deconvolutional block
x = Conv2DTranspose(64, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
print(f"Shape after 1st Deconv+UpSample block: {x.shape}")

# Second deconvolutional block
x = Conv2DTranspose(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
print(f"Shape after 2nd Deconv+UpSample block: {x.shape}")

# Third deconvolutional block
x = Conv2DTranspose(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
print(f"Shape after 3rd Deconv+UpSample block: {x.shape}")

# Final output layer to reconstruct the image
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same', name="Decoder_Output")(x)
print(f"Shape of final DECODED output: {decoded.shape}")


print("\n--- Step 3: Assembling and Compiling Models ---")

# The full autoencoder model (trains both encoder and decoder)
autoencoder = Model(input_img, decoded, name="Autoencoder")
print("✅ Autoencoder model assembled.")

# A separate model that is just the encoder part
encoder = Model(input_img, encoded, name="Encoder")
print("✅ Encoder-only model assembled.")

# Compile the autoencoder for training
autoencoder.compile(optimizer='adam', loss='mse')
print("✅ Autoencoder compiled with 'adam' optimizer and 'mse' loss.")


print("\n--- Final Model Summary ---")
# Print the model summary to verify the architecture
autoencoder.summary()

--- Step 1: Defining the Encoder Architecture ---


NameError: name 'Input' is not defined

In [None]:


# In your training and visualization cell
import matplotlib.pyplot as plt

# --- Create Generators for Training and Validation ---
BATCH_SIZE = 32
# These variables should have been calculated in your first cell.
# A common split is 80/20 for train/validation.
train_frames = int(total_fight_frames * 0.8)
val_frames = total_fight_frames - train_frames

steps_per_epoch = train_frames // BATCH_SIZE
validation_steps = val_frames // BATCH_SIZE

print(f"Training on {train_frames} frames, validating on {val_frames} frames.")
print(f"Steps per epoch: {steps_per_epoch}")
print(f"Validation steps: {validation_steps}")

# Initialize the generators
train_gen = data_generator(VIDEO_PATH, LOG_PATH, batch_size=BATCH_SIZE)
val_gen = data_generator(VIDEO_PATH, LOG_PATH, batch_size=BATCH_SIZE)


# --- Train the Autoencoder using the Generator ---
# The autoencoder learns to reconstruct its own input, so the generator
# works perfectly here. The labels (y_batch) are simply ignored.
print("\nStarting autoencoder training...")
history = autoencoder.fit(
    train_gen,
    steps_per_epoch=steps_per_epoch,
    epochs=20, # 20 is a great starting point
    validation_data=val_gen,
    validation_steps=validation_steps
)
print("Autoencoder training complete.")


# --- Save the Encoder Model ---
# We only need the encoder part for the next stage.
# Using the .keras extension is the modern, recommended format.
encoder.save('cuphead_encoder.keras')
print("Encoder model saved to cuphead_encoder.keras")


# --- Visualize the Reconstructions ---
# We can't use `X_val` because it doesn't exist.
# Instead, we'll grab ONE batch from our validation generator to see how well it works.
print("\nGenerating sample reconstructions for visualization...")
X_val_sample, _ = next(val_gen) # The `_` ignores the labels

# Use the trained autoencoder to predict (reconstruct) this sample batch
decoded_imgs = autoencoder.predict(X_val_sample)

# --- Plot the results ---
n = 8  # Let's display 8 images for a good overview
plt.figure(figsize=(18, 4))
plt.suptitle("Autoencoder Reconstruction Quality", fontsize=16)
for i in range(n):
    # Display original image
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(X_val_sample[i].reshape(IMG_HEIGHT, IMG_WIDTH), cmap='gray')
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    if i == 0:
        ax.set_title("Original", loc='left', fontsize=10, y=-0.4)


    # Display reconstructed image
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(IMG_HEIGHT, IMG_WIDTH), cmap='gray')
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    if i == 0:
        ax.set_title("Reconstructed", loc='left', fontsize=10, y=-0.4)

plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()

UsageError: Line magic function `%` not found.
