## 1) Turn on GPU
Kaggle notebook → **Settings** → **Accelerator** → **GPU**.
If a session is already running, stop/restart the session after changing it.

In [None]:
import torch
print('torch:', torch.__version__)
print('cuda available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('gpu:', torch.cuda.get_device_name(0))

## 2) Add your Dataset input (ZIP upload)
Because Kaggle has a **1000-file limit** for individual uploads, you should upload just one file: `kaggle_upload.zip`
(you already created it locally).

On Kaggle: create a Dataset containing only `kaggle_upload.zip`, then **Add Input** that dataset to this notebook.

In [None]:
# IMPORTANT: set this to the exact folder shown under Kaggle 'Inputs'

# Tip: you can run `!ls /kaggle/input` in a separate cell to see the names.

DATASET_DIR = '/kaggle/input/PUT_YOUR_DATASET_NAME_HERE'



import os, shutil

from pathlib import Path



WORK_DIR = '/kaggle/working/proj'

os.makedirs(WORK_DIR, exist_ok=True)



p = Path(DATASET_DIR)

if not p.exists():

    raise FileNotFoundError(f"DATASET_DIR does not exist: {DATASET_DIR}. Check /kaggle/input and copy the exact path.")



# Show what's inside the input directory (helps debugging)

items = sorted([x.name for x in p.iterdir()])

print('DATASET_DIR items (first 50):', items[:50])



# Case A: zip exists somewhere inside the input (recommended upload)

zip_paths = sorted(p.rglob('*.zip'))

if zip_paths:

    zip_path = str(zip_paths[0])

    print('Found zip:', zip_path)

    shutil.unpack_archive(zip_path, WORK_DIR)

else:

    # Case B: Kaggle auto-unzipped OR you uploaded files directly.

    # Try to find a directory that looks like the project root.

    candidates = [p] + [x for x in p.iterdir() if x.is_dir()]

    proj = None

    for c in candidates:

        if (c / 'train.py').exists() and (c / 'infer.py').exists() and (c / 'train').exists() and (c / 'test').exists():

            proj = c

            break

    if proj is None:

        matches = sorted(p.rglob('train.py'))

        if matches:

            proj = matches[0].parent

    if proj is None:

        raise FileNotFoundError(

            'No .zip found and could not locate project root (train.py). ' 

            'Open the Inputs panel and verify your dataset contains kaggle_upload.zip or the unzipped project files.'

        )

    print('Using project root:', str(proj))

    shutil.copytree(proj, WORK_DIR, dirs_exist_ok=True)



%cd /kaggle/working/proj

print('cwd:', os.getcwd())

print('top-level files:', sorted(os.listdir('.'))[:30])


In [None]:
!pip -q install -r requirements.txt

## 3) Train (strong settings)
Start with these. If you get CUDA OOM, change to `--batch_size 16 --accum_steps 2`.
This run also tunes per-class thresholds on validation and stores them in the checkpoint.

Upgrades used here:
- `--eval_with_ema`: checkpoints based on EMA weights (usually better leaderboard).
- `--train_use_val`: include validation clips during training (often boosts leaderboard; it *does* leak val into train).

In [None]:
# Single strong run (recommended first)
!python train.py \
  --epochs 50 \
  --batch_size 32 \
  --accum_steps 1 \
  --lr 1e-3 \
  --weight_decay 1e-2 \
  --specaug \
  --mixup_alpha 0.2 \
  --focal_gamma 1.5 \
  --tune_thresholds \
  --train_use_val \
  --ema_decay 0.999 \
  --eval_with_ema \
  --seed 42 \
  --out checkpoints/crnn_adv_s42.pt

# Optional: 3-seed ensemble (takes longer, but usually better LB)
# !python train.py \
#   --epochs 50 \
#   --batch_size 32 \
#   --accum_steps 1 \
#   --lr 1e-3 \
#   --weight_decay 1e-2 \
#   --specaug \
#   --mixup_alpha 0.2 \
#   --focal_gamma 1.5 \
#   --tune_thresholds \
#   --train_use_val \
#   --ema_decay 0.999 \
#   --eval_with_ema \
#   --seed 43 \
#   --out checkpoints/crnn_adv_s43.pt
# !python train.py \
#   --epochs 50 \
#   --batch_size 32 \
#   --accum_steps 1 \
#   --lr 1e-3 \
#   --weight_decay 1e-2 \
#   --specaug \
#   --mixup_alpha 0.2 \
#   --focal_gamma 1.5 \
#   --tune_thresholds \
#   --train_use_val \
#   --ema_decay 0.999 \
#   --eval_with_ema \
#   --seed 44 \
#   --out checkpoints/crnn_adv_s44.pt

## 4) Inference → submission.csv
`infer.py` writes the required `id` column by default.

Upgrades used here:
- `--use_ema`: use EMA weights saved in the checkpoint (if available).
- `--tta_shifts 1`: averages predictions with small time shifts (often helps).
- Multiple `--ckpt ...` arguments: ensemble several checkpoints.

In [None]:
# Single model inference
!python infer.py --ckpt checkpoints/crnn_adv_s42.pt --use_ema --tta_shifts 1 --batch_size 32 --out submission.csv

# Optional: ensemble inference (uncomment if you trained s43/s44)
# !python infer.py \
#   --ckpt checkpoints/crnn_adv_s42.pt \
#   --ckpt checkpoints/crnn_adv_s43.pt \
#   --ckpt checkpoints/crnn_adv_s44.pt \
#   --use_ema --tta_shifts 1 \
#   --batch_size 32 \
#   --out submission.csv

!python -c "import pandas as pd; df=pd.read_csv('submission.csv'); print(df.head()); print('rows:', len(df)); print('cols:', list(df.columns))"

## 5) Submit
Download `submission.csv` from the Kaggle notebook **Output** panel and upload it to the competition submission page.