Performance Analysis & Inferences
1. The "Smoothing" Paradox

        In CNNs: Smoothing helps "Soft" performance (Best 18 vs 21) but often hurts "Hard" performance. Why? Because a CNN is a pattern matcher; smoothing removes the high-frequency detail it uses to tell songs apart.


        In LSTMs: Smoothing acts as a Stabilizer. Look at Best 19/20: the accuracy for Soft and Hard is almost identical. This means the LSTM doesn't care if the input is noisy or clean‚Äîit only cares about the melodic sequence.

2. CNN vs. LSTM (Precision vs. Robustness)

        The Deeper CNN (Best 18/21) is your "High-Accuracy" specialist. It hits 60% Top-1 on Soft Hum. It is excellent at recognizing clear, well-sung melodies.

        The LSTM (Best 19/20) is your "Tank." It is much more consistent. Notice that Best 19 has identical Top-1 scores (53.8%) for both Soft and Hard. It is completely unfazed by the noise and jitter of a "Hard Hum."

3. The "Hard Hum" Champions

        Winner: Best 16 (Old + Smoothing) at 51.5% and Best 19 (LSTM + Smoothing) at 53.8%.

        Loser: Best 15 (Old No Smoothing) at 11.33%.

        Inference: Without smoothing or sequence modeling (LSTM), your model cannot "see" through the noise of a bad hum.


TABLE

        Model Style,Best Top-1 (Soft),Best Top-1 (Hard),Character
        Deeper CNN (No Smooth),60.0%,48.8%,"High Precision, Moderate Noise Resistance."
        Deeper CNN (Smooth),60.2%,40.5%,"Great for ""singing,"" bad for ""humming."""
        LSTM (Smooth),53.8%,53.8%,Rock Solid. Performance doesn't drop with noise.
        Old Model (Smooth),53.2%,51.5%,"Surprisingly decent, but limited growth."

ALL IN ONE


In [None]:
!pip install torchcodec
!pip install torchcrepe

In [None]:
!pip install yt_dlp

In [None]:
!apt-get update
!apt-get install -y aria2


DAY 2

In [None]:
import shutil
import os

# folders used by your script
folders_to_clean = [
    "/content/data_unique",
    "/content/eval",
    "/content/tmp_extract"
]

for folder in folders_to_clean:
    if os.path.exists(folder):
        print(f"üßπ Removing: {folder}")
        shutil.rmtree(folder)
    else:
        print(f"‚úî Skipped (not found): {folder}")

# recreate empty directories
os.makedirs("/content/data_unique", exist_ok=True)
os.makedirs("/content/eval", exist_ok=True)
os.makedirs("/content/tmp_extract", exist_ok=True)

print("\n‚ú® Clean reset complete. Run your ZIP ‚Üí VALIDATE ‚Üí DEDUP script now.")


üßπ Removing: /content/data_unique
üßπ Removing: /content/eval
üßπ Removing: /content/tmp_extract

‚ú® Clean reset complete. Run your ZIP ‚Üí VALIDATE ‚Üí DEDUP script now.


In [None]:
# =============================================================
# ZIP -> EXTRACT -> VALIDATE -> DEDUP (with limit) -> SAVE UNIQUE
# =============================================================
import os
import zipfile
import numpy as np
from tqdm import tqdm
import hashlib
import csv
import random
import shutil
from collections import defaultdict # <--- *** FIXED: ADDED THIS IMPORT ***

# ---------- CONFIGURATION ----------
ZIP_DIR = "/content/zips"
EXTRACT_TMP = "/content/tmp_extract"
OUTPUT_DIR = "/content/data_unique"  # unique, validated npy
EVAL_DIR = "/content/eval"           # evaluation set
METADATA_CSV = os.path.join(OUTPUT_DIR, "metadata.csv")

# NEW CONFIG: Control how many duplicates of the exact same array content are kept
MAX_DUPLICATE_COPIES = 1
EVAL_N = 400

# Ensure directories exist
os.makedirs(EXTRACT_TMP, exist_ok=True)
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(EVAL_DIR, exist_ok=True)

# ---------- FIND ZIPs ----------
zip_files = [f for f in os.listdir(ZIP_DIR) if f.endswith(".zip")]
print(f"üì¶ Found {len(zip_files)} ZIP files")

# ---------- VALIDATION ----------
def is_valid_pitch(arr):
    # Checks specific to pitch data (1D, non-empty, no junk values, not all zero)
    if not isinstance(arr, np.ndarray): return False
    if arr.size == 0: return False
    if np.isnan(arr).any(): return False
    if np.isinf(arr).any(): return False
    if arr.ndim != 1: return False
    if arr.sum() == 0: return False
    return True

# ---------- HASHING ----------
def md5_of_array(arr):
    m = hashlib.md5()
    # Use raw bytes representation (ensure dtype & shape consistent)
    m.update(arr.astype(np.float32).tobytes())
    m.update(str(arr.shape).encode())
    m.update(str(arr.dtype).encode())
    return m.hexdigest()

# Refined 'seen' tracks hash count: {md5: count}
seen_hash_count = defaultdict(int)
# Tracks hash -> first saved filename for reference
seen_first_filename = {}

duplicates_kept = 0
duplicates_skipped = 0
good_saved = 0
bad_invalid = 0
total_npy_seen = 0

# Prepare metadata CSV
meta_fields = ["src_zip", "orig_path", "saved_name", "md5", "status", "note"]
meta_rows = []

# ---------- PROCESS zips ----------
for zip_name in tqdm(zip_files, desc="Processing ZIPs"):
    zip_path = os.path.join(ZIP_DIR, zip_name)
    try:
        with zipfile.ZipFile(zip_path, 'r') as z:
            z.extractall(EXTRACT_TMP)
    except Exception as e:
        print(f"Warning: Could not extract {zip_name}. Skipping. Error: {e}")
        continue

    # Walk extracted contents
    for root, _, files in os.walk(EXTRACT_TMP):
        for f in files:
            if not f.endswith(".npy"):
                continue

            total_npy_seen += 1
            npy_path = os.path.join(root, f)
            relative_path = os.path.relpath(npy_path, EXTRACT_TMP)

            arr = None
            try:
                arr = np.load(npy_path)
            except Exception as e:
                bad_invalid += 1
                meta_rows.append([zip_name, relative_path, "", "", "bad_load", str(e)])
                continue

            if not is_valid_pitch(arr):
                bad_invalid += 1
                meta_rows.append([zip_name, relative_path, "", "", "invalid", "empty/nan/inf/dim"])
                continue

            h = md5_of_array(arr)

            # --- DUPLICATE HANDLING LOGIC ---

            is_duplicate = h in seen_hash_count

            if is_duplicate and seen_hash_count[h] >= MAX_DUPLICATE_COPIES:
                # We already have enough copies of this exact array content
                duplicates_skipped += 1
                meta_rows.append([zip_name, relative_path, seen_first_filename.get(h, ""), h, "duplicate_skipped", f"max copies ({MAX_DUPLICATE_COPIES}) reached"])
                continue

            # If not a duplicate, or if we need to keep more copies

            # Handle potential filename collision in OUTPUT_DIR
            base_name = os.path.splitext(f)[0]
            save_name = f"{base_name}.npy"

            # If MD5 is new, or we are keeping the duplicate
            if not is_duplicate or seen_hash_count[h] < MAX_DUPLICATE_COPIES:

                # Suffix file name if we're saving a duplicate copy
                if is_duplicate:
                    # e.g., song_a.npy -> song_a_copy1.npy
                    save_name = f"{base_name}_copy{seen_hash_count[h]}.npy"
                    duplicates_kept += 1
                    status = "duplicate_kept"
                    note = f"copy {seen_hash_count[h]} saved"
                else:
                    status = "saved"
                    note = "unique file saved"

                # Final check for filename collision (should be rare now)
                if os.path.exists(os.path.join(OUTPUT_DIR, save_name)):
                    save_name = f"{base_name}_{h[:8]}.npy"

                save_path = os.path.join(OUTPUT_DIR, save_name)
                np.save(save_path, arr.astype(np.float32))

                # Update tracking
                seen_hash_count[h] += 1
                if not is_duplicate:
                    seen_first_filename[h] = save_name

                good_saved += 1
                meta_rows.append([zip_name, relative_path, save_name, h, status, note])

    # clear tmp (Using shutil.rmtree is safer and faster for cleanup)
    shutil.rmtree(EXTRACT_TMP, ignore_errors=True)
    os.makedirs(EXTRACT_TMP, exist_ok=True) # Recreate empty folder

# ---------- WRITE METADATA ----------
with open(METADATA_CSV, "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(meta_fields)
    writer.writerows(meta_rows)

# ---------- CREATE EVAL SET ----------
# The unique source files are those that were saved (good_saved count)
unique_files = [f for f in os.listdir(OUTPUT_DIR) if f.endswith(".npy") and f != "metadata.csv"]
random.shuffle(unique_files)

# Delete old eval files first
for f in os.listdir(EVAL_DIR):
    os.remove(os.join(EVAL_DIR, f)) # Fixed os.path.join error

eval_select = unique_files[:min(EVAL_N, len(unique_files))]
for fn in eval_select:
    src = os.path.join(OUTPUT_DIR, fn)
    dst = os.path.join(EVAL_DIR, fn)
    shutil.copy(src, dst)

# ---------- SUMMARY ----------
print("‚úÖ DONE!")
print(f"üî¢ Total .npy encountered: {total_npy_seen}")
print(f"üëç Good files saved (including kept copies): {good_saved}")
print(f"   -> Max copies per unique array: {MAX_DUPLICATE_COPIES}")
print(f"   -> Duplicates Kept: {duplicates_kept}")
print(f"üîÅ Duplicates Skipped: {duplicates_skipped}")
print(f"‚ùå Bad/invalid files: {bad_invalid}")
print(f"üìÅ Unique data saved to: {OUTPUT_DIR}")
print(f"üìù Metadata CSV: {METADATA_CSV}")
print(f"üéØ Eval set created: {len(eval_select)} files -> {EVAL_DIR}")

üì¶ Found 77 ZIP files


Processing ZIPs: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 77/77 [00:12<00:00,  5.93it/s]

‚úÖ DONE!
üî¢ Total .npy encountered: 7050
üëç Good files saved (including kept copies): 4051
   -> Max copies per unique array: 1
   -> Duplicates Kept: 0
üîÅ Duplicates Skipped: 1525
‚ùå Bad/invalid files: 1474
üìÅ Unique data saved to: /content/data_unique
üìù Metadata CSV: /content/data_unique/metadata.csv
üéØ Eval set created: 400 files -> /content/eval





OLD

In [None]:
import os
import random
import numpy as np
import scipy.signal as sg
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau # <--- NEW IMPORT

# -------------------------
# Hyperparams (UPDATED)
# -------------------------
TARGET_LEN = 300 #frame size
BATCH_SIZE = 32
EPOCHS = 100       # <--- REDUCED EPOCHS
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# -------------------------
# UPDATED PATHS
# -------------------------
PITCH_DIR = "/content/data_unique"
CKPT_DIR = "/content/pitch_modelV1_old" # Using the path provided by the user
os.makedirs(CKPT_DIR, exist_ok=True)

BEST = f"{CKPT_DIR}/best.pth"
LAST = f"{CKPT_DIR}/last.pth"

# -------------------------
# Augment hum (strong) - NO CHANGE
# -------------------------
def augment_hum(pitch):
    pitch = pitch.copy().astype(np.float32)
    pitch += np.random.normal(0, 0.06, size=len(pitch))
    semitones = np.random.uniform(-5, 5)
    pitch[pitch > 0] += semitones * 0.057
    if random.random() < 0.7:
        rate = np.random.uniform(0.8, 1.25)
        old_idx = np.arange(len(pitch))
        new_idx = np.linspace(0, len(pitch)-1, max(2, int(len(pitch)*rate)))
        pitch = np.interp(new_idx, old_idx, pitch)
    try:
        b, a = sg.butter(3, 0.15)
        pitch = sg.filtfilt(b, a, pitch)
    except Exception:
        pass
    pitch += np.random.normal(0, 0.04, size=len(pitch))
    return pitch.astype(np.float32)

# -------------------------
# Helper: force equal length - NO CHANGE
# -------------------------
def force_length(arr, target_len=TARGET_LEN):
    if arr is None:
        return np.zeros(target_len, dtype=np.float32)
    if len(arr) < target_len:
        return np.pad(arr, (0, target_len - len(arr)), mode='constant')
    elif len(arr) > target_len:
        start = random.randint(0, len(arr) - target_len)
        return arr[start:start + target_len]
    return arr

# -------------------------
# Dataset - NO CHANGE
# -------------------------
class PitchDatasetV3(Dataset):
    def __init__(self, pitch_dir, target_len=TARGET_LEN):
        self.files = sorted([f for f in __import__("glob").glob(os.path.join(pitch_dir, "*.npy"))])
        self.target_len = target_len
        print(f"Loaded {len(self.files)} cleaned pitch contours from {pitch_dir}")

    def _random_crop(self, arr):
        if len(arr) <= self.target_len:
            return force_length(arr, self.target_len)
        start = random.randint(0, len(arr) - self.target_len)
        return arr[start:start + self.target_len]

    def __getitem__(self, idx):
        anchor_path = self.files[idx]
        anchor_full = np.load(anchor_path)

        neg_idx = random.randint(0, len(self.files) - 1)
        while neg_idx == idx:
            neg_idx = random.randint(0, len(self.files) - 1)
        neg_full = np.load(self.files[neg_idx])

        anchor = self._random_crop(anchor_full)
        positive_clean = self._random_crop(anchor_full)
        positive_hum = augment_hum(positive_clean)
        positive_hum = force_length(positive_hum, self.target_len)

        negative = self._random_crop(neg_full)
        negative = augment_hum(negative)
        negative = force_length(negative, self.target_len)

        anchor = force_length(anchor, self.target_len)
        positive_clean = force_length(positive_clean, self.target_len)

        return (
            torch.from_numpy(anchor).unsqueeze(0).float(),
            torch.from_numpy(positive_clean).unsqueeze(0).float(),
            torch.from_numpy(positive_hum).unsqueeze(0).float(),
            torch.from_numpy(negative).unsqueeze(0).float(),
        )

    def __len__(self):
        return len(self.files)

# -------------------------
# Siamese Model - NO CHANGE
# -------------------------
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 32, 5, padding=2), nn.BatchNorm1d(32), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(32, 64, 5, padding=2), nn.BatchNorm1d(64), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(64, 128, 3, padding=1), nn.BatchNorm1d(128), nn.ReLU(),
            nn.AdaptiveAvgPool1d(1)
        )
        self.fc = nn.Sequential(
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1)

# -------------------------
# Training Loop (FIXED num_workers)
# -------------------------
def train_v3_fixed():
    print(f"Training on: {DEVICE}")

    dataset = PitchDatasetV3(PITCH_DIR, target_len=TARGET_LEN)
    # FIX: Set num_workers=0 to prevent the AssertionErrors in the console
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0, pin_memory=True)

    model = PitchSiameseNet().to(DEVICE)
    optim = torch.optim.Adam(model.parameters(), lr=0.0001)
    loss_fn = nn.TripletMarginLoss(margin=0.8)

    # Learning Rate Scheduler Initialization
    # patience=8 means LR drops if loss doesn't improve for 8 epochs.
    scheduler = ReduceLROnPlateau(optim, mode='min', factor=0.5, patience=8)

    best_loss = float('inf')

    for epoch in range(EPOCHS):
        model.train()
        total_loss = 0.0

        for anchor, pos_clean, pos_hum, neg in loader:
            anchor = anchor.to(DEVICE)
            pos_clean = pos_clean.to(DEVICE)
            pos_hum = pos_hum.to(DEVICE)
            neg = neg.to(DEVICE)

            optim.zero_grad()
            a = model.forward_one(anchor)
            pc = model.forward_one(pos_clean)
            ph = model.forward_one(pos_hum)
            n = model.forward_one(neg)

            # Dual Triplet Loss
            loss = loss_fn(a, pc, n) + loss_fn(a, ph, n)
            loss.backward()
            optim.step()
            total_loss += loss.item()

        avg = total_loss / len(loader)
        print(f"Epoch {epoch+1}/{EPOCHS} | Loss: {avg:.4f} | LR: {optim.param_groups[0]['lr']:.6f}")

        # Step the scheduler
        scheduler.step(avg)

        if avg < best_loss:
            best_loss = avg
            torch.save(model.state_dict(), BEST)
            print(f" ‚≠ê New BEST saved: {BEST} (loss={best_loss:.4f})")

        torch.save({
            "epoch": epoch + 1,
            "model": model.state_dict(),
            "optimizer": optim.state_dict(),
            "best_loss": best_loss,
        }, LAST)

    print("Training finished.")
    print(f"Best checkpoint: {BEST}")
    print(f"Latest checkpoint: {LAST}")

# -------------------------
# Run
# -------------------------
if __name__ == "__main__":
    train_v3_fixed()

Training on: cuda
Loaded 4051 cleaned pitch contours from /content/data_unique
Epoch 1/100 | Loss: 0.6295 | LR: 0.000100
 ‚≠ê New BEST saved: /content/pitch_modelV1_old/best.pth (loss=0.6295)
Epoch 2/100 | Loss: 0.4605 | LR: 0.000100
 ‚≠ê New BEST saved: /content/pitch_modelV1_old/best.pth (loss=0.4605)
Epoch 3/100 | Loss: 0.4470 | LR: 0.000100
 ‚≠ê New BEST saved: /content/pitch_modelV1_old/best.pth (loss=0.4470)
Epoch 4/100 | Loss: 0.4216 | LR: 0.000100
 ‚≠ê New BEST saved: /content/pitch_modelV1_old/best.pth (loss=0.4216)
Epoch 5/100 | Loss: 0.4287 | LR: 0.000100
Epoch 6/100 | Loss: 0.4313 | LR: 0.000100
Epoch 7/100 | Loss: 0.4126 | LR: 0.000100
 ‚≠ê New BEST saved: /content/pitch_modelV1_old/best.pth (loss=0.4126)
Epoch 8/100 | Loss: 0.4093 | LR: 0.000100
 ‚≠ê New BEST saved: /content/pitch_modelV1_old/best.pth (loss=0.4093)
Epoch 9/100 | Loss: 0.4029 | LR: 0.000100
 ‚≠ê New BEST saved: /content/pitch_modelV1_old/best.pth (loss=0.4029)
Epoch 10/100 | Loss: 0.4012 | LR: 0.000100
 ‚≠

OLD EVAL

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import os
import random
import scipy.signal as sg
from collections import defaultdict
from tqdm import tqdm

# ======================================================
# CONFIGURATION
# ======================================================
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# Data Paths
VAL_DIR = "/content/eval"                  # UPDATED: Points to your 400 .npy files
MODEL_PATH = "/content/pitch_modelV1_old/best.pth" # Your trained model

# Windowing Params
WIN_LEN = 300      # 3 seconds (at 100Hz)
HOP_LEN = 150      # 1.5 seconds overlap
TOLERANCE = 1.0    # Time bucket tolerance in seconds (for voting)
TOP_K_MATCHES = 20 # How many candidates to check per window

# Eval Params
NUM_TRIALS = 150   # How many random songs to test
SEGMENT_LEN = 1500 # 15 seconds of hum (approx)


# ======================================================
# 1. MODEL ARCHITECTURE (Must match training)
# ======================================================
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 32, 5, padding=2), nn.BatchNorm1d(32), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(32, 64, 5, padding=2), nn.BatchNorm1d(64), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(64, 128, 3, padding=1), nn.BatchNorm1d(128), nn.ReLU(),
            nn.AdaptiveAvgPool1d(1)
        )
        self.fc = nn.Sequential(
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1)

# Load Model
print(f"‚è≥ Loading model from {MODEL_PATH}...")
model = PitchSiameseNet(embed_dim=128).to(DEVICE)
try:
    checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
    # Handle if state dict is nested or direct
    if "model" in checkpoint:
        model.load_state_dict(checkpoint["model"])
    else:
        model.load_state_dict(checkpoint)
    print("‚úÖ Model loaded successfully.")
except Exception as e:
    print(f"‚ùå Error loading model: {e}")
    exit()
model.eval()


# ======================================================
# 2. AUGMENTATION UTILS (Soft & Hard)
# ======================================================

def augment_soft(pitch):
    """Simulates a good singer (slight pitch wobble)"""
    arr = pitch.copy().astype(np.float32)
    arr += np.random.normal(0, 0.02, size=len(arr)) # Light noise
    return arr

def augment_hard(pitch):
    """Simulates a hum: Key shift, Time stretch, Jitter"""
    arr = pitch.copy().astype(np.float32)

    # 1. Jitter
    arr += np.random.normal(0, 0.06, size=len(arr))

    # 2. Key Shift (Simulated by adding semitones if log-scale/cents, or linear shift)
    # Assuming standard augmentation logic from training
    semitones = np.random.uniform(-3, 3)
    arr[arr > 0] += semitones * 0.057

    # 3. Time Warp (Linear interpolation)
    if random.random() < 0.8:
        rate = np.random.uniform(0.85, 1.15)
        old_idx = np.arange(len(arr))
        new_len = int(len(arr) * rate)
        new_idx = np.linspace(0, len(arr)-1, new_len)
        arr = np.interp(new_idx, old_idx, arr)

    # Force back to original length (crop or pad)
    target = len(pitch)
    if len(arr) < target:
        arr = np.pad(arr, (0, target - len(arr)), mode='constant')
    else:
        start = (len(arr) - target) // 2
        arr = arr[start:start+target]

    return arr.astype(np.float32)


# ======================================================
# 3. EMBEDDING UTILS
# ======================================================
def process_sequence_to_embeddings(arr):
    """
    Takes a full pitch array (Song or Hum).
    Returns: Tensor of embeddings, List of timestamps (offsets in seconds)
    """
    windows = []
    offsets = []

    i = 0
    while i + WIN_LEN <= len(arr):
        crop = arr[i : i + WIN_LEN]

        # Skip if mostly silence (optional, keeps DB clean)
        if np.mean(crop > 0) < 0.1:
            i += HOP_LEN
            continue

        windows.append(crop)
        offsets.append(i / 100.0) # Assuming 100Hz sample rate = 0.01s per frame
        i += HOP_LEN

    if not windows:
        return None, None

    # Batch process for speed
    windows_np = np.stack(windows)
    windows_tensor = torch.from_numpy(windows_np).float().unsqueeze(1).to(DEVICE) # (B, 1, 300)

    with torch.no_grad():
        embeddings = model.forward_one(windows_tensor)

    return embeddings, offsets # (B, 128), List[float]


# ======================================================
# 4. BUILD DATABASE (FLATTENED)
# ======================================================
def build_flat_database():
    """
    Creates a massive tensor of all windows from all songs.
    Returns:
       all_embeds: Tensor (Total_Windows, 128)
       metadata: List of (SongName, Offset_Seconds)
    """
    # 1. Load files from the CORRECT directory
    files = sorted([f for f in os.listdir(VAL_DIR) if f.endswith(".npy")])

    # NOTE: The limit files[:250] has been REMOVED to use all files.

    all_embeds_list = []
    metadata = []

    print(f"üèóÔ∏è Building Database from {len(files)} songs in {VAL_DIR}...")

    for f_name in tqdm(files):
        path = os.path.join(VAL_DIR, f_name)
        arr = np.load(path)

        embeds, offsets = process_sequence_to_embeddings(arr)
        if embeds is None: continue

        all_embeds_list.append(embeds)
        song_id = f_name.replace(".npy", "")

        for t in offsets:
            metadata.append((song_id, t))

    # Stack into one giant tensor for matrix multiplication
    full_db_tensor = torch.cat(all_embeds_list, dim=0)

    print(f"‚úÖ DB Built: {full_db_tensor.shape[0]} total windows across {len(files)} songs.")
    return full_db_tensor, metadata


# ======================================================
# 5. GEOMETRIC SCORING (The "Magic")
# ======================================================
def query_database_geometric(query_embeds, query_offsets, db_tensor, db_metadata):
    """
    1. Compares Query Windows vs ALL DB Windows.
    2. Filters Top-K matches per window.
    3. Aligns them using Delta T (Projected Start Time).
    4. Votes for the best song.
    """

    # 1. Calculate Distance Matrix (Query_Size x DB_Size)
    # Using CDIST (Euclidean)
    dists = torch.cdist(query_embeds, db_tensor, p=2)

    # 2. Get Top K matches for each query window
    # values: (Q, K), indices: (Q, K)
    top_vals, top_inds = torch.topk(dists, k=TOP_K_MATCHES, dim=1, largest=False)

    top_vals = top_vals.cpu().numpy()
    top_inds = top_inds.cpu().numpy()

    # 3. Voting Containers
    # Key: (SongID, BucketIndex) -> Value: Score
    vote_buckets = defaultdict(float)
    epsilon = 1e-4 # Avoid div by zero

    for q_idx, q_time in enumerate(query_offsets):
        for k in range(TOP_K_MATCHES):
            match_idx = top_inds[q_idx, k]
            dist = top_vals[q_idx, k]

            # Retrieve DB Info
            match_song, match_time = db_metadata[match_idx]

            # Calculate Projected Start Time (The "Alignment")
            # If match is true: MatchTime - QueryTime should be constant (the song start)
            projected_start = match_time - q_time

            # Quantize into Buckets (Rounding to nearest tolerance)
            bucket = int(round(projected_start / TOLERANCE))

            # Score Weighting
            # Closer vectors = Higher Score
            score = 1.0 / (dist + epsilon)

            vote_buckets[(match_song, bucket)] += score

    # 4. Aggregate Scores per Song
    # We take the MAX bucket score for each song (best alignment)
    song_final_scores = defaultdict(float)

    for (song, bucket), score in vote_buckets.items():
        if score > song_final_scores[song]:
            song_final_scores[song] = score

    # 5. Sort Results
    ranked_songs = sorted(song_final_scores.items(), key=lambda x: x[1], reverse=True)
    return [x[0] for x in ranked_songs] # Return list of song IDs


# ======================================================
# 6. MAIN EVALUATION LOOP
# ======================================================
if __name__ == "__main__":

    # Build DB once
    db_tensor, db_metadata = build_flat_database()

    song_list = list(set([m[0] for m in db_metadata]))

    print(f"\nüöÄ Starting Evaluation: {NUM_TRIALS} Trials")
    print(f"   Using {TOP_K_MATCHES} neighbors per window with Geometric Scoring.")

    results = {
        "Soft": {"top1": 0, "top5": 0, "top10": 0},
        "Hard": {"top1": 0, "top5": 0, "top10": 0}
    }

    for _ in tqdm(range(NUM_TRIALS)):
        # Pick Random Target
        target_song = random.choice(song_list)

        # Load Full File
        full_arr = np.load(os.path.join(VAL_DIR, f"{target_song}.npy"))
        if len(full_arr) < SEGMENT_LEN: continue # Skip if too short

        # Create Random Crop (The "Truth")
        start_idx = np.random.randint(0, len(full_arr) - SEGMENT_LEN)
        clean_clip = full_arr[start_idx : start_idx + SEGMENT_LEN]

        # -----------------------------
        # TEST 1: SOFT AUGMENTATION
        # -----------------------------
        soft_hum = augment_soft(clean_clip)
        q_emb, q_off = process_sequence_to_embeddings(soft_hum)

        if q_emb is not None:
            ranked = query_database_geometric(q_emb, q_off, db_tensor, db_metadata)

            if len(ranked) > 0:
                if ranked[0] == target_song: results["Soft"]["top1"] += 1
                if target_song in ranked[:5]: results["Soft"]["top5"] += 1
                if target_song in ranked[:10]: results["Soft"]["top10"] += 1

        # -----------------------------
        # TEST 2: HARD AUGMENTATION
        # -----------------------------
        hard_hum = augment_hard(clean_clip)
        q_emb, q_off = process_sequence_to_embeddings(hard_hum)

        if q_emb is not None:
            ranked = query_database_geometric(q_emb, q_off, db_tensor, db_metadata)

            if len(ranked) > 0:
                if ranked[0] == target_song: results["Hard"]["top1"] += 1
                if target_song in ranked[:5]: results["Hard"]["top5"] += 1
                if target_song in ranked[:10]: results["Hard"]["top10"] += 1

    # ======================================================
    # FINAL REPORT
    # ======================================================
    print("\n" + "="*40)
    print("üìä FINAL EVALUATION RESULTS")
    print("="*40)

    print(f"\nüé§ SOFT AUGMENTATION (Good Singing)")
    print(f"   Top-1 Accuracy:  {results['Soft']['top1']/NUM_TRIALS:.2%}")
    print(f"   Top-5 Accuracy:  {results['Soft']['top5']/NUM_TRIALS:.2%}")
    print(f"   Top-10 Accuracy: {results['Soft']['top10']/NUM_TRIALS:.2%}")

    print(f"\nüî• HARD AUGMENTATION (Humming/Noise)")
    print(f"   Top-1 Accuracy:  {results['Hard']['top1']/NUM_TRIALS:.2%}")
    print(f"   Top-5 Accuracy:  {results['Hard']['top5']/NUM_TRIALS:.2%}")
    print(f"   Top-10 Accuracy: {results['Hard']['top10']/NUM_TRIALS:.2%}")
    print("="*40)

‚è≥ Loading model from /content/pitch_modelV1_old/best.pth...
‚úÖ Model loaded successfully.
üèóÔ∏è Building Database from 400 songs in /content/eval...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:01<00:00, 337.71it/s]


‚úÖ DB Built: 63580 total windows across 400 songs.

üöÄ Starting Evaluation: 150 Trials
   Using 20 neighbors per window with Geometric Scoring.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 150/150 [00:00<00:00, 158.52it/s]


üìä FINAL EVALUATION RESULTS

üé§ SOFT AUGMENTATION (Good Singing)
   Top-1 Accuracy:  45.33%
   Top-5 Accuracy:  54.00%
   Top-10 Accuracy: 56.00%

üî• HARD AUGMENTATION (Humming/Noise)
   Top-1 Accuracy:  11.33%
   Top-5 Accuracy:  18.67%
   Top-10 Accuracy: 25.33%





üé§ SOFT AUGMENTATION (Good Singing)
          Top-1 Accuracy:  45.33%
          Top-5 Accuracy:  54.00%
          Top-10 Accuracy: 56.00%

üî• HARD AUGMENTATION (Humming/Noise)
          Top-1 Accuracy:  11.33%
          Top-5 Accuracy:  18.67%
          Top-10 Accuracy: 25.33%


OLD+SMOOTHING

CREPE is accurate but "jittery." It often produces tiny, instantaneous spikes (octave errors or noise) that aren't actually part of the melody.

The Fix: A Median Filter smoothes out these jagged edges.

The Impact: Your model stops learning "This song has a weird spike at frame 50" (which is an artifact) and starts learning "This song goes up and then down" (the actual melody). This makes the embeddings much cleaner

      Layer,Operation,Output Shape,What it does
      1,"Conv1d(1, 32, k=5)","(Batch, 32, 300)",Finds local slopes (is pitch rising?)
      2,MaxPool1d(2),"(Batch, 32, 150)",Downsamples (ignores minor timing errors)
      3,"Conv1d(32, 64, k=5)","(Batch, 64, 150)","Finds patterns (vibrato, trills)"
      4,MaxPool1d(2),"(Batch, 64, 75)",Downsamples again
      5,"Conv1d(64, 128, k=3)","(Batch, 128, 75)",Finds longer phrases
      6,AdaptiveAvgPool1d(1),"(Batch, 128, 1)",The Bottleneck: Squashes time dimension completely.
      7,"Linear(128, 128)","(Batch, 128)",Refines the features.
      8,"Linear(128, 128)","(Batch, 128)",Final Embedding Vector.

In [None]:
# ==============================================================================
# PITCH MODEL V4: ORIGINAL CNN + SMOOTHING + ADAPTIVE LR
# Trains on: /content/data_unique
# Saves to:  /content/pitch_modelV1_oldplussmoothing
# ==============================================================================
import os
import random
import glob
import numpy as np
import scipy.signal as sg
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau # <--- Adaptive LR
from tqdm import tqdm

# -------------------------
# Hyperparams
# -------------------------
TARGET_LEN = 300   # 3 seconds (approx)
BATCH_SIZE = 32
EPOCHS = 110       # <--- Updated to 110
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# -------------------------
# PATHS
# -------------------------
PITCH_DIR = "/content/data_unique"
CKPT_DIR = "/content/pitch_modelV1_oldplussmoothing" # Final save location
os.makedirs(CKPT_DIR, exist_ok=True)

BEST = f"{CKPT_DIR}/best.pth"
LAST = f"{CKPT_DIR}/last.pth"

# -------------------------
# 1. SMOOTHING HELPER
# -------------------------
def smooth_pitch(pitch):
    """
    Global smoothing: Median filter to remove jagged tracking errors.
    """
    return sg.medfilt(pitch, kernel_size=5).astype(np.float32)

# -------------------------
# 2. AUGMENTATION
# -------------------------
def augment_hum(pitch):
    pitch = pitch.copy()

    # 1. More Noise (Harder to see the line)
    pitch += np.random.normal(0, 0.1, size=len(pitch))

    # 2. Key Shift (Unchanged)
    semitones = np.random.uniform(-5, 5)
    pitch[pitch > 0] += semitones * 0.057

    # 3. Aggressive Time Warp (ALWAYS HAPPENS)
    # Range 0.7 to 1.4 makes it stretch/squash significantly
    rate = np.random.uniform(0.7, 1.4)
    old_idx = np.arange(len(pitch))
    new_idx = np.linspace(0, len(pitch)-1, max(2, int(len(pitch)*rate)))
    pitch = np.interp(new_idx, old_idx, pitch)

    return pitch.astype(np.float32)

# -------------------------
# Helper: Pad/Crop
# -------------------------
def force_length(arr, target_len=TARGET_LEN):
    if arr is None or len(arr) == 0:
        return np.zeros(target_len, dtype=np.float32)

    if len(arr) < target_len:
        pad_amt = target_len - len(arr)
        return np.pad(arr, (0, pad_amt), mode='constant')

    elif len(arr) > target_len:
        start = random.randint(0, len(arr) - target_len)
        return arr[start:start + target_len]

    return arr

# -------------------------
# 3. DATASET
# -------------------------
class PitchDatasetV4(Dataset):
    def __init__(self, pitch_dir, target_len=TARGET_LEN):
        self.files = sorted(glob.glob(os.path.join(pitch_dir, "*.npy")))
        self.target_len = target_len
        print(f"‚úÖ Loaded {len(self.files)} files from {pitch_dir}")

    def _random_crop(self, arr):
        if len(arr) <= self.target_len:
            return arr
        start = random.randint(0, len(arr) - self.target_len)
        return arr[start:start + self.target_len]

    def __getitem__(self, idx):
        # Load
        anchor_path = self.files[idx]
        anchor_full = np.load(anchor_path)

        neg_idx = random.randint(0, len(self.files) - 1)
        while neg_idx == idx:
            neg_idx = random.randint(0, len(self.files) - 1)
        neg_full = np.load(self.files[neg_idx])

        # 1. CROP
        anchor_raw = self._random_crop(anchor_full)
        neg_raw = self._random_crop(neg_full)

        # 2. GLOBAL SMOOTHING (Apply to everything)
        anchor_clean = smooth_pitch(anchor_raw)
        neg_clean = smooth_pitch(neg_raw)

        # 3. AUGMENT
        # Create positive from the smoothed anchor
        pos_hum = augment_hum(anchor_clean)

        # 4. FINALIZE
        anchor_out = force_length(anchor_clean, self.target_len)
        pos_hum_out = force_length(pos_hum, self.target_len)
        neg_out = force_length(neg_clean, self.target_len)

        return (
            torch.from_numpy(anchor_out).unsqueeze(0).float(),
            torch.from_numpy(pos_hum_out).unsqueeze(0).float(),
            torch.from_numpy(neg_out).unsqueeze(0).float(),
        )

    def __len__(self):
        return len(self.files)

# -------------------------
# 4. MODEL (ORIGINAL CNN)
# -------------------------
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()
        # Original 3-layer architecture
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 32, 5, padding=2), nn.BatchNorm1d(32), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(32, 64, 5, padding=2), nn.BatchNorm1d(64), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(64, 128, 3, padding=1), nn.BatchNorm1d(128), nn.ReLU(),
            nn.AdaptiveAvgPool1d(1)
        )
        self.fc = nn.Sequential(
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1) # L2 normalization

# -------------------------
# 5. TRAINING LOOP
# -------------------------
def train_v4():
    print(f"üöÄ Training V4 (Original CNN + Smooth + AdaptiveLR) on: {DEVICE}")
    print(f"üìÇ Data: {PITCH_DIR}")
    print(f"üîß Epochs: {EPOCHS} | Margin: 0.85")

    dataset = PitchDatasetV4(PITCH_DIR, target_len=TARGET_LEN)
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0, pin_memory=True)

    model = PitchSiameseNet().to(DEVICE)
    initial_lr = 0.0001
    optim = torch.optim.Adam(model.parameters(), lr=initial_lr)

    # ADAPTIVE LR SCHEDULER (verbose=True added)
    scheduler = ReduceLROnPlateau(optim, mode='min', factor=0.5, patience=8)

    loss_fn = nn.TripletMarginLoss(margin=0.85, p=2)

    best_loss = float('inf')

    print(f"Starting LR: {initial_lr}")
    print("-" * 40)

    for epoch in range(EPOCHS):
        model.train()
        total_loss = 0.0

        for anchor, pos_hum, neg in tqdm(loader, desc=f"Epoch {epoch+1}/{EPOCHS}"):
            anchor = anchor.to(DEVICE)
            pos_hum = pos_hum.to(DEVICE)
            neg = neg.to(DEVICE)

            optim.zero_grad()

            a_emb = model.forward_one(anchor)
            p_emb = model.forward_one(pos_hum)
            n_emb = model.forward_one(neg)

            loss = loss_fn(a_emb, p_emb, n_emb)

            loss.backward()
            optim.step()

            total_loss += loss.item()

        avg_loss = total_loss / len(loader)
        current_lr = optim.param_groups[0]['lr']
        print(f"\nSummary | Loss: {avg_loss:.4f} | LR: {current_lr:.6f}")

        # Step the scheduler, observing the current average loss
        scheduler.step(avg_loss)

        if avg_loss < best_loss:
            best_loss = avg_loss
            torch.save(model.state_dict(), BEST)
            print(f" ‚≠ê New Best: {best_loss:.4f}")

        torch.save({
            "epoch": epoch + 1,
            "model": model.state_dict(),
            "optimizer": optim.state_dict(),
            "best_loss": best_loss,
        }, LAST)

    print("-" * 40)
    print("Training finished.")
    print(f"Best checkpoint: {BEST}")
    print(f"Latest checkpoint: {LAST}")

# -------------------------
# Run
# -------------------------
if __name__ == "__main__":
    train_v4()

üöÄ Training V4 (Original CNN + Smooth + AdaptiveLR) on: cuda
üìÇ Data: /content/data_unique
üîß Epochs: 110 | Margin: 0.85
‚úÖ Loaded 4051 files from /content/data_unique
Starting LR: 0.0001
----------------------------------------


Epoch 1/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:08<00:00, 15.75it/s]



Summary | Loss: 0.3026 | LR: 0.000100
 ‚≠ê New Best: 0.3026


Epoch 2/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:06<00:00, 18.94it/s]



Summary | Loss: 0.1840 | LR: 0.000100
 ‚≠ê New Best: 0.1840


Epoch 3/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.34it/s]



Summary | Loss: 0.1435 | LR: 0.000100
 ‚≠ê New Best: 0.1435


Epoch 4/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.17it/s]



Summary | Loss: 0.1303 | LR: 0.000100
 ‚≠ê New Best: 0.1303


Epoch 5/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.16it/s]



Summary | Loss: 0.1313 | LR: 0.000100


Epoch 6/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.42it/s]



Summary | Loss: 0.1203 | LR: 0.000100
 ‚≠ê New Best: 0.1203


Epoch 7/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.66it/s]



Summary | Loss: 0.1060 | LR: 0.000100
 ‚≠ê New Best: 0.1060


Epoch 8/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.41it/s]



Summary | Loss: 0.1118 | LR: 0.000100


Epoch 9/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.39it/s]



Summary | Loss: 0.1012 | LR: 0.000100
 ‚≠ê New Best: 0.1012


Epoch 10/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.44it/s]



Summary | Loss: 0.0906 | LR: 0.000100
 ‚≠ê New Best: 0.0906


Epoch 11/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.22it/s]



Summary | Loss: 0.1021 | LR: 0.000100


Epoch 12/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.35it/s]



Summary | Loss: 0.0875 | LR: 0.000100
 ‚≠ê New Best: 0.0875


Epoch 13/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 31.87it/s]



Summary | Loss: 0.0913 | LR: 0.000100


Epoch 14/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.23it/s]



Summary | Loss: 0.0955 | LR: 0.000100


Epoch 15/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.61it/s]



Summary | Loss: 0.0889 | LR: 0.000100


Epoch 16/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.63it/s]



Summary | Loss: 0.0841 | LR: 0.000100
 ‚≠ê New Best: 0.0841


Epoch 17/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.30it/s]



Summary | Loss: 0.0796 | LR: 0.000100
 ‚≠ê New Best: 0.0796


Epoch 18/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.26it/s]



Summary | Loss: 0.0803 | LR: 0.000100


Epoch 19/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.88it/s]



Summary | Loss: 0.0788 | LR: 0.000100
 ‚≠ê New Best: 0.0788


Epoch 20/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.94it/s]



Summary | Loss: 0.0739 | LR: 0.000100
 ‚≠ê New Best: 0.0739


Epoch 21/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.91it/s]



Summary | Loss: 0.0755 | LR: 0.000100


Epoch 22/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.74it/s]



Summary | Loss: 0.0687 | LR: 0.000100
 ‚≠ê New Best: 0.0687


Epoch 23/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.98it/s]



Summary | Loss: 0.0748 | LR: 0.000100


Epoch 24/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.35it/s]



Summary | Loss: 0.0620 | LR: 0.000100
 ‚≠ê New Best: 0.0620


Epoch 25/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 33.10it/s]



Summary | Loss: 0.0729 | LR: 0.000100


Epoch 26/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.95it/s]



Summary | Loss: 0.0831 | LR: 0.000100


Epoch 27/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.45it/s]



Summary | Loss: 0.0771 | LR: 0.000100


Epoch 28/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.67it/s]



Summary | Loss: 0.0706 | LR: 0.000100


Epoch 29/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.41it/s]



Summary | Loss: 0.0647 | LR: 0.000100


Epoch 30/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.64it/s]



Summary | Loss: 0.0726 | LR: 0.000100


Epoch 31/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.46it/s]



Summary | Loss: 0.0647 | LR: 0.000100


Epoch 32/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 31.90it/s]



Summary | Loss: 0.0647 | LR: 0.000100


Epoch 33/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.15it/s]



Summary | Loss: 0.0747 | LR: 0.000100


Epoch 34/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.93it/s]



Summary | Loss: 0.0666 | LR: 0.000050


Epoch 35/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.87it/s]



Summary | Loss: 0.0625 | LR: 0.000050


Epoch 36/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.06it/s]



Summary | Loss: 0.0645 | LR: 0.000050


Epoch 37/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 33.01it/s]



Summary | Loss: 0.0698 | LR: 0.000050


Epoch 38/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.51it/s]



Summary | Loss: 0.0718 | LR: 0.000050


Epoch 39/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.35it/s]



Summary | Loss: 0.0558 | LR: 0.000050
 ‚≠ê New Best: 0.0558


Epoch 40/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.96it/s]



Summary | Loss: 0.0606 | LR: 0.000050


Epoch 41/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.81it/s]



Summary | Loss: 0.0540 | LR: 0.000050
 ‚≠ê New Best: 0.0540


Epoch 42/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.43it/s]



Summary | Loss: 0.0661 | LR: 0.000050


Epoch 43/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.69it/s]



Summary | Loss: 0.0576 | LR: 0.000050


Epoch 44/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.04it/s]



Summary | Loss: 0.0652 | LR: 0.000050


Epoch 45/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.38it/s]



Summary | Loss: 0.0467 | LR: 0.000050
 ‚≠ê New Best: 0.0467


Epoch 46/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.62it/s]



Summary | Loss: 0.0608 | LR: 0.000050


Epoch 47/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.59it/s]



Summary | Loss: 0.0533 | LR: 0.000050


Epoch 48/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.41it/s]



Summary | Loss: 0.0521 | LR: 0.000050


Epoch 49/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.92it/s]



Summary | Loss: 0.0488 | LR: 0.000050


Epoch 50/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.23it/s]



Summary | Loss: 0.0521 | LR: 0.000050


Epoch 51/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.88it/s]



Summary | Loss: 0.0620 | LR: 0.000050


Epoch 52/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.11it/s]



Summary | Loss: 0.0595 | LR: 0.000050


Epoch 53/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.44it/s]



Summary | Loss: 0.0601 | LR: 0.000050


Epoch 54/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.07it/s]



Summary | Loss: 0.0525 | LR: 0.000050


Epoch 55/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.48it/s]



Summary | Loss: 0.0504 | LR: 0.000025


Epoch 56/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.36it/s]



Summary | Loss: 0.0601 | LR: 0.000025


Epoch 57/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.07it/s]



Summary | Loss: 0.0454 | LR: 0.000025
 ‚≠ê New Best: 0.0454


Epoch 58/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.65it/s]



Summary | Loss: 0.0529 | LR: 0.000025


Epoch 59/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.17it/s]



Summary | Loss: 0.0445 | LR: 0.000025
 ‚≠ê New Best: 0.0445


Epoch 60/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.71it/s]



Summary | Loss: 0.0531 | LR: 0.000025


Epoch 61/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.65it/s]



Summary | Loss: 0.0594 | LR: 0.000025


Epoch 62/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.16it/s]



Summary | Loss: 0.0531 | LR: 0.000025


Epoch 63/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.67it/s]



Summary | Loss: 0.0485 | LR: 0.000025


Epoch 64/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.62it/s]



Summary | Loss: 0.0481 | LR: 0.000025


Epoch 65/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.72it/s]



Summary | Loss: 0.0500 | LR: 0.000025


Epoch 66/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.88it/s]



Summary | Loss: 0.0643 | LR: 0.000025


Epoch 67/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.26it/s]



Summary | Loss: 0.0527 | LR: 0.000025


Epoch 68/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 31.99it/s]



Summary | Loss: 0.0526 | LR: 0.000025


Epoch 69/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.91it/s]



Summary | Loss: 0.0516 | LR: 0.000013


Epoch 70/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 33.11it/s]



Summary | Loss: 0.0541 | LR: 0.000013


Epoch 71/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.68it/s]



Summary | Loss: 0.0426 | LR: 0.000013
 ‚≠ê New Best: 0.0426


Epoch 72/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.78it/s]



Summary | Loss: 0.0481 | LR: 0.000013


Epoch 73/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.30it/s]



Summary | Loss: 0.0575 | LR: 0.000013


Epoch 74/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.11it/s]



Summary | Loss: 0.0505 | LR: 0.000013


Epoch 75/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:05<00:00, 23.38it/s]



Summary | Loss: 0.0511 | LR: 0.000013


Epoch 76/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.64it/s]



Summary | Loss: 0.0490 | LR: 0.000013


Epoch 77/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.66it/s]



Summary | Loss: 0.0687 | LR: 0.000013


Epoch 78/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.08it/s]



Summary | Loss: 0.0475 | LR: 0.000013


Epoch 79/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 30.71it/s]



Summary | Loss: 0.0514 | LR: 0.000013


Epoch 80/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.09it/s]



Summary | Loss: 0.0518 | LR: 0.000013


Epoch 81/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.38it/s]



Summary | Loss: 0.0381 | LR: 0.000006
 ‚≠ê New Best: 0.0381


Epoch 82/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.57it/s]



Summary | Loss: 0.0434 | LR: 0.000006


Epoch 83/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.04it/s]



Summary | Loss: 0.0489 | LR: 0.000006


Epoch 84/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.98it/s]



Summary | Loss: 0.0424 | LR: 0.000006


Epoch 85/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.12it/s]



Summary | Loss: 0.0448 | LR: 0.000006


Epoch 86/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.73it/s]



Summary | Loss: 0.0429 | LR: 0.000006


Epoch 87/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.07it/s]



Summary | Loss: 0.0445 | LR: 0.000006


Epoch 88/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.59it/s]



Summary | Loss: 0.0379 | LR: 0.000006
 ‚≠ê New Best: 0.0379


Epoch 89/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 31.77it/s]



Summary | Loss: 0.0385 | LR: 0.000006


Epoch 90/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.65it/s]



Summary | Loss: 0.0446 | LR: 0.000006


Epoch 91/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.29it/s]



Summary | Loss: 0.0506 | LR: 0.000006


Epoch 92/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 31.60it/s]



Summary | Loss: 0.0481 | LR: 0.000006


Epoch 93/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.78it/s]



Summary | Loss: 0.0454 | LR: 0.000006


Epoch 94/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.48it/s]



Summary | Loss: 0.0421 | LR: 0.000006


Epoch 95/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 31.88it/s]



Summary | Loss: 0.0500 | LR: 0.000006


Epoch 96/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.42it/s]



Summary | Loss: 0.0484 | LR: 0.000006


Epoch 97/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.66it/s]



Summary | Loss: 0.0481 | LR: 0.000006


Epoch 98/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.49it/s]



Summary | Loss: 0.0515 | LR: 0.000003


Epoch 99/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 28.19it/s]



Summary | Loss: 0.0470 | LR: 0.000003


Epoch 100/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.47it/s]



Summary | Loss: 0.0511 | LR: 0.000003


Epoch 101/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.83it/s]



Summary | Loss: 0.0434 | LR: 0.000003


Epoch 102/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.94it/s]



Summary | Loss: 0.0421 | LR: 0.000003


Epoch 103/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.94it/s]



Summary | Loss: 0.0467 | LR: 0.000003


Epoch 104/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.79it/s]



Summary | Loss: 0.0436 | LR: 0.000003


Epoch 105/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.93it/s]



Summary | Loss: 0.0434 | LR: 0.000003


Epoch 106/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 33.04it/s]



Summary | Loss: 0.0509 | LR: 0.000003


Epoch 107/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 33.10it/s]



Summary | Loss: 0.0497 | LR: 0.000002


Epoch 108/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:04<00:00, 27.90it/s]



Summary | Loss: 0.0438 | LR: 0.000002


Epoch 109/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.93it/s]



Summary | Loss: 0.0481 | LR: 0.000002


Epoch 110/110: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [00:03<00:00, 32.87it/s]


Summary | Loss: 0.0460 | LR: 0.000002
----------------------------------------
Training finished.
Best checkpoint: /content/pitch_modelV1_oldplussmoothing/best.pth
Latest checkpoint: /content/pitch_modelV1_oldplussmoothing/last.pth





OLD+SMOOTHING EVAL

In [None]:
# ==============================================================================
# EVAL V4: GEOMETRIC SCORING + SMOOTHING (Corrected for /content/eval)
# ==============================================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import scipy.signal as sg
import os
import random
from collections import defaultdict
from tqdm import tqdm

# ======================================================
# CONFIGURATION (UPDATED)
# ======================================================
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# Data Paths
VAL_DIR = "/content/eval"  # <--- CORRECTED: Using the folder with 400 unseen files
MODEL_PATH = "/content/pitch_modelV1_oldplussmoothing/best.pth" # Points to V4 model

# Params
WIN_LEN = 300
HOP_LEN = 150
TOLERANCE = 1.0    # Time bucket tolerance
TOP_K_MATCHES = 20
NUM_TRIALS = 400   # <--- INCREASED: Matching the number of songs for full coverage
SEGMENT_LEN = 1500 # 15 seconds

# ======================================================
# 1. MODEL (V4 Architecture)
# ======================================================
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 32, 5, padding=2), nn.BatchNorm1d(32), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(32, 64, 5, padding=2), nn.BatchNorm1d(64), nn.ReLU(), nn.MaxPool1d(2),
            nn.Conv1d(64, 128, 3, padding=1), nn.BatchNorm1d(128), nn.ReLU(),
            nn.AdaptiveAvgPool1d(1)
        )
        self.fc = nn.Sequential(
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1)

print(f"‚è≥ Loading model from {MODEL_PATH}...")
model = PitchSiameseNet(embed_dim=128).to(DEVICE)
try:
    checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
    if "model" in checkpoint:
        model.load_state_dict(checkpoint["model"])
    else:
        model.load_state_dict(checkpoint)
    print("‚úÖ Model loaded.")
except Exception as e:
    print(f"‚ùå Error loading model: {e}")
    exit()
model.eval()

# ======================================================
# 2. SMOOTHING (Must match Training V4)
# ======================================================
def smooth_pitch(pitch):
    return sg.medfilt(pitch, kernel_size=5).astype(np.float32)

# ======================================================
# 3. AUGMENTATION
# ======================================================
def humify_soft(arr):
    arr = arr.copy()
    arr += np.random.normal(0, 0.02, size=len(arr))
    return arr.astype(np.float32)

def humify_hard(arr):
    arr = arr.copy()
    arr += np.random.normal(0, 0.06, size=len(arr))

    semitones = np.random.uniform(-3, 3)
    arr[arr > 0] += semitones * 0.057

    # --- ADDED: Time Warp Logic for Consistency ---
    if random.random() < 0.8:
        rate = np.random.uniform(0.85, 1.15)
        old_idx = np.arange(len(arr))
        new_len = int(len(arr) * rate)
        new_idx = np.linspace(0, len(arr)-1, new_len)
        arr = np.interp(new_idx, old_idx, arr)

        # Force back to original length (SEGMENT_LEN=1500)
        target = 1500  # Based on SEGMENT_LEN
        if len(arr) < target:
            arr = np.pad(arr, (0, target - len(arr)), mode='constant')
        else:
            start = (len(arr) - target) // 2
            arr = arr[start:start+target]
    # ---------------------------------------------

    return arr.astype(np.float32)

# ======================================================
# 4. EMBEDDING + DB BUILDER (FLAT)
# ======================================================
def process_sequence_to_embeddings(arr):
    """Returns embeddings and time offsets."""
    # 1. APPLY GLOBAL SMOOTHING FIRST
    arr = smooth_pitch(arr)

    windows = []
    offsets = []

    i = 0
    while i + WIN_LEN <= len(arr):
        crop = arr[i : i + WIN_LEN]
        if np.mean(crop > 0) < 0.1: # Skip silence
            i += HOP_LEN
            continue
        windows.append(crop)
        offsets.append(i / 100.0)
        i += HOP_LEN

    if not windows:
        return None, None

    windows_np = np.stack(windows)
    windows_tensor = torch.from_numpy(windows_np).float().unsqueeze(1).to(DEVICE)

    with torch.no_grad():
        embeddings = model.forward_one(windows_tensor)

    return embeddings, offsets

def build_flat_database():
    files = sorted([f for f in os.listdir(VAL_DIR) if f.endswith(".npy")])
    # FILE LIMIT REMOVED: All found files are used for the database

    all_embeds_list = []
    metadata = []

    print(f"üèóÔ∏è Building Geometric DB from {len(files)} songs in {VAL_DIR}...")

    for f_name in tqdm(files):
        path = os.path.join(VAL_DIR, f_name)
        arr = np.load(path)

        # All DB files are smoothed and windowed
        embeds, offsets = process_sequence_to_embeddings(arr)
        if embeds is None: continue

        all_embeds_list.append(embeds)
        song_id = f_name.replace(".npy", "")

        for t in offsets:
            metadata.append((song_id, t))

    full_db_tensor = torch.cat(all_embeds_list, dim=0)
    print(f"‚úÖ DB Built: {full_db_tensor.shape[0]} windows across {len(files)} songs.")
    return full_db_tensor, metadata

# ======================================================
# 5. GEOMETRIC SCORING
# ======================================================
def query_geometric(query_embeds, query_offsets, db_tensor, db_metadata):
    # 1. Distance Matrix
    dists = torch.cdist(query_embeds, db_tensor, p=2)

    # 2. Top-K
    top_vals, top_inds = torch.topk(dists, k=TOP_K_MATCHES, dim=1, largest=False)
    top_vals = top_vals.cpu().numpy()
    top_inds = top_inds.cpu().numpy()

    vote_buckets = defaultdict(float)
    epsilon = 1e-4

    #
    for q_idx, q_time in enumerate(query_offsets):
        for k in range(TOP_K_MATCHES):
            match_idx = top_inds[q_idx, k]
            dist = top_vals[q_idx, k]

            match_song, match_time = db_metadata[match_idx]

            # 3. Geometric Alignment (Projected Start)
            # This aligns all time windows to a single projected start time for the song
            projected_start = match_time - q_time
            bucket = int(round(projected_start / TOLERANCE))

            # Score weighted by inverse distance
            score = 1.0 / (dist + epsilon)
            vote_buckets[(match_song, bucket)] += score

    # 4. Max Score per Song (Max vote over all possible start alignments)
    song_final_scores = defaultdict(float)
    for (song, bucket), score in vote_buckets.items():
        if score > song_final_scores[song]:
            song_final_scores[song] = score

    ranked_songs = sorted(song_final_scores.items(), key=lambda x: x[1], reverse=True)
    return [x[0] for x in ranked_songs]

# ======================================================
# 6. RUN EVAL
# ======================================================
if __name__ == "__main__":
    db_tensor, db_metadata = build_flat_database()
    song_list = list(set([m[0] for m in db_metadata]))

    # Added "top10" to results dictionary
    results = {
        "Soft": {"top1": 0, "top5": 0, "top10": 0},
        "Hard": {"top1": 0, "top5": 0, "top10": 0}
    }

    print(f"\nüöÄ Running {NUM_TRIALS} Trials with Geometric Scoring...")

    # Total successful trials counter
    effective_trials = {"Soft": 0, "Hard": 0}

    for _ in tqdm(range(NUM_TRIALS)):
        target_song = random.choice(song_list)
        full_arr = np.load(os.path.join(VAL_DIR, f"{target_song}.npy"))

        if len(full_arr) < SEGMENT_LEN: continue

        start_idx = np.random.randint(0, len(full_arr) - SEGMENT_LEN)
        clean_clip = full_arr[start_idx : start_idx + SEGMENT_LEN]

        # --- Test Soft ---
        soft_hum_clip = humify_soft(clean_clip)
        q_emb, q_off = process_sequence_to_embeddings(soft_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Soft"] += 1
            if ranked:
                if ranked[0] == target_song: results["Soft"]["top1"] += 1
                if target_song in ranked[:5]: results["Soft"]["top5"] += 1
                if target_song in ranked[:10]: results["Soft"]["top10"] += 1 # <--- Added Top 10 check

        # --- Test Hard ---
        hard_hum_clip = humify_hard(clean_clip)
        q_emb, q_off = process_sequence_to_embeddings(hard_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Hard"] += 1
            if ranked:
                if ranked[0] == target_song: results["Hard"]["top1"] += 1
                if target_song in ranked[:5]: results["Hard"]["top5"] += 1
                if target_song in ranked[:10]: results["Hard"]["top10"] += 1 # <--- Added Top 10 check

    # ======================================================
    # FINAL REPORT
    # ======================================================
    print("\n" + "="*50)
    print("üìä V4 FINAL RESULTS (Using 400 Unseen Files)")
    print("="*50)

    def calc_acc(res, key, total):
        return res[key] / total if total > 0 else 0

    print(f"Total Effective Soft Trials: {effective_trials['Soft']}")
    print(f"Total Effective Hard Trials: {effective_trials['Hard']}")
    print("-" * 50)

    print(f"üé§ Soft Hum:")
    print(f"   Top-1:  {calc_acc(results['Soft'], 'top1', effective_trials['Soft']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Soft'], 'top5', effective_trials['Soft']):.1%}")
    print(f"   Top-10: {calc_acc(results['Soft'], 'top10', effective_trials['Soft']):.1%}") # <--- Added Top 10 Report

    print(f"\nüî• Hard Hum:")
    print(f"   Top-1:  {calc_acc(results['Hard'], 'top1', effective_trials['Hard']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Hard'], 'top5', effective_trials['Hard']):.1%}")
    print(f"   Top-10: {calc_acc(results['Hard'], 'top10', effective_trials['Hard']):.1%}") # <--- Added Top 10 Report
    print("="*50)

‚è≥ Loading model from /content/pitch_modelV1_oldplussmoothing/best.pth...
‚úÖ Model loaded.
üèóÔ∏è Building Geometric DB from 400 songs in /content/eval...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:01<00:00, 299.54it/s]


‚úÖ DB Built: 63580 windows across 400 songs.

üöÄ Running 400 Trials with Geometric Scoring...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:02<00:00, 152.28it/s]


üìä V4 FINAL RESULTS (Using 400 Unseen Files)
Total Effective Soft Trials: 400
Total Effective Hard Trials: 400
--------------------------------------------------
üé§ Soft Hum:
   Top-1:  54.2%
   Top-5:  60.0%
   Top-10: 62.5%

üî• Hard Hum:
   Top-1:  31.0%
   Top-5:  42.8%
   Top-10: 47.5%





--------------------------------------------------
üé§ Soft Hum:
   Top-1:  54.2%
   Top-5:  60.0%
   Top-10: 62.5%

üî• Hard Hum:
   Top-1:  31.0%
   Top-5:  42.8%
   Top-10: 47.5%


OLD+deeper CNN

Code Change: You added a 4th Convolutional Layer and increased the channel depth to 256.

Old (V3/V4): Conv(32) -> Conv(64) -> Conv(128) -> Output

New (V3.6): Conv(32) -> Conv(64) -> Conv(128) -> **Conv(256)** -> Output

Shallow networks (3 layers) learn simple shapes (lines going up/down). Deeper networks (4+ layers) can learn complex patterns of patterns (e.g., a specific vibrato style or a repeating melodic motif)

SMOOTHING REMOVED IN THIS VERSION

In [None]:
# ==============================================================================
# V3.6: DEEPER CNN (4-LAYER) | NO SMOOTHING | ADAPTIVE LR | 100 EPOCHS
# ==============================================================================
import os
import random
import glob
import numpy as np
import scipy.signal as sg
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau
from tqdm import tqdm

# -------------------------
# Hyperparams
# -------------------------
TARGET_LEN = 300   # 3 seconds
BATCH_SIZE = 32
EPOCHS = 100       # 100 Epochs
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# -------------------------
# PATHS
# -------------------------
PITCH_DIR = "/content/data_unique"
CKPT_DIR = "/content/pitch_modelV3_6_nosmoothdeepcnn"
os.makedirs(CKPT_DIR, exist_ok=True)

BEST = f"{CKPT_DIR}/best.pth"
LAST = f"{CKPT_DIR}/last.pth"

# -------------------------
# 1. AUGMENTATION
# -------------------------
def augment_hum(pitch):
    pitch = pitch.copy().astype(np.float32)

    # 1. Noise
    pitch += np.random.normal(0, 0.06, size=len(pitch))

    # 2. Key Shift
    semitones = np.random.uniform(-5, 5)
    pitch[pitch > 0] += semitones * 0.057

    # 3. Time Warp
    if random.random() < 0.7:
        rate = np.random.uniform(0.8, 1.25)
        old_idx = np.arange(len(pitch))
        new_idx = np.linspace(0, len(pitch)-1, max(2, int(len(pitch)*rate)))
        pitch = np.interp(new_idx, old_idx, pitch)

    # 4. Breath Noise
    pitch += np.random.normal(0, 0.04, size=len(pitch))

    return pitch.astype(np.float32)

# -------------------------
# Helper: Pad/Crop
# -------------------------
def force_length(arr, target_len=TARGET_LEN):
    if arr is None or len(arr) == 0:
        return np.zeros(target_len, dtype=np.float32)
    if len(arr) < target_len:
        pad_amt = target_len - len(arr)
        return np.pad(arr, (0, pad_amt), mode='constant')
    elif len(arr) > target_len:
        start = random.randint(0, len(arr) - target_len)
        return arr[start:start + target_len]
    return arr

# -------------------------
# 2. DATASET (NO SMOOTHING)
# -------------------------
class PitchDatasetV3(Dataset):
    def __init__(self, pitch_dir, target_len=TARGET_LEN):
        self.files = sorted(glob.glob(os.path.join(pitch_dir, "*.npy")))
        self.target_len = target_len
        print(f"‚úÖ Loaded {len(self.files)} files")

    def _random_crop(self, arr):
        if len(arr) <= self.target_len:
            return arr
        start = random.randint(0, len(arr) - self.target_len)
        return arr[start:start + self.target_len]

    def __getitem__(self, idx):
        # Load Raw Data
        anchor_path = self.files[idx]
        anchor_full = np.load(anchor_path)

        neg_idx = random.randint(0, len(self.files) - 1)
        while neg_idx == idx:
            neg_idx = random.randint(0, len(self.files) - 1)
        neg_full = np.load(self.files[neg_idx])

        # --- NO SMOOTHING APPLIED HERE ---

        # 1. CROP
        anchor_clean = self._random_crop(anchor_full)
        neg_clean = self._random_crop(neg_full)

        # 2. AUGMENT
        positive_hum = augment_hum(anchor_clean)
        negative_hum = augment_hum(neg_clean)

        # 3. PAD/TRUNCATE
        a_out = force_length(anchor_clean, self.target_len)
        ph_out = force_length(positive_hum, self.target_len)
        n_out = force_length(negative_hum, self.target_len)

        return (
            torch.from_numpy(a_out).unsqueeze(0).float(),
            torch.from_numpy(ph_out).unsqueeze(0).float(),
            torch.from_numpy(n_out).unsqueeze(0).float(),
        )

    def __len__(self):
        return len(self.files)

# -------------------------
# 3. MODEL: DEEPER 4-LAYER CNN
# -------------------------
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        self.cnn = nn.Sequential(
            # Layer 1
            nn.Conv1d(1, 32, kernel_size=5, padding=2),
            nn.BatchNorm1d(32), nn.ReLU(),

            # Layer 2
            nn.Conv1d(32, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2),

            # Layer 3
            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),

            # Layer 4 (The "Deep" Part)
            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),

            nn.AdaptiveAvgPool1d(1)
        )

        self.fc = nn.Sequential(
            nn.Linear(256, 256), # Input 256 matches CNN output
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)   # (B, 256)
        x = self.fc(x)               # (B, 128)
        return F.normalize(x, p=2, dim=1)

# -------------------------
# 4. TRAINING LOOP
# -------------------------
def train_deeper_nosmooth():
    print(f"üöÄ Training V3.6: DEEPER CNN (4-Layer) | NO SMOOTHING | {DEVICE}")
    print(f"üìÇ Data: {PITCH_DIR}")
    print(f"üîß Epochs: {EPOCHS} | Adaptive LR: ON | Margin: 0.85")

    dataset = PitchDatasetV3(PITCH_DIR, target_len=TARGET_LEN)
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0, pin_memory=True)

    model = PitchSiameseNet().to(DEVICE)
    optim = torch.optim.Adam(model.parameters(), lr=0.0001)

    # ADAPTIVE LR SCHEDULER
    scheduler = ReduceLROnPlateau(optim, mode='min', factor=0.5, patience=8)

    # SINGLE ROBUST TRIPLET LOSS
    loss_fn = nn.TripletMarginLoss(margin=0.85, p=2)

    best_loss = float('inf')

    for epoch in range(EPOCHS):
        model.train()
        total_loss = 0.0

        for anchor, pos_hum, neg_hum in loader:
            anchor = anchor.to(DEVICE)
            pos_hum = pos_hum.to(DEVICE)
            neg_hum = neg_hum.to(DEVICE)

            optim.zero_grad()

            a = model.forward_one(anchor)
            ph = model.forward_one(pos_hum)
            n = model.forward_one(neg_hum)

            loss = loss_fn(a, ph, n)

            loss.backward()
            optim.step()

            total_loss += loss.item()

        avg_loss = total_loss / len(loader)
        current_lr = optim.param_groups[0]['lr']
        print(f"Epoch {epoch+1}/{EPOCHS} | Avg Loss: {avg_loss:.4f} | LR: {current_lr:.6f}")

        # Step Scheduler
        scheduler.step(avg_loss)

        if avg_loss < best_loss:
            best_loss = avg_loss
            torch.save(model.state_dict(), BEST)
            print(f" ‚≠ê New Best: {best_loss:.4f}")

        torch.save({
            "epoch": epoch + 1,
            "model": model.state_dict(),
            "optimizer": optim.state_dict(),
            "best_loss": best_loss,
        }, LAST)

    print("‚úÖ Training Complete.")
    print(f"Best Model Saved to: {BEST}")

if __name__ == "__main__":
    train_deeper_nosmooth()

üöÄ Training V3.6: DEEPER CNN (4-Layer) | NO SMOOTHING | cuda
üìÇ Data: /content/data_unique
üîß Epochs: 100 | Adaptive LR: ON | Margin: 0.85
‚úÖ Loaded 4051 files
Epoch 1/100 | Avg Loss: 0.2489 | LR: 0.000100
 ‚≠ê New Best: 0.2489
Epoch 2/100 | Avg Loss: 0.1496 | LR: 0.000100
 ‚≠ê New Best: 0.1496
Epoch 3/100 | Avg Loss: 0.1125 | LR: 0.000100
 ‚≠ê New Best: 0.1125
Epoch 4/100 | Avg Loss: 0.1030 | LR: 0.000100
 ‚≠ê New Best: 0.1030
Epoch 5/100 | Avg Loss: 0.0904 | LR: 0.000100
 ‚≠ê New Best: 0.0904
Epoch 6/100 | Avg Loss: 0.1015 | LR: 0.000100
Epoch 7/100 | Avg Loss: 0.0781 | LR: 0.000100
 ‚≠ê New Best: 0.0781
Epoch 8/100 | Avg Loss: 0.0794 | LR: 0.000100
Epoch 9/100 | Avg Loss: 0.0793 | LR: 0.000100
Epoch 10/100 | Avg Loss: 0.0601 | LR: 0.000100
 ‚≠ê New Best: 0.0601
Epoch 11/100 | Avg Loss: 0.0747 | LR: 0.000100
Epoch 12/100 | Avg Loss: 0.0709 | LR: 0.000100
Epoch 13/100 | Avg Loss: 0.0586 | LR: 0.000100
 ‚≠ê New Best: 0.0586
Epoch 14/100 | Avg Loss: 0.0576 | LR: 0.000100
 ‚≠ê New

EVAL deepCNN + no SMOOTHING


In [None]:
# ==============================================================================
# EVAL V3.6: DEEP CNN (NO SMOOTHING) + GEOMETRIC SCORING
# ==============================================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import scipy.signal as sg
import os
import random
from collections import defaultdict
from tqdm import tqdm

# ======================================================
# CONFIGURATION
# ======================================================
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# Data Paths
VAL_DIR = "/content/eval"  # 400 unseen files
# Pointing to the specific checkpoint you requested
MODEL_PATH = "/content/pitch_modelV3_6_nosmoothdeepcnn/best.pth"

# Params
WIN_LEN = 300
HOP_LEN = 150
TOLERANCE = 1.0    # Time bucket tolerance
TOP_K_MATCHES = 20
NUM_TRIALS = 400   # Full coverage
SEGMENT_LEN = 1500 # 15 seconds

# ======================================================
# 1. MODEL ARCHITECTURE (V3.6 DEEP CNN)
# ======================================================
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        self.cnn = nn.Sequential(
            # Layer 1
            nn.Conv1d(1, 32, kernel_size=5, padding=2),
            nn.BatchNorm1d(32), nn.ReLU(),

            # Layer 2
            nn.Conv1d(32, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2),

            # Layer 3
            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),

            # Layer 4 (The "Deep" Part)
            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),

            nn.AdaptiveAvgPool1d(1)
        )

        self.fc = nn.Sequential(
            nn.Linear(256, 256), # Input matches CNN output (256)
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1)

print(f"‚è≥ Loading V3.6 Model from {MODEL_PATH}...")
model = PitchSiameseNet(embed_dim=128).to(DEVICE)
try:
    checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
    if "model" in checkpoint:
        model.load_state_dict(checkpoint["model"])
    else:
        model.load_state_dict(checkpoint)
    print("‚úÖ Model loaded successfully.")
except Exception as e:
    print(f"‚ùå Error loading model: {e}")
    exit()
model.eval()

# ======================================================
# 2. AUGMENTATION (Proper Soft & Hard with Time Warp)
# ======================================================
def humify_soft(arr):
    """Soft Hum: Light Noise only."""
    arr = arr.copy()
    arr += np.random.normal(0, 0.02, size=len(arr))
    return arr.astype(np.float32)

def humify_hard(arr):
    """Hard Hum: Noise + Key Shift + Time Warping"""
    arr = arr.copy()

    # 1. Jitter
    arr += np.random.normal(0, 0.06, size=len(arr))

    # 2. Key Shift
    semitones = np.random.uniform(-3, 3)
    arr[arr > 0] += semitones * 0.057

    # 3. TIME WARP (Crucial for robustness test)
    target_len = SEGMENT_LEN # 1500
    if random.random() < 0.8: # 80% chance
        rate = np.random.uniform(0.85, 1.15)
        old_idx = np.arange(len(arr))
        new_len = int(len(arr) * rate)
        new_idx = np.linspace(0, len(arr)-1, new_len)
        arr = np.interp(new_idx, old_idx, arr)

        # Force back to target length
        if len(arr) < target_len:
            arr = np.pad(arr, (0, target_len - len(arr)), mode='constant')
        else:
            start = (len(arr) - target_len) // 2
            arr = arr[start:start+target_len]

    return arr.astype(np.float32)


# ======================================================
# 3. EMBEDDING + DB BUILDER (NO SMOOTHING)
# ======================================================
def process_sequence_to_embeddings(arr):
    """
    Returns embeddings and time offsets.
    NO SMOOTHING applied here (Matches training V3.6 No-Smooth).
    Uses mini-batches to prevent OOM.
    """

    windows = []
    offsets = []

    i = 0
    while i + WIN_LEN <= len(arr):
        crop = arr[i : i + WIN_LEN]
        if np.mean(crop > 0) < 0.1: # Skip silence
            i += HOP_LEN
            continue
        windows.append(crop)
        offsets.append(i / 100.0)
        i += HOP_LEN

    if not windows:
        return None, None

    windows_np = np.stack(windows)
    windows_tensor = torch.from_numpy(windows_np).float().unsqueeze(1).to(DEVICE)

    # --- BATCH PROCESSING ---
    batch_size = 64
    embeddings_list = []

    with torch.no_grad():
        for k in range(0, len(windows_tensor), batch_size):
            batch = windows_tensor[k : k + batch_size]
            emb_batch = model.forward_one(batch)
            embeddings_list.append(emb_batch)

    embeddings = torch.cat(embeddings_list, dim=0)

    return embeddings, offsets

def build_flat_database():
    files = sorted([f for f in os.listdir(VAL_DIR) if f.endswith(".npy")])

    all_embeds_list = []
    metadata = []

    print(f"üèóÔ∏è Building Geometric DB (RAW PITCH) from {len(files)} songs in {VAL_DIR}...")

    for f_name in tqdm(files):
        path = os.path.join(VAL_DIR, f_name)
        arr = np.load(path)

        # Raw pitch goes directly into embedding (No Smoothing)
        embeds, offsets = process_sequence_to_embeddings(arr)
        if embeds is None: continue

        all_embeds_list.append(embeds)
        song_id = f_name.replace(".npy", "")

        for t in offsets:
            metadata.append((song_id, t))

    full_db_tensor = torch.cat(all_embeds_list, dim=0)
    print(f"‚úÖ DB Built: {full_db_tensor.shape[0]} windows across {len(files)} songs.")
    return full_db_tensor, metadata

# ======================================================
# 4. GEOMETRIC SCORING
# ======================================================
def query_geometric(query_embeds, query_offsets, db_tensor, db_metadata):
    # 1. Distance Matrix
    dists = torch.cdist(query_embeds, db_tensor, p=2)

    # 2. Top-K
    top_vals, top_inds = torch.topk(dists, k=TOP_K_MATCHES, dim=1, largest=False)
    top_vals = top_vals.cpu().numpy()
    top_inds = top_inds.cpu().numpy()

    vote_buckets = defaultdict(float)
    epsilon = 1e-4

    for q_idx, q_time in enumerate(query_offsets):
        for k in range(TOP_K_MATCHES):
            match_idx = top_inds[q_idx, k]
            dist = top_vals[q_idx, k]

            match_song, match_time = db_metadata[match_idx]

            # 3. Geometric Alignment
            projected_start = match_time - q_time
            bucket = int(round(projected_start / TOLERANCE))

            # Score
            score = 1.0 / (dist + epsilon)
            vote_buckets[(match_song, bucket)] += score

    # 4. Max Score per Song
    song_final_scores = defaultdict(float)
    for (song, bucket), score in vote_buckets.items():
        if score > song_final_scores[song]:
            song_final_scores[song] = score

    ranked_songs = sorted(song_final_scores.items(), key=lambda x: x[1], reverse=True)
    return [x[0] for x in ranked_songs]

# ======================================================
# 5. RUN EVAL
# ======================================================
if __name__ == "__main__":
    db_tensor, db_metadata = build_flat_database()
    song_list = list(set([m[0] for m in db_metadata]))

    results = {
        "Soft": {"top1": 0, "top5": 0, "top10": 0},
        "Hard": {"top1": 0, "top5": 0, "top10": 0}
    }

    print(f"\nüöÄ Running {NUM_TRIALS} Trials with Geometric Scoring...")

    effective_trials = {"Soft": 0, "Hard": 0}

    for _ in tqdm(range(NUM_TRIALS)):
        target_song = random.choice(song_list)
        full_arr = np.load(os.path.join(VAL_DIR, f"{target_song}.npy"))

        if len(full_arr) < SEGMENT_LEN: continue

        start_idx = np.random.randint(0, len(full_arr) - SEGMENT_LEN)

        # BASE: Raw, unsmoothed clip (Matches V3.6 No-Smooth training)
        raw_clip = full_arr[start_idx : start_idx + SEGMENT_LEN]

        # --- Test Soft ---
        soft_hum_clip = humify_soft(raw_clip)
        q_emb, q_off = process_sequence_to_embeddings(soft_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Soft"] += 1
            if ranked:
                if ranked[0] == target_song: results["Soft"]["top1"] += 1
                if target_song in ranked[:5]: results["Soft"]["top5"] += 1
                if target_song in ranked[:10]: results["Soft"]["top10"] += 1

        # --- Test Hard ---
        hard_hum_clip = humify_hard(raw_clip)
        q_emb, q_off = process_sequence_to_embeddings(hard_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Hard"] += 1
            if ranked:
                if ranked[0] == target_song: results["Hard"]["top1"] += 1
                if target_song in ranked[:5]: results["Hard"]["top5"] += 1
                if target_song in ranked[:10]: results["Hard"]["top10"] += 1

    # ======================================================
    # FINAL REPORT
    # ======================================================
    def calc_acc(res, key, total):
        return res[key] / total if total > 0 else 0

    print("\n" + "="*50)
    print("üìä V3.6 DEEP CNN (NO SMOOTHING) RESULTS (400 Files)")
    print("="*50)
    print(f"Total Effective Soft Trials: {effective_trials['Soft']}")
    print(f"Total Effective Hard Trials: {effective_trials['Hard']}")
    print("-" * 50)

    print(f"üé§ Soft Hum:")
    print(f"   Top-1:  {calc_acc(results['Soft'], 'top1', effective_trials['Soft']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Soft'], 'top5', effective_trials['Soft']):.1%}")
    print(f"   Top-10: {calc_acc(results['Soft'], 'top10', effective_trials['Soft']):.1%}")

    print(f"\nüî• Hard Hum:")
    print(f"   Top-1:  {calc_acc(results['Hard'], 'top1', effective_trials['Hard']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Hard'], 'top5', effective_trials['Hard']):.1%}")
    print(f"   Top-10: {calc_acc(results['Hard'], 'top10', effective_trials['Hard']):.1%}")
    print("="*50)

‚è≥ Loading V3.6 Model from /content/pitch_modelV3_6_nosmoothdeepcnn/best.pth...
‚úÖ Model loaded successfully.
üèóÔ∏è Building Geometric DB (RAW PITCH) from 400 songs in /content/eval...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:02<00:00, 156.46it/s]


‚úÖ DB Built: 63580 windows across 400 songs.

üöÄ Running 400 Trials with Geometric Scoring...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:02<00:00, 146.93it/s]


üìä V3.6 DEEP CNN (NO SMOOTHING) RESULTS (400 Files)
Total Effective Soft Trials: 400
Total Effective Hard Trials: 400
--------------------------------------------------
üé§ Soft Hum:
   Top-1:  63.2%
   Top-5:  67.2%
   Top-10: 68.8%

üî• Hard Hum:
   Top-1:  49.2%
   Top-5:  58.5%
   Top-10: 61.8%





üé§ Soft Hum:
   Top-1:  63.2%
   Top-5:  67.2%
   Top-10: 68.8%

üî• Hard Hum:
   Top-1:  49.2%
   Top-5:  58.5%
   Top-10: 61.8%

OLD+deeper CNN+smoothing

In [None]:
# ==============================================================================
# V3.6: DEEPER CNN + GLOBAL SMOOTHING + MARGIN 0.85 + ADAPTIVE LR
# ==============================================================================
import os
import random
import glob
import numpy as np
import scipy.signal as sg
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau
from tqdm import tqdm

# -------------------------
# Hyperparams
# -------------------------
TARGET_LEN = 300   # 3 seconds (approx)
BATCH_SIZE = 32
EPOCHS = 120       # <--- 120 Epochs
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# -------------------------
# PATHS
# -------------------------
PITCH_DIR = "/content/data_unique"
CKPT_DIR = "/content/pitch_modelV1_deepCNNplussmoothing"
os.makedirs(CKPT_DIR, exist_ok=True)

BEST = f"{CKPT_DIR}/best.pth"
LAST = f"{CKPT_DIR}/last.pth"

# -------------------------
# 1. GLOBAL SMOOTHING HELPER
# -------------------------
def smooth_pitch(pitch):
    """
    Applies Median Filter (k=5) to remove jagged tracking errors.
    """
    return sg.medfilt(pitch, kernel_size=5).astype(np.float32)

# -------------------------
# 2. AUGMENTATION
# -------------------------
def augment_hum(pitch):
    pitch = pitch.copy().astype(np.float32)

    # 1. Noise
    pitch += np.random.normal(0, 0.06, size=len(pitch))

    # 2. Key Shift
    semitones = np.random.uniform(-5, 5)
    pitch[pitch > 0] += semitones * 0.057

    # 3. Time Warp
    if random.random() < 0.7:
        rate = np.random.uniform(0.8, 1.25)
        old_idx = np.arange(len(pitch))
        new_idx = np.linspace(0, len(pitch)-1, max(2, int(len(pitch)*rate)))
        pitch = np.interp(new_idx, old_idx, pitch)

    # 4. Breath Noise
    pitch += np.random.normal(0, 0.04, size=len(pitch))

    return pitch.astype(np.float32)

# -------------------------
# Helper: Pad/Crop
# -------------------------
def force_length(arr, target_len=TARGET_LEN):
    if arr is None or len(arr) == 0:
        return np.zeros(target_len, dtype=np.float32)

    if len(arr) < target_len:
        pad_amt = target_len - len(arr)
        return np.pad(arr, (0, pad_amt), mode='constant')

    elif len(arr) > target_len:
        start = random.randint(0, len(arr) - target_len)
        return arr[start:start + target_len]

    return arr

# -------------------------
# 3. DATASET (WITH GLOBAL SMOOTHING)
# -------------------------
class PitchDatasetV3(Dataset):
    def __init__(self, pitch_dir, target_len=TARGET_LEN):
        self.files = sorted(glob.glob(os.path.join(pitch_dir, "*.npy")))
        self.target_len = target_len
        print(f"‚úÖ Loaded {len(self.files)} files from {pitch_dir}")

    def _random_crop(self, arr):
        if len(arr) <= self.target_len:
            return arr
        start = random.randint(0, len(arr) - self.target_len)
        return arr[start:start + self.target_len]

    def __getitem__(self, idx):
        # Load Raw
        anchor_path = self.files[idx]
        anchor_full = np.load(anchor_path)

        neg_idx = random.randint(0, len(self.files) - 1)
        while neg_idx == idx:
            neg_idx = random.randint(0, len(self.files) - 1)
        neg_full = np.load(self.files[neg_idx])

        # --- APPLY GLOBAL SMOOTHING ---
        anchor_full = smooth_pitch(anchor_full)
        neg_full = smooth_pitch(neg_full)
        # ------------------------------

        # 1. CROP (Now cropping from Cleaned data)
        anchor_clean = self._random_crop(anchor_full)
        neg_clean = self._random_crop(neg_full)

        # 2. AUGMENT
        # Create positive hum from the smoothed positive crop
        positive_hum = augment_hum(anchor_clean)

        # Apply augmentation to negative as well (Harder Negative)
        negative_hum = augment_hum(neg_clean)

        # 3. FINALIZE
        anchor_out = force_length(anchor_clean, self.target_len)
        pos_hum_out = force_length(positive_hum, self.target_len)
        neg_out = force_length(negative_hum, self.target_len)

        return (
            torch.from_numpy(anchor_out).unsqueeze(0).float(),
            torch.from_numpy(pos_hum_out).unsqueeze(0).float(),
            torch.from_numpy(neg_out).unsqueeze(0).float(),
        )

    def __len__(self):
        return len(self.files)

# -------------------------
# 4. MODEL: DEEPER 4-LAYER CNN
# -------------------------
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        self.cnn = nn.Sequential(
            # Layer 1
            nn.Conv1d(1, 32, kernel_size=5, padding=2),
            nn.BatchNorm1d(32), nn.ReLU(),

            # Layer 2
            nn.Conv1d(32, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2),

            # Layer 3
            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),

            # Layer 4 (The "Deeper" part)
            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),

            nn.AdaptiveAvgPool1d(1)
        )

        self.fc = nn.Sequential(
            nn.Linear(256, 256), # Input 256 matches CNN output
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)   # (B, 256)
        x = self.fc(x)               # (B, 128)
        return F.normalize(x, p=2, dim=1)

# -------------------------
# 5. TRAINING LOOP
# -------------------------
def train_deeper_smooth():
    print(f"üöÄ Training Deeper CNN (4-Layer) on: {DEVICE}")
    print(f"üìÇ Data: {PITCH_DIR}")
    print(f"üîß Margin: 0.85 | Smoothing: ON | Adaptive LR: ON")

    dataset = PitchDatasetV3(PITCH_DIR, target_len=TARGET_LEN)
    # Using num_workers=0 for safety in Colab/Notebooks
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0, pin_memory=True)

    model = PitchSiameseNet().to(DEVICE)
    initial_lr = 0.0001
    optim = torch.optim.Adam(model.parameters(), lr=initial_lr)

    # ADAPTIVE LR SCHEDULER (Reduced Factor=0.5, Patience=8)
    # verbose=True removed for compatibility with newer PyTorch
    scheduler = ReduceLROnPlateau(optim, mode='min', factor=0.5, patience=8)

    # FIXED MARGIN 0.85
    loss_fn = nn.TripletMarginLoss(margin=0.85, p=2)

    best_loss = float('inf')

    print(f"Starting LR: {initial_lr}")

    for epoch in range(EPOCHS):
        model.train()
        total_loss = 0.0

        # pbar = tqdm(loader, desc=f"Epoch {epoch+1}/{EPOCHS}", leave=False)

        for anchor, pos_hum, neg_hum in loader:
            anchor = anchor.to(DEVICE)
            pos_hum = pos_hum.to(DEVICE)
            neg_hum = neg_hum.to(DEVICE)

            optim.zero_grad()

            a = model.forward_one(anchor)
            ph = model.forward_one(pos_hum)
            n = model.forward_one(neg_hum)

            # Single Loss: Anchor vs Hummed Positive vs Hummed Negative
            loss = loss_fn(a, ph, n)

            loss.backward()
            optim.step()

            total_loss += loss.item()
            # pbar.set_postfix({"loss": f"{loss.item():.4f}"})

        avg_loss = total_loss / len(loader)
        current_lr = optim.param_groups[0]['lr']
        print(f"Epoch {epoch+1}/{EPOCHS} | Avg Loss: {avg_loss:.4f} | LR: {current_lr:.6f}")

        # Step the scheduler
        scheduler.step(avg_loss)

        if avg_loss < best_loss:
            best_loss = avg_loss
            torch.save(model.state_dict(), BEST)
            print(f" ‚≠ê New Best: {best_loss:.4f}")

        torch.save({
            "epoch": epoch + 1,
            "model": model.state_dict(),
            "optimizer": optim.state_dict(),
            "best_loss": best_loss,
        }, LAST)

    print("‚úÖ Training Complete.")
    print(f"Best Model Saved to: {BEST}")

if __name__ == "__main__":
    train_deeper_smooth()

üöÄ Training Deeper CNN (4-Layer) on: cuda
üìÇ Data: /content/data_unique
üîß Margin: 0.85 | Smoothing: ON | Adaptive LR: ON
‚úÖ Loaded 4051 files from /content/data_unique
Starting LR: 0.0001
Epoch 1/120 | Avg Loss: 0.2640 | LR: 0.000100
 ‚≠ê New Best: 0.2640
Epoch 2/120 | Avg Loss: 0.1497 | LR: 0.000100
 ‚≠ê New Best: 0.1497
Epoch 3/120 | Avg Loss: 0.1091 | LR: 0.000100
 ‚≠ê New Best: 0.1091
Epoch 4/120 | Avg Loss: 0.1038 | LR: 0.000100
 ‚≠ê New Best: 0.1038
Epoch 5/120 | Avg Loss: 0.1022 | LR: 0.000100
 ‚≠ê New Best: 0.1022
Epoch 6/120 | Avg Loss: 0.0912 | LR: 0.000100
 ‚≠ê New Best: 0.0912
Epoch 7/120 | Avg Loss: 0.0747 | LR: 0.000100
 ‚≠ê New Best: 0.0747
Epoch 8/120 | Avg Loss: 0.0729 | LR: 0.000100
 ‚≠ê New Best: 0.0729
Epoch 9/120 | Avg Loss: 0.0822 | LR: 0.000100
Epoch 10/120 | Avg Loss: 0.0824 | LR: 0.000100
Epoch 11/120 | Avg Loss: 0.0705 | LR: 0.000100
 ‚≠ê New Best: 0.0705
Epoch 12/120 | Avg Loss: 0.0770 | LR: 0.000100
Epoch 13/120 | Avg Loss: 0.0667 | LR: 0.000100
 ‚≠ê

EVAL DEEP_CNN+SMOOTHING

It combines the Global Smoothing from V4 (to clean up the input) with the Deeper Architecture from V3.6 (to learn complex patterns).

      Layer,Input Channels,Output Channels,What it learns
      1. Conv1d,1 (Pitch),32,Slopes: Is the pitch going up or down?
      2. Conv1d,32,64,Curves: Simple vibrato or note transitions.
      3. Conv1d,64,128,"Motifs: Short musical ideas (e.g., a specific riff)."
      4. Conv1d,128,256,Phrasing: Long-term melodic structure.
      FC Layer,256,256,Feature Mixing: Combining these patterns.
      Output,256,128,Fingerprint: The final embedding.

In [None]:
# ==============================================================================
# EVAL V3.6/V4: DEEP CNN + SMOOTHING + GEOMETRIC SCORING (HARD AUG FIXED)
# ==============================================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import scipy.signal as sg
import os
import random
from collections import defaultdict
from tqdm import tqdm

# ======================================================
# CONFIGURATION
# ======================================================
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# Data Paths
VAL_DIR = "/content/eval"  # 400 unseen files
# Pointing to the Deep CNN model checkpoint (V3.6/V4)
MODEL_PATH = "/content/pitch_modelV1_deepCNNplussmoothing/best.pth"

# Params
WIN_LEN = 300
HOP_LEN = 150
TOLERANCE = 1.0    # Time bucket tolerance
TOP_K_MATCHES = 20
NUM_TRIALS = 400   # Full coverage
SEGMENT_LEN = 1500 # 15 seconds

# ======================================================
# 1. MODEL ARCHITECTURE (DEEP CNN V3.6)
# ======================================================
class PitchSiameseNet(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        self.cnn = nn.Sequential(
            # Layer 1
            nn.Conv1d(1, 32, kernel_size=5, padding=2),
            nn.BatchNorm1d(32), nn.ReLU(),

            # Layer 2
            nn.Conv1d(32, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2),

            # Layer 3
            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),

            # Layer 4 (The "Deep" Part)
            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),

            nn.AdaptiveAvgPool1d(1)
        )

        self.fc = nn.Sequential(
            nn.Linear(256, 256), # Input matches CNN output (256)
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        x = self.cnn(x).squeeze(-1)
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1)

print(f"‚è≥ Loading Deep CNN Model from {MODEL_PATH}...")
model = PitchSiameseNet(embed_dim=128).to(DEVICE)
try:
    checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
    if "model" in checkpoint:
        model.load_state_dict(checkpoint["model"])
    else:
        model.load_state_dict(checkpoint)
    print("‚úÖ Model loaded successfully.")
except Exception as e:
    print(f"‚ùå Error loading model: {e}")
    exit()
model.eval()

# ======================================================
# 2. SMOOTHING (MATCHES TRAINING)
# ======================================================
def smooth_pitch(pitch):
    return sg.medfilt(pitch, kernel_size=5).astype(np.float32)

# ======================================================
# 3. AUGMENTATION (FIXED)
# ======================================================
def humify_soft(arr):
    arr = arr.copy()
    arr += np.random.normal(0, 0.02, size=len(arr))
    return arr.astype(np.float32)

def humify_hard(arr):
    """Hard Hum Augmentation (NOW INCLUDES TIME WARPING)"""
    arr = arr.copy()

    # 1. Jitter
    arr += np.random.normal(0, 0.06, size=len(arr))

    # 2. Key Shift
    semitones = np.random.uniform(-3, 3)
    arr[arr > 0] += semitones * 0.057

    # 3. TIME WARP (FIXED: Added back logic to match Training V4)
    target_len = SEGMENT_LEN # 1500
    if random.random() < 0.8: # 80% chance of warping
        rate = np.random.uniform(0.85, 1.15)
        old_idx = np.arange(len(arr))
        new_len = int(len(arr) * rate)
        new_idx = np.linspace(0, len(arr)-1, new_len)
        arr = np.interp(new_idx, old_idx, arr)

        # Force back to target length (Cropping or Padding)
        if len(arr) < target_len:
            arr = np.pad(arr, (0, target_len - len(arr)), mode='constant')
        else:
            start = (len(arr) - target_len) // 2
            arr = arr[start:start+target_len]

    return arr.astype(np.float32)

# ======================================================
# 4. EMBEDDING + DB BUILDER
# ======================================================
def process_sequence_to_embeddings(arr):
    """
    Returns embeddings and time offsets.
    NOTE: Processes in mini-batches to prevent CUDA OutOfMemoryError.
    """
    # 1. APPLY GLOBAL SMOOTHING FIRST
    arr = smooth_pitch(arr)

    windows = []
    offsets = []

    i = 0
    while i + WIN_LEN <= len(arr):
        crop = arr[i : i + WIN_LEN]
        if np.mean(crop > 0) < 0.1: # Skip silence
            i += HOP_LEN
            continue
        windows.append(crop)
        offsets.append(i / 100.0)
        i += HOP_LEN

    if not windows:
        return None, None

    # Stack all windows into one large tensor
    windows_np = np.stack(windows)
    windows_tensor = torch.from_numpy(windows_np).float().unsqueeze(1).to(DEVICE)

    # --- BATCH PROCESSING FIX (prevents OOM) ---
    batch_size = 64  # Process 64 windows at a time
    embeddings_list = []

    with torch.no_grad():
        for k in range(0, len(windows_tensor), batch_size):
            # Slice the batch
            batch = windows_tensor[k : k + batch_size]
            # Forward pass
            emb_batch = model.forward_one(batch)
            # Store result
            embeddings_list.append(emb_batch)

    # Concatenate all batch results back into one tensor
    embeddings = torch.cat(embeddings_list, dim=0)

    return embeddings, offsets

def build_flat_database():
    files = sorted([f for f in os.listdir(VAL_DIR) if f.endswith(".npy")])

    all_embeds_list = []
    metadata = []

    print(f"üèóÔ∏è Building Geometric DB from {len(files)} songs in {VAL_DIR}...")

    for f_name in tqdm(files):
        path = os.path.join(VAL_DIR, f_name)
        arr = np.load(path)

        # Process full song (Smoothing happens inside function)
        embeds, offsets = process_sequence_to_embeddings(arr)
        if embeds is None: continue

        all_embeds_list.append(embeds)
        song_id = f_name.replace(".npy", "")

        for t in offsets:
            metadata.append((song_id, t))

    full_db_tensor = torch.cat(all_embeds_list, dim=0)
    print(f"‚úÖ DB Built: {full_db_tensor.shape[0]} windows across {len(files)} songs.")
    return full_db_tensor, metadata

# ======================================================
# 5. GEOMETRIC SCORING
# ======================================================
def query_geometric(query_embeds, query_offsets, db_tensor, db_metadata):
    # 1. Distance Matrix
    dists = torch.cdist(query_embeds, db_tensor, p=2)

    # 2. Top-K
    top_vals, top_inds = torch.topk(dists, k=TOP_K_MATCHES, dim=1, largest=False)
    top_vals = top_vals.cpu().numpy()
    top_inds = top_inds.cpu().numpy()

    vote_buckets = defaultdict(float)
    epsilon = 1e-4

    for q_idx, q_time in enumerate(query_offsets):
        for k in range(TOP_K_MATCHES):
            match_idx = top_inds[q_idx, k]
            dist = top_vals[q_idx, k]

            match_song, match_time = db_metadata[match_idx]

            # 3. Geometric Alignment (Projected Start)
            projected_start = match_time - q_time
            bucket = int(round(projected_start / TOLERANCE))

            # Score
            score = 1.0 / (dist + epsilon)
            vote_buckets[(match_song, bucket)] += score

    # 4. Max Score per Song
    song_final_scores = defaultdict(float)
    for (song, bucket), score in vote_buckets.items():
        if score > song_final_scores[song]:
            song_final_scores[song] = score

    ranked_songs = sorted(song_final_scores.items(), key=lambda x: x[1], reverse=True)
    return [x[0] for x in ranked_songs]

# ======================================================
# 6. RUN EVAL
# ======================================================
if __name__ == "__main__":
    db_tensor, db_metadata = build_flat_database()
    song_list = list(set([m[0] for m in db_metadata]))

    results = {
        "Soft": {"top1": 0, "top5": 0, "top10": 0},
        "Hard": {"top1": 0, "top5": 0, "top10": 0}
    }

    print(f"\nüöÄ Running {NUM_TRIALS} Trials with Geometric Scoring...")

    effective_trials = {"Soft": 0, "Hard": 0}

    for _ in tqdm(range(NUM_TRIALS)):
        target_song = random.choice(song_list)
        full_arr = np.load(os.path.join(VAL_DIR, f"{target_song}.npy"))

        if len(full_arr) < SEGMENT_LEN: continue

        start_idx = np.random.randint(0, len(full_arr) - SEGMENT_LEN)
        clean_clip = full_arr[start_idx : start_idx + SEGMENT_LEN]

        # --- Test Soft ---
        soft_hum_clip = humify_soft(clean_clip)
        q_emb, q_off = process_sequence_to_embeddings(soft_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Soft"] += 1
            if ranked:
                if ranked[0] == target_song: results["Soft"]["top1"] += 1
                if target_song in ranked[:5]: results["Soft"]["top5"] += 1
                if target_song in ranked[:10]: results["Soft"]["top10"] += 1

        # --- Test Hard ---
        hard_hum_clip = humify_hard(clean_clip)
        q_emb, q_off = process_sequence_to_embeddings(hard_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Hard"] += 1
            if ranked:
                if ranked[0] == target_song: results["Hard"]["top1"] += 1
                if target_song in ranked[:5]: results["Hard"]["top5"] += 1
                if target_song in ranked[:10]: results["Hard"]["top10"] += 1

    # ======================================================
    # FINAL REPORT
    # ======================================================
    def calc_acc(res, key, total):
        return res[key] / total if total > 0 else 0

    print("\n" + "="*50)
    print("üìä V3.6 DEEP CNN RESULTS (400 Files) - HARD AUG FIXED")
    print("="*50)
    print(f"Total Effective Soft Trials: {effective_trials['Soft']}")
    print(f"Total Effective Hard Trials: {effective_trials['Hard']}")
    print("-" * 50)

    print(f"üé§ Soft Hum:")
    print(f"   Top-1:  {calc_acc(results['Soft'], 'top1', effective_trials['Soft']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Soft'], 'top5', effective_trials['Soft']):.1%}")
    print(f"   Top-10: {calc_acc(results['Soft'], 'top10', effective_trials['Soft']):.1%}")

    print(f"\nüî• Hard Hum:")
    print(f"   Top-1:  {calc_acc(results['Hard'], 'top1', effective_trials['Hard']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Hard'], 'top5', effective_trials['Hard']):.1%}")
    print(f"   Top-10: {calc_acc(results['Hard'], 'top10', effective_trials['Hard']):.1%}")
    print("="*50)

‚è≥ Loading Deep CNN Model from /content/pitch_modelV1_deepCNNplussmoothing/best.pth...
‚úÖ Model loaded successfully.
üèóÔ∏è Building Geometric DB from 400 songs in /content/eval...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:02<00:00, 184.26it/s]


‚úÖ DB Built: 63580 windows across 400 songs.

üöÄ Running 400 Trials with Geometric Scoring...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:02<00:00, 143.50it/s]


üìä V3.6 DEEP CNN RESULTS (400 Files) - HARD AUG FIXED
Total Effective Soft Trials: 400
Total Effective Hard Trials: 400
--------------------------------------------------
üé§ Soft Hum:
   Top-1:  58.0%
   Top-5:  62.3%
   Top-10: 64.0%

üî• Hard Hum:
   Top-1:  37.5%
   Top-5:  49.2%
   Top-10: 54.5%





üé§ Soft Hum:
   Top-1:  58.0%
   Top-5:  62.3%
   Top-10: 64.0%

üî• Hard Hum:
   Top-1:  37.5%
   Top-5:  49.2%
   Top-10: 54.5%

OLD+LSTN+smoothing

Bi-Directional: This is key. The model reads the melody forwards (start to end) AND backwards (end to start).

      Smoothing: Removes the jitter.

      CNN: Extracts clean features.

      LSTM: Analyzes the clean sequence.

Why? In music, context matters.

      Forward: "I heard a C, then an E, so the next note is probably a G."

      Backward: "I see a G at the end, which confirms the C at the start was part of a C-Major chord."

      Memory: Unlike CNNs, LSTMs have "memory cells." They can remember a key change that happened 2 seconds ago and use that info to interpret the current note.

BiLSTM attaches Musical Context (Phrasing & Key) to the notes based on the melody structure.

In [None]:
# ==============================================================================
# V5: CRNN (DEEP CNN + 2-LAYER BiLSTM) + GLOBAL SMOOTHING + ADAPTIVE LR
# ==============================================================================
import os
import random
import glob
import numpy as np
import scipy.signal as sg
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau
from tqdm import tqdm

# -------------------------
# Hyperparams
# -------------------------
TARGET_LEN = 300   # 3 seconds
BATCH_SIZE = 32
EPOCHS = 120       # <--- Updated to 120
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# -------------------------
# PATHS
# -------------------------
PITCH_DIR = "/content/data_unique"
CKPT_DIR = "/content/pitch_modelV1_lstmplussmoothing"
os.makedirs(CKPT_DIR, exist_ok=True)

BEST = f"{CKPT_DIR}/best.pth"
LAST = f"{CKPT_DIR}/last.pth"

# -------------------------
# 1. GLOBAL SMOOTHING HELPER
# -------------------------
def smooth_pitch(pitch):
    """
    Applies Median Filter (k=5) to remove jagged tracking errors.
    Crucial for stabilizing the LSTM input.
    """
    #
    return sg.medfilt(pitch, kernel_size=5).astype(np.float32)

# -------------------------
# 2. AUGMENTATION
# -------------------------
def augment_hum(pitch):
    pitch = pitch.copy().astype(np.float32)

    # 1. Noise
    pitch += np.random.normal(0, 0.06, size=len(pitch))

    # 2. Key Shift
    semitones = np.random.uniform(-5, 5)
    pitch[pitch > 0] += semitones * 0.057

    # 3. Time Warp
    if random.random() < 0.7:
        rate = np.random.uniform(0.8, 1.25)
        old_idx = np.arange(len(pitch))
        new_idx = np.linspace(0, len(pitch)-1, max(2, int(len(pitch)*rate)))
        pitch = np.interp(new_idx, old_idx, pitch)

    # 4. Breath Noise (Simulating air in the mic)
    pitch += np.random.normal(0, 0.04, size=len(pitch))

    return pitch.astype(np.float32)

# -------------------------
# Helper: Pad/Crop
# -------------------------
def force_length(arr, target_len=TARGET_LEN):
    if arr is None or len(arr) == 0:
        return np.zeros(target_len, dtype=np.float32)
    if len(arr) < target_len:
        pad_amt = target_len - len(arr)
        return np.pad(arr, (0, pad_amt), mode='constant')
    elif len(arr) > target_len:
        start = random.randint(0, len(arr) - target_len)
        return arr[start:start + target_len]
    return arr

# -------------------------
# 3. DATASET
# -------------------------
class PitchDatasetV5(Dataset):
    def __init__(self, pitch_dir, target_len=TARGET_LEN):
        self.files = sorted(glob.glob(os.path.join(pitch_dir, "*.npy")))
        self.target_len = target_len
        print(f"‚úÖ Loaded {len(self.files)} files")

    def _random_crop(self, arr):
        if len(arr) <= self.target_len:
            return arr
        start = random.randint(0, len(arr) - self.target_len)
        return arr[start:start + self.target_len]

    def __getitem__(self, idx):
        # Load Raw Data
        anchor_path = self.files[idx]
        anchor_full = np.load(anchor_path)

        neg_idx = random.randint(0, len(self.files) - 1)
        while neg_idx == idx:
            neg_idx = random.randint(0, len(self.files) - 1)
        neg_full = np.load(self.files[neg_idx])

        # --- APPLY GLOBAL SMOOTHING ---
        # This is the V4/V5 upgrade: Clean inputs before cropping
        anchor_full = smooth_pitch(anchor_full)
        neg_full = smooth_pitch(neg_full)
        # ------------------------------

        # 1. CROP
        anchor_clean = self._random_crop(anchor_full)
        neg_clean = self._random_crop(neg_full)

        # 2. AUGMENT
        # Positive is the HUMMED version of the anchor
        positive_hum = augment_hum(anchor_clean)
        # Negative is the HUMMED version of the negative (Harder Negative)
        negative_hum = augment_hum(neg_clean)

        # 3. PAD/TRUNCATE
        a_out = force_length(anchor_clean, self.target_len)
        ph_out = force_length(positive_hum, self.target_len)
        n_out = force_length(negative_hum, self.target_len)

        # Return only the 3 necessary tensors for Triplet Loss
        return (
            torch.from_numpy(a_out).unsqueeze(0).float(),
            torch.from_numpy(ph_out).unsqueeze(0).float(),
            torch.from_numpy(n_out).unsqueeze(0).float(),
        )

    def __len__(self):
        return len(self.files)

# -------------------------
# 4. MODEL: CRNN (CNN + BiLSTM)
# -------------------------
class PitchCRNN(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        # A. CNN Feature Extractor
        # Reduces length 300 -> 75, Increases features 1 -> 256
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2), # 300 -> 150

            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),
            nn.MaxPool1d(2), # 150 -> 75

            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),
            # No pooling here, preserving sequence length (75) for LSTM
        )

        # B. Deep Bidirectional LSTM
        # Input: 256 features (from CNN)
        # Hidden: 128 features per direction = 256 total output
        self.lstm = nn.LSTM(
            input_size=256,
            hidden_size=128,
            num_layers=2,        # Deep LSTM (2 stacked layers)
            batch_first=True,
            bidirectional=True
        )

        # C. Projection Head
        self.fc = nn.Sequential(
            nn.Linear(256, 256), # 256 matches BiLSTM output (128*2)
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        # x shape: (Batch, 1, 300)

        # 1. CNN Forward
        x = self.cnn(x)  # Output: (Batch, 256, 75)

        # 2. Prepare for LSTM
        # LSTM expects (Batch, Sequence, Features)
        x = x.permute(0, 2, 1) # Output: (Batch, 75, 256)

        # 3. LSTM Forward
        # out shape: (Batch, 75, 256)
        self.lstm.flatten_parameters() # Optimize memory for CUDA
        out, _ = self.lstm(x)

        # 4. Global Average Pooling (Over time dimension)
        # We average the 75 time steps to get one vector per song
        out = torch.mean(out, dim=1) # Output: (Batch, 256)

        # 5. Projection
        out = self.fc(out) # Output: (Batch, 128)

        return F.normalize(out, p=2, dim=1)

# -------------------------
# 5. TRAINING LOOP
# -------------------------
def train_crnn_v5():
    print(f"üöÄ Training V5: CRNN (CNN + BiLSTM) on: {DEVICE}")
    print(f"üìÇ Data: {PITCH_DIR}")
    print(f"üîß Margin: 0.85 | Smoothing: ON | Epochs: {EPOCHS}")

    dataset = PitchDatasetV5(PITCH_DIR, target_len=TARGET_LEN)
    # num_workers=0 avoids multiprocessing errors in notebooks
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0, pin_memory=True)

    model = PitchCRNN().to(DEVICE)
    optim = torch.optim.Adam(model.parameters(), lr=0.0001)

    # ADAPTIVE LR SCHEDULER
    scheduler = ReduceLROnPlateau(optim, mode='min', factor=0.5, patience=8)

    loss_fn = nn.TripletMarginLoss(margin=0.85, p=2)

    best_loss = float('inf')
    print(f"Starting LR: 0.0001")

    for epoch in range(EPOCHS):
        model.train()
        total_loss = 0.0

        # Using simple iterator to keep log clean
        # pbar = tqdm(loader, desc=f"Epoch {epoch+1}/{EPOCHS}")

        for anchor, pos_hum, neg_hum in loader:
            anchor = anchor.to(DEVICE)
            pos_hum = pos_hum.to(DEVICE)
            neg_hum = neg_hum.to(DEVICE)

            optim.zero_grad()

            a = model.forward_one(anchor)
            ph = model.forward_one(pos_hum)
            n = model.forward_one(neg_hum)

            # Single robust loss
            loss = loss_fn(a, ph, n)

            loss.backward()

            # Gradient Clipping (Important for LSTMs stability)
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

            optim.step()

            total_loss += loss.item()

        avg_loss = total_loss / len(loader)
        current_lr = optim.param_groups[0]['lr']
        print(f"Epoch {epoch+1}/{EPOCHS} | Avg Loss: {avg_loss:.4f} | LR: {current_lr:.6f}")

        # Step Scheduler
        scheduler.step(avg_loss)

        if avg_loss < best_loss:
            best_loss = avg_loss
            torch.save(model.state_dict(), BEST)
            print(f" ‚≠ê New Best: {best_loss:.4f}")

        torch.save({
            "epoch": epoch + 1,
            "model": model.state_dict(),
            "optimizer": optim.state_dict(),
            "best_loss": best_loss,
        }, LAST)

    print("‚úÖ Training Complete.")
    print(f"Best Model Saved to: {BEST}")

if __name__ == "__main__":
    train_crnn_v5()

üöÄ Training V5: CRNN (CNN + BiLSTM) on: cuda
üìÇ Data: /content/data_unique
üîß Margin: 0.85 | Smoothing: ON | Epochs: 120
‚úÖ Loaded 4051 files
Starting LR: 0.0001
Epoch 1/120 | Avg Loss: 0.2638 | LR: 0.000100
 ‚≠ê New Best: 0.2638
Epoch 2/120 | Avg Loss: 0.1283 | LR: 0.000100
 ‚≠ê New Best: 0.1283
Epoch 3/120 | Avg Loss: 0.1201 | LR: 0.000100
 ‚≠ê New Best: 0.1201
Epoch 4/120 | Avg Loss: 0.0915 | LR: 0.000100
 ‚≠ê New Best: 0.0915
Epoch 5/120 | Avg Loss: 0.0789 | LR: 0.000100
 ‚≠ê New Best: 0.0789
Epoch 6/120 | Avg Loss: 0.0759 | LR: 0.000100
 ‚≠ê New Best: 0.0759
Epoch 7/120 | Avg Loss: 0.0713 | LR: 0.000100
 ‚≠ê New Best: 0.0713
Epoch 8/120 | Avg Loss: 0.0641 | LR: 0.000100
 ‚≠ê New Best: 0.0641
Epoch 9/120 | Avg Loss: 0.0599 | LR: 0.000100
 ‚≠ê New Best: 0.0599
Epoch 10/120 | Avg Loss: 0.0546 | LR: 0.000100
 ‚≠ê New Best: 0.0546
Epoch 11/120 | Avg Loss: 0.0622 | LR: 0.000100
Epoch 12/120 | Avg Loss: 0.0755 | LR: 0.000100
Epoch 13/120 | Avg Loss: 0.0600 | LR: 0.000100
Epoch 14/

eval OLD+LSTN+smoothing

In [None]:
# ==============================================================================
# EVAL V5: CRNN (CNN+LSTM) + SMOOTHING + GEOMETRIC SCORING (HARD AUG FIXED)
# ==============================================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import scipy.signal as sg
import os
import random
from collections import defaultdict
from tqdm import tqdm

# ======================================================
# CONFIGURATION
# ======================================================
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# Data Paths
VAL_DIR = "/content/eval"  # 400 unseen files
# Pointing to the new CRNN model checkpoint
MODEL_PATH = "/content/pitch_modelV1_lstmplussmoothing/best.pth"

# Params
WIN_LEN = 300
HOP_LEN = 150
TOLERANCE = 1.0    # Time bucket tolerance
TOP_K_MATCHES = 20
NUM_TRIALS = 400   # Full coverage
SEGMENT_LEN = 1500 # 15 seconds

# ======================================================
# 1. MODEL (V5 Architecture)
# ======================================================
class PitchCRNN(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        # 1. CNN Feature Extractor
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2), # 300 -> 150

            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),
            nn.MaxPool1d(2), # 150 -> 75

            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),
        )

        # 2. Deep Bidirectional LSTM
        self.lstm = nn.LSTM(
            input_size=256,
            hidden_size=128,
            num_layers=2,
            batch_first=True,
            bidirectional=True
        )

        # 3. Projection Head
        self.fc = nn.Sequential(
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        # A. CNN Forward
        x = self.cnn(x)  # (Batch, 256, 75)

        # B. Prepare for LSTM (Permute to Batch, Seq, Feat)
        x = x.permute(0, 2, 1) # (Batch, 75, 256)

        # C. LSTM Forward
        self.lstm.flatten_parameters()
        out, _ = self.lstm(x)

        # D. Global Average Pooling (Over time dimension)
        out = torch.mean(out, dim=1) # (Batch, 256)

        # E. Projection
        out = self.fc(out) # (Batch, 128)

        return F.normalize(out, p=2, dim=1)

print(f"‚è≥ Loading V5 CRNN Model from {MODEL_PATH}...")
model = PitchCRNN(embed_dim=128).to(DEVICE)
try:
    checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
    if "model" in checkpoint:
        model.load_state_dict(checkpoint["model"])
    else:
        model.load_state_dict(checkpoint)
    print("‚úÖ Model loaded successfully.")
except Exception as e:
    print(f"‚ùå Error loading model: {e}")
    exit()
model.eval()

# ======================================================
# 2. SMOOTHING (MATCHES TRAINING)
# ======================================================
def smooth_pitch(pitch):
    return sg.medfilt(pitch, kernel_size=5).astype(np.float32)

# ======================================================
# 3. AUGMENTATION (FIXED)
# ======================================================
def humify_soft(arr):
    arr = arr.copy()
    arr += np.random.normal(0, 0.02, size=len(arr))
    return arr.astype(np.float32)

def humify_hard(arr):
    """Hard Hum Augmentation (NOW INCLUDES TIME WARPING)"""
    # NOTE: This function assumes the input 'arr' is already the clean_base clip (1500 frames)
    arr = arr.copy()

    # 1. Jitter
    arr += np.random.normal(0, 0.06, size=len(arr))

    # 2. Key Shift
    semitones = np.random.uniform(-3, 3)
    arr[arr > 0] += semitones * 0.057

    # 3. TIME WARP (FIXED: Added back the crucial tempo distortion)
    target_len = SEGMENT_LEN # 1500
    if random.random() < 0.8: # 80% chance of warping
        rate = np.random.uniform(0.85, 1.15)
        old_idx = np.arange(len(arr))
        new_len = int(len(arr) * rate)
        new_idx = np.linspace(0, len(arr)-1, new_len)
        arr = np.interp(new_idx, old_idx, arr)

        # Force back to target length (Cropping or Padding)
        if len(arr) < target_len:
            arr = np.pad(arr, (0, target_len - len(arr)), mode='constant')
        else:
            start = (len(arr) - target_len) // 2
            arr = arr[start:start+target_len]

    return arr.astype(np.float32)


# ======================================================
# 4. EMBEDDING + DB BUILDER
# ======================================================
def process_sequence_to_embeddings(arr):
    """
    Returns embeddings and time offsets.
    Processes windows in mini-batches to prevent OOM errors.
    """
    windows = []
    offsets = []

    i = 0
    while i + WIN_LEN <= len(arr):
        crop = arr[i : i + WIN_LEN]
        if np.mean(crop > 0) < 0.1: # Skip silence
            i += HOP_LEN
            continue
        windows.append(crop)
        offsets.append(i / 100.0)
        i += HOP_LEN

    if not windows:
        return None, None

    # Stack all windows
    windows_np = np.stack(windows)
    windows_tensor = torch.from_numpy(windows_np).float().unsqueeze(1).to(DEVICE)

    # --- BATCH PROCESSING FIX ---
    batch_size = 64  # Safe batch size for inference
    embeddings_list = []

    with torch.no_grad():
        for k in range(0, len(windows_tensor), batch_size):
            batch = windows_tensor[k : k + batch_size]
            emb_batch = model.forward_one(batch)
            embeddings_list.append(emb_batch)

    # Concatenate all batch results back into one tensor
    embeddings = torch.cat(embeddings_list, dim=0)

    return embeddings, offsets

def build_flat_database():
    files = sorted([f for f in os.listdir(VAL_DIR) if f.endswith(".npy")])

    all_embeds_list = []
    metadata = []

    print(f"üèóÔ∏è Building Geometric DB from {len(files)} songs in {VAL_DIR}...")

    for f_name in tqdm(files):
        path = os.path.join(VAL_DIR, f_name)
        arr = np.load(path)

        # V5 LOGIC: Smooth the DB tracks (Anchors)
        arr = smooth_pitch(arr)

        embeds, offsets = process_sequence_to_embeddings(arr)
        if embeds is None: continue

        all_embeds_list.append(embeds)
        song_id = f_name.replace(".npy", "")

        for t in offsets:
            metadata.append((song_id, t))

    full_db_tensor = torch.cat(all_embeds_list, dim=0)
    print(f"‚úÖ DB Built: {full_db_tensor.shape[0]} windows across {len(files)} songs.")
    return full_db_tensor, metadata

# ======================================================
# 5. GEOMETRIC SCORING
# ======================================================
def query_geometric(query_embeds, query_offsets, db_tensor, db_metadata):
    # 1. Distance Matrix
    dists = torch.cdist(query_embeds, db_tensor, p=2)

    # 2. Top-K
    top_vals, top_inds = torch.topk(dists, k=TOP_K_MATCHES, dim=1, largest=False)
    top_vals = top_vals.cpu().numpy()
    top_inds = top_inds.cpu().numpy()

    vote_buckets = defaultdict(float)
    epsilon = 1e-4

    for q_idx, q_time in enumerate(query_offsets):
        for k in range(TOP_K_MATCHES):
            match_idx = top_inds[q_idx, k]
            dist = top_vals[q_idx, k]

            match_song, match_time = db_metadata[match_idx]

            # 3. Geometric Alignment
            projected_start = match_time - q_time
            bucket = int(round(projected_start / TOLERANCE))

            # Score
            score = 1.0 / (dist + epsilon)
            vote_buckets[(match_song, bucket)] += score

    # 4. Max Score per Song
    song_final_scores = defaultdict(float)
    for (song, bucket), score in vote_buckets.items():
        if score > song_final_scores[song]:
            song_final_scores[song] = score

    ranked_songs = sorted(song_final_scores.items(), key=lambda x: x[1], reverse=True)
    return [x[0] for x in ranked_songs]

# ======================================================
# 6. RUN EVAL
# ======================================================
if __name__ == "__main__":
    db_tensor, db_metadata = build_flat_database()
    song_list = list(set([m[0] for m in db_metadata]))

    results = {
        "Soft": {"top1": 0, "top5": 0, "top10": 0},
        "Hard": {"top1": 0, "top5": 0, "top10": 0}
    }

    print(f"\nüöÄ Running {NUM_TRIALS} Trials with Geometric Scoring...")

    effective_trials = {"Soft": 0, "Hard": 0}

    for _ in tqdm(range(NUM_TRIALS)):
        target_song = random.choice(song_list)
        full_arr = np.load(os.path.join(VAL_DIR, f"{target_song}.npy"))

        if len(full_arr) < SEGMENT_LEN: continue

        start_idx = np.random.randint(0, len(full_arr) - SEGMENT_LEN)
        raw_clip = full_arr[start_idx : start_idx + SEGMENT_LEN]

        # V5 LOGIC:
        # 1. Smooth the raw clip (Simulates the Clean Anchor base)
        clean_base = smooth_pitch(raw_clip)

        # 2. Augment the Smoothed Base (Matches Training: augment(smooth(anchor)))

        # --- Test Soft ---
        soft_hum_clip = humify_soft(clean_base)
        q_emb, q_off = process_sequence_to_embeddings(soft_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Soft"] += 1
            if ranked:
                if ranked[0] == target_song: results["Soft"]["top1"] += 1
                if target_song in ranked[:5]: results["Soft"]["top5"] += 1
                if target_song in ranked[:10]: results["Soft"]["top10"] += 1

        # --- Test Hard ---
        hard_hum_clip = humify_hard(clean_base)
        q_emb, q_off = process_sequence_to_embeddings(hard_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Hard"] += 1
            if ranked:
                if ranked[0] == target_song: results["Hard"]["top1"] += 1
                if target_song in ranked[:5]: results["Hard"]["top5"] += 1
                if target_song in ranked[:10]: results["Hard"]["top10"] += 1

    # ======================================================
    # FINAL REPORT
    # ======================================================
    def calc_acc(res, key, total):
        return res[key] / total if total > 0 else 0

    print("\n" + "="*50)
    print("üìä V5 CRNN (CNN+LSTM) RESULTS (400 Files)")
    print("="*50)
    print(f"Total Effective Soft Trials: {effective_trials['Soft']}")
    print(f"Total Effective Hard Trials: {effective_trials['Hard']}")
    print("-" * 50)

    print(f"üé§ Soft Hum:")
    print(f"   Top-1:  {calc_acc(results['Soft'], 'top1', effective_trials['Soft']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Soft'], 'top5', effective_trials['Soft']):.1%}")
    print(f"   Top-10: {calc_acc(results['Soft'], 'top10', effective_trials['Soft']):.1%}")

    print(f"\nüî• Hard Hum:")
    print(f"   Top-1:  {calc_acc(results['Hard'], 'top1', effective_trials['Hard']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Hard'], 'top5', effective_trials['Hard']):.1%}")
    print(f"   Top-10: {calc_acc(results['Hard'], 'top10', effective_trials['Hard']):.1%}")
    print("="*50)

‚è≥ Loading V5 CRNN Model from /content/pitch_modelV1_lstmplussmoothing/best.pth...
‚úÖ Model loaded successfully.
üèóÔ∏è Building Geometric DB from 400 songs in /content/eval...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:05<00:00, 69.59it/s]


‚úÖ DB Built: 63580 windows across 400 songs.

üöÄ Running 400 Trials with Geometric Scoring...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:05<00:00, 78.12it/s]


üìä V5 CRNN (CNN+LSTM) RESULTS (400 Files)
Total Effective Soft Trials: 400
Total Effective Hard Trials: 400
--------------------------------------------------
üé§ Soft Hum:
   Top-1:  54.8%
   Top-5:  60.0%
   Top-10: 60.8%

üî• Hard Hum:
   Top-1:  54.8%
   Top-5:  60.2%
   Top-10: 61.8%





üé§ Soft Hum:
   Top-1:  54.8%
   Top-5:  60.0%
   Top-10: 60.8%

üî• Hard Hum:
   Top-1:  54.8%
   Top-5:  60.2%
   Top-10: 61.8%

OLD+LSTM

LSTMs are good at learning "temporal noise." You are betting that the LSTM can learn to ignore the specific "jitter pattern" of CREPE artifacts on its own, potentially preserving subtle vibrato details that the Median Filter might have accidentally smoothed away.

In [None]:
# ==============================================================================
# V5: CRNN (CNN + BiLSTM) | NO SMOOTHING | ADAPTIVE LR | 100 EPOCHS
# ==============================================================================
import os
import random
import glob
import numpy as np
import scipy.signal as sg
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau
from tqdm import tqdm

# -------------------------
# Hyperparams
# -------------------------
TARGET_LEN = 300   # 3 seconds
BATCH_SIZE = 32
EPOCHS = 100       # 100 Epochs
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# -------------------------
# PATHS
# -------------------------
PITCH_DIR = "/content/data_unique"
CKPT_DIR = "/content/pitch_modelV5_nosmooth"
os.makedirs(CKPT_DIR, exist_ok=True)

BEST = f"{CKPT_DIR}/best.pth"
LAST = f"{CKPT_DIR}/last.pth"

# -------------------------
# 1. AUGMENTATION
# -------------------------
def augment_hum(pitch):
    pitch = pitch.copy().astype(np.float32)

    # 1. Noise
    pitch += np.random.normal(0, 0.06, size=len(pitch))

    # 2. Key Shift
    semitones = np.random.uniform(-5, 5)
    pitch[pitch > 0] += semitones * 0.057

    # 3. Time Warp
    if random.random() < 0.7:
        rate = np.random.uniform(0.8, 1.25)
        old_idx = np.arange(len(pitch))
        new_idx = np.linspace(0, len(pitch)-1, max(2, int(len(pitch)*rate)))
        pitch = np.interp(new_idx, old_idx, pitch)

    # 4. Breath Noise
    pitch += np.random.normal(0, 0.04, size=len(pitch))

    return pitch.astype(np.float32)

# -------------------------
# Helper: Pad/Crop
# -------------------------
def force_length(arr, target_len=TARGET_LEN):
    if arr is None or len(arr) == 0:
        return np.zeros(target_len, dtype=np.float32)
    if len(arr) < target_len:
        pad_amt = target_len - len(arr)
        return np.pad(arr, (0, pad_amt), mode='constant')
    elif len(arr) > target_len:
        start = random.randint(0, len(arr) - target_len)
        return arr[start:start + target_len]
    return arr

# -------------------------
# 2. DATASET (NO SMOOTHING)
# -------------------------
class PitchDatasetV5(Dataset):
    def __init__(self, pitch_dir, target_len=TARGET_LEN):
        self.files = sorted(glob.glob(os.path.join(pitch_dir, "*.npy")))
        self.target_len = target_len
        print(f"‚úÖ Loaded {len(self.files)} files")

    def _random_crop(self, arr):
        if len(arr) <= self.target_len:
            return arr
        start = random.randint(0, len(arr) - self.target_len)
        return arr[start:start + self.target_len]

    def __getitem__(self, idx):
        # Load Raw Data
        anchor_path = self.files[idx]
        anchor_full = np.load(anchor_path)

        neg_idx = random.randint(0, len(self.files) - 1)
        while neg_idx == idx:
            neg_idx = random.randint(0, len(self.files) - 1)
        neg_full = np.load(self.files[neg_idx])

        # --- NO SMOOTHING APPLIED HERE ---

        # 1. CROP
        anchor_clean = self._random_crop(anchor_full)
        neg_clean = self._random_crop(neg_full)

        # 2. AUGMENT
        positive_hum = augment_hum(anchor_clean)
        negative_hum = augment_hum(neg_clean)

        # 3. PAD/TRUNCATE
        a_out = force_length(anchor_clean, self.target_len)
        ph_out = force_length(positive_hum, self.target_len)
        n_out = force_length(negative_hum, self.target_len)

        return (
            torch.from_numpy(a_out).unsqueeze(0).float(),
            torch.from_numpy(ph_out).unsqueeze(0).float(),
            torch.from_numpy(n_out).unsqueeze(0).float(),
        )

    def __len__(self):
        return len(self.files)

# -------------------------
# 3. MODEL: CRNN (CNN + BiLSTM)
# -------------------------
class PitchCRNN(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        # A. CNN Feature Extractor
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2), # 300 -> 150

            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),
            nn.MaxPool1d(2), # 150 -> 75

            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),
        )

        # B. Deep Bidirectional LSTM
        self.lstm = nn.LSTM(
            input_size=256,
            hidden_size=128,
            num_layers=2,
            batch_first=True,
            bidirectional=True
        )

        # C. Projection Head
        self.fc = nn.Sequential(
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        # 1. CNN Forward
        x = self.cnn(x)  # (Batch, 256, 75)

        # 2. Prepare for LSTM (Batch, Seq, Feat)
        x = x.permute(0, 2, 1) # (Batch, 75, 256)

        # 3. LSTM Forward
        self.lstm.flatten_parameters()
        out, _ = self.lstm(x)

        # 4. Global Average Pooling
        out = torch.mean(out, dim=1) # (Batch, 256)

        # 5. Projection
        out = self.fc(out)

        return F.normalize(out, p=2, dim=1)

# -------------------------
# 4. TRAINING LOOP
# -------------------------
def train_crnn_nosmooth():
    print(f"üöÄ Training V5: CRNN (CNN + BiLSTM) | NO SMOOTHING | {DEVICE}")
    print(f"üìÇ Data: {PITCH_DIR}")
    print(f"üîß Epochs: {EPOCHS} | Adaptive LR: ON")

    dataset = PitchDatasetV5(PITCH_DIR, target_len=TARGET_LEN)
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0, pin_memory=True)

    model = PitchCRNN().to(DEVICE)
    optim = torch.optim.Adam(model.parameters(), lr=0.0001)

    # ADAPTIVE LR SCHEDULER
    scheduler = ReduceLROnPlateau(optim, mode='min', factor=0.5, patience=8)

    loss_fn = nn.TripletMarginLoss(margin=0.85, p=2)

    best_loss = float('inf')

    for epoch in range(EPOCHS):
        model.train()
        total_loss = 0.0

        for anchor, pos_hum, neg_hum in loader:
            anchor = anchor.to(DEVICE)
            pos_hum = pos_hum.to(DEVICE)
            neg_hum = neg_hum.to(DEVICE)

            optim.zero_grad()

            a = model.forward_one(anchor)
            ph = model.forward_one(pos_hum)
            n = model.forward_one(neg_hum)

            loss = loss_fn(a, ph, n)

            loss.backward()

            # Clip Gradients for LSTM stability
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

            optim.step()

            total_loss += loss.item()

        avg_loss = total_loss / len(loader)
        current_lr = optim.param_groups[0]['lr']
        print(f"Epoch {epoch+1}/{EPOCHS} | Avg Loss: {avg_loss:.4f} | LR: {current_lr:.6f}")

        # Step Scheduler
        scheduler.step(avg_loss)

        if avg_loss < best_loss:
            best_loss = avg_loss
            torch.save(model.state_dict(), BEST)
            print(f" ‚≠ê New Best: {best_loss:.4f}")

        torch.save({
            "epoch": epoch + 1,
            "model": model.state_dict(),
            "optimizer": optim.state_dict(),
            "best_loss": best_loss,
        }, LAST)

    print("‚úÖ Training Complete.")
    print(f"Best Model Saved to: {BEST}")

if __name__ == "__main__":
    train_crnn_nosmooth()

üöÄ Training V5: CRNN (CNN + BiLSTM) | NO SMOOTHING | cuda
üìÇ Data: /content/data_unique
üîß Epochs: 100 | Adaptive LR: ON
‚úÖ Loaded 4051 files
Epoch 1/100 | Avg Loss: 0.2556 | LR: 0.000100
 ‚≠ê New Best: 0.2556
Epoch 2/100 | Avg Loss: 0.1378 | LR: 0.000100
 ‚≠ê New Best: 0.1378
Epoch 3/100 | Avg Loss: 0.1092 | LR: 0.000100
 ‚≠ê New Best: 0.1092
Epoch 4/100 | Avg Loss: 0.0908 | LR: 0.000100
 ‚≠ê New Best: 0.0908
Epoch 5/100 | Avg Loss: 0.0907 | LR: 0.000100
 ‚≠ê New Best: 0.0907
Epoch 6/100 | Avg Loss: 0.0754 | LR: 0.000100
 ‚≠ê New Best: 0.0754
Epoch 7/100 | Avg Loss: 0.0673 | LR: 0.000100
 ‚≠ê New Best: 0.0673
Epoch 8/100 | Avg Loss: 0.0633 | LR: 0.000100
 ‚≠ê New Best: 0.0633
Epoch 9/100 | Avg Loss: 0.0596 | LR: 0.000100
 ‚≠ê New Best: 0.0596
Epoch 10/100 | Avg Loss: 0.0603 | LR: 0.000100
Epoch 11/100 | Avg Loss: 0.0587 | LR: 0.000100
 ‚≠ê New Best: 0.0587
Epoch 12/100 | Avg Loss: 0.0544 | LR: 0.000100
 ‚≠ê New Best: 0.0544
Epoch 13/100 | Avg Loss: 0.0636 | LR: 0.000100
Epoch 1

In [None]:
# ==============================================================================
# EVAL V5: CRNN (NO SMOOTHING) + GEOMETRIC SCORING
# ==============================================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import scipy.signal as sg
import os
import random
from collections import defaultdict
from tqdm import tqdm

# ======================================================
# CONFIGURATION
# ======================================================
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# Data Paths
VAL_DIR = "/content/eval"  # 400 unseen files
# Pointing to the NO SMOOTHING checkpoint
MODEL_PATH = "/content/pitch_modelV5_nosmooth/best.pth"

# Params
WIN_LEN = 300
HOP_LEN = 150
TOLERANCE = 1.0    # Time bucket tolerance
TOP_K_MATCHES = 20
NUM_TRIALS = 400   # Full coverage
SEGMENT_LEN = 1500 # 15 seconds

# ======================================================
# 1. MODEL ARCHITECTURE (MUST MATCH V5 TRAINING)
# ======================================================
class PitchCRNN(nn.Module):
    def __init__(self, embed_dim=128):
        super().__init__()

        # 1. CNN Feature Extractor
        self.cnn = nn.Sequential(
            nn.Conv1d(1, 64, kernel_size=5, padding=2),
            nn.BatchNorm1d(64), nn.ReLU(),
            nn.MaxPool1d(2), # 300 -> 150

            nn.Conv1d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm1d(128), nn.ReLU(),
            nn.MaxPool1d(2), # 150 -> 75

            nn.Conv1d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm1d(256), nn.ReLU(),
        )

        # 2. Deep Bidirectional LSTM
        self.lstm = nn.LSTM(
            input_size=256,
            hidden_size=128,
            num_layers=2,
            batch_first=True,
            bidirectional=True
        )

        # 3. Projection Head
        self.fc = nn.Sequential(
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, embed_dim)
        )

    def forward_one(self, x):
        # A. CNN Forward
        x = self.cnn(x)  # (Batch, 256, 75)

        # B. Prepare for LSTM (Permute to Batch, Seq, Feat)
        x = x.permute(0, 2, 1) # (Batch, 75, 256)

        # C. LSTM Forward
        self.lstm.flatten_parameters()
        out, _ = self.lstm(x)

        # D. Global Average Pooling (Over time dimension)
        out = torch.mean(out, dim=1) # (Batch, 256)

        # E. Projection
        out = self.fc(out) # (Batch, 128)

        return F.normalize(out, p=2, dim=1)

print(f"‚è≥ Loading V5 CRNN (No Smooth) Model from {MODEL_PATH}...")
model = PitchCRNN(embed_dim=128).to(DEVICE)
try:
    checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
    if "model" in checkpoint:
        model.load_state_dict(checkpoint["model"])
    else:
        model.load_state_dict(checkpoint)
    print("‚úÖ Model loaded successfully.")
except Exception as e:
    print(f"‚ùå Error loading model: {e}")
    exit()
model.eval()

# ======================================================
# 2. AUGMENTATION (Proper Soft & Hard)
# ======================================================
def humify_soft(arr):
    """Soft Hum: Light Noise only."""
    arr = arr.copy()
    arr += np.random.normal(0, 0.02, size=len(arr))
    return arr.astype(np.float32)

def humify_hard(arr):
    """Hard Hum: Noise + Key Shift + Time Warping"""
    arr = arr.copy()

    # 1. Jitter
    arr += np.random.normal(0, 0.06, size=len(arr))

    # 2. Key Shift
    semitones = np.random.uniform(-3, 3)
    arr[arr > 0] += semitones * 0.057

    # 3. TIME WARP (Crucial for testing LSTM robustness)
    target_len = SEGMENT_LEN # 1500
    if random.random() < 0.8: # 80% chance
        rate = np.random.uniform(0.85, 1.15)
        old_idx = np.arange(len(arr))
        new_len = int(len(arr) * rate)
        new_idx = np.linspace(0, len(arr)-1, new_len)
        arr = np.interp(new_idx, old_idx, arr)

        # Force back to target length
        if len(arr) < target_len:
            arr = np.pad(arr, (0, target_len - len(arr)), mode='constant')
        else:
            start = (len(arr) - target_len) // 2
            arr = arr[start:start+target_len]

    return arr.astype(np.float32)


# ======================================================
# 3. EMBEDDING + DB BUILDER (No Smoothing applied here)
# ======================================================
def process_sequence_to_embeddings(arr):
    """Returns embeddings and time offsets.
    NOTE: Processes in mini-batches to prevent OOM."""

    # NO SMOOTHING CALL HERE (Matches Training V5 No-Smooth)

    windows = []
    offsets = []

    i = 0
    while i + WIN_LEN <= len(arr):
        crop = arr[i : i + WIN_LEN]
        if np.mean(crop > 0) < 0.1: # Skip silence
            i += HOP_LEN
            continue
        windows.append(crop)
        offsets.append(i / 100.0)
        i += HOP_LEN

    if not windows:
        return None, None

    windows_np = np.stack(windows)
    windows_tensor = torch.from_numpy(windows_np).float().unsqueeze(1).to(DEVICE)

    # --- BATCH PROCESSING (Prevents OOM) ---
    batch_size = 64
    embeddings_list = []

    with torch.no_grad():
        for k in range(0, len(windows_tensor), batch_size):
            batch = windows_tensor[k : k + batch_size]
            emb_batch = model.forward_one(batch)
            embeddings_list.append(emb_batch)

    embeddings = torch.cat(embeddings_list, dim=0)

    return embeddings, offsets

def build_flat_database():
    files = sorted([f for f in os.listdir(VAL_DIR) if f.endswith(".npy")])

    all_embeds_list = []
    metadata = []

    print(f"üèóÔ∏è Building Geometric DB (RAW PITCH) from {len(files)} songs in {VAL_DIR}...")

    for f_name in tqdm(files):
        path = os.path.join(VAL_DIR, f_name)
        arr = np.load(path)

        # Raw pitch goes directly into embedding
        embeds, offsets = process_sequence_to_embeddings(arr)
        if embeds is None: continue

        all_embeds_list.append(embeds)
        song_id = f_name.replace(".npy", "")

        for t in offsets:
            metadata.append((song_id, t))

    full_db_tensor = torch.cat(all_embeds_list, dim=0)
    print(f"‚úÖ DB Built: {full_db_tensor.shape[0]} windows across {len(files)} songs.")
    return full_db_tensor, metadata

# ======================================================
# 4. GEOMETRIC SCORING
# ======================================================
def query_geometric(query_embeds, query_offsets, db_tensor, db_metadata):
    # 1. Distance Matrix
    dists = torch.cdist(query_embeds, db_tensor, p=2)

    # 2. Top-K
    top_vals, top_inds = torch.topk(dists, k=TOP_K_MATCHES, dim=1, largest=False)
    top_vals = top_vals.cpu().numpy()
    top_inds = top_inds.cpu().numpy()

    vote_buckets = defaultdict(float)
    epsilon = 1e-4

    for q_idx, q_time in enumerate(query_offsets):
        for k in range(TOP_K_MATCHES):
            match_idx = top_inds[q_idx, k]
            dist = top_vals[q_idx, k]

            match_song, match_time = db_metadata[match_idx]

            # 3. Geometric Alignment
            projected_start = match_time - q_time
            bucket = int(round(projected_start / TOLERANCE))

            # Score
            score = 1.0 / (dist + epsilon)
            vote_buckets[(match_song, bucket)] += score

    # 4. Max Score per Song
    song_final_scores = defaultdict(float)
    for (song, bucket), score in vote_buckets.items():
        if score > song_final_scores[song]:
            song_final_scores[song] = score

    ranked_songs = sorted(song_final_scores.items(), key=lambda x: x[1], reverse=True)
    return [x[0] for x in ranked_songs]

# ======================================================
# 5. RUN EVAL
# ======================================================
if __name__ == "__main__":
    db_tensor, db_metadata = build_flat_database()
    song_list = list(set([m[0] for m in db_metadata]))

    results = {
        "Soft": {"top1": 0, "top5": 0, "top10": 0},
        "Hard": {"top1": 0, "top5": 0, "top10": 0}
    }

    print(f"\nüöÄ Running {NUM_TRIALS} Trials with Geometric Scoring...")

    effective_trials = {"Soft": 0, "Hard": 0}

    for _ in tqdm(range(NUM_TRIALS)):
        target_song = random.choice(song_list)
        full_arr = np.load(os.path.join(VAL_DIR, f"{target_song}.npy"))

        if len(full_arr) < SEGMENT_LEN: continue

        start_idx = np.random.randint(0, len(full_arr) - SEGMENT_LEN)

        # BASE: Raw, unsmoothed clip (Matches V5 No-Smooth training)
        raw_clip = full_arr[start_idx : start_idx + SEGMENT_LEN]

        # --- Test Soft ---
        soft_hum_clip = humify_soft(raw_clip)
        q_emb, q_off = process_sequence_to_embeddings(soft_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Soft"] += 1
            if ranked:
                if ranked[0] == target_song: results["Soft"]["top1"] += 1
                if target_song in ranked[:5]: results["Soft"]["top5"] += 1
                if target_song in ranked[:10]: results["Soft"]["top10"] += 1

        # --- Test Hard ---
        hard_hum_clip = humify_hard(raw_clip)
        q_emb, q_off = process_sequence_to_embeddings(hard_hum_clip)

        if q_emb is not None:
            ranked = query_geometric(q_emb, q_off, db_tensor, db_metadata)
            effective_trials["Hard"] += 1
            if ranked:
                if ranked[0] == target_song: results["Hard"]["top1"] += 1
                if target_song in ranked[:5]: results["Hard"]["top5"] += 1
                if target_song in ranked[:10]: results["Hard"]["top10"] += 1

    # ======================================================
    # FINAL REPORT
    # ======================================================
    def calc_acc(res, key, total):
        return res[key] / total if total > 0 else 0

    print("\n" + "="*50)
    print("üìä V5 CRNN (NO SMOOTHING) RESULTS (400 Files)")
    print("="*50)
    print(f"Total Effective Soft Trials: {effective_trials['Soft']}")
    print(f"Total Effective Hard Trials: {effective_trials['Hard']}")
    print("-" * 50)

    print(f"üé§ Soft Hum:")
    print(f"   Top-1:  {calc_acc(results['Soft'], 'top1', effective_trials['Soft']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Soft'], 'top5', effective_trials['Soft']):.1%}")
    print(f"   Top-10: {calc_acc(results['Soft'], 'top10', effective_trials['Soft']):.1%}")

    print(f"\nüî• Hard Hum:")
    print(f"   Top-1:  {calc_acc(results['Hard'], 'top1', effective_trials['Hard']):.1%}")
    print(f"   Top-5:  {calc_acc(results['Hard'], 'top5', effective_trials['Hard']):.1%}")
    print(f"   Top-10: {calc_acc(results['Hard'], 'top10', effective_trials['Hard']):.1%}")
    print("="*50)

‚è≥ Loading V5 CRNN (No Smooth) Model from /content/pitch_modelV5_nosmooth/best.pth...
‚úÖ Model loaded successfully.
üèóÔ∏è Building Geometric DB (RAW PITCH) from 400 songs in /content/eval...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:06<00:00, 66.03it/s]


‚úÖ DB Built: 63580 windows across 400 songs.

üöÄ Running 400 Trials with Geometric Scoring...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:04<00:00, 88.95it/s]


üìä V5 CRNN (NO SMOOTHING) RESULTS (400 Files)
Total Effective Soft Trials: 400
Total Effective Hard Trials: 400
--------------------------------------------------
üé§ Soft Hum:
   Top-1:  51.2%
   Top-5:  59.8%
   Top-10: 61.0%

üî• Hard Hum:
   Top-1:  47.2%
   Top-5:  52.2%
   Top-10: 56.0%





üé§ Soft Hum:
   Top-1:  51.2%
   Top-5:  59.8%
   Top-10: 61.0%

üî• Hard Hum:
   Top-1:  47.2%
   Top-5:  52.2%
   Top-10: 56.0%

Loading: /content/pitch_modelV6/best.pth


Building Eval DB: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 199/199 [00:24<00:00,  8.05it/s]


Eval DB size: 199


Evaluating: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100/100 [00:11<00:00,  8.99it/s]


üéµ CLEAN RESULTS
Top-1: 0.28
Top-5: 0.46

üé§ SOFT HUM RESULTS
Top-1: 0.24
Top-5: 0.32

üî• HARD HUM RESULTS
Top-1: 0.19
Top-5: 0.27





üéµ CLEAN RESULTS
Top-1: 0.28
Top-5: 0.46

üé§ SOFT HUM RESULTS
Top-1: 0.24
Top-5: 0.32

üî• HARD HUM RESULTS
Top-1: 0.19
Top-5: 0.27

In [2]:
from google.colab import drive
drive.mount('/content/drive')

MessageError: Error: credential propagation was unsuccessful