### Match already validated files with new files and update annotations

I have made a mistake while analyzing the data and I have used the wrong parameters such as sensitivity and threshold. Sensitivity used was 0.5, lower to what I should have used which causes less target predictions and also more polarized scores, lower scoring clips will have even lower scores and higher scoring clips will have even higher scores. The threshold used was of 0.1 meaning the 10% which also caused to have fewer detections and some classes were not represented at all. That is why I need to reanalyze my data using the correct parameters of sensitivity 1.0 and threshold of 0.01. But as previously I have already validated the top scoring clips and those are likely to be the same, I want to avoid revalidating clips. That is what this code does

**What This Code Does**
* Finds the .txt file in each class folder from it1_s0.5_t0.1.
* Loads previously validated clips & their annotations (Eval & Annotation).
* Finds the same clips in it1_s1_t0.01, now with updated scores.
* Replaces old filenames with updated filenames (new scores) but keeps Eval & Annotation.
* Adds any new detections that weren’t validated before.
* Sorts the .txt file by the new scores.
* Saves the updated .txt file in the it1_s1_t0.01 folder.

In [8]:
import os
import pandas as pd
from pathlib import Path
import glob  # To find the `_Validation.txt` files dynamically

# Paths
prev_val_clips = "/mnt/d/retraining_BirdNET_2025/iterative_training/segments_validation/it_1_s0.5_t0.1/uncertainty/"
new_clips_dir = "/mnt/d/retraining_BirdNET_2025/iterative_training/segments_validation/it_1_s1_t0.01/uncertainty/"

# Function to extract core filename (ignoring score)
def extract_core_filename(filename):
    parts = Path(filename).stem.split('_')  # Remove extension and split filename
    return '_'.join(parts[1:6])  # Extract offset, siteID, survey night, date, time

# Iterate through each class folder in the previous validation clips
for class_folder in os.listdir(prev_val_clips):
    old_class_path = os.path.join(prev_val_clips, class_folder)
    new_class_path = os.path.join(new_clips_dir, class_folder)

    # Ensure the class exists in both directories
    if not os.path.isdir(old_class_path) or not os.path.isdir(new_class_path):
        continue

    # Find the validation .txt file (assuming one per class)
    old_txt_files = glob.glob(os.path.join(old_class_path, "*_Validation.txt"))
    new_txt_files = glob.glob(os.path.join(new_class_path, "*_Validation.txt"))

    if len(old_txt_files) == 0:
        print(f"⚠ No previous validation file found in {class_folder}, skipping...")
        continue
    if len(new_txt_files) == 0:
        print(f"⚠ No new validation file found in {class_folder}, skipping...")
        continue

    old_txt_file = old_txt_files[0]  # Pick the old validation file
    new_txt_file = new_txt_files[0]  # Pick the new validation file

    # Load old validation data
    df_old = pd.read_csv(old_txt_file, delimiter='\t', usecols=["Begin File", "Eval", "Annotation"])

    # Create a dictionary mapping core filenames to their Eval & Annotation
    validated_clips = {
        extract_core_filename(row["Begin File"]): (row["Eval"], row["Annotation"])
        for _, row in df_old.iterrows()
    }

    # Load new validation data
    df_new = pd.read_csv(new_txt_file, delimiter='\t')

    # Ensure required columns exist
    if "Begin File" not in df_new.columns:
        print(f"⚠ Skipping {class_folder} due to missing 'Begin File' column in new validation file.")
        continue

    # Add Eval and Annotation columns if not present
    if "Eval" not in df_new.columns:
        df_new["Eval"] = ""
    if "Annotation" not in df_new.columns:
        df_new["Annotation"] = ""

    # Update new validation file with matched Eval & Annotation values
    for index, row in df_new.iterrows():
        core_name = extract_core_filename(row["Begin File"])
        if core_name in validated_clips:
            df_new.at[index, "Eval"] = validated_clips[core_name][0]  # Copy Eval
            df_new.at[index, "Annotation"] = validated_clips[core_name][1]  # Copy Annotation

    # Save the updated validation file
    df_new.to_csv(new_txt_file, sep='\t', index=False)

    print(f"✔ Updated {new_txt_file} with preserved validation data.")

print("✅ All annotation files updated successfully.")


✔ Updated /mnt/d/retraining_BirdNET_2025/iterative_training/segments_validation/it_1_s1_t0.01/uncertainty/Barn Owl/Barn Owl_Validation.txt with preserved validation data.
✔ Updated /mnt/d/retraining_BirdNET_2025/iterative_training/segments_validation/it_1_s1_t0.01/uncertainty/Black-banded Owl/Black-banded Owl_Validation.txt with preserved validation data.
✔ Updated /mnt/d/retraining_BirdNET_2025/iterative_training/segments_validation/it_1_s1_t0.01/uncertainty/Black-capped Screech-Owl/Black-capped Screech Owl_Validation.txt with preserved validation data.
✔ Updated /mnt/d/retraining_BirdNET_2025/iterative_training/segments_validation/it_1_s1_t0.01/uncertainty/Brown Tinamou/Brown Tinamou_Validation.txt with preserved validation data.
✔ Updated /mnt/d/retraining_BirdNET_2025/iterative_training/segments_validation/it_1_s1_t0.01/uncertainty/Buff-fronted Owl/Buff-fronted Owl_Validation.txt with preserved validation data.
✔ Updated /mnt/d/retraining_BirdNET_2025/iterative_training/segments_va