# Prepare (Glass+Gunshot, 22.05k Mono) / 准备（玻璃+枪声，22.05k单声道)

Load multi-source meta, dedup, filter, resample all audio to 22.05k mono (reuse if exists), then continue folds and listening. 读取多源meta，去重/过滤，重采样为22.05k单声道（复用已存在），再分折与试听。


## Imports / 导入


In [1]:
import json
import wave
from pathlib import Path
import numpy as np
import pandas as pd
import librosa
from IPython.display import Audio, display

from src.config import TARGET_LABELS, SEED, SR, PROJECT_ROOT
from src.meta_utils import (
    load_meta_files,
    map_canonical_labels,
    deduplicate_meta,
    sample_gunshot_even,
    assign_folds_if_missing,
    stratified_folds,
)
from src.data_utils import load_audio



## Config Override / 配置覆写
Set run-time parameters; defaults can move to config later. 设定本次运行参数，后续可迁移到 config。


In [2]:
META_FILES = ['data/meta/esc50.csv', 'data/meta/gunshot_kaggle.csv']
LABEL_MAP = {'glass_breaking': 'glass', 'gunshot': 'gunshot'}
K_FOLDS = 5
RATIOS = {'glass': 3, 'gunshot': 3, 'background': 4}  # weights sum to 10
MAX_DURATION = 6.0  # seconds
GUNSHOT_TARGET = 60  # total gunshot clips after sampling
TARGET_SR = 22050  # force resample target
RESAMPLED_ROOT = Path('cache/data_resampled')


## Load & Dedup / 读取与去重
Map labels, drop duplicates by md5/filepath, summarize. 映射标签，按 md5/路径去重并统计。


In [3]:
meta_df = load_meta_files(META_FILES)
meta_df = map_canonical_labels(meta_df, label_map=LABEL_MAP, target_labels=TARGET_LABELS)
meta_df = deduplicate_meta(meta_df, subset=['md5', 'filepath'])
print('Total rows after dedup:', len(meta_df))
print('By source:', meta_df['source'].value_counts().to_dict())
print('By canonical_label:', meta_df['canonical_label'].value_counts().to_dict())


Total rows after dedup: 2851
By source: {'esc50': 2000, 'gunshot_kaggle': 851}
By canonical_label: {'gunshot': 851, 'snoring': 40, 'rain': 40, 'insects': 40, 'laughing': 40, 'hen': 40, 'engine': 40, 'breathing': 40, 'crying_baby': 40, 'hand_saw': 40, 'coughing': 40, 'glass': 40, 'toilet_flush': 40, 'helicopter': 40, 'pig': 40, 'washing_machine': 40, 'clock_tick': 40, 'sneezing': 40, 'rooster': 40, 'sea_waves': 40, 'siren': 40, 'cat': 40, 'door_wood_creaks': 40, 'crickets': 40, 'drinking_sipping': 40, 'dog': 40, 'chirping_birds': 40, 'pouring_water': 40, 'vacuum_cleaner': 40, 'thunderstorm': 40, 'door_wood_knock': 40, 'can_opening': 40, 'crow': 40, 'clapping': 40, 'fireworks': 40, 'chainsaw': 40, 'airplane': 40, 'mouse_click': 40, 'train': 40, 'car_horn': 40, 'sheep': 40, 'water_drops': 40, 'church_bells': 40, 'clock_alarm': 40, 'keyboard_typing': 40, 'wind': 40, 'footsteps': 40, 'frog': 40, 'cow': 40, 'brushing_teeth': 40, 'crackling_fire': 40}


## Filter & gunshot sampling / 过滤与枪声均匀抽样
- Drop clips longer than MAX_DURATION.
- Evenly sample gunshot to GUNSHOT_TARGET across weapon_id.


In [4]:
clean_df = meta_df[meta_df['duration_sec'] <= MAX_DURATION].copy()
gun_sampled = sample_gunshot_even(clean_df, target_label='gunshot', total=GUNSHOT_TARGET, seed=SEED)
non_gun = clean_df[clean_df['canonical_label'] != 'gunshot']
clean_df = pd.concat([non_gun, gun_sampled], ignore_index=True)
print('After filter & gun sampling:', len(clean_df))
print('Label counts:', clean_df['canonical_label'].value_counts().to_dict())


After filter & gun sampling: 2060
Label counts: {'gunshot': 60, 'snoring': 40, 'rain': 40, 'insects': 40, 'laughing': 40, 'hen': 40, 'engine': 40, 'breathing': 40, 'crying_baby': 40, 'hand_saw': 40, 'coughing': 40, 'glass': 40, 'toilet_flush': 40, 'helicopter': 40, 'pig': 40, 'washing_machine': 40, 'clock_tick': 40, 'sneezing': 40, 'rooster': 40, 'sea_waves': 40, 'siren': 40, 'cat': 40, 'door_wood_creaks': 40, 'crickets': 40, 'drinking_sipping': 40, 'dog': 40, 'chirping_birds': 40, 'pouring_water': 40, 'vacuum_cleaner': 40, 'thunderstorm': 40, 'door_wood_knock': 40, 'can_opening': 40, 'crow': 40, 'clapping': 40, 'fireworks': 40, 'chainsaw': 40, 'airplane': 40, 'mouse_click': 40, 'train': 40, 'car_horn': 40, 'sheep': 40, 'water_drops': 40, 'church_bells': 40, 'clock_alarm': 40, 'keyboard_typing': 40, 'wind': 40, 'footsteps': 40, 'frog': 40, 'cow': 40, 'brushing_teeth': 40, 'crackling_fire': 40}


## Resample to 22.05k mono (reuse if exists) / 重采样22.05k单声道（复用已存在）
For each clip, if resampled file exists under RESAMPLED_ROOT, reuse; otherwise resample and write 16-bit PCM. 若已有重采样文件则直接复用，否则重采样写入。


In [5]:
RESAMPLED_ROOT.mkdir(parents=True, exist_ok=True)
resampled_rows = []
for _, r in clean_df.iterrows():
    rel = Path(r['filepath'])
    if rel.is_absolute():
        rel = rel.relative_to(PROJECT_ROOT)
    dst_path = RESAMPLED_ROOT / rel
    raw_path = PROJECT_ROOT / rel
    if dst_path.exists():
        y_rs, sr_rs = librosa.load(dst_path.as_posix(), sr=TARGET_SR, mono=True)
    else:
        y_rs, _ = librosa.load(raw_path.as_posix(), sr=TARGET_SR, mono=True)
        y_int16 = (y_rs * 32767).clip(-32768, 32767).astype('int16')
        dst_path.parent.mkdir(parents=True, exist_ok=True)
        with wave.open(dst_path.as_posix(), 'wb') as wf:
            wf.setnchannels(1)
            wf.setsampwidth(2)
            wf.setframerate(TARGET_SR)
            wf.writeframes(y_int16.tobytes())
        sr_rs = TARGET_SR
    frames = len(y_rs)
    resampled_rows.append({
        'sno': r.get('sno', len(resampled_rows)+1),
        'filepath': dst_path.as_posix(),
        'label': r.get('label'),
        'canonical_label': r.get('canonical_label'),
        'source': r.get('source'),
        'fold_id': r.get('fold_id',''),
        'duration_sec': round(frames / TARGET_SR, 3),
        'duration_samples': frames,
        'sr': sr_rs,
        'channels': 1,
        'bit_depth': 16,
        'md5': '',
        'extra_meta': r.get('extra_meta',''),
    })
resampled_df = pd.DataFrame(resampled_rows)
print('Resampled rows:', len(resampled_df))
print('sr/ch:', resampled_df['sr'].unique(), resampled_df['channels'].unique())
print('Label counts:', resampled_df['canonical_label'].value_counts().to_dict())


Resampled rows: 2060
sr/ch: [22050] [1]
Label counts: {'gunshot': 60, 'snoring': 40, 'rain': 40, 'insects': 40, 'laughing': 40, 'hen': 40, 'engine': 40, 'breathing': 40, 'crying_baby': 40, 'hand_saw': 40, 'coughing': 40, 'glass': 40, 'toilet_flush': 40, 'helicopter': 40, 'pig': 40, 'washing_machine': 40, 'clock_tick': 40, 'sneezing': 40, 'rooster': 40, 'sea_waves': 40, 'siren': 40, 'cat': 40, 'door_wood_creaks': 40, 'crickets': 40, 'drinking_sipping': 40, 'dog': 40, 'chirping_birds': 40, 'pouring_water': 40, 'vacuum_cleaner': 40, 'thunderstorm': 40, 'door_wood_knock': 40, 'can_opening': 40, 'crow': 40, 'clapping': 40, 'fireworks': 40, 'chainsaw': 40, 'airplane': 40, 'mouse_click': 40, 'train': 40, 'car_horn': 40, 'sheep': 40, 'water_drops': 40, 'church_bells': 40, 'clock_alarm': 40, 'keyboard_typing': 40, 'wind': 40, 'footsteps': 40, 'frog': 40, 'cow': 40, 'brushing_teeth': 40, 'crackling_fire': 40}


## Listen: Before vs After / 试听：前后对比
Randomly pick 5 samples, play original (if exists) vs resampled. 随机抽5条，播放原始（若存在）与重采样音频。


In [6]:
samples = resampled_df.sample(n=min(5, len(resampled_df)), random_state=SEED)
for _, r in samples.iterrows():
    y_after, sr_after = load_audio(r, sr=TARGET_SR)
    raw_path = PROJECT_ROOT / Path(r['filepath']).relative_to('cache/data_resampled')
    y_before, sr_before = (None, None)
    if raw_path.exists():
        y_before, sr_before = load_audio({'filepath': raw_path}, sr=None)
    print('Sample:', r.get('canonical_label',''), '|', raw_path)
    if y_before is not None:
        display(Audio(y_before, rate=sr_before))
    else:
        print('No original audio found')
    display(Audio(y_after, rate=sr_after))
    print('-'*40)


Sample: airplane | /workspace/data/esc50/audio/4-161099-A-47.wav


----------------------------------------
Sample: pig | /workspace/data/esc50/audio/2-166644-A-2.wav


----------------------------------------
Sample: rooster | /workspace/data/esc50/audio/4-164064-B-1.wav


----------------------------------------
Sample: cow | /workspace/data/esc50/audio/3-160993-A-3.wav


----------------------------------------
Sample: sheep | /workspace/data/esc50/audio/1-121951-A-8.wav


----------------------------------------


## Fold assignment / 分配折
Assign folds after resample (hash for missing). 重采样后分折，缺失折用哈希补齐。


In [7]:
# Extract weapon_id for stratification
resampled_df['weapon_id'] = resampled_df['extra_meta'].str.extract(r'weapon_id=([^,]+)', expand=False).fillna('')
folded_df = stratified_folds(resampled_df, k=K_FOLDS, seed=SEED, group_key='canonical_label', sub_key='weapon_id')
fold_counts = folded_df.groupby(['canonical_label','fold_id']).size().reset_index(name='count').pivot(index='canonical_label', columns='fold_id', values='count').fillna(0).astype(int)
fold_counts['total'] = fold_counts.sum(axis=1)
display(fold_counts)


fold_id,1,2,3,4,5,total
canonical_label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
airplane,8,8,8,8,8,40
breathing,8,8,8,8,8,40
brushing_teeth,8,8,8,8,8,40
can_opening,8,8,8,8,8,40
car_horn,8,8,8,8,8,40
cat,8,8,8,8,8,40
chainsaw,8,8,8,8,8,40
chirping_birds,8,8,8,8,8,40
church_bells,8,8,8,8,8,40
clapping,8,8,8,8,8,40


## Next Steps / 后续步骤
- Energy analysis on glass/gunshot for window tuning. 对玻璃/枪声能量分析调窗口。
- Configure window/augment; run smoke/full cache via build_cache_index. 配置窗口/增强，跑 smoke/全量缓存。
- Fold balancing with ratios 3:3:4; print per-fold stats. 按 3:3:4 平衡折分并打印分布。
- QA sampling & wav export; save balanced index for training. QA 抽样导出 wav，保存平衡索引供训练。
