# Augmentation Playground
증강 파라미터를 실험하고 결과 오디오/스펙트럼을 즉시 확인하기 위한 노트북입니다.
1. 개별 Zeroth 샘플에 노이즈 삽입을 적용하면서 파라미터를 조정해 봅니다.
2. 이미 생성된 `augmented_meta.jsonl`을 로드해 오디오를 재생/분석합니다.

In [None]:
from pathlib import Path
import json
import random
import numpy as np
import pandas as pd
import soundfile as sf
import matplotlib.pyplot as plt
from IPython.display import Audio, display

from src.utils.config_loader import load_yaml
from src.modules.audio_processor import AudioProcessor
from src.modules.noise_selector import NoiseSelector
from src.pipeline.step_02_augment import (
    AugmentationConfig,
    collect_candidates,
    choose_gap,
    uniform_duration,
    process_record,
)

%matplotlib inline
project_root = Path('..').resolve()


In [None]:
config = load_yaml(project_root / 'configs/default_config.yaml')
paths_cfg = config['paths']
synth_cfg = config['synthesis']
print('Loaded configuration.')


In [None]:
split = 'test'  # 'train'으로 변경 가능
alignment_path = (project_root / paths_cfg['label_dir']) / split / 'raw_alignment.jsonl'
records = []
with alignment_path.open('r', encoding='utf-8') as fh:
    for line in fh:
        if not line.strip():
            continue
        rec = json.loads(line)
        records.append(rec)

meta_path = (project_root / 'data/labels') / split / 'augmented_meta.jsonl'
meta = pd.read_json(meta_path, lines=True)
meta_ok = meta[meta['status'] == 'ok'].reset_index(drop=True)
print(f'total records: {len(meta)}, ok: {len(meta_ok)}')
meta_ok.head()


In [None]:
sample_idx = 0  # meta_ok에서 듣고 싶은 index로 수정
selected = meta_ok.loc[sample_idx]
sample_id = selected['sample_id']
record_candidates = [r for r in records if r.get('sample_id') == sample_id]
if not record_candidates:
    raise RuntimeError('선택된 sample_id에 대응하는 alignment 레코드가 없습니다.')
record = record_candidates[0]
print(f'Selected sample_id: {sample_id}, augmented index: {sample_idx}')


In [None]:
custom_params = {
    'min_gap_sec': 0.5,
    'insertion_min_sec': 1.5,
    'insertion_max_sec': 3.0,
    'crossfade_sec': 0.05,
    'context_window_sec': 0.75,
    'target_snr_db': 12.0,
    'loudness_target_lufs': -23.0,
    'true_peak_dbfs': -1.0,
    'insertions_per_file': 1,
    'rng_seed': 2025,
    'noise_categories': None,
}
cfg = AugmentationConfig(**custom_params)
cfg


In [None]:
audio_processor = AudioProcessor(
    sample_rate=16000,
    crossfade_sec=cfg.crossfade_sec,
    loudness_target_lufs=cfg.loudness_target_lufs,
    true_peak_dbfs=cfg.true_peak_dbfs,
    context_window_sec=cfg.context_window_sec,
)
noise_selector = NoiseSelector(
    catalog_path=project_root / paths_cfg['noise_catalog'],
    noise_root=project_root / paths_cfg['noise_dir'],
    rng=random.Random(cfg.rng_seed),
    allow_categories=cfg.noise_categories,
)
rng = random.Random(cfg.rng_seed)
input_audio_root = (project_root / paths_cfg['input_audio_dir']).resolve()
aug_dir = (project_root / paths_cfg['output_dir']) / split
aug_dir.mkdir(parents=True, exist_ok=True)

child_seed = rng.randint(0, 2**31 - 1)
result = process_record(
    record=record,
    audio_processor=audio_processor,
    noise_selector=noise_selector,
    rng=random.Random(child_seed),
    seed=child_seed,
    config=cfg,
    input_audio_root=input_audio_root,
    augmented_audio_dir=aug_dir,
)
result


In [None]:
if result['status'] != 'ok':
    msg = 'Augmentation skipped or failed: {} - {}'.format(result['status'], result.get('error_msg'))
    raise RuntimeError(msg)
orig_path = Path(result['original_audio_path'])
if not orig_path.is_absolute():
    orig_path = (project_root / orig_path).resolve()
aug_path = Path(result['augmented_audio_path'])
if not aug_path.is_absolute():
    aug_path = (project_root / aug_path).resolve()
orig_wave, sr = sf.read(orig_path, dtype='float32')
aug_wave, _ = sf.read(aug_path, dtype='float32')

plt.figure(figsize=(12, 3))
plt.plot(orig_wave)
plt.title('Original waveform')
plt.xlim(0, len(orig_wave))
plt.show()

plt.figure(figsize=(12, 3))
plt.plot(aug_wave, color='orange')
plt.title('Augmented waveform')
plt.xlim(0, len(aug_wave))
plt.show()

display(Audio(orig_wave, rate=sr))
display(Audio(aug_wave, rate=sr))


In [None]:
import pandas as pd
orig_words = record.get('alignment', {}).get('words', [])
if orig_words:
    df_orig = pd.DataFrame(orig_words)
    display(df_orig[['w', 'start', 'end']].rename(columns={'w': 'word'}))
else:
    print('No original alignment data available.')

updated_words = result.get('updated_segments', [])
if updated_words:
    df_updated = pd.DataFrame(updated_words)
    display(df_updated[['w', 'start', 'end']].rename(columns={'w': 'word'}))
else:
    print('No updated segments available (likely skipped).')


## 이미 생성된 증강 결과 살펴보기

In [None]:
meta_ok.head()


In [None]:
row_index = 0  # meta_ok에서 듣고 싶은 index
row = meta_ok.loc[row_index]
aug_path = Path(row['augmented_audio_path'])
if not aug_path.is_absolute():
    aug_path = (project_root / aug_path).resolve()
wave, sr = sf.read(aug_path, dtype='float32')
plt.figure(figsize=(12, 3))
plt.plot(wave)
plt.title('Augmented sample: {}'.format(row['aug_id']))
plt.show()
display(Audio(wave, rate=sr))


In [None]:
# raw alignment for selected sample
alignment_records = [r for r in records if r.get('sample_id') == row['sample_id']]
if alignment_records:
    df_orig = pd.DataFrame(alignment_records[0].get('alignment', {}).get('words', []))
    if not df_orig.empty:
        display(df_orig[['w', 'start', 'end']].rename(columns={'w': 'word'}))
else:
    print('No alignment data for selected sample')
