Computes standard phonetic alignment quality metrics by comparing an automatically aligned Praat TextGrid against a human-annotated reference.
| Metric | Description | Key Reference |
|---|---|---|
| Boundary Displacement | Absolute time difference (ms) between each reference boundary and the nearest hypothesis boundary. Reports mean, median, std, and percentage within 10/20/25/50/100 ms thresholds. | McAuliffe et al. (2017) |
| Intersection over Union (IoU) | Temporal overlap between matched segments, computed per phone/word. | Gonzalez et al. (2020) |
| Phone Error Rate (PER) | Levenshtein edit distance between phone label sequences, normalised by reference length. | Standard ASR evaluation |
alignment-eval-project/
├── .vscode/
│ └── launch.json # VS Code debug configurations
├── data/
│ └── examples/ # Demo TextGrid files (auto-generated)
├── outputs/
│ ├── logs/ # Timestamped log files
│ └── reports/ # Evaluation reports (txt + csv)
├── src/
│ ├── __init__.py # Package metadata
│ ├── __main__.py # python -m src entry point
│ ├── main.py # CLI and evaluation pipeline
│ ├── loader.py # TextGrid loading utilities
│ ├── metrics.py # Metric computations
│ ├── reporting.py # Report formatting and file output
│ ├── log_config.py # Logging setup
│ └── demo.py # Demo TextGrid generator
├── tests/ # (placeholder for unit tests)
├── .gitignore
├── requirements.txt
└── README.md
# Clone the repository
git clone <repo-url>
cd alignment-eval-project
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/macOS
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtpython -m src --demoThis generates synthetic reference and hypothesis TextGrid files in data/examples/, runs the evaluation, prints results to the console, and saves reports to outputs/reports/ and logs to outputs/logs/.
# Basic evaluation (phone tier)
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier phones
# Word-level evaluation
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier words
# Save report and CSV to outputs/reports/
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier phones --save
# Exclude silence-adjacent boundaries
python -m src --ref data/my_reference.TextGrid --hyp data/my_hypothesis.TextGrid --tier phones --exclude-silence --saveOpen the project in VS Code and use the debug configurations in .vscode/launch.json:
- Run Demo — runs with built-in example data
- Evaluate Phones — evaluates the example phone tier
- Evaluate Words — evaluates the example word tier
- Evaluate (Pick Files) — prompts you for file paths and tier name
from src.main import evaluate_alignment
results = evaluate_alignment(
ref_path="data/reference.TextGrid",
hyp_path="data/hypothesis.TextGrid",
tier_name="phones",
)
# Access individual metrics
bd = results["boundary_displacement"]
print(f"Median displacement: {bd['median_ms']:.1f} ms")
print(f"Within 25 ms: {bd['pct_within_25ms']:.0f}%")
iou = results["iou"]
print(f"Mean IoU: {iou['mean_iou']:.3f}")
per = results["phone_error_rate"]
print(f"PER: {per['per']:.1%}")- Boundary displacement: MFA typically achieves ~12–17 ms median on English benchmarks (Buckeye, TIMIT). Human inter-annotator agreement is ~10–13 ms. If your aligner is within this range, it is performing at near-human level.
- IoU: 1.0 = perfect overlap. Values above 0.8 are generally good for phone-level alignment.
- PER: 0.0 = perfect label match. Analogous to Word Error Rate (WER) in ASR evaluation.
- McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. Interspeech 2017, 498–502.
- Gonzalez, S., Grama, J., & Travis, C. E. (2020). Comparing the performance of forced aligners used in sociophonetic research. Linguistics Vanguard, 6(1).
- Kelley, M. C., et al. (2023). MAPS: A Mason-Alberta Phonetic Segmenter. Interspeech 2023.
- Rousso, T., et al. (2024). Evaluating forced alignment tools. Speech Communication.
MIT