Keeping normalizer up-to-date with Whisper-normalizer for ASR by nhhoang96 · Pull Request #27 · ServiceNow/AU-Harness

nhhoang96 · 2026-01-13T19:54:41Z

📌 Description

Fix the issue of post-processing before WER computation. There exists a minor deviation of post-processing (within EnglishNormalizer) from the standardized whisper-normalizer. This PR ensures the compatibility with current standard normalizer.

🔗 Related Issue(s)

🛠️ Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality including new tasks)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactor / Code cleanup
Maintenance / Chore / Task
Other (please describe):

✅ How Has This Been Tested?

Specific unit testing on cases where bugs were reported.
Integration testing with re-run Librispeech-test-clean
Unit tests
Integration tests
Manual testing

Test Results / Screenshots (if applicable):

📸 Screenshots / Demos

📋 Checklist

Code follows project style guidelines
Tests have been added/updated (if applicable)
Documentation has been updated (if applicable)
Linked relevant issue(s)
Self-reviewed my code

🙌 Additional Notes

akshaykalkunte

The problem was that

r"(\w+)'m\b": "\1 am" is incorrect. r"(\w+)'m\b": "\\1 am" was the correct usage.

Using the normalizers from Whisper without any changes is a good idea. So the changes look good to me.

* add gpqa diamond * Update constants.py (#18) * updating turn handling for multi-turn evals * feat: Add Gemini support (#15) * add spokenwoz speech and text (#24) * add vllm configs and readme (#21) * added phonetics, speech_disorder, and speech_enhancement tasks - stil… (#22) * added phonetics, speech_disorder, and speech_enhancement tasks - still in need of full model scoring. Fixed small inconsistency bug in config by changing judge_properties to judge_settings. * Update the correct HF path for noise_detection task * updated scores --------- Co-authored-by: hoang <huuhoang.nguyen@servicenow.com> * voxtral and phi4 guidance (#25) * Keeping normalizer up-to-date with Whisper-normalizer for ASR (#27) * add gpqa diamond --------- Co-authored-by: oluwanifemibamgbose <oluwanifemi.bamgbose@servicenow.com> Co-authored-by: khyatimahajan <khyati.mahajan@servicenow.com> Co-authored-by: Khyati Mahajan <mahajan.khyati@gmail.com> Co-authored-by: Akshay Kalkunte <akshay.kalkunte@servicenow.com> Co-authored-by: Jash Mehta <jash.mehta@servicenow.com> Co-authored-by: Sidharth Surapaneni <40740959+pcsid@users.noreply.github.com> Co-authored-by: hoang <huuhoang.nguyen@servicenow.com> Co-authored-by: hoang <hnguy7@uic.edu>

Keeping normalizer up-to-date with Whisper-normalizer for ASR

d6b887a

nhhoang96 requested review from akshaykalkunte and pcsid January 13, 2026 19:58

akshaykalkunte approved these changes Jan 13, 2026

View reviewed changes

akshaykalkunte merged commit daa0616 into main Jan 13, 2026

akshaykalkunte deleted the bug/wer-normalizer branch January 13, 2026 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keeping normalizer up-to-date with Whisper-normalizer for ASR#27

Keeping normalizer up-to-date with Whisper-normalizer for ASR#27
akshaykalkunte merged 1 commit intomainfrom
bug/wer-normalizer

nhhoang96 commented Jan 13, 2026

Uh oh!

akshaykalkunte left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nhhoang96 commented Jan 13, 2026

📌 Description

🔗 Related Issue(s)

🛠️ Type of Change

✅ How Has This Been Tested?

📸 Screenshots / Demos

📋 Checklist

🙌 Additional Notes

Uh oh!

akshaykalkunte left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akshaykalkunte left a comment •

edited

Loading