Skip to content

[CHORE] Voxtral and Phi4 ASR guidance#25

Merged
akshaykalkunte merged 1 commit intomainfrom
scratch/voxtral_phi4_guide
Jan 9, 2026
Merged

[CHORE] Voxtral and Phi4 ASR guidance#25
akshaykalkunte merged 1 commit intomainfrom
scratch/voxtral_phi4_guide

Conversation

@akshaykalkunte
Copy link
Copy Markdown
Collaborator

@akshaykalkunte akshaykalkunte commented Jan 9, 2026

📌 Description

  • Adds kubernetes config to launch Phi4-multimodal and Voxtral vLLM end-points.
  • Adds run configs for Phi4-multimodal and Voxtral models for librispeech ASR task.
  • Updates Readme for vLLM server launching.

🔗 Related Issue(s)

NA

🛠️ Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality including new tasks)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor / Code cleanup
  • Maintenance / Chore / Task
  • Other (please describe):

✅ How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Manual testing

Test Results / Screenshots (if applicable):

Phi4-multimodal score for Librispeech Test Clean

Run WER
Open ASR Leaderboard 1.69
Ours 1.74

Voxtral-mini-3b score for Librispeech Test Clean

Run WER
Voxtral Paper 1.86
Ours 2.0

📸 Screenshots / Demos

📋 Checklist

  • Code follows project style guidelines
  • Tests have been added/updated (if applicable)
  • Documentation has been updated (if applicable)
  • Linked relevant issue(s)
  • Self-reviewed my code

🙌 Additional Notes

Copy link
Copy Markdown
Collaborator

@nhhoang96 nhhoang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akshaykalkunte akshaykalkunte merged commit 99ac7bc into main Jan 9, 2026
@akshaykalkunte akshaykalkunte deleted the scratch/voxtral_phi4_guide branch January 9, 2026 03:06
nhhoang96 added a commit that referenced this pull request Apr 18, 2026
* add gpqa diamond

* Update constants.py (#18)

* updating turn handling for multi-turn evals

* feat: Add Gemini support (#15)

* add spokenwoz speech and text (#24)

* add vllm configs and readme (#21)

* added phonetics, speech_disorder, and speech_enhancement tasks - stil… (#22)

* added phonetics, speech_disorder, and speech_enhancement tasks - still in need of full model scoring. Fixed small inconsistency bug in config by changing judge_properties to judge_settings.

* Update the correct HF path for noise_detection task

* updated scores

---------

Co-authored-by: hoang <huuhoang.nguyen@servicenow.com>

* voxtral and phi4 guidance (#25)

* Keeping normalizer up-to-date with Whisper-normalizer for ASR (#27)

* add gpqa diamond

---------

Co-authored-by: oluwanifemibamgbose <oluwanifemi.bamgbose@servicenow.com>
Co-authored-by: khyatimahajan <khyati.mahajan@servicenow.com>
Co-authored-by: Khyati Mahajan <mahajan.khyati@gmail.com>
Co-authored-by: Akshay Kalkunte <akshay.kalkunte@servicenow.com>
Co-authored-by: Jash Mehta <jash.mehta@servicenow.com>
Co-authored-by: Sidharth Surapaneni <40740959+pcsid@users.noreply.github.com>
Co-authored-by: hoang <huuhoang.nguyen@servicenow.com>
Co-authored-by: hoang <hnguy7@uic.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants