Skip to content

Fix streaming transcribe#576

Merged
HenryNdubuaku merged 2 commits into
mainfrom
fix-streaming-transcribe
Apr 14, 2026
Merged

Fix streaming transcribe#576
HenryNdubuaku merged 2 commits into
mainfrom
fix-streaming-transcribe

Conversation

@jakmro
Copy link
Copy Markdown
Collaborator

@jakmro jakmro commented Apr 13, 2026

No description provided.

@jakmro jakmro marked this pull request as ready for review April 13, 2026 21:53
Copilot AI review requested due to automatic review settings April 13, 2026 21:53
jakmro added 2 commits April 13, 2026 23:53
Signed-off-by: jakmro <kubamroz124@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds coverage and fixes for streaming speech-to-text behavior, particularly around Parakeet streaming semantics and token piece decoding, to address issues with “streaming transcribe”.

Changes:

  • Add a new stream_transcription test that feeds PCM audio in chunks and consumes confirmed/pending streaming results.
  • Adjust Parakeet model-type handling in cactus_stream_transcribe_process/stop (apply certain Parakeet-specific streaming heuristics only to PARAKEET_TDT).
  • Update SPTokenizer::decode() to better support single-token “piece” decoding (used by streaming/token callbacks).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
tests/test_stt.cpp Adds a new streaming transcription test and wires it into the test runner.
cactus/ffi/cactus_stream.cpp Refines Parakeet streaming branches and stop/flush behavior to fix streaming transcription output.
cactus/engine/engine_sp.cpp Tweaks SentencePiece-style decoding for single-token decode calls used in streaming output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_stt.cpp
size_t word_count = 0;
bool in_word = false;
for (char c : full_transcription) {
if (std::isspace(c)) {
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::isspace(c) is undefined behavior when char is signed and c is negative. Cast to unsigned char (or use a locale-aware approach) before calling std::isspace to avoid UB on non-ASCII bytes.

Suggested change
if (std::isspace(c)) {
if (std::isspace(static_cast<unsigned char>(c))) {

Copilot uses AI. Check for mistakes.
Comment thread tests/test_stt.cpp
Comment on lines +695 to +705
std::cout << "\n[Results]\n"
<< " \"success\": true,\n"
<< " \"total_time_ms\": " << std::fixed << std::setprecision(2) << elapsed << ",\n"
<< " \"audio_chunks\": " << ((pcm_samples.size() + chunk_size - 1) / chunk_size) << ",\n"
<< " \"pcm_samples\": " << pcm_samples.size() << ",\n"
<< " \"duration_sec\": " << std::setprecision(2) << (pcm_samples.size() / 16000.0) << ",\n"
<< " \"words_transcribed\": " << word_count << "\n"
<< "├─ Full transcription: \"" << full_transcription << "\"" << std::endl;

cactus_destroy(model);
return true;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently returns true even if no transcript is produced (e.g., empty full_transcription / word_count == 0 / JSON success false). To actually catch regressions in streaming, add at least one assertion that some confirmed text/words were produced (or that the response JSON indicates success and non-empty output).

Copilot uses AI. Check for mistakes.
Comment thread tests/test_stt.cpp
return false;
}

std::string audio_path = std::string(g_assets_path) + "/test_long.wav";
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_assets_path is used to build audio_path without checking it is set. If CACTUS_TEST_ASSETS is unset this will construct std::string(nullptr) and crash; please add a skip/early-fail similar to other tests in this file that require assets.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_stt.cpp
Comment on lines +619 to +624
fseek(wav_file, 44, SEEK_SET);
std::vector<int16_t> pcm_samples;
int16_t sample;
while (fread(&sample, sizeof(int16_t), 1, wav_file) == 1) {
pcm_samples.push_back(sample);
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WAV parsing assumes a fixed 44-byte header and raw int16 PCM data. This will misread files with extra chunks (e.g., LIST/fact) or non-16bit/mono formats; consider using the existing WAV loader/resampler utilities used in other tests so the input format is validated and converted to 16kHz PCM as needed.

Copilot uses AI. Check for mistakes.
@HenryNdubuaku HenryNdubuaku merged commit 130a60d into main Apr 14, 2026
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants