Fix streaming transcribe by jakmro · Pull Request #576 · cactus-compute/cactus

jakmro · 2026-04-13T21:32:35Z

No description provided.

Signed-off-by: jakmro <kubamroz124@gmail.com>

Copilot

Pull request overview

Adds coverage and fixes for streaming speech-to-text behavior, particularly around Parakeet streaming semantics and token piece decoding, to address issues with “streaming transcribe”.

Changes:

Add a new stream_transcription test that feeds PCM audio in chunks and consumes confirmed/pending streaming results.
Adjust Parakeet model-type handling in cactus_stream_transcribe_process/stop (apply certain Parakeet-specific streaming heuristics only to PARAKEET_TDT).
Update SPTokenizer::decode() to better support single-token “piece” decoding (used by streaming/token callbacks).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
tests/test_stt.cpp	Adds a new streaming transcription test and wires it into the test runner.
cactus/ffi/cactus_stream.cpp	Refines Parakeet streaming branches and stop/flush behavior to fix streaming transcription output.
cactus/engine/engine_sp.cpp	Tweaks SentencePiece-style decoding for single-token decode calls used in streaming output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T21:58:24Z

+    size_t word_count = 0;
+    bool in_word = false;
+    for (char c : full_transcription) {
+        if (std::isspace(c)) {


std::isspace(c) is undefined behavior when char is signed and c is negative. Cast to unsigned char (or use a locale-aware approach) before calling std::isspace to avoid UB on non-ASCII bytes.

Suggested change

if (std::isspace(c)) {

if (std::isspace(static_cast<unsigned char>(c))) {

Copilot · 2026-04-13T21:58:24Z

+    std::cout << "\n[Results]\n"
+              << "  \"success\": true,\n"
+              << "  \"total_time_ms\": " << std::fixed << std::setprecision(2) << elapsed << ",\n"
+              << "  \"audio_chunks\": " << ((pcm_samples.size() + chunk_size - 1) / chunk_size) << ",\n"
+              << "  \"pcm_samples\": " << pcm_samples.size() << ",\n"
+              << "  \"duration_sec\": " << std::setprecision(2) << (pcm_samples.size() / 16000.0) << ",\n"
+              << "  \"words_transcribed\": " << word_count << "\n"
+              << "├─ Full transcription: \"" << full_transcription << "\"" << std::endl;
+
+    cactus_destroy(model);
+    return true;


This test currently returns true even if no transcript is produced (e.g., empty full_transcription / word_count == 0 / JSON success false). To actually catch regressions in streaming, add at least one assertion that some confirmed text/words were produced (or that the response JSON indicates success and non-empty output).

Copilot · 2026-04-13T21:58:25Z

+        return false;
+    }
+
+    std::string audio_path = std::string(g_assets_path) + "/test_long.wav";


g_assets_path is used to build audio_path without checking it is set. If CACTUS_TEST_ASSETS is unset this will construct std::string(nullptr) and crash; please add a skip/early-fail similar to other tests in this file that require assets.

Copilot · 2026-04-13T21:58:25Z

+    fseek(wav_file, 44, SEEK_SET);
+    std::vector<int16_t> pcm_samples;
+    int16_t sample;
+    while (fread(&sample, sizeof(int16_t), 1, wav_file) == 1) {
+        pcm_samples.push_back(sample);
+    }


The WAV parsing assumes a fixed 44-byte header and raw int16 PCM data. This will misread files with extra chunks (e.g., LIST/fact) or non-16bit/mono formats; consider using the existing WAV loader/resampler utilities used in other tests so the input format is validated and converted to 16kHz PCM as needed.

jakmro marked this pull request as ready for review April 13, 2026 21:53

Copilot AI review requested due to automatic review settings April 13, 2026 21:53

jakmro added 2 commits April 13, 2026 23:53

fix parakeet ctc stream implementation and add stream transcription test

9d09d3c

Signed-off-by: jakmro <kubamroz124@gmail.com>

preserve leading space in single-token SentencePiece decode

c32926b

Signed-off-by: jakmro <kubamroz124@gmail.com>

jakmro force-pushed the fix-streaming-transcribe branch from d2f3aa3 to c32926b Compare April 13, 2026 21:53

Copilot started reviewing on behalf of jakmro April 13, 2026 21:53 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

HenryNdubuaku merged commit 130a60d into main Apr 14, 2026
7 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix streaming transcribe#576

Fix streaming transcribe#576
HenryNdubuaku merged 2 commits into
mainfrom
fix-streaming-transcribe

jakmro commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if (std::isspace(c)) {
	if (std::isspace(static_cast<unsigned char>(c))) {

Conversation

jakmro commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants