-
Notifications
You must be signed in to change notification settings - Fork 160
fix(minicpmo): fix minicpmo tokenization logic & streaming generation #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds MiniCPM4 model support and example, new MiniCPMO audio/Whisper processing and refactors its examples, implements 1D AvgPool across nn/aops/backend with CPU implementation and factory registration, updates various build targets and minor fixes. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant App as MiniCPM4 App
participant Tokenizer as MiniCPM4Tokenizer
participant Model as MiniCPM4ForCausalLM
participant Backend as CPU Backend
User->>App: Run with model/tokenizer/config paths
App->>Tokenizer: Init tokenizer
App->>Model: Init model & load weights
loop chat
App->>Tokenizer: encode(prompt)
Tokenizer-->>App: token IDs
App->>Model: forward(input_ids)
Model->>Backend: compute attention/MLP (RoPE)
Backend-->>Model: logits
Model-->>App: token id (stream)
App->>Tokenizer: decode(id)
Tokenizer-->>App: text
end
App->>App: perf & memory report
sequenceDiagram
participant User as User/CLI
participant App as MiniCPMO Audio Infer
participant AudioProc as Audio Processor
participant Tokenizer as MiniCPMOTokenizer
participant Model as MiniCPMOForCausalLM
User->>App: Run with audio_path + model
App->>AudioProc: processAudio(file) -> mel
AudioProc-->>App: mel spectrogram
App->>Tokenizer: build message with audio placeholder
Tokenizer->>Tokenizer: replace placeholder -> audio token sequence
Tokenizer-->>App: input tensors + audio_bounds
App->>Model: forward(inputs, audio_features, audio_bounds)
Model->>Model: encode_audio -> audio embeddings
Model->>Model: merge_audio_text_embeddings
Model-->>App: generated tokens (stream)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (22)
mllm/core/aops/AvgPool1dOp.cpp (2)
15-43: Validate pooling hyperparameters and guard against invalid output lengths
reshapeassumeskernel_size > 0,stride > 0, and yields a positivepooled_length. With malformed options (e.g.,stride == 0,kernel_size <= 0, or extremekernel_sizevs(length + 2 * padding)inceil_mode), this can produce division-by-zero or non‑positive output sizes and eventually invalid tensor shapes.Consider validating the options and failing fast before computing
pooled_length, and also checking that the computedpooled_lengthis positive (or at least non‑negative) before allocating the output:const int kernel_size = options_.kernel_size; const int stride = options_.stride; const int padding = options_.padding; + + if (kernel_size <= 0 || stride <= 0) { + MLLM_ERROR_EXIT(ExitCode::kCoreError, + "AvgPool1dOp expects kernel_size > 0 and stride > 0, got kernel_size={}, stride={}", + kernel_size, stride); + } + int pooled_length; if (options_.ceil_mode) { pooled_length = static_cast<int>(std::ceil(static_cast<float>(length + 2 * padding - kernel_size) / stride)) + 1; } else { pooled_length = ((length + 2 * padding - kernel_size) / stride) + 1; } + + if (pooled_length <= 0) { + MLLM_ERROR_EXIT(ExitCode::kCoreError, + "AvgPool1dOp computed non-positive output length: {}", pooled_length); + }This keeps failures explicit and avoids undefined behavior from bad configs.
49-51: Make the baseforwardfail fast instead of silently doing nothingRight now
AvgPool1dOp::forwardis a no‑op. If a caller accidentally instantiates or dispatches through this base op instead of a backend-specific subclass (likeCPUAvgPool1dOp), the call will silently succeed while producing no output.Consider turning this into an explicit error so mis-wired backends are caught immediately:
void AvgPool1dOp::forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) { - // Forward will be dispatched to backend implementation + MLLM_ERROR_EXIT(ExitCode::kCoreError, + "AvgPool1dOp::forward must be implemented by a backend-specific subclass"); }Alternatively, you could make
forwardpure virtual in the header if that fits the existing op hierarchy.mllm/backends/cpu/ops/AvgPool1dOp.cpp (1)
12-36: Align dtype handling with pointer access and add light sanity checksThe forward path looks correct overall, but a couple of small robustness tweaks would make it safer:
input.ptr<float>()/output.ptr<float>()are called before theswitch (output.dtype()). Ifptr<T>()validates dtype internally, this may fail before reaching theNYIpath for unsupported dtypes.- There’s no explicit assertion that
input.dtype() == output.dtype() == kFloat32, which is what the implementation assumes.You can tighten this up by moving pointer acquisition inside the
kFloat32case and asserting dtype consistency:- auto input_ptr = input.ptr<float>(); - auto output_ptr = output.ptr<float>(); - - std::fill_n(output_ptr, output.numel(), 0.0f); - - int total_iterations = batch * channels * pooled_length; - - switch (output.dtype()) { - case kFloat32: - MLLM_CONDITIONAL_PARALLEL_FOR(options_.getThreads() > 1, 4, idx, 0, total_iterations, 1, { + int total_iterations = batch * channels * pooled_length; + + switch (output.dtype()) { + case kFloat32: { + MLLM_ASSERT(input.dtype() == output.dtype(), + "AvgPool1d: input/output dtype mismatch: {} vs {}", input.dtype(), output.dtype()); + auto input_ptr = input.ptr<float>(); + auto output_ptr = output.ptr<float>(); + std::fill_n(output_ptr, output.numel(), 0.0f); + + MLLM_CONDITIONAL_PARALLEL_FOR(options_.getThreads() > 1, 4, idx, 0, total_iterations, 1, { // Decode 1D index to 3D indices int b = idx / (channels * pooled_length); int c = (idx % (channels * pooled_length)) / pooled_length; @@ if (count > 0) { output_ptr[output_idx] = sum / static_cast<float>(count); } - }); - break; + }); + break; + }This keeps unsupported dtypes routed cleanly to the
NYIcase and ensures the float-specific path only runs under the expected conditions.examples/minicpm4/main.cpp (1)
55-59: Designated initializer requires C++20; consider more portable initialization
convertMessage({.prompt = prompt_text});uses a C++20 designated initializer syntax forMiniCPM4Message. If the project is still built with C++17 (or older), this will fail to compile.If you need C++17 compatibility, you can instead write:
- auto inputs = minicpm4_tokenizer.convertMessage({.prompt = prompt_text}); + MiniCPM4Message msg; + msg.prompt = prompt_text; + auto inputs = minicpm4_tokenizer.convertMessage(msg);Please confirm the project’s C++ standard setting and adjust accordingly.
mllm/models/minicpm4/tokenization_minicpm4.hpp (3)
13-16: Makemessage_templateconst and guard the placeholder replacement
MiniCPM4Message::message_templateis a mutablestatic inline std::string, andconvertMessageassumes the{{{prompt}}}placeholder is always present:auto applied_string = MiniCPM4Message::message_template; size_t pos = applied_string.find("{{{prompt}}}"); applied_string.replace(pos, 12, message.prompt);Two suggestions:
- Declare the template as const to avoid accidental runtime mutation:
- static inline std::string message_template = "..."; + static inline const std::string message_template = "...";
- Add a simple guard in case the placeholder is ever edited or localized:
auto applied_string = MiniCPM4Message::message_template; const std::string placeholder = "{{{prompt}}}"; auto pos = applied_string.find(placeholder); if (pos == std::string::npos) { // Fallback: append prompt or throw/log. applied_string += message.prompt; } else { applied_string.replace(pos, placeholder.size(), message.prompt); }This makes the tokenizer more robust to future template changes.
Also applies to: 112-121
52-58: Avoid duplicating encode logic inconvertMessageBoth
encodeandconvertMessageperform the same pattern oftokenize(...)followed bybpe_._lookup_vocab:// encode auto tokens = tokenize(str); for (const auto& token : tokens) { ids.push_back(bpe_._lookup_vocab(token)); } // convertMessage auto tokens = tokenize(applied_string); for (const auto& token : tokens) { ids.push_back(bpe_._lookup_vocab(token)); }To keep behavior consistent and easier to maintain, consider reusing
encodeinsideconvertMessage:- auto tokens = tokenize(applied_string); - - std::vector<int64_t> ids; - ids.reserve(tokens.size()); - for (const auto& token : tokens) { ids.push_back(bpe_._lookup_vocab(token)); } + std::vector<int64_t> ids = encode(applied_string);This way, any future changes to encoding/tokenization only need to be made in one place.
Also applies to: 119-127
34-50: Normalize/decode logic is coherent; consider small cleanup and commentsThe handling of the SentencePiece underscore (
▁) innormalize,decode,decode_token, anddecode_stringis consistent and should round‑trip text correctly. To make future maintenance easier:
- Add a brief comment near the hard‑coded
"\xE2\x96\x81"explaining it’s the UTF‑8 encoding of▁used by SentencePiece.- Optionally factor the repeated “strip leading
▁and replace remaining with spaces” logic into a small helper used by bothdecodeanddecode_string.These are non‑blocking polish items; the current implementation is functionally sound.
Also applies to: 60-80, 135-170
mllm/models/minicpm4/modeling_minicpm4.hpp (3)
37-84: RoPE embedding computation looks correct but is O(B * S * D); consider reuse if perf becomes an issueThe rotary embedding construction (freqs + duplicated sin/cos) is structurally correct for per‑head RoPE and matches expected shapes. It does, however, recompute the full
[B, S, dim]embeddings every forward.If you later profile MiniCPM4 and find RoPE dominating CPU time for long sequences, it may be worth:
- Caching
inv_freq/freqsper sequence length, or- Vectorizing the loops more aggressively.
Fine for now; just something to be aware of.
268-274: Layer index wiring for KV cache is correct but brittle to external constructionAssigning:
for (auto [idx, b] : enumerate(decode_blocks_.list())) { b.self_attn_.layer_idx_ = idx; }correctly sets the per‑layer index used by
StaticCache::updateKVCache. However, becauselayer_idx_is public and only set here, creatingMiniCPM4Attentioninstances elsewhere without doing this will break the cache logic.You may want to:
- Make
layer_idx_private with a setter, or- Pass
layer_idxintoMiniCPM4Attention’s constructor and initialize it there.Not a blocker, but it would reduce footguns.
331-349: Position id handling assumes seq_len == 1 for decode and shared lengths across batchIn the decode path:
if (input.count("position_ids")) { position_ids = input.at("position_ids"); if (seq_len == 1) { auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1}); position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc(); *position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1; } }this:
- Only updates a single scalar position for batch 0, and
- Assumes decode is always called with
seq_len == 1.If your generation loop ever supports:
- Multiple tokens per decode step, or
- Different sequence lengths per batch element,
this logic will not generalize correctly.
If the intended contract is “decode is always called with a single new token per batch and synchronized lengths”, it would help to either assert this or extend the logic to compute per‑batch positions (e.g., reading last_pos per batch row).
examples/minicpm_o/main.cpp (1)
131-132: Perf report filename is misleadingThe perf report is saved as
"minicpm4.perf"even though this executable ismain_minicpmo. Renaming the report (e.g., to"minicpmo_omni.perf") would avoid confusion when inspecting traces from multiple examples.- mllm::perf::saveReport("minicpm4.perf"); + mllm::perf::saveReport("minicpmo_omni.perf");examples/minicpm_o/audio_infer.cpp (1)
46-51: Fail fast on unsupportedmodel_versionvaluesRight now, any unexpected
--model_versionsilently falls back tokV1. It’s safer to validate and exit with an error when the flag is wrong so users immediately know they passed an invalid value.- mllm::ModelFileVersion file_version = mllm::ModelFileVersion::kV1; - if (model_version.get() == "v1") { - file_version = mllm::ModelFileVersion::kV1; - } else if (model_version.get() == "v2") { - file_version = mllm::ModelFileVersion::kV2; - } + mllm::ModelFileVersion file_version; + if (model_version.get() == "v1") { + file_version = mllm::ModelFileVersion::kV1; + } else if (model_version.get() == "v2") { + file_version = mllm::ModelFileVersion::kV2; + } else { + fmt::print("Unsupported model_version '{}'; expected 'v1' or 'v2'.\n", model_version.get()); + mllm::shutdownContext(); + return 1; + }examples/minicpm_o/image_infer.cpp (1)
28-39: Align perf tracing start/stop and avoid starting it on--helpWith
MLLM_PERFETTO_ENABLE,mllm::perf::start()is called before checking the--helpflag, butperf::stop()/saveReport()only run in the execution block. If a user invokes--help, perf is started but never stopped or reported. It’s cleaner to start perf only when we’re actually running the model:- Argparse::parse(argc, argv); - -#ifdef MLLM_PERFETTO_ENABLE - mllm::perf::start(); -#endif - - mllm::ModelFileVersion file_version = mllm::ModelFileVersion::kV1; + Argparse::parse(argc, argv); + + mllm::ModelFileVersion file_version = mllm::ModelFileVersion::kV1; @@ - if (help.isSet()) { + if (help.isSet()) { Argparse::printHelp(); mllm::shutdownContext(); return 0; } - { + { +#ifdef MLLM_PERFETTO_ENABLE + mllm::perf::start(); +#endif @@ -#ifdef MLLM_PERFETTO_ENABLE - mllm::perf::stop(); - mllm::perf::saveReport("minicpmo.perf"); -#endif +#ifdef MLLM_PERFETTO_ENABLE + mllm::perf::stop(); + mllm::perf::saveReport("minicpmo.perf"); +#endifexamples/minicpm_o/CMakeLists.txt (1)
3-15: CMake wiring for the three MiniCPMO examples looks consistentThe three executables (
main_image,main_audio,main_minicpmo) are all linked againstMllmRTandMllmCPUBackendwith the shared include directory, which matches how the corresponding example sources are structured. If this list grows further, you could consider a small helper function to DRY up the repeatedtarget_link_libraries/target_include_directoriescalls, but it’s not necessary right now.mllm/models/minicpm_o2_6/audio_preprocessor_minicpmo.hpp (3)
167-208: Verify STFT → mel matmul and dynamic range steps for multi‑dimensional inputs
mel_specgram = nn::functional::matmul(specgram.T(), melscale_fbanks_).T();assumes a particular STFT output layout (effectively 2D[freq_bins, frames]). Ifnn::STFTreturns a 3D tensor like[B, freq_bins, frames](which is typical), the semantics of.T()andmatmulbroadcasting need to be double‑checked to ensure you end up with[B, n_mels, n_frames]as intended.Similarly,
auto max_val = mel_specgram.max();followed by a globalthresholdapplies dynamic range compression over the entire tensor rather than per‑sample. IfB > 1, this differs from the usual per‑example normalization.I recommend:
- Verifying the actual shapes from
stft_at runtime (or docs) and, if needed, making the mel transform explicit, e.g.mel_specgram = nn::functional::matmul(melscale_fbanks_.T(), specgram)or a batched[B, T, F]→[B, T, M]transform.- Clarifying whether Whisper here is expected to use a per‑example max (e.g., over
[freq, time]for each batch element) and adjustingmax()reduction accordingly.Right now this is shape‑ and reduction‑sensitive logic that could silently misbehave if the STFT layout differs from what you assumed.
228-265: Placeholder length may not match encoded audio length due to fixed padding inprocessAudioData
getAudioPlaceholdercomputesoutput_lensfromaudio_lengthvia hop length and two downsampling stages. However,processAudioDataalways pads or truncates ton_samples_ = chunk_length * sample_rate_before STFT, so the encoder input length is effectively fixed (30s @ 16 kHz by default), independent ofaudio_length.This can lead to:
output_lens< actualaudio_seq_lenfromencode_audio, truncating embeddings when merging into text; or- A mismatch with how placeholders were generated during training if the model expects a fixed number of
<unk>tokens per 30s chunk.Consider either:
- Basing
output_lensonn_samples_instead ofaudio_length(ormin(audio_length, n_samples_)), and/or- Returning the actual encoded length from the audio pipeline and using that to derive
unkcount.Also, the chunked path (
chunk_input == true) uses a hardcodedfbank_feat_in_chunk = chunk_length * 100, which assumes 100 frames/sec; this should stay in sync withhop_length_and your actual STFT config.
268-325: Minor: factor shared file/data processing and include missing headers explicitly
processAudioandprocessAudioDataduplicate padding/truncation and mel extraction logic. Not a bug, but you could reduce maintenance overhead by having the file‑based path reuse the raw‑data path.Also, since this header uses
std::numeric_limits<float>::max()andstd::memcpy, it would be safer to explicitly include<limits>and<cstring>here instead of relying on transitive includes.mllm/models/minicpm_o2_6/modeling_minicpmo.hpp (2)
242-266: Remove or gate noisy debug prints inencode_audio
encode_audiocurrently callsmllm::printmultiple times to dump tensor shapes (“finish apm!”, “before audio avg pooler!”, etc.). These are useful while debugging but will be noisy in production and can slow down inference.Consider either removing them or guarding behind a debug flag / log level:
- mllm::print("finish apm!, tensor shape:", audio_states.shape()); + // mllm::print("finish apm!, tensor shape:", audio_states.shape()); ... - mllm::print("final audio embeds!, tensor shape:", audio_embeds.shape()); + // mllm::print("final audio embeds!, tensor shape:", audio_embeds.shape());or similar conditional logging.
301-325: Batching assumptions inmerge_audio_text_embeddings
merge_audio_text_embeddingsassumes:
audio_embeddings.shape()[0]equals the number of audio segments (num_bounds), and- Each row
audio_embeddings[bound_idx, ...]aligns one‑to‑one withaudio_bounds[bound_idx].This matches the current tokenizer path (single audio segment →
audio_featureswith batch size 1 → one bounds pair), but will need care if you later support multiple audio clips per request (e.g.,MiniCPMOInput::audio_paths).Not an immediate bug, but consider documenting this assumption or asserting
num_bounds == audio_embeddings.shape()[0]to catch future mismatches early.mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp (3)
163-186:buildChatMessagenow emits a consistent assistant preambleThe updated
buildChatMessagealways appends:result += "<|im_start|>assistant\n"; if (gen_audio) { result += "<|spk_bos|><|spk|><|spk_eos|><|tts_bos|>"; }after the user segment, which is aligned with your audio‑generation flow and the special tokens registered in the trie.
Note that the
has_imgandhas_audioparameters are currently unused inside this method; they’re redundant given you readimg_file_path/audio_file_pathdirectly. Optional: either wire them into the logic (e.g., to support multi‑image/audio variants) or drop them from the signature to avoid confusion.
318-419: Image+audio are now processed in two clean phases; only single audio placeholder is supportedThe refactored
convertMessage:
- Processes image/audio first (Phase 1),
- Replaces image placeholders, then audio placeholders, in the same
applied_string(Phases 2–3),- Tokenizes once and computes image/audio bounds on the final sequence, and
- Passes everything to
convertToTensors.This fixes the previous limitation where image and audio could not be encoded together; the current ordering preserves the audio placeholder across the image‑replacement step, so both modalities co‑exist correctly.
Two minor limitations to be aware of:
Only the first occurrence of
"(<audio>./</audio>)"is replaced:size_t audio_placeholder_pos = applied_string.find("(<audio>./</audio>)"); if (audio_placeholder_pos != std::string::npos) { applied_string.replace(audio_placeholder_pos, 19, audio_placeholder); }If you ever support multiple audio segments per prompt, this should loop over all matches.
The hard‑coded length
19is brittle; better to derive it from the placeholder literal:const std::string kAudioTag = "(./)";
- constexpr char kAudioTag[] = "(./)";
...
- size_t audio_placeholder_pos = applied_string.find("(./)");
- if (audio_placeholder_pos != std::string::npos) { applied_string.replace(audio_placeholder_pos, 19, audio_placeholder); }
- size_t audio_placeholder_pos = applied_string.find(kAudioTag);
- if (audio_placeholder_pos != std::string::npos) {
- applied_string.replace(audio_placeholder_pos, std::strlen(kAudioTag), audio_placeholder);
- }
Functionally this addresses the image+audio coexistence issue; the above are robustness improvements. --- `524-526`: **Audio preprocessor member is correctly introduced; consider future configurability** Adding: ```cpp // For audio only. MiniCPMOAudioProcessor audio_preprocessor_;to the tokenizer is a good encapsulation of audio‑specific logic. Right now it’s constructed with fixed Whisper defaults (16kHz, 80 mel bins, hop 160); if you foresee running variants of MiniCPMO with different audio front‑ends, you may eventually want to pass these from config or the constructor instead of hard‑coding.
For the current MiniCPMO‑2.6 setup, this is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
examples/CMakeLists.txt(1 hunks)examples/minicpm4/CMakeLists.txt(1 hunks)examples/minicpm4/config_minicpm4.json(1 hunks)examples/minicpm4/main.cpp(1 hunks)examples/minicpm_o/CMakeLists.txt(1 hunks)examples/minicpm_o/audio_infer.cpp(1 hunks)examples/minicpm_o/image_infer.cpp(1 hunks)examples/minicpm_o/main.cpp(1 hunks)mllm/backends/cpu/CPUBackend.cpp(2 hunks)mllm/backends/cpu/ops/AvgPool1dOp.cpp(1 hunks)mllm/backends/cpu/ops/AvgPool1dOp.hpp(1 hunks)mllm/core/OpTypes.hpp(2 hunks)mllm/core/aops/AvgPool1dOp.cpp(1 hunks)mllm/core/aops/AvgPool1dOp.hpp(1 hunks)mllm/models/ARGeneration.cpp(2 hunks)mllm/models/minicpm4/configuration_minicpm4.hpp(1 hunks)mllm/models/minicpm4/modeling_minicpm4.hpp(1 hunks)mllm/models/minicpm4/tokenization_minicpm4.hpp(1 hunks)mllm/models/minicpm_o2_6/audio_preprocessor_minicpmo.hpp(1 hunks)mllm/models/minicpm_o2_6/configuration_minicpm_o2_6.hpp(0 hunks)mllm/models/minicpm_o2_6/modeling_minicpmo.hpp(8 hunks)mllm/models/minicpm_o2_6/modeling_whisper_encoder.hpp(1 hunks)mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp(10 hunks)mllm/models/vocos/modeling_vocos.hpp(1 hunks)mllm/nn/Nn.hpp(1 hunks)mllm/nn/layers/AvgPool1d.cpp(1 hunks)mllm/nn/layers/AvgPool1d.hpp(1 hunks)
💤 Files with no reviewable changes (1)
- mllm/models/minicpm_o2_6/configuration_minicpm_o2_6.hpp
🧰 Additional context used
🧬 Code graph analysis (9)
mllm/models/ARGeneration.cpp (2)
mllm/core/Tensor.cpp (4)
device(351-351)device(351-351)empty(70-74)empty(70-70)mllm/core/Tensor.hpp (1)
device(395-395)
mllm/nn/layers/AvgPool1d.hpp (1)
mllm/backends/cpu/ops/AvgPool1dOp.hpp (2)
options(20-22)options(20-20)
mllm/nn/layers/AvgPool1d.cpp (2)
mllm/nn/layers/AvgPool1d.hpp (3)
AvgPool1d(11-11)AvgPool1d(13-14)AvgPool1d(16-16)mllm/backends/cpu/ops/AvgPool1dOp.hpp (2)
options(20-22)options(20-20)
mllm/backends/cpu/ops/AvgPool1dOp.cpp (3)
mllm/backends/cpu/ops/AvgPool1dOp.hpp (4)
CPUAvgPool1dOp(13-13)options(20-22)options(20-20)inputs(15-15)mllm/core/aops/AvgPool1dOp.cpp (2)
forward(49-51)forward(49-49)mllm/core/aops/AvgPool1dOp.hpp (3)
inputs(24-24)inputs(26-26)inputs(28-28)
mllm/backends/cpu/ops/AvgPool1dOp.hpp (1)
mllm/core/aops/AvgPool1dOp.hpp (3)
inputs(24-24)inputs(26-26)inputs(28-28)
mllm/core/aops/AvgPool1dOp.hpp (1)
mllm/backends/cpu/ops/AvgPool1dOp.hpp (3)
options(20-22)options(20-20)inputs(15-15)
examples/minicpm_o/main.cpp (7)
mllm/models/minicpm_o2_6/modeling_minicpmo.hpp (10)
config(140-154)config(140-140)inputs(33-39)inputs(33-33)inputs(56-58)inputs(56-56)inputs(75-81)inputs(75-75)inputs(161-240)inputs(161-161)mllm/models/minicpm_o2_6/configuration_minicpmo.hpp (3)
MiniCPMOConfig(11-11)MiniCPMOConfig(13-54)MiniCPMOConfig(13-13)mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp (2)
MiniCPMOTokenizer(199-261)MiniCPMOTokenizer(199-199)mllm/models/minicpm_o2_6/image_preprocessor_minicpmo.hpp (2)
image_path(253-283)image_path(253-254)mllm/models/minicpm_o2_6/modeling_siglip.hpp (4)
inputs(40-154)inputs(40-40)inputs(178-186)inputs(178-178)mllm/models/minicpm_o2_6/modeling_resampler.hpp (4)
inputs(70-168)inputs(70-70)inputs(211-351)inputs(211-211)mllm/nn/Functional.cpp (2)
concat(45-47)concat(45-45)
mllm/models/minicpm_o2_6/modeling_minicpmo.hpp (2)
mllm/models/minicpm_o2_6/modeling_siglip.hpp (12)
inputs(40-154)inputs(40-40)inputs(178-186)inputs(178-178)inputs(216-253)inputs(216-216)inputs(277-298)inputs(277-277)inputs(315-322)inputs(315-315)inputs(346-433)inputs(346-346)mllm/models/minicpm_o2_6/modeling_resampler.hpp (4)
inputs(70-168)inputs(70-70)inputs(211-351)inputs(211-211)
mllm/models/minicpm_o2_6/audio_preprocessor_minicpmo.hpp (4)
mllm/models/minicpm_o2_6/modeling_minicpmo.hpp (8)
inputs(33-39)inputs(33-33)inputs(56-58)inputs(56-56)inputs(75-81)inputs(75-75)inputs(161-240)inputs(161-161)mllm/models/minicpm_o2_6/modeling_whisper_encoder.hpp (6)
inputs(39-73)inputs(39-39)inputs(108-130)inputs(108-108)inputs(177-207)inputs(177-177)mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp (1)
nodiscard(163-186)mllm/models/minicpm_o2_6/streaming_generation.hpp (1)
mel_spec(203-203)
🪛 Clang (14.0.6)
mllm/nn/layers/AvgPool1d.hpp
[error] 3-3: 'cstdint' file not found
(clang-diagnostic-error)
mllm/nn/layers/AvgPool1d.cpp
[error] 1-1: 'mllm/nn/layers/AvgPool1d.hpp' file not found
(clang-diagnostic-error)
mllm/backends/cpu/ops/AvgPool1dOp.cpp
[error] 4-4: 'mllm/backends/cpu/ops/AvgPool1dOp.hpp' file not found
(clang-diagnostic-error)
mllm/backends/cpu/ops/AvgPool1dOp.hpp
[error] 6-6: 'mllm/core/BaseOp.hpp' file not found
(clang-diagnostic-error)
mllm/core/aops/AvgPool1dOp.hpp
[error] 6-6: 'cstdint' file not found
(clang-diagnostic-error)
[error] 12-12: constructor does not initialize these fields: kernel_size, stride
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 33-33: member variable 'options_' has protected visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
mllm/models/minicpm4/configuration_minicpm4.hpp
[error] 5-5: 'mllm/core/aops/LinearOp.hpp' file not found
(clang-diagnostic-error)
[error] 11-11: constructor does not initialize these fields: short_factor, long_factor
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 14-14: constructor does not initialize these fields: short_factor, long_factor
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 16-16: 'attention_bias' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 23-23: 'rms_norm_eps' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 28-28: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 71-71: 1e-05 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 77-77: 10000.0 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 83-83: 12.0 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 84-84: 256.0 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 85-85: 1.4 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
examples/minicpm_o/main.cpp
[error] 4-4: 'fmt/core.h' file not found
(clang-diagnostic-error)
[error] 18-18: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 18-18: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
mllm/models/minicpm_o2_6/modeling_minicpmo.hpp
[error] 48-48: constructor does not initialize these fields: avg_pool_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 52-52: constructor does not initialize these fields: avg_pool_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 52-52: 3 adjacent parameters of 'AudioAvgPooler' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 56-56: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 113-113: variable 'audio_output_dim' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 128-128: member variable 'apm_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 129-129: member variable 'audio_projection_layer_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 130-130: member variable 'audio_avg_pooler_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 192-192: variable 'audio_bounds' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 301-301: 3 adjacent parameters of 'merge_audio_text_embeddings' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
examples/minicpm4/main.cpp
[error] 1-1: 'iostream' file not found
(clang-diagnostic-error)
[error] 11-11: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 11-11: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
examples/minicpm_o/image_infer.cpp
[error] 1-1: 'iostream' file not found
(clang-diagnostic-error)
[error] 11-11: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 11-11: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
mllm/models/minicpm4/tokenization_minicpm4.hpp
[error] 4-4: 'vector' file not found
(clang-diagnostic-error)
[error] 13-13: constructor does not initialize these fields: prompt
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 15-15: variable 'message_template' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 20-20: constructor does not initialize these fields: bpe_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 35-35: variable 'normalized' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 61-61: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 68-68: variable 'decoded' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 107-107: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 114-114: variable 'pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 123-123: variable 'sequence' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 136-136: variable 'decoded' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 154-154: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 158-158: variable 'decoded' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
mllm/models/minicpm4/modeling_minicpm4.hpp
[error] 4-4: 'mllm/mllm.hpp' file not found
(clang-diagnostic-error)
[error] 16-16: 4 adjacent parameters of 'makeRoPEInvFreqWithLongRoPE' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 32-32: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 33-33: variable 'scale' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 37-37: 2 adjacent parameters of 'makeRotaryPosEmbedding' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 86-86: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 94-94: constructor does not initialize these fields: gate_proj_, up_proj_, down_proj_, silu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 94-94: 2 adjacent parameters of 'MiniCPM4MLP' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 101-101: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 103-103: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 105-105: variable name 'y' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 112-112: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, head_dim_, num_attention_heads_, num_key_value_heads_, num_key_value_groups_, layer_idx_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 131-131: constructor does not initialize these fields: q_proj_, k_proj_, v_proj_, o_proj_, q_rope_, k_rope_, mask_, softmax_, hidden_size_, num_attention_heads_, num_key_value_heads_, layer_idx_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 131-131: 2 adjacent parameters of 'MiniCPM4Attention' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 136-136: 'num_key_value_groups_' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 152-152: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 153-153: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 163-163: variable 'B' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 163-163: variable name 'B' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 163-163: invalid case style for variable 'B'
(readability-identifier-naming,-warnings-as-errors)
[error] 164-164: variable 'S' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 164-164: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 164-164: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 186-186: variable 'attn' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 187-187: repeated branch in conditional chain
(bugprone-branch-clone,-warnings-as-errors)
[error] 209-209: member variable 'layer_idx_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 212-212: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_, scale_depth_, num_hidden_layers_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 224-224: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_, scale_depth_, num_hidden_layers_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 224-224: 2 adjacent parameters of 'MiniCPM4Decoder' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 234-234: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 240-240: variable 'residual_scale' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 244-244: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 259-259: constructor does not initialize these fields: decode_blocks_, norm_, embedding_, scale_emb_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 268-268: constructor does not initialize these fields: decode_blocks_, norm_, embedding_, scale_emb_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 268-268: 2 adjacent parameters of 'MiniCPM4Text' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 276-276: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 280-280: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 282-282: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 298-298: constructor does not initialize these fields: lm_head_, tie_word_embeddings_, kv_cache_, longrope_scaling_factor_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 322-322: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 331-331: variable 'position_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 358-358: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 358-358: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 365-365: variable 'scale_factor' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
mllm/core/aops/AvgPool1dOp.cpp
[error] 15-15: method 'reshape' can be made static
(readability-convert-member-functions-to-static,-warnings-as-errors)
[error] 16-16: variable name 'i' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 25-25: variable 'batch' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 26-26: variable 'channels' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 27-27: variable 'length' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 29-29: variable 'kernel_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 30-30: variable 'stride' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 31-31: variable 'padding' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 33-33: variable 'pooled_length' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
mllm/models/minicpm_o2_6/audio_preprocessor_minicpmo.hpp
[error] 5-5: 'string' file not found
(clang-diagnostic-error)
[error] 19-19: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 20-20: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 21-21: variable 'logstep' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 24-24: 3.0f is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 24-24: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 24-24: 200.0f is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 24-24: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 25-25: do not use 'else' after 'return'
(readability-else-after-return,-warnings-as-errors)
[error] 32-32: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 33-33: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 34-34: variable 'logstep' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 37-37: 200.0f is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 37-37: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 37-37: 3.0f is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 37-37: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 38-38: do not use 'else' after 'return'
(readability-else-after-return,-warnings-as-errors)
[error] 45-45: variable 'length' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 47-47: variable 'window_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 61-61: 3 adjacent parameters of 'create_mel_filterbank' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 62-62: 2 adjacent parameters of 'create_mel_filterbank' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 85-85: variable 'filters_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 128-128: constructor does not initialize these fields: n_fft_, hop_length_, win_length_, n_mels_, padding_, power_, stft_, window_, melscale_fbanks_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 143-143: constructor does not initialize these fields: n_fft_, hop_length_, win_length_, n_mels_, padding_, stft_, window_, melscale_fbanks_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 143-143: 7 adjacent parameters of 'MelSpectrogramFeatures' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 143-143: 24000 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 143-143: 1024 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 144-144: 256 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 144-144: 100 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 167-167: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 201-201: variable 'threshold' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 220-220: constructor does not initialize these fields: sample_rate_, hop_length_, chunk_length_, n_samples_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 220-220: 4 adjacent parameters of 'MiniCPMOAudioProcessor' of similar type ('int') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 220-220: 16000 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 220-220: 80 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 220-220: 160 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 221-221: 30 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 230-230: variable 'pool_step' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 233-233: variable 'feature_lens' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 239-239: variable 'output_lens' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 241-241: variable 'audio_placeholder' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 245-245: variable 'fbank_feat_in_chunk' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 246-246: variable 'cnn_feat_in_chunk' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 247-247: variable 'audio_embeds_in_chunk' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 248-248: variable 'num_audio_chunks' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 250-250: variable 'total_unk_len' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 276-276: variable 'audio_length' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 277-277: variable 'padded_length' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 280-280: variable 'audio_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 282-282: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 307-307: variable 'padded_length' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 310-310: variable 'audio_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 312-312: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
mllm/models/minicpm_o2_6/tokenization_minicpmo.hpp
[error] 333-333: variable 'audio_features' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 374-374: variable 'final_text' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 386-386: variable 'audio_placeholder' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 387-387: variable 'audio_placeholder_pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 411-411: variable 'audio_start_id' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 412-412: variable 'audio_end_id' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 422-422: 2 adjacent parameters of 'convertToTensors' of similar type ('int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 423-423: all parameters should be named in a function
(readability-named-parameter,-warnings-as-errors)
[error] 426-426: variable 'result' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 439-439: variable 'channels' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 440-440: variable 'patch_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 441-441: variable 'HW_patch_size' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 441-441: invalid case style for variable 'HW_patch_size'
(readability-identifier-naming,-warnings-as-errors)
[error] 505-505: variable 'audio_bounds_tensor' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 520-520: redundant access specifier has the same accessibility as the previous access specifier
(readability-redundant-access-specifiers,-warnings-as-errors)
mllm/models/minicpm_o2_6/modeling_whisper_encoder.hpp
[error] 5-5: 'mllm/mllm.hpp' file not found
(clang-diagnostic-error)
[error] 15-15: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, dropout_, k_proj_, v_proj_, q_proj_, out_proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 29-29: constructor does not initialize these fields: embed_dim_, num_heads_, head_dim_, k_proj_, v_proj_, q_proj_, out_proj_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 29-29: 4 adjacent parameters of 'WhisperEncoderAttention' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 29-29: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 39-39: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 58-58: variable 'scale' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 77-77: constructor does not initialize these fields: embed_dim_, ffn_dim_, dropout_, activation_dropout_, self_attn_layer_norm_, fc1_, fc2_, final_layer_norm_, activation_fn_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 93-93: constructor does not initialize these fields: embed_dim_, ffn_dim_, self_attn_layer_norm_, fc1_, fc2_, final_layer_norm_, activation_fn_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 93-93: 2 adjacent parameters of 'WhisperEncoderLayer' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 97-97: 'dropout_' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 97-97: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 98-98: 'activation_dropout_' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 98-98: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 108-108: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 134-134: constructor does not initialize these fields: embed_dim_, num_mel_bins_, max_source_positions_, num_layers_, conv1_, gelu1_, conv2_, gelu2_, embed_positions_, layers_, layer_norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 151-151: constructor does not initialize these fields: embed_dim_, num_mel_bins_, max_source_positions_, num_layers_, conv1_, gelu1_, conv2_, gelu2_, embed_positions_, layers_, layer_norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 151-151: 2 adjacent parameters of 'WhisperEncoder' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 177-177: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
examples/minicpm_o/audio_infer.cpp
[error] 4-4: 'iostream' file not found
(clang-diagnostic-error)
[error] 16-16: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 16-16: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-android
- GitHub Check: build-macos
| models::minicpmo::StreamingGenerationConfig stream_config; | ||
| stream_config.generate_audio = true; | ||
| stream_config.output_chunk_size = 25; | ||
| stream_config.max_new_tokens = 100; | ||
| stream_config.force_no_stop = false; | ||
| stream_config.top_p = 0.7f; | ||
| stream_config.top_k = 20; | ||
| stream_config.sampling = false; | ||
| stream_config.tts_temperature = {0.1f, 0.3f, 0.1f, 0.3f}; | ||
|
|
||
| auto streaming_gen = models::minicpmo::StreamingGenerator(gen_ids, Tensor::nil(), model, minicpmo_tokenizer, | ||
| &chattts_tokenizer, stream_config, chattts_config); | ||
| int chunk_count = 0; | ||
| std::vector<Tensor> audio_chunks; | ||
| for (auto& output : streaming_gen) { | ||
| fmt::print("\n--- Chunk {} ---\n", chunk_count); | ||
| ++chunk_count; | ||
| fmt::print("Text: {}\n", output.text); | ||
|
|
||
| if (output.audio_wav && !output.audio_wav.value().isNil()) { | ||
| auto& audio = output.audio_wav.value(); | ||
| fmt::print("Audio: [{} samples @ {}Hz]\n", audio.shape()[1], output.sampling_rate); | ||
| audio_chunks.emplace_back(audio); | ||
| } | ||
|
|
||
| mllm::ModelFileVersion file_version = mllm::ModelFileVersion::kV1; | ||
| if (model_version.get() == "v1") { | ||
| file_version = mllm::ModelFileVersion::kV1; | ||
| } else if (model_version.get() == "v2") { | ||
| file_version = mllm::ModelFileVersion::kV2; | ||
| if (output.finished) { | ||
| fmt::print("\n=== Generation Finished ===\n"); | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if (help.isSet()) { | ||
| Argparse::printHelp(); | ||
| mllm::shutdownContext(); | ||
| return 0; | ||
| } | ||
| { | ||
| auto minicpmo_cfg = mllm::models::minicpmo::MiniCPMOConfig(config_path.get()); | ||
| auto minicpmo_tokenizer = mllm::models::minicpmo::MiniCPMOTokenizer(tokenizer_path.get()); | ||
| auto minicpmo = mllm::models::minicpmo::MiniCPMOForCausalLM(minicpmo_cfg); | ||
|
|
||
| auto param = mllm::load(model_path.get(), file_version); | ||
| minicpmo.llm_.llm.load(param); | ||
| minicpmo.vpm_.load(param); | ||
| minicpmo.resampler_.load(param); | ||
| // minicpmo.audio_proj_.load(param); | ||
| // minicpmo.tts_proj_.load(param); | ||
|
|
||
| fmt::print("\n{:*^60}\n", " MiniCPM-o Interactive CLI "); | ||
| fmt::print("Enter 'exit' or 'quit' to end the session\n"); | ||
|
|
||
| std::string image_path = "/Users/luis/Desktop/plane.png"; | ||
| std::string prompt_text = "描述图片中物体"; | ||
| mllm::models::minicpmo::MiniCPMOMessage message; | ||
| message.prompt = prompt_text; | ||
| message.img_file_path = image_path; | ||
|
|
||
| fmt::print("Processing...\n"); | ||
| auto inputs = minicpmo_tokenizer.convertMessage(message); | ||
|
|
||
| fmt::print("\nResponse: "); | ||
|
|
||
| int token_count = 0; | ||
| for (auto& step : minicpmo.chat(inputs)) { | ||
| auto token_str = minicpmo_tokenizer.detokenize(step.cur_token_id); | ||
| std::wcout << token_str << std::flush; | ||
|
|
||
| token_count++; | ||
| if (token_count >= 50) break; // Limit output for debugging | ||
| } | ||
| Tensor audio_output = nn::functional::concat(audio_chunks, -1); | ||
|
|
||
| print("Final audio shape:", audio_output.shape(), audio_output); | ||
| audio_output = audio_output * 32767; | ||
|
|
||
| wenet::WavWriter wav_writer2(audio_output.ptr<float>(), audio_output.shape().back(), 1, 24000, 16); | ||
| wav_writer2.Write("./omni.wav"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Guard against empty audio, use runtime sampling rate, and verify manual scaling
There are a few issues/opportunities in the streaming/audio output section:
- If no
audio_wavchunks are returned,concat(audio_chunks, -1)will be called with an empty vector, which is likely to assert or crash. - The WAV writer is hard‑coded to
24000Hz even thoughoutput.sampling_rateis available. audio_outputis manually scaled by32767before being passed asfloat*toWavWriter; depending onWavWriter’s contract this may be redundant or incorrect.
You can harden this block and align it with the runtime metadata like this:
- int chunk_count = 0;
- std::vector<Tensor> audio_chunks;
+ int chunk_count = 0;
+ std::vector<Tensor> audio_chunks;
+ int audio_sampling_rate = 24000;
@@
- if (output.audio_wav && !output.audio_wav.value().isNil()) {
- auto& audio = output.audio_wav.value();
- fmt::print("Audio: [{} samples @ {}Hz]\n", audio.shape()[1], output.sampling_rate);
- audio_chunks.emplace_back(audio);
- }
+ if (output.audio_wav && !output.audio_wav.value().isNil()) {
+ auto& audio = output.audio_wav.value();
+ fmt::print("Audio: [{} samples @ {}Hz]\n", audio.shape()[1], output.sampling_rate);
+ audio_chunks.emplace_back(audio);
+ audio_sampling_rate = output.sampling_rate;
+ }
@@
- Tensor audio_output = nn::functional::concat(audio_chunks, -1);
-
- print("Final audio shape:", audio_output.shape(), audio_output);
- audio_output = audio_output * 32767;
-
- wenet::WavWriter wav_writer2(audio_output.ptr<float>(), audio_output.shape().back(), 1, 24000, 16);
- wav_writer2.Write("./omni.wav");
+ if (audio_chunks.empty()) {
+ fmt::print("No audio chunks were generated; skipping WAV write.\n");
+ } else {
+ Tensor audio_output = nn::functional::concat(audio_chunks, -1);
+
+ print("Final audio shape:", audio_output.shape(), audio_output);
+ // Verify whether this manual scaling matches WavWriter's expected input range.
+ audio_output = audio_output * 32767;
+
+ wenet::WavWriter wav_writer2(
+ audio_output.ptr<float>(), audio_output.shape().back(), 1, audio_sampling_rate, 16);
+ wav_writer2.Write("./omni.wav");
+ }This prevents a crash when no audio is produced and keeps the sample rate consistent with what the model actually outputs. Please double‑check WavWriter’s expected input format to confirm whether the explicit * 32767 scaling is correct or should be removed/adjusted.
🏁 Script executed:
#!/bin/bash
# Find WavWriter class to understand its contract
rg -n "class WavWriter" --type=cpp --type=h -A 15Length of output: 1344
🏁 Script executed:
#!/bin/bash
# Find nn::functional::concat implementation
rg -n "concat" --type=cpp --type=h -A 5 | grep -E "(concat|functional)" -A 5 -B 2 | head -100Length of output: 7620
🏁 Script executed:
#!/bin/bash
# Verify the actual code in main.cpp at the specified lines
cat examples/minicpm_o/main.cpp | sed -n '79,116p'Length of output: 1469
🏁 Script executed:
#!/bin/bash
# Check if there are any guards or checks before concat elsewhere in the codebase
rg -n "audio_chunks" --type=cpp examples/Length of output: 531
🏁 Script executed:
#!/bin/bash
# Find nn::functional::concat implementation
find . -name "*.h" -o -name "*.cpp" | xargs grep -l "namespace functional" | head -5Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for concat function implementation
rg -n "functional::concat|fn::concat" --type=cpp --type=h -A 20 | head -100Length of output: 9243
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation or comments about tensor concat with empty vectors
rg -n "def concat|void concat|Tensor concat" --type=cpp --type=h -B 3 -A 10 | head -150Length of output: 8337
🏁 Script executed:
#!/bin/bash
# Look for any existing empty checks in similar audio processing code
rg -n "empty()" --type=cpp examples/ -B 2 -A 2Length of output: 2313
🏁 Script executed:
#!/bin/bash
# Search for OpTypes::kConcat operation implementation
rg -n "kConcat|concat.*Op" --type=cpp --type=h mllm/nn/ -B 2 -A 15 | head -200Length of output: 1116
🏁 Script executed:
#!/bin/bash
# Look at WavWriter Write implementation more carefully
cat third_party/wenet_audio/wav.h | sed -n '157,250p'Length of output: 2195
🏁 Script executed:
#!/bin/bash
# Check for any tests or examples that show concat with various inputs
rg -n "functional::concat" --type=cpp -B 5 -A 5 | grep -E "(empty|vector|list|\[\])" -B 3 -A 3 | head -80Length of output: 6572
🏁 Script executed:
#!/bin/bash
# Look for how audio_wav is populated - could it be nil/empty for all chunks?
rg -n "audio_wav\|output.audio_wav" --type=cpp --type=h -B 2 -A 2 | head -60Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find StreamingGenerator output structure definition
rg -n "struct.*output|class.*Output" --type=cpp --type=h mllm/models/minicpm_o* -A 10 | grep -E "(audio_wav|sampling_rate)" -B 5 -A 5 | head -80Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for buildOpAndSubmitTask to understand how it handles empty vectors
rg -n "buildOpAndSubmitTask" --type=cpp --type=h -B 3 -A 20 | head -150Length of output: 11003
🏁 Script executed:
#!/bin/bash
# Look for ConcatOp or concat operation actual implementation
rg -n "class.*ConcatOp|ConcatOpOptions" --type=cpp --type=h -A 20 | head -150Length of output: 10101
🏁 Script executed:
#!/bin/bash
# Check if there are integration tests showing concat behavior
find . -name "*test*.cpp" -o -name "*test*.h" | xargs grep -l "concat" | head -3Length of output: 0
Guard against empty audio chunks, use runtime sampling rate
Verification confirms all issues:
-
Empty audio crash:
ConcatOp::reshape()exits with error if inputs are empty (seemllm/core/aops/ConcatOp.cpp:29-31). If noaudio_wavchunks are generated, line 110 will crash. -
Hard-coded sampling rate: Line 115 uses
24000instead ofoutput.sampling_rateavailable from the loop (line 102). -
Manual scaling: The
* 32767is correct—WavWriter::Write()casts float directly toint16_tfor 16-bit PCM, so the scaling converts normalized float[-1, 1]to PCM range appropriately.
Apply the suggested fixes at examples/minicpm_o/main.cpp:92–116 and examples/minicpm_o/demo_minicpmo_streaming.cpp:114–137 to add empty vector guard and capture runtime sampling rate.
🤖 Prompt for AI Agents
In examples/minicpm_o/main.cpp around lines 92–116 and
examples/minicpm_o/demo_minicpmo_streaming.cpp around lines 114–137, guard
against calling concat and writing a WAV when audio_chunks is empty and use the
runtime sampling rate collected from the streaming outputs instead of the
hard-coded 24000; specifically, track the last non-zero output.sampling_rate
inside the loop, after the loop check if audio_chunks.empty() and skip WAV
creation (or log/return) to avoid ConcatOp crashes, otherwise call concat, keep
the existing float-to-PCM scaling, and pass the captured sampling_rate to
WavWriter instead of 24000.
| mllm::ModelFileVersion file_version = mllm::ModelFileVersion::kV1; | ||
| if (model_version.get() == "v1") { | ||
| file_version = mllm::ModelFileVersion::kV1; | ||
| } else if (model_version.get() == "v2") { | ||
| file_version = mllm::ModelFileVersion::kV2; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate model_version instead of silently defaulting to v1
If -mv|--model_version receives anything other than "v1" or "v2", file_version remains kV1. That silent fallback can hide typos and cause confusing behavior (loading the wrong layout for a v2 model file).
Consider explicitly rejecting unknown versions, e.g., by printing an error and exiting, or at least logging a warning when model_version isn’t recognized.
| fmt::print("\n{:*^60}\n", " MiniCPM4 Interactive CLI "); | ||
| fmt::print("Enter 'exit' or 'quit' to end the session\n\n"); | ||
|
|
||
| std::string prompt_text = "你是谁?"; | ||
|
|
||
| try { | ||
| auto inputs = minicpm4_tokenizer.convertMessage({.prompt = prompt_text}); | ||
|
|
||
| fmt::print("\nResponse: "); | ||
|
|
||
| mllm::models::ARGenerationArgs args; | ||
| args["temperature"] = mllm::AnyValue(0.8f); | ||
| args["top_p"] = mllm::AnyValue(0.9f); | ||
| args["do_sample"] = mllm::AnyValue(true); | ||
|
|
||
| for (auto& step : minicpm4.chat(inputs, args)) { | ||
| auto token_str = minicpm4_tokenizer.detokenize(step.cur_token_id); | ||
| if (token_str == "<|im_end|>") { break; } | ||
| std::cout << token_str << std::flush; | ||
| } | ||
|
|
||
| fmt::print("\n{}\n", std::string(60, '-')); | ||
| } catch (const std::exception& e) { fmt::print("\nError: {}\n{}\n", e.what(), std::string(60, '-')); } | ||
|
|
||
| minicpm4.perfSummary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI is not actually interactive despite prompts and naming
The banner and message:
" MiniCPM4 Interactive CLI ""Enter 'exit' or 'quit' to end the session"
imply a read–eval–print loop, but the code only ever runs a single, hard‑coded prompt ("你是谁?") and then exits—there is no loop or std::getline over user input.
Either:
- Implement a proper REPL that reads user input until
"exit"/"quit"and re‑invokesconvertMessage/chat, or - Downgrade the messaging to indicate this is a single‑shot demo and remove the
exit/quitnotice.
As it stands, the UX is misleading for anyone expecting an interactive session.
🤖 Prompt for AI Agents
In examples/minicpm4/main.cpp around lines 52 to 76, the code displays an
"Interactive CLI" banner and tells the user to "Enter 'exit' or 'quit' to end
the session" but only runs a single hard-coded prompt ("你是谁?") and exits; fix by
either implementing a REPL loop that repeatedly reads input with std::getline,
checks for "exit"/"quit", calls convertMessage and minicpm4.chat for each input,
prints responses and continues until quit, or change the banner and help text to
indicate this is a single-shot demo and remove the exit/quit instructions so the
messaging matches behavior.
| struct AvgPool1dOpOptions : public BaseOpOptions<AvgPool1dOpOptions> { | ||
| int32_t kernel_size; | ||
| int32_t stride; | ||
| int32_t padding = 0; | ||
| bool ceil_mode = false; | ||
| bool count_include_pad = true; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize required fields in AvgPool1dOpOptions.
The fields kernel_size and stride are not initialized with default values. This can lead to undefined behavior when the struct is default-constructed (as seen in mllm/nn/layers/AvgPool1d.cpp line 7, where AvgPool1dOpOptions{} is used).
Apply this diff to add default values:
struct AvgPool1dOpOptions : public BaseOpOptions<AvgPool1dOpOptions> {
- int32_t kernel_size;
- int32_t stride;
+ int32_t kernel_size = 1;
+ int32_t stride = 1;
int32_t padding = 0;
bool ceil_mode = false;
bool count_include_pad = true;
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| struct AvgPool1dOpOptions : public BaseOpOptions<AvgPool1dOpOptions> { | |
| int32_t kernel_size; | |
| int32_t stride; | |
| int32_t padding = 0; | |
| bool ceil_mode = false; | |
| bool count_include_pad = true; | |
| }; | |
| struct AvgPool1dOpOptions : public BaseOpOptions<AvgPool1dOpOptions> { | |
| int32_t kernel_size = 1; | |
| int32_t stride = 1; | |
| int32_t padding = 0; | |
| bool ceil_mode = false; | |
| bool count_include_pad = true; | |
| }; |
🧰 Tools
🪛 Clang (14.0.6)
[error] 12-12: constructor does not initialize these fields: kernel_size, stride
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/core/aops/AvgPool1dOp.hpp around lines 12 to 18, the AvgPool1dOpOptions
struct leaves kernel_size and stride uninitialized which leads to undefined
behavior when default-constructed; set sensible default values (e.g.,
kernel_size = 1 and stride = 1) in the struct definition so AvgPool1dOpOptions{}
produces valid, initialized members while keeping existing defaults for padding,
ceil_mode, and count_include_pad.
| // Create a Hann window | ||
| inline Tensor create_hann_window(int32_t window_length, bool periodic = true) { | ||
| int32_t length = periodic ? window_length + 1 : window_length; | ||
| auto window = Tensor::empty({1, window_length}, kFloat32, kCPU).alloc(); | ||
| float* window_ptr = window.ptr<float>(); | ||
|
|
||
| for (int32_t i = 0; i < window_length; ++i) { | ||
| // Hann window formula: 0.5 - 0.5 * cos(2π * n / (N-1)) | ||
| // For periodic: use N instead of N-1 | ||
| float n = static_cast<float>(i); | ||
| float denominator = periodic ? static_cast<float>(length) : static_cast<float>(length - 1); | ||
| window_ptr[i] = 0.5f - 0.5f * std::cos(2.0f * M_PI * n / denominator); | ||
| } | ||
|
|
||
| return window; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Periodic Hann window formula is likely incorrect vs standard/Whisper behavior
create_hann_window uses length = window_length + 1 and denominator = length when periodic == true, but still iterates i only up to window_length - 1. This yields 0.5 - 0.5 * cos(2π n / (N+1)) for n ∈ [0, N-1], which does not match the usual periodic Hann window (2π n / N) or the standard non‑periodic form (2π n / (N-1)).
Given this window is fed directly into nn::STFT and you explicitly aim to match Whisper/NumPy behavior, this mismatch will change the STFT magnitudes and thus the mel features.
Consider simplifying to an explicit, standard formula, e.g.:
-inline Tensor create_hann_window(int32_t window_length, bool periodic = true) {
- int32_t length = periodic ? window_length + 1 : window_length;
- auto window = Tensor::empty({1, window_length}, kFloat32, kCPU).alloc();
- float* window_ptr = window.ptr<float>();
-
- for (int32_t i = 0; i < window_length; ++i) {
- // Hann window formula: 0.5 - 0.5 * cos(2π * n / (N-1))
- // For periodic: use N instead of N-1
- float n = static_cast<float>(i);
- float denominator = periodic ? static_cast<float>(length) : static_cast<float>(length - 1);
- window_ptr[i] = 0.5f - 0.5f * std::cos(2.0f * M_PI * n / denominator);
- }
+inline Tensor create_hann_window(int32_t window_length, bool periodic = true) {
+ auto window = Tensor::empty({1, window_length}, kFloat32, kCPU).alloc();
+ float* window_ptr = window.ptr<float>();
+
+ const float denom = periodic ? static_cast<float>(window_length)
+ : static_cast<float>(window_length - 1);
+ for (int32_t i = 0; i < window_length; ++i) {
+ float n = static_cast<float>(i);
+ window_ptr[i] = 0.5f - 0.5f * std::cos(2.0f * M_PI * n / denom);
+ }
return window;
}This aligns periodic and non‑periodic forms with common definitions and avoids the off‑by‑one in the periodic branch.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Create a Hann window | |
| inline Tensor create_hann_window(int32_t window_length, bool periodic = true) { | |
| int32_t length = periodic ? window_length + 1 : window_length; | |
| auto window = Tensor::empty({1, window_length}, kFloat32, kCPU).alloc(); | |
| float* window_ptr = window.ptr<float>(); | |
| for (int32_t i = 0; i < window_length; ++i) { | |
| // Hann window formula: 0.5 - 0.5 * cos(2π * n / (N-1)) | |
| // For periodic: use N instead of N-1 | |
| float n = static_cast<float>(i); | |
| float denominator = periodic ? static_cast<float>(length) : static_cast<float>(length - 1); | |
| window_ptr[i] = 0.5f - 0.5f * std::cos(2.0f * M_PI * n / denominator); | |
| } | |
| return window; | |
| } | |
| // Create a Hann window | |
| inline Tensor create_hann_window(int32_t window_length, bool periodic = true) { | |
| auto window = Tensor::empty({1, window_length}, kFloat32, kCPU).alloc(); | |
| float* window_ptr = window.ptr<float>(); | |
| const float denom = periodic ? static_cast<float>(window_length) | |
| : static_cast<float>(window_length - 1); | |
| for (int32_t i = 0; i < window_length; ++i) { | |
| float n = static_cast<float>(i); | |
| window_ptr[i] = 0.5f - 0.5f * std::cos(2.0f * M_PI * n / denom); | |
| } | |
| return window; | |
| } |
🧰 Tools
🪛 Clang (14.0.6)
[error] 45-45: variable 'length' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 47-47: variable 'window_ptr' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/models/minicpm_o2_6/audio_preprocessor_minicpmo.hpp around lines 43-58,
the periodic Hann branch uses length = window_length + 1 and then divides by
length while iterating i in [0, window_length-1], producing an off-by-one
denominator; change to use the standard formulas: set denominator = periodic ?
static_cast<float>(window_length) : static_cast<float>(window_length - 1) (and
keep the loop range 0..window_length-1), so the periodic case uses 2π * n / N
and the non-periodic case uses 2π * n / (N-1), matching NumPy/Whisper behavior.
Ensure the Tensor allocation and indices remain based on window_length.
| if (data()["eos_token_id"].is_array()) { | ||
| eos_token_ids = data()["eos_token_id"].get<std::vector<int64_t>>(); | ||
| eos_token_id = eos_token_ids[0]; | ||
| } else { | ||
| eos_token_id = data()["eos_token_id"]; | ||
| eos_token_ids.push_back(eos_token_id); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against empty eos_token_id arrays before indexing
When eos_token_id is an array, eos_token_ids[0] is accessed without checking that the array is non‑empty. A malformed or partially edited config would cause undefined behavior here.
Consider validating and failing fast, for example:
- if (data()["eos_token_id"].is_array()) {
- eos_token_ids = data()["eos_token_id"].get<std::vector<int64_t>>();
- eos_token_id = eos_token_ids[0];
- } else {
+ if (data()["eos_token_id"].is_array()) {
+ eos_token_ids = data()["eos_token_id"].get<std::vector<int64_t>>();
+ if (eos_token_ids.empty()) {
+ throw std::runtime_error("MiniCPM4Config: `eos_token_id` array must not be empty");
+ }
+ eos_token_id = eos_token_ids[0];
+ } else {
eos_token_id = data()["eos_token_id"];
eos_token_ids.push_back(eos_token_id);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (data()["eos_token_id"].is_array()) { | |
| eos_token_ids = data()["eos_token_id"].get<std::vector<int64_t>>(); | |
| eos_token_id = eos_token_ids[0]; | |
| } else { | |
| eos_token_id = data()["eos_token_id"]; | |
| eos_token_ids.push_back(eos_token_id); | |
| } | |
| if (data()["eos_token_id"].is_array()) { | |
| eos_token_ids = data()["eos_token_id"].get<std::vector<int64_t>>(); | |
| if (eos_token_ids.empty()) { | |
| throw std::runtime_error("MiniCPM4Config: `eos_token_id` array must not be empty"); | |
| } | |
| eos_token_id = eos_token_ids[0]; | |
| } else { | |
| eos_token_id = data()["eos_token_id"]; | |
| eos_token_ids.push_back(eos_token_id); | |
| } |
🧰 Tools
🪛 Clang (14.0.6)
[error] 28-28: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/models/minicpm4/configuration_minicpm4.hpp around lines 28-34, the code
assumes data()["eos_token_id"] array contains at least one element and directly
accesses eos_token_ids[0]; add a guard to validate the array is non-empty before
indexing and fail fast if it is empty (e.g., log/throw an informative exception
or return an error), and ensure eos_token_id and eos_token_ids are set
consistently afterwards; also validate the non-array path (presence/type) and
handle malformed or missing values with the same fail-fast behavior.
| // RoPE scaling configuration | ||
| if (data().contains("rope_scaling") && !data()["rope_scaling"].is_null()) { | ||
| auto rope_scaling = data()["rope_scaling"]; | ||
| rope_scaling_type = rope_scaling["rope_type"]; | ||
| rope_scaling_factor = rope_scaling.value("factor", 1.0); | ||
|
|
||
| if (rope_scaling_type == "longrope") { | ||
| original_max_position_embeddings = rope_scaling["original_max_position_embeddings"]; | ||
|
|
||
| // Load short_factor and long_factor arrays | ||
| if (rope_scaling.contains("short_factor")) { short_factor = rope_scaling["short_factor"].get<std::vector<float>>(); } | ||
| if (rope_scaling.contains("long_factor")) { long_factor = rope_scaling["long_factor"].get<std::vector<float>>(); } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate RoPE scaling factors for longrope configs
rope_scaling_type, short_factor, and long_factor are loaded here, but there’s no validation that:
long_factoris populated whenshort_factoris non‑empty, and- both factor vectors have length at least
head_dim / 2(the length used later inmakeRoPEInvFreqWithLongRoPE).
Given the modeling code unconditionally indexes ext_factors[i] when !cfg.short_factor.empty(), a config with only short_factor (or mismatched lengths) will cause out‑of‑bounds reads.
Suggest adding basic shape/consistency checks after loading the vectors (and either throwing with a clear message or falling back to unscaled RoPE if invalid).
🤖 Prompt for AI Agents
mllm/models/minicpm4/configuration_minicpm4.hpp around lines 46 to 59: after
loading rope_scaling_type, original_max_position_embeddings, short_factor and
long_factor, add validation that if short_factor is non-empty then long_factor
is also non-empty, and verify both short_factor.size() and long_factor.size()
are >= required_len (compute required_len as head_dim/2 or pass head_dim in
scope); if checks fail either throw a clear std::runtime_error with a
descriptive message including which vector/mismatch failed, or reset
rope_scaling_type to disable longrope (fallback to unscaled RoPE) and log/warn
the issue; ensure validations occur before any later code that indexes
ext_factors to prevent out‑of‑bounds access.
| inline auto makeRoPEInvFreqWithLongRoPE(int output_dim, float rope_theta, const MiniCPM4Config& cfg, int seq_len) -> Tensor { | ||
| auto inv_freq = Tensor::empty({output_dim / 2}, kFloat32, kCPU).alloc(); | ||
| auto inv_freq_ptr = inv_freq.ptr<float>(); | ||
|
|
||
| for (int i = 0; i < output_dim / 2; i++) { inv_freq_ptr[i] = 1.0 / std::pow(rope_theta, 2.0 * i / output_dim); } | ||
|
|
||
| if (!cfg.short_factor.empty()) { | ||
| const auto& ext_factors = (seq_len > cfg.original_max_position_embeddings) ? cfg.long_factor : cfg.short_factor; | ||
|
|
||
| for (int i = 0; i < output_dim / 2; i++) { inv_freq_ptr[i] = inv_freq_ptr[i] / ext_factors[i]; } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid out‑of‑bounds access when applying long/short RoPE factors
makeRoPEInvFreqWithLongRoPE assumes that whenever !cfg.short_factor.empty():
cfg.long_factoris also populated for the long‑sequence case, and- the chosen
ext_factorsvector has length ≥output_dim / 2.
If a config ships with only short_factor (or with shorter vectors than expected), ext_factors[i] will read out of bounds.
It’s safer to gate this scaling on both vectors being well‑formed, and to assert or fall back when sizes don’t match:
- if (!cfg.short_factor.empty()) {
- const auto& ext_factors = (seq_len > cfg.original_max_position_embeddings) ? cfg.long_factor : cfg.short_factor;
-
- for (int i = 0; i < output_dim / 2; i++) { inv_freq_ptr[i] = inv_freq_ptr[i] / ext_factors[i]; }
- }
+ const int dim_half = output_dim / 2;
+ if (!cfg.short_factor.empty()) {
+ const auto& ext_factors =
+ (seq_len > cfg.original_max_position_embeddings) ? cfg.long_factor : cfg.short_factor;
+ if ((int)ext_factors.size() < dim_half) {
+ throw std::runtime_error("MiniCPM4: RoPE scaling factors length mismatch");
+ }
+ for (int i = 0; i < dim_half; ++i) {
+ inv_freq_ptr[i] /= ext_factors[i];
+ }
+ }🧰 Tools
🪛 Clang (14.0.6)
[error] 16-16: 4 adjacent parameters of 'makeRoPEInvFreqWithLongRoPE' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/models/minicpm4/modeling_minicpm4.hpp around lines 16–26, the code
applies ext_factors[i] without ensuring ext_factors exists and is at least
output_dim/2 long, risking out‑of‑bounds access; update the logic to (1) choose
ext_factors only if the intended vector (long_factor for long sequences or
short_factor) is nonempty and its size() >= output_dim/2, (2) otherwise fall
back to no scaling (or to the other factor if it is valid), and (3) optionally
add an assert/log when sizes mismatch; implement the scaling loop using the
validated size boundary (or min(size, output_dim/2)) so you never index past the
vector bounds.
| explicit MiniCPM4ForCausalLM(const MiniCPM4Config& cfg) : cfg(cfg) { | ||
| kv_cache_ = nn::StaticCache(cfg.max_cache_length, cfg.num_hidden_layers, | ||
| cfg.num_attention_heads, // q_heads | ||
| cfg.num_key_value_heads, // kv_heads | ||
| cfg.hidden_size / cfg.num_attention_heads, // kv_dim | ||
| kFloat32, // k_dtype | ||
| kFloat32, // v_dtype | ||
| kCPU, // device_type | ||
| false // use_fa2 | ||
| ); | ||
| eos_token_id_ = cfg.eos_token_id; | ||
| max_length_ = cfg.max_cache_length; | ||
| tie_word_embeddings_ = cfg.tie_word_embeddings; | ||
|
|
||
| llm = reg<MiniCPM4Text>("model", cfg); | ||
|
|
||
| // Only create lm_head if NOT using tied embeddings | ||
| if (!cfg.tie_word_embeddings) { | ||
| lm_head_ = reg<nn::Linear>("lm_head", cfg.hidden_size, cfg.vocab_size, false, cfg.linear_impl_type); | ||
| } | ||
|
|
||
| longrope_scaling_factor_ = computeLongRoPEScalingFactor(cfg); | ||
| } | ||
|
|
||
| ARGenerationOutputPast forward(const ARGenerationOutputPast& input, const ARGenerationArgs& args) override { | ||
| auto sequence = input.at("sequence"); | ||
|
|
||
| // Generate position_ids for the current sequence | ||
| auto batch_size = sequence.shape()[0]; | ||
| auto seq_len = sequence.shape()[1]; | ||
|
|
||
| auto inv_freq = makeRoPEInvFreqWithLongRoPE(cfg.hidden_size / cfg.num_attention_heads, cfg.rope_theta, cfg, seq_len); | ||
|
|
||
| Tensor position_ids = Tensor::nil(); | ||
| if (input.count("position_ids")) { | ||
| // Use existing position_ids for decode phase | ||
| position_ids = input.at("position_ids"); | ||
|
|
||
| // For decode phase, increment the last position | ||
| if (seq_len == 1) { | ||
| auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1}); | ||
| position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc(); | ||
| *position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1; | ||
| } | ||
| } else { | ||
| // Generate position_ids for prefill phase | ||
| position_ids = Tensor::empty({batch_size, seq_len}, kInt64, kCPU).alloc(); | ||
| auto position_ids_ptr = position_ids.ptr<int64_t>(); | ||
| for (int b = 0; b < batch_size; ++b) { | ||
| for (int s = 0; s < seq_len; ++s) { position_ids_ptr[b * seq_len + s] = s; } | ||
| } | ||
| } | ||
|
|
||
| // Generate RoPE embeddings using the inv_freq buffer | ||
| auto [llm_embedding_sin, llm_embedding_cos] = makeRotaryPosEmbedding(position_ids, inv_freq, longrope_scaling_factor_); | ||
|
|
||
| sequence = llm(sequence, llm_embedding_sin, llm_embedding_cos, AnyValue(&kv_cache_))[0]; | ||
|
|
||
| // clip x to one seq length | ||
| { | ||
| auto S = sequence.shape()[1]; | ||
| sequence = sequence[{kAll, {S - 1}, kAll}]; | ||
| } | ||
|
|
||
| // Apply MiniCPM specific scaling before output | ||
| // logits = lm_head(hidden_states / (hidden_size / dim_model_base)) | ||
| if (cfg.dim_model_base != 1.0f) { | ||
| float scale_factor = cfg.hidden_size / cfg.dim_model_base; | ||
| sequence = sequence / scale_factor; | ||
| } | ||
|
|
||
| if (tie_word_embeddings_) { | ||
| auto embedding_weight = llm.embedding_.weight(); | ||
| sequence = nn::functional::matmul(sequence, embedding_weight, false, true); | ||
| } else { | ||
| sequence = lm_head_(sequence); | ||
| } | ||
|
|
||
| return { | ||
| {"sequence", sequence}, | ||
| {"position_ids", position_ids}, | ||
| }; | ||
| } | ||
|
|
||
| inline nn::StaticCache& kvCache() { return kv_cache_; } | ||
|
|
||
| private: | ||
| const MiniCPM4Config& cfg; | ||
| MiniCPM4Text llm; | ||
| nn::Linear lm_head_; | ||
| bool tie_word_embeddings_; | ||
| nn::StaticCache kv_cache_; | ||
| float longrope_scaling_factor_; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid storing MiniCPM4Config by reference; this can dangle and is used in forward
MiniCPM4ForCausalLM stores:
private:
const MiniCPM4Config& cfg;and uses cfg in both the constructor and forward (e.g., cfg.hidden_size, cfg.dim_model_base). If the caller ever constructs the model from a temporary config:
MiniCPM4ForCausalLM model(MiniCPM4Config("config_minicpm4.json"));cfg will dangle as soon as the temporary is destroyed, leading to undefined behavior on the first forward.
To make this robust, prefer storing the config by value:
-class MiniCPM4ForCausalLM : public ARGeneration, public nn::Module {
+class MiniCPM4ForCausalLM : public ARGeneration, public nn::Module {
...
-private:
- const MiniCPM4Config& cfg;
+private:
+ MiniCPM4Config cfg;and update the constructor to take MiniCPM4Config cfg_ by value or const ref and copy.
🧰 Tools
🪛 Clang (14.0.6)
[error] 298-298: constructor does not initialize these fields: lm_head_, tie_word_embeddings_, kv_cache_, longrope_scaling_factor_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 322-322: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 331-331: variable 'position_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 358-358: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 358-358: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 365-365: variable 'scale_factor' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
|
|
||
| namespace mllm::nn { | ||
|
|
||
| AvgPool1d::AvgPool1d() : Layer(OpTypes::kAvgPool1d, aops::AvgPool1dOpOptions{}) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default constructor uses uninitialized AvgPool1dOpOptions.
The default constructor initializes with aops::AvgPool1dOpOptions{}, which leaves kernel_size and stride uninitialized (as noted in the review of mllm/core/aops/AvgPool1dOp.hpp). This can lead to undefined behavior.
Consider one of these approaches:
Option 1: Remove the default constructor if it's not needed:
-AvgPool1d::AvgPool1d() : Layer(OpTypes::kAvgPool1d, aops::AvgPool1dOpOptions{}) {}Option 2: Provide sensible defaults in the constructor:
-AvgPool1d::AvgPool1d() : Layer(OpTypes::kAvgPool1d, aops::AvgPool1dOpOptions{}) {}
+AvgPool1d::AvgPool1d() : Layer(OpTypes::kAvgPool1d, aops::AvgPool1dOpOptions{
+ .kernel_size = 1,
+ .stride = 1,
+ .padding = 0,
+ .ceil_mode = false,
+ .count_include_pad = true}) {}Note: This issue will be resolved if the fields are initialized in AvgPool1dOpOptions as suggested in the review of mllm/core/aops/AvgPool1dOp.hpp.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| AvgPool1d::AvgPool1d() : Layer(OpTypes::kAvgPool1d, aops::AvgPool1dOpOptions{}) {} |
🤖 Prompt for AI Agents
In mllm/nn/layers/AvgPool1d.cpp around line 7, the default constructor uses
aops::AvgPool1dOpOptions{} which leaves kernel_size and stride uninitialized;
either remove this default constructor if not needed, or change it to construct
with explicit safe defaults (e.g. set kernel_size and stride to 1 or other
sensible values) so AvgPool1dOpOptions fields are initialized before passing to
Layer; alternatively, ensure AvgPool1dOpOptions itself initializes its members
and adjust this constructor accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
mllm/models/minicpm_o2_6/streaming_generation.cpp (1)
207-217: Consider extracting num_spk_embs to avoid duplication.The mask construction logic is correct and aligns with the intended streaming TTS behavior. However,
num_spk_embs = 1is hardcoded here and also inprepare_tts_text(line 196), creating duplication.Consider extracting
num_spk_embsas a class member constant to maintain consistency and simplify future adjustments:// In the class header static constexpr int32_t kNumSpeakerEmbeddings = 1;Then reference it in both methods:
- int32_t num_spk_embs = 1; + int32_t num_spk_embs = kNumSpeakerEmbeddings;Additionally, the inline comment
// = 303on line 208 is only valid whenstreaming_text_reserved_len_equals 300, which may not be obvious. Consider either removing it or making it conditional on the actual value.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
mllm/models/minicpm_o2_6/streaming_generation.cpp(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mllm/models/minicpm_o2_6/streaming_generation.cpp (1)
mllm/models/minicpm_o2_6/streaming_generation.hpp (1)
tts_token_lens(196-196)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-android
- GitHub Check: build-macos
🔇 Additional comments (1)
mllm/models/minicpm_o2_6/streaming_generation.cpp (1)
116-116: Verify that lowering min_new_token to 0 doesn't compromise TTS quality.The minimum token generation requirement has been lowered from 10 to 0, which allows the TTS model to finish generation immediately without producing any new tokens. This could potentially affect streaming audio quality or cause premature termination in edge cases.
Please confirm this change has been tested and doesn't degrade the streaming TTS experience.
Fixed the MiniCPM-O tokenization logic - the
convertMessagefunction previously could not encode images and audio simultaneouslyRefactored the code under
examples/minicpm_o/, splitting it into:main.cpp(omni end-to-end inference)audio_infer.cpp(audio + text inference)image_infer.cpp(image + text inference)Summary by CodeRabbit
New Features
Examples
✏️ Tip: You can customize this high-level summary in your review settings.