Skip to content

[Bug]: Inference fails on older transformers versions due to DynamicLayer import and incorrect past_key_values initialization #339

@falcon-xu

Description

@falcon-xu

Is there an existing issue ? / 是否已有相关的 issue ?

  • I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。

Describe the bug / 描述这个 bug

🐛 Bug Description

When running inference on MiniCPM4 using transformers==4.49.0, the model throws an ImportError followed by a ValueError. These errors prevent the model from successfully executing the forward pass in environments that have not upgraded to the latest transformers versions.

🔍 Root Cause Analysis

We identified two distinct but related issues during the model initialization and the first forward pass:

1. DynamicLayer Import Error
DynamicLayer was introduced in transformers version 4.54.1. In older versions (e.g., 4.49.0), importing it directly from transformers.cache_utils causes a fatal crash:

ImportError: cannot import name 'DynamicLayer' from 'transformers.cache_utils' 

2. past_key_values Initialization Logic Flaw

During the first forward pass, past_key_values is naturally None. However, the current logic in MiniCPMModel.forward (around line 1940) misinterprets None as a legacy tuple cache because isinstance(None, Cache) evaluates to False:

ValueError: You must use the new past_key_values format, such as the Cache class, instead of the old tuple format

🛠️ Proposed Solution

To maximize backward compatibility without forcing users to upgrade their transformers package (which might break other dependencies), we propose the following minimal-impact fixes:

  1. Self-Contained Cache Classes:
    • Embed CacheLayerMixin and DynamicLayer definitions directly into modeling_minicpm.py as a fallback. (If use newer versions, it can be neglected)
  2. Refined Cache Check:
    • Update the validation logic in the forward method to explicitly allow past_key_values is None during the first pass and correctly initialize InfLLMv2Cache or DynamicCache.

PR: openbmb/MiniCPM4-8B · Fix: resolve transformers version compatibility for DynamicLayer and cache initialization

To Reproduce / 如何复现

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    "openbmb/MiniCPM4-8B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM4-8B", trust_remote_code=True)
prompt = "GitHub community standards dictate clear code reproduction."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model(
            input_ids=inputs["input_ids"], 
            attention_mask=inputs["attention_mask"],
            use_cache=True 
        )

Expected behavior / 期望的结果

Normal Inference

Screenshots / 截图

No response

Environment / 环境

- **Model:** MiniCPM4-8B / MiniCPM4-0.5B
- **Transformers Version:** <= 4.54.0 (e.g., 4.49.0)
- **PyTorch Version:** (Add your version here, e.g., 2.2.0)

Additional context / 其他信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions