[Bug]: Inference fails on older transformers versions due to DynamicLayer import and incorrect past_key_values initialization

### Is there an existing issue ? / 是否已有相关的 issue ?

- [x] I have searched, and there is no existing issue. / 我已经搜索过了，没有相关的 issue。

### Describe the bug / 描述这个 bug

## 🐛 Bug Description

When running inference on [MiniCPM4](https://huggingface.co/openbmb/MiniCPM4-8B/tree/main)  using `transformers==4.49.0`, the model throws an `ImportError` followed by a `ValueError`. These errors prevent the model from successfully executing the forward pass in environments that have not upgraded to the latest `transformers` versions.

## 🔍 Root Cause Analysis

We identified two distinct but related issues during the model initialization and the first forward pass:

**1. `DynamicLayer` Import Error**
`DynamicLayer` was introduced in `transformers` version [4.54.1](https://github.com/huggingface/transformers/commit/c338fd43b0be2c7f5d73e693fa6fb1b5e7a0bdc2). In older versions (e.g., `4.49.0`), importing it directly from `transformers.cache_utils` causes a fatal crash:

```python
ImportError: cannot import name 'DynamicLayer' from 'transformers.cache_utils' 
```

**2. `past_key_values` Initialization Logic Flaw**

During the first forward pass, `past_key_values` is naturally `None`. However, the current logic in `MiniCPMModel.forward` (around line 1940) misinterprets `None` as a legacy tuple cache because `isinstance(None, Cache)` evaluates to `False`:


```
ValueError: You must use the new past_key_values format, such as the Cache class, instead of the old tuple format
```

## 🛠️ Proposed Solution

To maximize backward compatibility without forcing users to upgrade their `transformers` package (which might break other dependencies), we propose the following minimal-impact fixes:

1. **Self-Contained Cache Classes:** 
   - Embed `CacheLayerMixin` and `DynamicLayer` definitions directly into `modeling_minicpm.py` as a fallback. (If use newer versions, it can be neglected)
2. **Refined Cache Check:** 
   - Update the validation logic in the `forward` method to explicitly allow `past_key_values is None` during the first pass and correctly initialize `InfLLMv2Cache` or `DynamicCache`.

**PR**: [openbmb/MiniCPM4-8B · Fix: resolve transformers version compatibility for DynamicLayer and cache initialization](https://huggingface.co/openbmb/MiniCPM4-8B/discussions/18)



### To Reproduce / 如何复现

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    "openbmb/MiniCPM4-8B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM4-8B", trust_remote_code=True)
prompt = "GitHub community standards dictate clear code reproduction."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model(
            input_ids=inputs["input_ids"], 
            attention_mask=inputs["attention_mask"],
            use_cache=True 
        )
```

### Expected behavior / 期望的结果

Normal Inference

### Screenshots / 截图

_No response_

### Environment / 环境

```shell
- **Model:** MiniCPM4-8B / MiniCPM4-0.5B
- **Transformers Version:** <= 4.54.0 (e.g., 4.49.0)
- **PyTorch Version:** (Add your version here, e.g., 2.2.0)
```

### Additional context / 其他信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Inference fails on older transformers versions due to DynamicLayer import and incorrect past_key_values initialization #339

Is there an existing issue ? / 是否已有相关的 issue ?

Describe the bug / 描述这个 bug

🐛 Bug Description

🔍 Root Cause Analysis

🛠️ Proposed Solution

To Reproduce / 如何复现

Expected behavior / 期望的结果

Screenshots / 截图

Environment / 环境

Additional context / 其他信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Inference fails on older transformers versions due to DynamicLayer import and incorrect past_key_values initialization #339

Description

Is there an existing issue ? / 是否已有相关的 issue ?

Describe the bug / 描述这个 bug

🐛 Bug Description

🔍 Root Cause Analysis

🛠️ Proposed Solution

To Reproduce / 如何复现

Expected behavior / 期望的结果

Screenshots / 截图

Environment / 环境

Additional context / 其他信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions