[Bug]: Memory leak in 0.6 and 0.7 when setting "max_tokens=1"

### Your current environment

<details>
<summary>There is memory leak when I set max_tokens=1</summary>

```text
I have tested 2 versions:
pip install vllm==0.6.6
or
pip install vllm==0.7.2

If I call generate with SamplingParams(max_tokens=1), there is a memory leak.
```

</details>


### 🐛 Describe the bug

from vllm import LLM, SamplingParams

llm=LLM(model='qwen/Qwen2.5-0.5B-Instruct', trust_remote_code=True)
sampling_params = SamplingParams(max_tokens=1)
while True:
    result = llm.generate("Hello world, ", sampling_params)
    pass
pass


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Memory leak in 0.6 and 0.7 when setting "max_tokens=1" #13464

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Memory leak in 0.6 and 0.7 when setting "max_tokens=1" #13464

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions