Skip to content

fix: engine cannot give response to the second user request with stre… #701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lihaiyin88
Copy link

No description provided.

@CharlieFRuan
Copy link
Contributor

thanks a lot for the contribution. Would it be possible for you to provide a script for reproducing the issue / elaborate on the issue? Thank you!

@CharlieFRuan CharlieFRuan requested a review from Copilot June 24, 2025 14:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR restructures the streaming generation method in MLCEngine to ensure the model lock is always released once, fixing an issue where the engine could not handle a second streaming request.

  • Flattened multiple nested try/catch blocks into a single try + finally around the main logic.
  • Converted inner helper functions (_countTrailingReplacementChar, _getChunk) to arrow function expressions.
  • Removed redundant lock.release() calls spread across catches; now released once in finally.
Comments suppressed due to low confidence (2)

src/engine.ts:706

  • This TODO is still open—either implement the usage support for non-chat completions or file a tracking issue to ensure it isn't overlooked.
          // TODO(Charlie): support usage for completion

src/engine.ts:483

  • Add a test case that performs two consecutive streaming requests (with and without errors) to verify the lock is always released and reacquired correctly.
    genConfig: GenerationConfig,

@@ -483,18 +483,15 @@ export class MLCEngine implements MLCEngineInterface {
genConfig: GenerationConfig,
timeReceived: number,
): AsyncGenerator<ChatCompletionChunk | Completion, void, void> {
// Since it is an async generator, we need to do fine-grained try-catch to ensure lock is
// released only when errors occur. Then release at the very end when no error occurs.
// TODO: This makes code less readable, is there a better way to do this?
const lock = this.loadedModelIdToLock.get(model)!;
Copy link
Preview

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a brief comment above this try/finally to clarify its scope is for lock acquisition and release, improving future readability.

Suggested change
const lock = this.loadedModelIdToLock.get(model)!;
const lock = this.loadedModelIdToLock.get(model)!;
// Acquire the lock and ensure its release in the `finally` block.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants