Skip to content

Revert "[lora][tinker] Add pause and resume for multi-tenant lora "#1678

Merged
erictang000 merged 1 commit into
mainfrom
revert-1657-multi-lora-pause
May 16, 2026
Merged

Revert "[lora][tinker] Add pause and resume for multi-tenant lora "#1678
erictang000 merged 1 commit into
mainfrom
revert-1657-multi-lora-pause

Conversation

@erictang000
Copy link
Copy Markdown
Collaborator

Reverts #1657, since we can skip pause/resume for multi-tenant lora after #1677

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the transient per-LoRA targeted pause and resume functionality, including the lora_name parameter from generation control methods and the sample_with_retry logic. Associated server endpoints and test suites have also been deleted. Feedback identifies a potential issue in the sample method where removing dictionary guards for prompt and sampling_params could lead to AttributeError if those fields are explicitly provided as null.

Comment on lines 608 to +610
prompt = body.get("prompt", {})

# Render prompt: flatten text tokens and, if images are present,
# call the render endpoint to get placeholder tokens + features.
token_ids, mm_features = await self._render_for_sample(prompt, session_id, model=model)

return await self._sample_with_rendered_tokens(
token_ids=token_ids,
mm_features=mm_features,
body=body,
session_id=session_id,
model=model,
)

async def _sample_with_rendered_tokens(
self,
token_ids: List[int],
mm_features: Optional[MultiModalFeatures],
body: Dict[str, Any],
session_id: Optional[str],
model: str,
) -> SampleResponse:
"""Dispatch a single rendered sample request to /inference/v1/generate.

Shared between ``sample`` (one shot) and ``sample_with_retry`` (loops
on this method with accumulating ``token_ids`` and decremented
``max_tokens``). No retry, no LoRA gating; those live in the caller.
"""
num_samples = body.get("num_samples", 1)
tinker_params = body.get("sampling_params", {}) or {}
tinker_params = body.get("sampling_params", {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of the or {} guard for prompt and sampling_params reduces the robustness of the sample method. If the input JSON explicitly contains null for these fields, body.get(key, {}) will return None, leading to an AttributeError when .get() or other dictionary methods are called on them later (e.g., at line 635).

Suggested change
prompt = body.get("prompt", {})
# Render prompt: flatten text tokens and, if images are present,
# call the render endpoint to get placeholder tokens + features.
token_ids, mm_features = await self._render_for_sample(prompt, session_id, model=model)
return await self._sample_with_rendered_tokens(
token_ids=token_ids,
mm_features=mm_features,
body=body,
session_id=session_id,
model=model,
)
async def _sample_with_rendered_tokens(
self,
token_ids: List[int],
mm_features: Optional[MultiModalFeatures],
body: Dict[str, Any],
session_id: Optional[str],
model: str,
) -> SampleResponse:
"""Dispatch a single rendered sample request to /inference/v1/generate.
Shared between ``sample`` (one shot) and ``sample_with_retry`` (loops
on this method with accumulating ``token_ids`` and decremented
``max_tokens``). No retry, no LoRA gating; those live in the caller.
"""
num_samples = body.get("num_samples", 1)
tinker_params = body.get("sampling_params", {}) or {}
tinker_params = body.get("sampling_params", {})
prompt = body.get("prompt") or {}
num_samples = body.get("num_samples", 1)
tinker_params = body.get("sampling_params") or {}

@erictang000 erictang000 merged commit ddc68ee into main May 16, 2026
5 checks passed
@erictang000 erictang000 deleted the revert-1657-multi-lora-pause branch May 16, 2026 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant