Skip to content

[WS3] End-to-end logprob cross-benchmark tool #131

@Flink-ddd

Description

@Flink-ddd

Part of #83 · Deferred to next phase — not in this month's sprint. Tracks #106.

The headline deliverable: one command runs the same fixed model + prompt set + seed through real vLLM and real Megatron, dumps both logprob streams, and computes drift. Tolerance reuses the #108 threshold table.

Planned PRs:

Metadata

Metadata

Labels

component: distributedTasks involving Ray actor management, cross-node scheduling, and communication synchronization.component: testingAdd test cases and benchmark-related tasksfeaturenext-phase

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions