You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Need to redesign ModulePipeline design to integrate with the LLMEngine design. This means that a request output will be returned, and logprobs must be calculated.
Calculate logprobs
Generate best_of responses instead of the current n to check from (sorted by the logprobs)
Use Sampler
Do not return a ChatChoices, return something like a RawResponse that contains the logprobs, results
Streaming poses a problem. This will need to be handled by the LLMEngine.
Overall, convert ModulePipeline to a pure-sequence-generating platform
LLM Engine
Manages calling the model pipeline and linking this with the KV cache.
Tracking issue for PagedAttention
General overview
_memory_efficient_attention
or equiv.llama
,mistral
modelsSampler
ModulePipeline
refactoringNeed to redesign ModulePipeline design to integrate with the LLMEngine design. This means that a request output will be returned, and logprobs must be calculated.
best_of
responses instead of the currentn
to check from (sorted by the logprobs)Sampler
LLM Engine
Manages calling the model pipeline and linking this with the KV cache.
Tasks
.generate
: batch the input seqs, calling.add_request
.add_request
.generate
.run_engine
:.step
through each unfinished request, recording output.has_unfinished_requests
.run_engine
.step
: 1) call Scheduler to manage the seqs to swap for this decoding phase with._schedule
, then 2) execute the model._schedule
.step
.execute_model
: Follow the cache ops from._schedule
, finally run the model pipeline with the cache..execute_model
Completed tasks
SequenceGroup:
Sequences generated from the same prompt.
Tasks
Cache:
CacheEngine manages the KV cache.
Tasks
BlockSpaceManager
Managed blocks and allocation
Deps
Tasks
Scheduler
Scheduler schedules blocks to swap in/out, copy.
Deps
Tasks
The text was updated successfully, but these errors were encountered: