-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new CLI options #68
Conversation
Found a case where the default
Will fix on Monday. |
@XkunW The options should be working now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #68 +/- ##
===========================================
- Coverage 79.50% 79.07% -0.43%
===========================================
Files 4 4
Lines 522 540 +18
===========================================
+ Hits 415 427 +12
- Misses 107 113 +6
🚀 New features to boost your workflow:
|
) | ||
@click.option( | ||
"--compilation-config", | ||
type=click.Choice(["0", "3"]), | ||
help="torch.compile optimization level, accepts '0' or '3', default to '0', which means no optimization is applied", | ||
type=click.Choice(["0", "1", "2", "3"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1
and 2
are meant for internal use for vLLM developers.
--compilation-config, -O
torch.compile configuration for the model.When it is a number (0, 1, 2, 3), it will be interpreted as the optimization level. NOTE: level 0 is the default level without any optimization. level 1 and 2 are for internal testing only. level 3 is the recommended level for production. To specify the full compilation config, use a JSON string. Following the convention of traditional compilers, using -O without space is also supported. -O3 is equivalent to -O 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, missed that bit in the description, will get rid of them
vec_inf/cli/_utils.py
Outdated
if "Prefix cache hit rate" in line: | ||
# Parse the metric values from the line | ||
metrics_str = line.split("] ")[1].strip() | ||
prefix, metrics_str = metrics_str.split(": ", 1) | ||
metrics_list = metrics_str.split(", ") | ||
for metric in metrics_list: | ||
key, value = metric.split(": ") | ||
latest_metric[f"{key} {prefix}"] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix cache hit rate is not captured in updated metrics
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this metric is relevant if prefix caching isn't enabled, I can add it in if you think this is useful, otherwise it just complicates the metrics command logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not needed if prefix caching is not enabled. It's not necessary to add it. I can still view it from the log file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the log files captures the production metrics anymore? IIUC everything goes to the metrics
API endpoint. I'm working on another PR right now, will add this change in there
PR Type
Feature
Short Description
Add new CLI options that can help with performance tuning.
--enable-chunked-prefill
: enables chunked prefill to prioritize decode requests over prefill requests.--max-num-batch-tokens
: specifies the token budget for chunked prefill.--enable-prefix-caching
: enables automatic prefix caching, which reuses the KV cache of existing requests that match new requests.--compilation-config
: level of optimization fortorch.compile
.Tests Added
...