-
-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Labels
needs-kindIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.
Description
What would you like to be added:
Here's an example from Triton_RTLLM with lws, https://github.com/triton-inference-server/tutorials/blob/main/Deployment/Kubernetes/EKS_Multinode_Triton_TRTLLM/multinode_helm_chart/chart/templates/deployment.yaml,
it needs to set a bunch of parameters dynamically, see
- python3
- ./server.py
- leader
- --triton_model_repo_dir={{ $.Values.triton.triton_model_repo_path }}
- --namespace={{ $.Release.Namespace }}
- --pp={{ $.Values.tensorrtLLM.parallelism.pipeline }}
- --tp={{ $.Values.tensorrtLLM.parallelism.tensor }}
- --gpu_per_node={{ $.Values.gpuPerNode }}
- --stateful_set_group_key=$(GROUP_KEY)We should support this, basically, we can set the params in the model.spec.inferenceFlavors[x].params, with a prefix like Params_GPU_PER_NODE, when rending, we'll cut the Params_.
Why is this needed:
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
needs-kindIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.