Skip to content

Able to set the toRender parameters dynamically #239

@kerthcet

Description

@kerthcet

What would you like to be added:

Here's an example from Triton_RTLLM with lws, https://github.com/triton-inference-server/tutorials/blob/main/Deployment/Kubernetes/EKS_Multinode_Triton_TRTLLM/multinode_helm_chart/chart/templates/deployment.yaml,
it needs to set a bunch of parameters dynamically, see

          - python3
          - ./server.py
          - leader
          - --triton_model_repo_dir={{ $.Values.triton.triton_model_repo_path }}
          - --namespace={{ $.Release.Namespace }}
          - --pp={{ $.Values.tensorrtLLM.parallelism.pipeline }}
          - --tp={{ $.Values.tensorrtLLM.parallelism.tensor }}
          - --gpu_per_node={{ $.Values.gpuPerNode }}
          - --stateful_set_group_key=$(GROUP_KEY)

We should support this, basically, we can set the params in the model.spec.inferenceFlavors[x].params, with a prefix like Params_GPU_PER_NODE, when rending, we'll cut the Params_.

Why is this needed:

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-kindIndicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions