[Improvement] LLM auto provision for CPU/GPU nodes

Is your improvement request related to a problem? Please describe.
1, Auto provision for CPU only nodes, load the model with the minimum quantization option, one can detect the node with command and set the env variables
2, Match the models with different GPU types (GPU memory), one can map them in a config (model-config.json) and parse it with a script to get informations as needed, like {name:"baichuan-13b", memory:"20GB", quantization:"ON/OFF", quantization_bits:"4|8", ....}
3, So the end user only cares about the model name and the node running in CPU or GPU mode. For us, we need to maintain one CD/CV to support all kinds of LLMs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] LLM auto provision for CPU/GPU nodes #97

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Improvement] LLM auto provision for CPU/GPU nodes #97

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions