Skip to content

[Improvement] LLM auto provision for CPU/GPU nodes #97

@nayutah

Description

@nayutah

Is your improvement request related to a problem? Please describe.
1, Auto provision for CPU only nodes, load the model with the minimum quantization option, one can detect the node with command and set the env variables
2, Match the models with different GPU types (GPU memory), one can map them in a config (model-config.json) and parse it with a script to get informations as needed, like {name:"baichuan-13b", memory:"20GB", quantization:"ON/OFF", quantization_bits:"4|8", ....}
3, So the end user only cares about the model name and the node running in CPU or GPU mode. For us, we need to maintain one CD/CV to support all kinds of LLMs

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions