-
Notifications
You must be signed in to change notification settings - Fork 572
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
33 changed files
with
1,895 additions
and
495 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
nightly-requirements.txt linguist-generated=true | ||
* text=auto eol=lf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#### Serving LLM with fine-tuned LoRA, QLoRA adapters layers | ||
|
||
Then the given fine tuning weights can be served with the model via | ||
`openllm start`: | ||
|
||
```bash | ||
openllm start opt --model-id facebook/opt-6.7b --adapter-id /path/to/adapters | ||
``` | ||
|
||
If you just wish to try some pretrained adapter checkpoint, you can use | ||
`--adapter-id`: | ||
|
||
```bash | ||
openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6.7b-lora | ||
``` | ||
|
||
To use multiple adapters, use the following format: | ||
|
||
```bash | ||
openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6.7b-lora --adapter-id aarnphm/opt-6.7b-lora:french_lora | ||
``` | ||
|
||
By default, the first `adapter-id` will be the default lora layer, but | ||
optionally users can change what lora layer to use for inference via | ||
`/v1/adapters`: | ||
|
||
```bash | ||
curl -X POST http://localhost:3000/v1/adapters --json '{"adapter_name": "vn_lora"}' | ||
``` | ||
|
||
> Note that for multiple `adapter-name` and `adapter-id`, it is recomended to | ||
> update to use the default adapter before sending the inference, to avoid any | ||
> performance degradation | ||
To include this into the Bento, one can also provide a `--adapter-id` into | ||
`openllm build`: | ||
|
||
```bash | ||
openllm build opt --model-id facebook/opt-6.7b --adapter-id ... | ||
``` | ||
|
||
### Rework | ||
|
||
Separate out configuration builder, to make it more flexible for future | ||
configuration generation. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.