-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding lllama fairscale #2604
base: master
Are you sure you want to change the base?
adding lllama fairscale #2604
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2604 +/- ##
=======================================
Coverage 72.44% 72.44%
=======================================
Files 85 85
Lines 3963 3963
Branches 58 58
=======================================
Hits 2871 2871
Misses 1088 1088
Partials 4 4 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @HamidShojanazeri for this PR. Please see comments inline. It will be good to match the readme and config files for consistent example -- eg use 13b model as the base and explain everything for that.
model_path: "PATH/TO/MODEL_CHECKPOINTS" | ||
tokenizer_path: "PATH/TO/MODEL_CHECKPOINTS/tokenizer.model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you change the dir as the same way as this PR: https://github.com/pytorch/serve/pull/2623/files#diff-8dff1fb7c93d43b560e8ef09c2e2c6f93b55309399d807e231131ea962303dae?
### Step 3: Generate MAR file | ||
|
||
```bash | ||
torch-model-archiver --model-name llama --version 1.0 --handler llama-handler.py --config-file model-config.yaml --archive-format tgz -r requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change "--archive-format tgz" to "--archive-format no-archive"
model_path = ctx.model_yaml_config["handler"]["model_path"] | ||
tokenizer_path = ctx.model_yaml_config["handler"]["tokenizer_path"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you change to the same way as https://github.com/pytorch/serve/blob/master/examples/large_models/tp_llama/llama-handler.py#L68C1-L68C1?
ie. model_path = f'{model_dir}/{ctx.model_yaml_config["handler"]["model_path"]}'
torch.manual_seed(seed) | ||
|
||
logger.info("Instantiating Llama model") | ||
self.model = Llama.build( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq, should we provide option defer init for llama2-70b?
Description
Adding Fariscale llama to Torchserve
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs
Test B
Logs for Test B
Checklist: