Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update doc for llama3 #1462

Merged
merged 2 commits into from
Apr 19, 2024
Merged

update doc for llama3 #1462

merged 2 commits into from
Apr 19, 2024

Conversation

zhyncs
Copy link
Contributor

@zhyncs zhyncs commented Apr 19, 2024

Motivation

as titled

ref #1459

Modification

as titled

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@lvhan028 lvhan028 requested a review from AllentDan April 19, 2024 05:25
Copy link
Collaborator

@AllentDan AllentDan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 added the documentation Improvements or additions to documentation label Apr 19, 2024
@lvhan028 lvhan028 merged commit a02ed41 into InternLM:main Apr 19, 2024
4 checks passed
@zhyncs zhyncs deleted the patch-4 branch April 19, 2024 05:49
@zhyncs
Copy link
Contributor Author

zhyncs commented Apr 19, 2024

The compatibility verification is all OK with main branch.

# TurboMind
python3 -m lmdeploy serve api_server /workdir/Meta-Llama-3-8B

# PyTorch
python3 -m lmdeploy serve api_server /workdir/Meta-Llama-3-8B --backend pytorch

# KV Cache Int8
python3 -m lmdeploy serve api_server /workdir/Meta-Llama-3-8B --quant-policy 8

# KV Cache Int4
python3 -m lmdeploy serve api_server /workdir/Meta-Llama-3-8B --quant-policy 4

# AWQ
python3 -m lmdeploy lite auto_awq /workdir/Meta-Llama-3-8B --calib-dataset 'ptb' --calib-samples 128 --calib-seqlen 2048 --w-bits 4 --w-group-size 128 --work-dir /workdir/Meta-Llama-3-8B-AWQ
python3 -m lmdeploy serve api_server /workdir/Meta-Llama-3-8B-AWQ

# press
python3 benchmark/profile_restful_api.py --server_addr 127.0.0.1:23333 --tokenizer_path /workdir/Meta-Llama-3-8B --dataset /workdir/ShareGPT_V3_unfiltered_cleaned_split.json --concurrency 128 --num_prompts 1000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants