Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more model into benchmark and evaluate workflow #1565

Merged
merged 14 commits into from
May 13, 2024

Conversation

zhulinJulia24
Copy link
Collaborator

  1. make evaluate dataset as an input in workflow; add more models into evaluation regression
  2. react evaluate config to make it easy to maintain. Add llama3、qwen1.5 moe、kvint4 config for evaluate regression.
  3. add more models into benchmark and modify max batch size in benchmark.
  4. add more support models and remove some old models in regression test.

benchmark test record: https://github.com/zhulinJulia24/lmdeploy/actions/runs/8998731504
evaluation test record: https://github.com/zhulinJulia24/lmdeploy/actions/runs/9012663244

@zhulinJulia24
Copy link
Collaborator Author

zhulinJulia24 commented May 9, 2024

https://github.com/zhulinJulia24/lmdeploy/actions/runs/9015990252 this contains all without qwen moe

# WSC_datasets # noqa: F401, E501
# from .datasets.triviaqa.triviaqa_gen_2121ce import \
# triviaqa_datasets # noqa: F401, E501
from .datasets.race.race_gen_69ee4f import \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more datasets involves, the more time the evaluation costs.
I suggest keeping eval, gsm8k and mmlu, and remove the rest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more datasets involves, the more time the evaluation costs. I suggest keeping eval, gsm8k and mmlu, and remove the rest.

yes. Default datasets is gsm8k and mmlu controlled by workflow input. If I want to evaluate more datesets, I can also use the same workflow with different input. the value is here https://github.com/zhulinJulia24/lmdeploy/blob/497783134da77790096a2a25f10244aa11cef134/.github/workflows/evaluate.yml#L25

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 merged commit ca4de27 into InternLM:main May 13, 2024
5 checks passed
@zhulinJulia24 zhulinJulia24 deleted the update_benchmark_evaluate branch August 26, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants