add more model into benchmark and evaluate workflow #1565

zhulinJulia24 · 2024-05-09T06:17:44Z

make evaluate dataset as an input in workflow; add more models into evaluation regression
react evaluate config to make it easy to maintain. Add llama3、qwen1.5 moe、kvint4 config for evaluate regression.
add more models into benchmark and modify max batch size in benchmark.
add more support models and remove some old models in regression test.

benchmark test record: https://github.com/zhulinJulia24/lmdeploy/actions/runs/8998731504
evaluation test record: https://github.com/zhulinJulia24/lmdeploy/actions/runs/9012663244

.github/workflows/evaluate.yml

.github/scripts/eval_opencompass_config.py

zhulinJulia24 · 2024-05-09T14:24:43Z

https://github.com/zhulinJulia24/lmdeploy/actions/runs/9015990252 this contains all without qwen moe

.github/scripts/set_benchmark_param.sh

lvhan028 · 2024-05-10T03:14:20Z

.github/scripts/eval_opencompass_config.py

-    #     WSC_datasets  # noqa: F401, E501
-    # from .datasets.triviaqa.triviaqa_gen_2121ce import \
-    #     triviaqa_datasets  # noqa: F401, E501
+    from .datasets.race.race_gen_69ee4f import \


The more datasets involves, the more time the evaluation costs.
I suggest keeping eval, gsm8k and mmlu, and remove the rest.

The more datasets involves, the more time the evaluation costs. I suggest keeping eval, gsm8k and mmlu, and remove the rest.

yes. Default datasets is gsm8k and mmlu controlled by workflow input. If I want to evaluate more datesets, I can also use the same workflow with different input. the value is here https://github.com/zhulinJulia24/lmdeploy/blob/497783134da77790096a2a25f10244aa11cef134/.github/workflows/evaluate.yml#L25

.github/scripts/eval_opencompass_config.py

.github/scripts/action_tools.py

autotest/config.yaml

RunningLeon

LGTM

zhulin1 added 8 commits May 8, 2024 19:28

update

640a2d5

update

3be54e0

update

6859dd2

update

a70ab39

update

7471d3f

update

6b62cc7

update

6696166

update

371bda0

zhulinJulia24 requested review from lvhan028 and RunningLeon May 9, 2024 06:17

RunningLeon reviewed May 9, 2024

View reviewed changes

.github/workflows/evaluate.yml Show resolved Hide resolved

RunningLeon reviewed May 9, 2024

View reviewed changes

.github/scripts/eval_opencompass_config.py Show resolved Hide resolved

lvhan028 reviewed May 10, 2024

View reviewed changes

.github/scripts/set_benchmark_param.sh Show resolved Hide resolved

lvhan028 reviewed May 10, 2024

View reviewed changes

.github/scripts/eval_opencompass_config.py Show resolved Hide resolved

zhulin1 added 2 commits May 10, 2024 13:38

update

22deccf

update

4977831

lvhan028 reviewed May 10, 2024

View reviewed changes

.github/scripts/action_tools.py Outdated Show resolved Hide resolved

zhulin1 added 2 commits May 10, 2024 14:19

update

a8eaeb8

update

172123e

lvhan028 reviewed May 13, 2024

View reviewed changes

autotest/config.yaml Show resolved Hide resolved

lvhan028 reviewed May 13, 2024

View reviewed changes

autotest/config.yaml Show resolved Hide resolved

lvhan028 approved these changes May 13, 2024

View reviewed changes

zhulin1 added 2 commits May 13, 2024 11:50

update

3438463

update

bfe6856

RunningLeon approved these changes May 13, 2024

View reviewed changes

lvhan028 merged commit ca4de27 into InternLM:main May 13, 2024
5 checks passed

zhulinJulia24 deleted the update_benchmark_evaluate branch August 26, 2024 09:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more model into benchmark and evaluate workflow #1565

add more model into benchmark and evaluate workflow #1565

zhulinJulia24 commented May 9, 2024

zhulinJulia24 commented May 9, 2024 •

edited

Loading

lvhan028 May 10, 2024

zhulinJulia24 May 10, 2024

RunningLeon left a comment

add more model into benchmark and evaluate workflow #1565

add more model into benchmark and evaluate workflow #1565

Conversation

zhulinJulia24 commented May 9, 2024

zhulinJulia24 commented May 9, 2024 • edited Loading

lvhan028 May 10, 2024

Choose a reason for hiding this comment

zhulinJulia24 May 10, 2024

Choose a reason for hiding this comment

RunningLeon left a comment

Choose a reason for hiding this comment

zhulinJulia24 commented May 9, 2024 •

edited

Loading