Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llm as judge mt-bench dataset and metrics #791

Merged
merged 164 commits into from
May 20, 2024

Conversation

OfirArviv
Copy link
Collaborator

No description provided.

@OfirArviv OfirArviv marked this pull request as ready for review May 5, 2024 12:26
@elronbandel elronbandel merged commit d854109 into main May 20, 2024
7 checks passed
@elronbandel elronbandel deleted the users/ofir/add_llm_as_judge_dataset_and_metrics branch May 20, 2024 09:10
bnayahu pushed a commit that referenced this pull request May 21, 2024
* add mt_bench_single_turn_gpt4_judge dataset

* added typings to model_response_assessment task field

* fixed output_format in mt_bench template

* fixed output_format in mt_bench template

* add llama3 format

* temporal changes to the inference engines

* add llama3_bam_mt_bench_prompt llm-as-judge metric

* add assert to openai model recipe

* update genai and openai inference apis

* add model_response_assessment_chat task

* add ChatTemplate

* add model_response_assessment.json

* fix model_response_assessment.json

* add template and task of chat llm as judge

* mt bench templates

* mt bench templates

* model assessment tasks

* add InterleaveListsToDialogOperator operator

* update dialog template

* update mt bench template

* update mt bench template update

* update chat template

* add mt bench datasets

* small fixes

* update metrics

* update metrics

* delete old files

* update test requirements file

* update test requirements file

* update llam3 metric with correct format

* add model assestmnt tasks with reference

* update tasks

* clear catalog

* add tasks

* update task

* update templates

* update

* update

* update

* add mt bench pairwise proccessor

* remove odl file

* update

* add model assesment pairwise comparison tass

* add pairwise templates

* fix pairwise templates

* fix mt bench pairwise processor

* fix template

* add mt-bench pairwise dataset

* llm as judge metric cards

* add llama3 metrics

* update

* update

* update prepare test python version

* clean catalog

* update templates

* update tasks

* update tasks

* update templates

* update cards

* update cards

* update templates

* add cards

* add cards for llm as judge metric

* add cards for llm as judge metric

* add metrics

* merge

* add mt becnh generation datasets

* fix

* fix

* fix

* fix

* update python to 3.9 for catalog testing

* remove old catalog items

* update llm as a judge

* update readme

* update tests

* update dynamic cards for llm as judge

* update llm as jusge etric

* update tests

* add the ability to strip_system_prompt_and_format_from_inputs

* update tests

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* add phi3 format

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update cards with LiteralEval

* update cards with LiteralEval

* make llm judge dynamic fields

* add json

* update

* update metric

* update

* fix

* update readme

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update llm_as_judge.rst

* update

* update

* update

* update

* update

* Update llm_as_judge.rst (#847)

* update

* update

* update

* update

* update

* update

* update

* update

* small fix

* small fix

---------

Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants