Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benjams/add rag task card and metric #1044

Merged
merged 47 commits into from
Jul 29, 2024

Conversation

benjaminsznajder
Copy link
Collaborator

No description provided.

.gitignore Outdated Show resolved Hide resolved
Copy link
Member

@matanor matanor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjaminsznajder looks good IMO, left a few small questions and comments

prepare/tasks/rag/rag_end_to_end.py Outdated Show resolved Hide resolved
prepare/cards/rag/end_to_end/clapnq.py Outdated Show resolved Hide resolved
prepare/cards/rag/end_to_end/clapnq.py Outdated Show resolved Hide resolved
prepare/cards/rag/end_to_end/clapnq.py Outdated Show resolved Hide resolved
prepare/metrics/rag.py Show resolved Hide resolved
prepare/metrics/rag.py Outdated Show resolved Hide resolved
prepare/metrics/rag.py Show resolved Hide resolved
prepare/templates/rag/end_to_end.py Show resolved Hide resolved
Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A style comment (no need to adopt, totally personal preference here): when a variable name is longer or more complicated then the string it holds it should be replaced with the string.
For example:

ClapNqBenchmark.ID -> "id"

variables can break the linear flow of file reading and understanding of what exactly happening without going back and forth to the variables definitions.

Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, may need some tests, but can be done in a separate pr.

elronbandel and others added 13 commits July 29, 2024 23:26
…s containing types or functions (#1027)

Fix data classes not support field overriding in fields containing types or functions

Signed-off-by: elronbandel <elron.bandel@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
eladven and others added 25 commits July 29, 2024 23:26
* bug fixes in PairwiseChoiceTemplate

* add arena hard regex parser operator

* update mt bench card common

* update mt bench card common

* add reward bench

* update metric to pairwise comarison task

* arena hard tasks and cards

* update mt bench template

* add duplicate stream operator

* add PairwiseComparativeRatingTemplate

* add card

* add card

* add template

* add winrate metrics

* add comparative rating task

* add ExtractArenaHardNumericalJudgment

* add arena hard cards

* add arena hard template

* add weighted winrate metrics

* delete file

* update PairwiseComparativeRatingTemplate

* add metric

* add metric

* update

* update

* update

* fix template bug

* update

* llama 3 update

* update

* update

* update jsons

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* fix

* fix

* fix

* update

* update

* update

* bluebench related changes

* fix type issue

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

* update

* update

* update

* prometheus1

* update

* fix

* fix

* merge with arena_branch

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

* rebuild catalog

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

* add debugging to clapnq

* Reproduce all artifacts

* Add missing artifacts to catalog

* Add secrets baseline

Signed-off-by: Elad Venezian <eladv@il.ibm.com>

* Fix bugs with catalog creation

* Remove areana hard examples from tests, since they don't pass

* Add missing metadata to test mock

* Add data_classification_policy and recipe_metadata to the steams tests

* Fix test failures

* Update multi_turn_gpt4_judgement.py

* Update multi_turn_with_reference_gpt4_judgement.py

* Update docs/docs/examples.rst

Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>

* revert catalog consistecy and preperation yml files

* revert catalog consistecy and preperation yml files

* revert catalog consistecy and preperation yml files

* revert catalog consistecy and preperation yml files

* Update docs/docs/examples.rst

Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>

* bug fix in LoadFromHFSpace

* revert

* revert

* update examples

* add coment to expain change

* update to new params usage

* pr fixes

* pr fixes

* update

* update

* update

* update

* update

* update

* Update prepare/templates/rag/response_generation.py

Co-authored-by: Yotam Perlitz <perlitz@gmail.com>

* Update prepare/templates/rag/response_generation.py

Co-authored-by: Yotam Perlitz <perlitz@gmail.com>

* update

* cr fixes

* llmaj format fix

* llmaj format fix

---------

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Signed-off-by: Elad Venezian <eladv@il.ibm.com>
Co-authored-by: ofirarviv <ofir.arviv@ibm.com>
Co-authored-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Co-authored-by: michal <shmueli@il.ibm.com>
Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Co-authored-by: Yotam Perlitz <perlitz@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
* demo's target prefix is now taken from demo instance

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* do not pop fields out of demo instances.
Traditionally done for main instance, but not allowed for demo instance that should serve also other main instances in the stream

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* simplified test-case per @yoavkatz idea. Still eagering samples different demos than non-eagering

Signed-off-by: dafnapension <dafnashein@yahoo.com>

---------

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
* Added example for selection of demos

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added example doc

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Update docs/docs/examples.rst

* Update docs/docs/examples.rst

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
…oints to kaggle without version, and currently kaggle-1.6.15 fails. We fix the version of kaggle to be 1.6.14 as a fix

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
#1063)

Signed-off-by: welisheva22 <welisheva22@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Jonathan Bnayahu <bnayahu@il.ibm.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
…ications (#1062)

allow =None in overwrites for fetch

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: elronbandel <elron.bandel@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
* add judge input to the metric

* add judge input to the metric

* fix

* fix test

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
making Unitxt capitalization consistent in text

Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
* suggested fix for score_ci inconsistency issue

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* unify with the update, and thus simplified the check

Signed-off-by: dafnapension <dafnashein@yahoo.com>
---------

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
…rics (#1045)

* Fix data classes not support field overriding in fields containing types or functions

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Make tasks types python types

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Fix errors

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Some fixes

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* More fixes

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Update catalog

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Fix cards

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Revert change

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Fix typing in docs with new convention

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* refactor of new asset to new convention

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Update secrets baseline

Signed-off-by: elronbandel <elron.bandel@ibm.com>

---------

Signed-off-by: elronbandel <elron.bandel@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
* Added prediction type to llm as jusdge to avoid warning

Clarified the sandalone llm as judge example

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Removed accidentally added file

Signed-off-by: Yoav Katz <katz@il.ibm.com>

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Also updated rag tasks to use new typing (instead of string types)

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
* add metric "metrics.rag.retrieval_at_k" to catalog
this is a wrapper around the retrieval_at_k for the ragas scheme

* add corresponding json file for the new metric

---------

Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
@benjaminsznajder benjaminsznajder force-pushed the benjams/add_rag_task_card_and_metric branch from 6755b6e to 810b803 Compare July 29, 2024 20:27
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
@benjaminsznajder benjaminsznajder merged commit 29d2d93 into main Jul 29, 2024
8 checks passed
@benjaminsznajder benjaminsznajder deleted the benjams/add_rag_task_card_and_metric branch July 29, 2024 21:20
csrajmohan pushed a commit that referenced this pull request Aug 29, 2024
* Fix bug in data classes and add support for field overriding in fields containing types or functions (#1027)

Fix data classes not support field overriding in fields containing types or functions

Signed-off-by: elronbandel <elron.bandel@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Added seed to LLM as judges for consistent results (#1029)

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* replace type and __type__ in type error (#1035)

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add rag_end_to_end metrics

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add rag_end_to_end metrics

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Add task rag_end_to_end

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add card for clapnq end_to_end

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add sandbox_benjams

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add subset

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add a reduction of clap_nq

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add a reduction of clap_nq

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* remove constants

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* rename sandbox_benjams to sandbox

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* remove sandbox

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Add string to context id in rag (#1036)

* allow strings (hash) as context id

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

* save to catalog

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

---------

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Fixed issues with fresh install (#1037)

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add validation to tldr, remove shuffle from billsum (#1038)

* add validation to tldr, remove shuffle from billsum
(shuffled by the SplitRandomMix)

Signed-off-by: ALON HALFON <ALONHAL@il.ibm.com>

* fix formatting

Signed-off-by: ALON HALFON <ALONHAL@il.ibm.com>

---------

Signed-off-by: ALON HALFON <ALONHAL@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Refactor Rouge and Meteor to InstanceMetric for faster score computation (#1011)

* Remove confidence interval calculation for meteor metric by default

added a new metric with interval calculations

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added error mesage when metrics not  a list

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added error mesage

when post processors are not  a list

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Changed Rouge to be HuggingfaceBulkMetric

to avoid recalculation of metric on every resample

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* added meteor as an HuggingFaceInstanceMetric

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* removed meteor_with_confidence_intervals.json

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* fixed test_metric_utils.py by better concentrating on rougeL only

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* comment about rounded floats in tested scores

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* while generating metric meteor, compmare against HF implementation

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* added a test comparing new Rouge with HF Rouge, nd per arielge's good advice, changed bootstrap method to percentile in case of 100 or more instances

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* implemented Meteor and Rouge with inhouse code

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* download quietly, and import in prepare

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* trying to avoid .secrets.baseline

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* secret.baseline how do I get rid of it?

Signed-off-by: dafnapension <dafnashein@yahoo.com>

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Co-authored-by: dafnapension <dafnashein@yahoo.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Add CloseTextSampler and FixedIndicesSampler (#1034)

* Add CloseTextSampler

That returns demos that are textually close to the current instance.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Make sampler call pass  current instance

Added end 2 end test of sampler that depends on output

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added FixedIndicesSampler(Sampler):

Selects a fix set of samples based on a list of indices from the demo pool

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Made splitter currently use random_generators

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Changed all Sample randomization

To use common code to create randomizer per instance

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated demos in test

After a non backward compatible change

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated demos in test

After a non backward compatible change

Signed-off-by: Yoav Katz <katz@il.ibm.com>

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* changed input and output of templates to "input_fields" and "reference_ fields" - Non backward compatible (#1030)

* changed input and output of templates

to "input_fields" and "reference_ fields" .

This is to continue  the work done on tasks.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Fixed type hint

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Documentation update

Signed-off-by: Yoav Katz <katz@il.ibm.com>

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* FinQA - filter problematic examples (#1039)

filter problematic examples

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Arena hard elad2 (#1026)

* bug fixes in PairwiseChoiceTemplate

* add arena hard regex parser operator

* update mt bench card common

* update mt bench card common

* add reward bench

* update metric to pairwise comarison task

* arena hard tasks and cards

* update mt bench template

* add duplicate stream operator

* add PairwiseComparativeRatingTemplate

* add card

* add card

* add template

* add winrate metrics

* add comparative rating task

* add ExtractArenaHardNumericalJudgment

* add arena hard cards

* add arena hard template

* add weighted winrate metrics

* delete file

* update PairwiseComparativeRatingTemplate

* add metric

* add metric

* update

* update

* update

* fix template bug

* update

* llama 3 update

* update

* update

* update jsons

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* fix

* fix

* fix

* update

* update

* update

* bluebench related changes

* fix type issue

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

* update

* update

* update

* prometheus1

* update

* fix

* fix

* merge with arena_branch

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

* rebuild catalog

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>

* add debugging to clapnq

* Reproduce all artifacts

* Add missing artifacts to catalog

* Add secrets baseline

Signed-off-by: Elad Venezian <eladv@il.ibm.com>

* Fix bugs with catalog creation

* Remove areana hard examples from tests, since they don't pass

* Add missing metadata to test mock

* Add data_classification_policy and recipe_metadata to the steams tests

* Fix test failures

* Update multi_turn_gpt4_judgement.py

* Update multi_turn_with_reference_gpt4_judgement.py

* Update docs/docs/examples.rst

Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>

* revert catalog consistecy and preperation yml files

* revert catalog consistecy and preperation yml files

* revert catalog consistecy and preperation yml files

* revert catalog consistecy and preperation yml files

* Update docs/docs/examples.rst

Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>

* bug fix in LoadFromHFSpace

* revert

* revert

* update examples

* add coment to expain change

* update to new params usage

* pr fixes

* pr fixes

* update

* update

* update

* update

* update

* update

* Update prepare/templates/rag/response_generation.py

Co-authored-by: Yotam Perlitz <perlitz@gmail.com>

* Update prepare/templates/rag/response_generation.py

Co-authored-by: Yotam Perlitz <perlitz@gmail.com>

* update

* cr fixes

* llmaj format fix

* llmaj format fix

---------

Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Signed-off-by: Elad Venezian <eladv@il.ibm.com>
Co-authored-by: ofirarviv <ofir.arviv@ibm.com>
Co-authored-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Co-authored-by: michal <shmueli@il.ibm.com>
Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Co-authored-by: Yotam Perlitz <perlitz@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* demo's target prefix is now taken from demo instance (#1031)

* demo's target prefix is now taken from demo instance

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* do not pop fields out of demo instances.
Traditionally done for main instance, but not allowed for demo instance that should serve also other main instances in the stream

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* simplified test-case per @yoavkatz idea. Still eagering samples different demos than non-eagering

Signed-off-by: dafnapension <dafnashein@yahoo.com>

---------

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* remove the reduced clap_nq

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* define an empty template for rag end_to_end

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Implement metrics ensemble (#1047)

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add load_json_predictions as processor in the template

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add the processors/load_json_predictions.json generated to the catalog

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Add flores101 (#1053)

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Added example for selection of demos (#1052)

* Added example for selection of demos

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added example doc

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Update docs/docs/examples.rst

* Update docs/docs/examples.rst

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* fix - building test is not working. The reason is that opendatasets points to kaggle without version, and currently kaggle-1.6.15 fails. We fix the version of kaggle to be 1.6.14 as a fix

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add overwrite

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Update introduction.rst - - copy edits (grammar, consistency, clarity) (#1063)

Signed-off-by: welisheva22 <welisheva22@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Fix typo in japanese_llama system prompt (issue #964) (#1056)

Signed-off-by: Jonathan Bnayahu <bnayahu@il.ibm.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Allow assigning None in overwrites when fetching artifacts with modifications (#1062)

allow =None in overwrites for fetch

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Make sure preparation times printed fully and nicely (#1046)

Signed-off-by: elronbandel <elron.bandel@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* numeric nlg - template changes (#1041)

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* add judge input to the metric (#1064)

* add judge input to the metric

* add judge input to the metric

* fix

* fix test

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Unitxt capitalization adding_dataset.rst (#1057)

making Unitxt capitalization consistent in text

Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* fixed the score_ci inconsistency issue (#1065)

* suggested fix for score_ci inconsistency issue

Signed-off-by: dafnapension <dafnashein@yahoo.com>

* unify with the update, and thus simplified the check

Signed-off-by: dafnapension <dafnashein@yahoo.com>
---------

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Use of conventional python types in input definition of tasks and metrics (#1045)

* Fix data classes not support field overriding in fields containing types or functions

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Make tasks types python types

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Fix errors

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Some fixes

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* More fixes

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Update catalog

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Fix cards

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Revert change

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Fix typing in docs with new convention

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* refactor of new asset to new convention

Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Update secrets baseline

Signed-off-by: elronbandel <elron.bandel@ibm.com>

---------

Signed-off-by: elronbandel <elron.bandel@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Added prediction type to llm as jusdge to avoid warning (#1072)

* Added prediction type to llm as jusdge to avoid warning

Clarified the sandalone llm as judge example

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Removed accidentally added file

Signed-off-by: Yoav Katz <katz@il.ibm.com>

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Fixed clapnq to check with reasonable error values

Also updated rag tasks to use new typing (instead of string types)

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* fix the type hint

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* update catalog

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* Add metric "metrics.rag.retrieval_at_k" to catalog (#1074)

* add metric "metrics.rag.retrieval_at_k" to catalog
this is a wrapper around the retrieval_at_k for the ragas scheme

* add corresponding json file for the new metric

---------

Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

* merge - resolve conflict

Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>

---------

Signed-off-by: elronbandel <elron.bandel@ibm.com>
Signed-off-by: Benjamin Sznajder <benjams@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Signed-off-by: ALON HALFON <ALONHAL@il.ibm.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: Elad Venezian <eladv@il.ibm.com>
Signed-off-by: welisheva22 <welisheva22@gmail.com>
Signed-off-by: Jonathan Bnayahu <bnayahu@il.ibm.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Co-authored-by: Yotam Perlitz <perlitz@gmail.com>
Co-authored-by: Benjamin Sznajder <benjams@il.ibm.com>
Co-authored-by: Alon H <alonh@users.noreply.github.com>
Co-authored-by: dafnapension <dafnashein@yahoo.com>
Co-authored-by: ShirApp <58909189+ShirApp@users.noreply.github.com>
Co-authored-by: Elad <eladv@il.ibm.com>
Co-authored-by: ofirarviv <ofir.arviv@ibm.com>
Co-authored-by: Yotam Perlitz <yotam.perlitz@ibm.com>
Co-authored-by: michal <shmueli@il.ibm.com>
Co-authored-by: dafnapension <46454972+dafnapension@users.noreply.github.com>
Co-authored-by: welisheva22 <welisheva22@gmail.com>
Co-authored-by: Jonathan Bnayahu <bnayahu@il.ibm.com>
Co-authored-by: hanansinger <95229126+hanansinger@users.noreply.github.com>
Co-authored-by: Yoav Katz <katz@il.ibm.com>
Co-authored-by: matanor <55045955+matanor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RAG end-to-end task and metrics to unitxt + clapNQ dataset for RAG end-to-end task