Skip to content

Conversation

@oktie
Copy link
Member

@oktie oktie commented Jun 24, 2025

This PR introduces improvements to the Text2SQL evaluation framework, focusing on flexibility, accuracy, and robustness of execution-based metrics.

Changes:

  • Refactored the Text2SQL metrics implementation to simplify the addition of new metrics.
  • Added a new metric that replaces the SELECT clause of the prediction with that of the ground truth, allowing comparison of execution logic independent of selected columns.
  • Fixed edge cases in execution metric functions to ensure more reliable evaluation.

oktie and others added 18 commits June 3, 2025 13:35
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
* Add Multi Turn Metrics Support

Signed-off-by: elronbandel <elronbandel@gmail.com>

* Add multi-turn metrics and templates support

Signed-off-by: elronbandel <elronbandel@gmail.com>

* Refactor MultiTurnMetric into GroupMetric for improved grouping and item identification

Signed-off-by: elronbandel <elronbandel@gmail.com>

* Remove duplicate import of dict_get and update line number in secrets baseline

Signed-off-by: elronbandel <elronbandel@gmail.com>

* Implement sequential success accuracy metric and refactor group reduction logic

Signed-off-by: elronbandel <elronbandel@gmail.com>

* Format

Signed-off-by: elronbandel <elronbandel@gmail.com>

* Some fixes

Signed-off-by: elronbandel <elronbandel@gmail.com>

* Rename

Signed-off-by: elronbandel <elronbandel@gmail.com>

---------

Signed-off-by: elronbandel <elronbandel@gmail.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
* Increase token limit so the judge can get to the actual answer.

Signed-off-by: Jonathan Bnayahu <bnayahu@il.ibm.com>

* now with the json file

Signed-off-by: Jonathan Bnayahu <bnayahu@il.ibm.com>

---------

Signed-off-by: Jonathan Bnayahu <bnayahu@il.ibm.com>
…enabled. (#1834)

* Added example of running inference with log probability

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Initial changes to support generated_text in meta data

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added missing generated_text in llava models

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Fixed WMLInferenceEngineChat and improved tests

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added print  header to exmaple

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added "text" and  "logprob" to OpenAiInferenceEngine

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Reverted test question change

* Updated tests

Signed-off-by: Yoav Katz <katz@il.ibm.com>

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Co-authored-by: Elron Bandel <elronbandel@gmail.com>
Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
@oktie oktie requested review from elronbandel and perlitz June 24, 2025 22:12
Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patches in the tests are still pointing towards the old files:
for example at line 108 in the tests:

@patch(
        "unitxt.sql_utils.LocalSQLiteConnector.get_db_file_path",

Other than that it looks great and better organized.

oktie added 3 commits June 25, 2025 07:48
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
Signed-off-by: Oktie Hassanzadeh <hassanzadeh@us.ibm.com>
@oktie oktie requested a review from elronbandel June 25, 2025 14:47
@oktie
Copy link
Member Author

oktie commented Jun 25, 2025

Thanks @elronbandel - I believe I've fixed the tests. I also added some more docs to the text2sql util functions.

Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@elronbandel elronbandel merged commit 816ca2b into main Jun 25, 2025
14 of 16 checks passed
@elronbandel elronbandel deleted the text2sql-metrics-update branch June 25, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants