Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics for benchmarks #424

Closed
jmatejcz opened this issue Feb 17, 2025 · 8 comments
Closed

Metrics for benchmarks #424

jmatejcz opened this issue Feb 17, 2025 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@jmatejcz
Copy link
Contributor

Come up with a metrics that will calculate how accurate the task was done.
After couple specific implementations think of common instructions/tools that might be useful when creating a metric

@jmatejcz jmatejcz added the enhancement New feature or request label Feb 17, 2025
@jmatejcz jmatejcz self-assigned this Feb 17, 2025
@jmatejcz
Copy link
Contributor Author

jmatejcz commented Feb 20, 2025

  • add verification to tasks, the verification will ensure that the scene is suitable for the task, in case when it is not suitable(for ex. move carrot, but there is no carrot in the scene) it should not be possible to create such scenario
  • add a way for creating scenarios automatically from given list of tasks and scenes

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Feb 20, 2025

  • define tasks more clearly
  • ensure proper naming and calculations

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Feb 20, 2025

  • number of tool calls?
  • calculating score also at the begining
  • storing results

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Feb 24, 2025

  • ensure all required service are running before starting scenario

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Feb 25, 2025

  • proper log after finishing all scenarios
  • logging how many scenarios left

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Feb 25, 2025

  • dump result into file

@jmatejcz jmatejcz reopened this Feb 28, 2025
@jmatejcz
Copy link
Contributor Author

jmatejcz commented Feb 28, 2025

@jmatejcz
Copy link
Contributor Author

introduced in #436
then extended in #444 and #452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant