Laminar evaluations#175
Conversation
fac7001 to
6c8797a
Compare
simonrosenberg
left a comment
There was a problem hiding this comment.
Can you solve the 2 mutables?
Then there is still an issue because the eval output format is really suboptimal but this can be solved in the standardization tickef!
There was a problem hiding this comment.
Actually can we not merge this until I standardize the evaluation output format? sorry! Should be done by tomorrow
Issue here #177
|
@simonrosenberg addressed the comments! sure, let me know once you standardise the format, I'll adjust laminar service as needed, thank you |
|
@OpenHands please resolve the conflicts |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
Resolve conflicts: - benchmarks/commit0/run_infer.py: Keep remote workspace support from main, add forward_env parameter to DockerDevWorkspace - pyproject.toml: Combine dependencies from both branches, use lmnr>=0.7.24 - uv.lock: Regenerate lock file Co-authored-by: openhands <openhands@all-hands.dev>
|
I've resolved the merge conflicts in PR #175 by merging Conflicts Resolved
Important NoteThe branch has been pushed to The PR author (@Rainhunter13) will need to either:
The resolved branch is available at: https://github.com/OpenHands/benchmarks/tree/laminar_evaluations |
|
@Rainhunter13 approved, thanks! and sorry for the delayed process |
|
@simonrosenberg thanks! Just fixed the tests, do you think it will possible to merge this soon? |
* laminar evaluations * revert uv.lock changes * revert uv.lock changes * remove mutable defaults and move score update to laminar service * chore * Update software-agent-sdk submodule to 52604a7fa48f40487c70338a96488fc6287dac9c * use generic report file format * Update software-agent-sdk submodule to e8f73e71b979984242245efeb701ee4b18480f9e * Update software-agent-sdk submodule to 73769d5e9d5f75333054d098a2e86c02555fa8d6 * forward env vars for RemoteWorkspace * fix tests --------- Co-authored-by: openhands <openhands@all-hands.dev>
* laminar evaluations * revert uv.lock changes * revert uv.lock changes * remove mutable defaults and move score update to laminar service * chore * Update software-agent-sdk submodule to 52604a7fa48f40487c70338a96488fc6287dac9c * use generic report file format * Update software-agent-sdk submodule to e8f73e71b979984242245efeb701ee4b18480f9e * Update software-agent-sdk submodule to 73769d5e9d5f75333054d098a2e86c02555fa8d6 * forward env vars for RemoteWorkspace * fix tests --------- Co-authored-by: openhands <openhands@all-hands.dev>
This PR integrates Laminar evaluations into Openhands Benchmarks, enabling dynamic creation of Laminar Evaluations and Datapoints per every run.
Core changes:
LMNR_SPAN_CONTEXTUsage: