Return the test details with `runtests`? #139

findmyway · 2024-01-16T13:42:40Z

I'd like to extract the test results and do some calculations later. My current approach is to set report=true and then parse the resulting xml file. Is there any other better approach?

More specifically, can we return testitems instead of nothing here?

ReTestItems.jl/src/ReTestItems.jl

Line 396 in 60d93f1

return nothing

The text was updated successfully, but these errors were encountered:

nickrobinson251 · 2024-01-16T14:01:08Z

thanks for the issue! Can you say more about what the use-case is?

Returning testitems there is not something we'd want to do by default (since in the REPL it'd then print out that object), but it could perhaps be made an options... although that's a ReTestItems.TestItems object (which is currently a private internal structure which i'd be reluctant to make public as is).

Also the whole runtests() function will throw (specifically the Test.finish call will throw) if there was any non-passing tests, so returning anything from runtests would only be possible in cases where all tests succeed... unless runtests() is itself called from within another testset (which is basically what we do in the tests of this package itself, we call runtests inside an EncasedTestSet and inspect the results).

So what's possible/desirable to do here depends on what exactly you're trying to achieve. Can you share an example?

Also, for context: i'm a little wary of committing to any kind of support for interacting with test results without a bit of a plan for what would be in or out of scope, and taking a look at prior art e.g. a natural next step would be to use the returned test results when re-running tests, e.g. to only run the failures/error, or to run those first -- but maybe we can add something that helps you without that, if i can understand the use-case better

findmyway · 2024-01-16T16:12:21Z

First, thanks for your prompt reply!

I understand your concern and your explanation makes sense to me.

Can you say more about what the use-case is?

Sure! So I'm working on a project similar to openai/human-eval and evalplus, but in JuliaLang. Generally it will ask different LLMs to generate the code based on given prompt and then execute the code to calculate the p@k (a metric based on the test case pass rate to measure the performance of code LLMs). I think this package already provides many interesting features, like parallel execution, timeout, and many important metrics. All these features combined help me a lot when analyzing the performance of existing LLMs.

I think all I need are already in the report.xml file. It would be great if we can somehow make it easier to EXTRACT these details (instead of parsing the raw xml file)

FYI: you can view the test cases here: https://github.com/oolong-dev/OolongEval.jl/tree/add_human_eval/benchmarks/HumanEval/src/tasks

nickrobinson251 · 2024-01-16T22:05:34Z

i see. yeah, i'm not sure i see an easy way to add this to ReTestItems due to how Test.jl works (at least not one i'd be comfortable committing too 😅)

Maybe right now using the report.xml is actually the easiest way (daft as it might sound). The report.xml is a somewhat standard format ("JUnit XML"). I don't think there are any Julia parsers for it (just generic XML parsers), but python has junitparser (if you're open to using python) and there are probably other, or writing your own parsers in Julia based on an XML parsers shouldn't be too arduous.

But an alternative to parsing the report.xml could maybe to use MetaTesting? It's designed for "testing your test functions" i.e. testing functions that themselves run tests... but that's pretty close to testing what happened when tests run, so could be adapted to that use-case i think... something like:

using MetaTesting: EncasedTestSet
using Test
using ReTestItems

# Wrap in a `MetaTesting.EncasedTestSet` so we can capture results rather than throwing if any fail/error
ts = @testset EncasedTestSet "runtests" begin
    runtests(...)
end

results = only(ts.results) # <- # Get the `Test.DefaultTestSet` produced by `runtests(...)`
# Do stuff with `results`... e.g. print results to check they're the same as what `runtests(...)` would show:
Test.print_test_results(results)

This gives you a Test.DefaultTestSet, which only stores the failures/errors and the number of passes (no other data about the passes), but maybe that's enough for your use-case?

findmyway · 2024-01-17T13:59:58Z

Great thanks!

I wasn't aware of MetaTesting before. I'll try both approaches and see which one works better.

findmyway · 2024-02-22T10:40:53Z

I used results.xml in the end.

FYI: https://github.com/01-ai/HumanEval.jl

nickrobinson251 added question Further information is requested speculative a feature idea that we are undecided about labels Jan 16, 2024

nickrobinson251 removed the question Further information is requested label Jan 17, 2024

findmyway closed this as completed Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return the test details with `runtests`? #139

Return the test details with `runtests`? #139

findmyway commented Jan 16, 2024

nickrobinson251 commented Jan 16, 2024 •

edited

Loading

findmyway commented Jan 16, 2024 •

edited

Loading

nickrobinson251 commented Jan 16, 2024

findmyway commented Jan 17, 2024

findmyway commented Feb 22, 2024

Return the test details with runtests? #139

Return the test details with runtests? #139

Comments

findmyway commented Jan 16, 2024

nickrobinson251 commented Jan 16, 2024 • edited Loading

findmyway commented Jan 16, 2024 • edited Loading

nickrobinson251 commented Jan 16, 2024

findmyway commented Jan 17, 2024

findmyway commented Feb 22, 2024

Return the test details with `runtests`? #139

Return the test details with `runtests`? #139

nickrobinson251 commented Jan 16, 2024 •

edited

Loading

findmyway commented Jan 16, 2024 •

edited

Loading