Thankf for your work!
I found that in the KodCode-R1-SFT-V1 dataset, the function names written by the model in the results where r1-correctness is true do not always match those in test_info. In this case, how is the code validation performed using pytest?
Thankf for your work!
I found that in the KodCode-R1-SFT-V1 dataset, the function names written by the model in the results where r1-correctness is true do not always match those in test_info. In this case, how is the code validation performed using pytest?