Dear Authors,
Thanks for the great work! I'm planning to do some work based on this dataset, and encounter a small problem:
Following README, I've run the code_repair inference for several open-source LLMs, and the results are saved to CodeScope/code_repair/inference/result/code_repair_eval_{model_name}.jsonl by default.
I'm confused about the next step to take.
- The jsonl file looks like this

If I'm not mistaken, source_code is the code for LLM to debug, code_repairing_0 is the output of the LLM, with code and some explanations.
- However, the next step shows that the Evaluator seems to need an input jsonl in a different format?

The question is how to change the jsonl in 1 to the required input format of 2, Specifically:
- Should the "source_code" in 2 replaced by the code extracted from the model's output, are there some code to enable this conversion?
- for "lang_cluster", "lang" etc. should I replace the value to "{model_name}" in the README? Or it's just a placeholder and I just keep the original value in the
code_repair_eval_{model_name}.jsonl without changing anything?
Thanks for your help!
Dear Authors,
Thanks for the great work! I'm planning to do some work based on this dataset, and encounter a small problem:
Following README, I've run the code_repair inference for several open-source LLMs, and the results are saved to
CodeScope/code_repair/inference/result/code_repair_eval_{model_name}.jsonlby default.I'm confused about the next step to take.
If I'm not mistaken,
source_codeis the code for LLM to debug,code_repairing_0is the output of the LLM, with code and some explanations.The question is how to change the jsonl in 1 to the required input format of 2, Specifically:
code_repair_eval_{model_name}.jsonlwithout changing anything?Thanks for your help!