forked from inclusionAI/SWE-CARE
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
- 运行LLM推理:可通过各类LLM API运行推理,从而生成代码审查预测结果并进行评测。
# Set up environment variables
export OPENAI_API_KEY="your-openai-api-key"
export LLM_EVALUATOR_OPENAI_API_KEY="your-o3-evaluation-api-key"
# Run the complete pipeline (uses default Hugging Face dataset)
python scripts/run_eval_pipeline.py \
--output-dir results/pipeline_output \
--model gpt-4o \
--model-provider openai \
--file-source oracle
# Run with local dataset file
python scripts/run_eval_pipeline.py \
--dataset-name-or-path results/dataset/code_review_task_instances.jsonl \
--output-dir results/pipeline_output \
--model gpt-4o \
--model-provider openai \
--file-source oracle
- 获取评测报告
# Generate evaluation report from pipeline results
python scripts/eval_report.py \
--dataset-name-or-path results/dataset/code_review_task_instances.jsonl \
--eval-output-dir results/pipeline_output/evaluation \
--report-output-file results/evaluation_report.json
# Or use default Hugging Face dataset
python scripts/eval_report.py \
--eval-output-dir results/pipeline_output/evaluation \
--report-output-file results/evaluation_report.json