Skip to content

[Codefuse开源轻训营] 利用LLM生成代码审查结果,并执行评估脚本对代码审查结果进行评估,得到评估结果 #4

@Henrykwokkk

Description

@Henrykwokkk
  1. 运行LLM推理:可通过各类LLM API运行推理,从而生成代码审查预测结果并进行评测。
# Set up environment variables
export OPENAI_API_KEY="your-openai-api-key"
export LLM_EVALUATOR_OPENAI_API_KEY="your-o3-evaluation-api-key"

# Run the complete pipeline (uses default Hugging Face dataset)
python scripts/run_eval_pipeline.py \
    --output-dir results/pipeline_output \
    --model gpt-4o \
    --model-provider openai \
    --file-source oracle

# Run with local dataset file
python scripts/run_eval_pipeline.py \
    --dataset-name-or-path results/dataset/code_review_task_instances.jsonl \
    --output-dir results/pipeline_output \
    --model gpt-4o \
    --model-provider openai \
    --file-source oracle
  1. 获取评测报告
# Generate evaluation report from pipeline results
python scripts/eval_report.py \
    --dataset-name-or-path results/dataset/code_review_task_instances.jsonl \
    --eval-output-dir results/pipeline_output/evaluation \
    --report-output-file results/evaluation_report.json

# Or use default Hugging Face dataset
python scripts/eval_report.py \
    --eval-output-dir results/pipeline_output/evaluation \
    --report-output-file results/evaluation_report.json

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions