This is the official code of SCAN.
conda create -n SCAN python=3.12
conda activate SCAN
pip install -r requirements.txtYou can choose to use the taxonomy we provide. See: visualization_and_analysis/cata_tree.json.
(Optional) You can also choose to customize your own taxonomy since SCAN is highly extensible. Guidelines for building your customized taxonomy tree can be found in: Build Customized Taxonomy Tree.
You can choose to use the evaluation dataset we provide. See: evaluation/outputs/evaluation_dataset.jsonl.
If you want to use the criteria and baseline model we provide, you can directly use the criteria we generated.
(Optional) You can also create your own custom evaluation dataset using our RealMix. Guidelines for building your customized evaluation dataset can be found in: Generate New Queries.
If you simply want to test our Visualization and Analysis Toolkits, you can directly use the generation and evaluation results we provided in evaluation/outputs.
When you reach this step, we first recommend preparing several models:
- Model to be evaluated: The model you want to evaluate.
- (Optional) Model for pre-comparison: Our evaluation method requires several models to generate their responses to assist in extracting more effective evaluation criteria. This can be any model. We adopt gpt-4o, deepseek-v3, and doubao-1-5-pro in our paper. (If you're using the criteria we generated, you do not need to prepare this model.)
- (Optional) Baseline model: The model that serves as the baseline in the evaluation. Our evaluation results are relative to its performance. We adopt gpt-4o in our paper. (If you're using the baseline model we use, you do not need to prepare this model.)
- Evaluation model: This model is used to generate criteria and evaluate other models. We recommend using more advanced models, especially reasoning models. We adopt DeepSeek-R1 in our paper.
Note that you need to prepare your models in OpenAI-compatible format. Evaluation requires three things: model name, base url, and API key.
After you have prepared these services, you can follow the guidance in Evaluate Models to perform the evaluation.
- Place the evaluation results obtained from the previous step into the
visualization_and_analysis/evaluation_source_datadirectory. - Enter the directory:
cd ./visualization_and_analysis - Run the following command to process the data obtained above.
python source_result_processing.py- Run the following command to get the analysis results.
python auto_analysing.py- Then, you can run the visualization and analysis tools locally.
python -m http.server 8103For more details, refer to: Visualization and Analysis.