GitHub

This repo is for research only.

Setup Enviroment

conda create --prefix ./envs/ python=3.10
conda init
conda activate envs
cd scripts
pip install -r requirements.txt
cd ..
cd alpaca_eval
pip install -e .[all]
pip install openai==0.27.0

Setup Openai Account

export OPENAI_API_KEY=<your_api_key>
export OPENAI_ORGANIZATION_IDS=<your_organization_id> # Optional; if not set, this will be your default org id.

Model Evaluation

data_path='data/fofo_test_prompts.json'
output_path='results'

1. Get model outputs

CUDA_VISIBLE_DEVICES='0' python scripts/inference_anymodel_anydata.py --input_file_path $data_path --output_file_path $output_path/wizardlm-13b-v1.2/model_outputs.json --model_name_or_path WizardLM/WizardLM-13B-V1.2 --prompt_style wizardlm --max_seq_length 5120```

2. Evaluate models' performance based on the outputs

alpaca_eval --annotators_config gpt4_format_correctness --model_outputs $output_path/chatgpt/reference_outputs.json --output_path $output_path/wizardlm-13b-v1.2/

3. Domain/Format analysis

If you want to draw radar figures of your model's performance and conduct domain/format analysis, please refer to

scripts/draw_analysis.py

Citation

Please consider using the follow citation when using our code:

@article{xia2024fofo,
  title={FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability},
  author={Xia, Congying and Xing, Chen and Du, Jiangshu and Yang, Xinyi and Feng, Yihao and Xu, Ran and Yin, Wenpeng and Xiong, Caiming},
  journal={arXiv preprint arXiv:2402.18667},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
alpaca_eval		alpaca_eval
data		data
results		results
scripts		scripts
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alpaca_eval

alpaca_eval

data

data

results

results

scripts

scripts

CODEOWNERS

CODEOWNERS

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE.txt

LICENSE.txt

README.md

README.md

SECURITY.md

SECURITY.md

Repository files navigation

Setup Enviroment

Setup Openai Account

Model Evaluation

1. Get model outputs

2. Evaluate models' performance based on the outputs

3. Domain/Format analysis

Citation

About

Releases

Packages

Contributors 2

Languages

License

SalesforceAIResearch/FoFo

Folders and files

Latest commit

History

Repository files navigation

Setup Enviroment

Setup Openai Account

Model Evaluation

1. Get model outputs

2. Evaluate models' performance based on the outputs

3. Domain/Format analysis

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages