Code Evaluator

中文

A multi-language code evaluation tool.

Why create this repo.

Environment integration: When evaluating code, various environments need to be pre-installed, such as JDK for Java, Node for JavaScript, various versions of numpy and torch in DS1000, etc. This project pre-installs all these environments in a Docker image.
Easy to evaluate: We can start the service through this repo with very simple steps, and then submit the results to the desired service. There's no need to go into the Docker container and execute the script, which can be quite cumbersome.

📖 Support Datasets&Language

Humanevalx

HumanEval-X is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation.

paper link Github repo Huggingface

🛠️ Evaluation Environments

The generated code for evaluation requires compilation and execution in multiple languages. The versions of the programming languages we depend on, as well as the packages used are as follows:

Dependencies	Version
Python	3.8.12
JDK	18.0.2.1
Node.js	16.14.0
js-md5	0.7.3
C++	11
g++	7.5.0
Boost	1.71.0
OpenSSL	3.0.0
go	1.18.4

👨‍🏫 How to use

1. Launch a service

Make sure you have install docker, and then build a image and run a service of container.

build Docker Image:

Choose your dataset: humanevalx or ds1000

git clone https://github.com/open-compass/code-evaluator.git
sudo docker build -t code-eval-{your-dataset}:latest -f docker/{your-dataset}/Dockerfile .

After getting the image, use the following command to create the container:

# Output Log Format
sudo docker run -it -p 5000:5000 code-eval:latest python server.py

# Running programs in the background
# sudo docker run -itd -p 5000:5000 code-eval:latest python server.py

# use differnet port
# sudo docker run -itd -p 5001:5001 code-eval:latest python server.py --port 5001

Make sure you can reach the service by checking the fllowing commands(If you run service in the loaclhost, just skip this.):

ping your_service_ip_address
telnet your_service_ip_address your_service_port

2. Prepare submit result files

humanevalx

We give sample formats for different datasets in the examples folder.

Let's take huamanevalx as an example，which submits results in the following format：

{"task_id": "../..", "generation: "..."}
{"task_id": "../..", "generation: "..."}
...

ds1000

Skip this step, use prediction by opencompass directly.

3. Submit service request

Use curl to submit your request

curl -X POST -F 'file=@{result_absolute_path}' -F 'dataset={dataset/language}' {your_service_ip_address}:{your_service_port}/evaluate

such as evaluate 'humanevalx/python' in 'localhost:5000':

curl -X POST -F 'file=@./examples/humanevalx/python.json' -F 'dataset=humanevalx/python' localhost:5000/evaluate

You will get the fllowing result：

"{\"pass@1\": 37.19512195121951}"%

such as evaluate 'ds1000_Numpy' in 'localhost:5000':

curl -X POST -F 'file=@./internlm-chat-7b-hf-v11/ds1000_Numpy.json' localhost:5000/evaluate

You will get the fllowing result：

"{\"accuracy\": xx}"%

🤝 Acknowledgements

Some code in this project is cited and modified from CodeGeeX2. Thanks for THUDM Team.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
datasets/humanevalx		datasets/humanevalx
docker		docker
evals		evals
examples/humanevalx		examples/humanevalx
requirements		requirements
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Evaluator

📄 Table of Contents

📖 Support Datasets&Language

Humanevalx

🛠️ Evaluation Environments

👨‍🏫 How to use

1. Launch a service

2. Prepare submit result files

humanevalx

ds1000

3. Submit service request

🤝 Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

open-compass/code-evaluator

Folders and files

Latest commit

History

Repository files navigation

Code Evaluator

📄 Table of Contents

📖 Support Datasets&Language

Humanevalx

🛠️ Evaluation Environments

👨‍🏫 How to use

1. Launch a service

2. Prepare submit result files

humanevalx

ds1000

3. Submit service request

🤝 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages