GPT-3.5 Evaluation on HumanEval Dataset

This project evaluates OpenAI's GPT-3.5 model on a sample from the HumanEval dataset to assess its code generation capabilities. The implementation is built in a way that can easily integrate new models and datasets. Parameters such as sample size and the pass@k metric are configurable.

Getting Started

Clone the repo and install required packages:

git clone https://github.com/yourusername/yourrepositoryname.git
pip install -r requirements.txt

Create a .env file in the root directory and add your OpenAI API key.

OPENAI_API_KEY= 'your-openai-api-key'`

Run the evaluation script for GPT-3.5 for specific sample size and pass@k evaluation metric.

python evaluation_pipeline.py  --model gpt-3.5-turbo --dataset humaneval --k 1 --num_eval_problems 3 --num_samples_per_task 2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
data		data
human_eval		human_eval
results		results
venv		venv
.gitignore		.gitignore
README.md		README.md
completion_generator.py		completion_generator.py
evaluation_pipeline.py		evaluation_pipeline.py
evaluator.py		evaluator.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-3.5 Evaluation on HumanEval Dataset

Getting Started

About

Releases

Packages

Languages

chziakas/humaneval_sample_eval

Folders and files

Latest commit

History

Repository files navigation

GPT-3.5 Evaluation on HumanEval Dataset

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages