This project evaluates OpenAI's GPT-3.5 model on a sample from the HumanEval dataset to assess its code generation capabilities. The implementation is built in a way that can easily integrate new models and datasets. Parameters such as sample size and the pass@k metric are configurable.
- Clone the repo and install required packages:
git clone https://github.com/yourusername/yourrepositoryname.git
pip install -r requirements.txt
- Create a
.env
file in the root directory and add your OpenAI API key.
OPENAI_API_KEY= 'your-openai-api-key'`
- Run the evaluation script for GPT-3.5 for specific sample size and pass@k evaluation metric.
python evaluation_pipeline.py --model gpt-3.5-turbo --dataset humaneval --k 1 --num_eval_problems 3 --num_samples_per_task 2