Skip to content

JetBrains-Research/agents-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agents and Planning Models Evaluation 🤖⛓

Toolkit for collecting datasets for Agents and Planning models and running evaluation pipelines.

SetUp

pip install requirements.txt

Evaluation Pipeline Configuration

We use Hydra library for evaluation pipeline. Each configuration is specified in eval.yaml format:

# @package _global_
hydra:
  job:
    name: ${agent.name}_${agent.model_name}_[YOUR_ADDITIONAL_TOKEN_OR_NOTHING]
  run:
    dir:[YOUR_PATH_TO_OUTPUT_DIR]/${hydra:job.name}
  job_logging:
    root:
      handlers: [console, file]
defaults:
  - _self_
  - data_source: hf
  - env: code_engine
  - agent: planning

Where you can define the datasource, env and agent you want to evaluate. We present several implementations for each defined in sub yamls:\

field options
data_source hf.yaml
env code_engine.yaml
http.yaml
few_shot.yaml
agent few_shot.yaml
planning.yaml
vanilla.yaml
reflexion.yaml
tree_of_thoughts.yaml
adapt.yaml

Project Template Generation Evaluation

The challenge is to generate project template -- small compilable project that can be described in 1-5 sentences containing small examples of all mentioned libraries/technologies/functionality.

Dataset

Dataset of template-related repos collected GitHub are published to HuggingFace 🤗. Details about the dataset collection and source code is placed in template_generation directory.

Agent Models

To run the evaluation pipeline, please execute the following command in your console:

python3 -m src/template_generation/run_eval --multirun agent=planning agent.model_name=gpt-3.5-turbo-1106,gpt-4-1106-preview
Model Metrics
⚠️ Coming soon ⚠️ ⚠️ Coming soon ⚠️

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published