Feature/benchmark gpt pilot #5184

nalbion · 2023-09-09T12:06:46Z

Background

GPT-Pilot benchmark config

Changes 🏗️

Because Auto-GPT-Benchmarks is deprecated I have had to make some additional code changes here to update the paths.

PR Quality Scorecard ✨

Have you used the PR description template? +2 pts
Is your pull request atomic, focusing on a single change? +5 pts
Have you linked the GitHub issue(s) that this PR addresses? +5 pts
Have you documented your changes clearly and comprehensively? +5 pts
Have you changed or added a feature? -4 pts
- Have you added/updated corresponding documentation? +4 pts
- Have you added/updated corresponding integration tests? +5 pts
Have you changed the behavior of Auto-GPT? -5 pts
- Have you also run agbenchmark to verify that these changes do not regress performance? +10 pts

netlify · 2023-09-09T12:07:25Z

✅ Deploy Preview for auto-gpt-docs ready!

Name	Link
🔨 Latest commit	`17da97c`
🔍 Latest deploy log	https://app.netlify.com/sites/auto-gpt-docs/deploys/6508fb92dfae2f0008f8114e
😎 Deploy Preview	https://deploy-preview-5184--auto-gpt-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

nalbion · 2023-09-09T12:11:54Z

This is still a work-in-progress and I may need help.

Locally I'm working on the config files here - https://github.com/Significant-Gravitas/Auto-GPT/tree/bd42e25916f3c4bba9b86de227428482482b507b/benchmark/agent/GPT-Pilot/agbenchmark

...But it seems that they should ultimately be in the gpt-pilot repo:
https://github.com/nalbion/gpt-pilot/tree/feature/agbenchmark/agbenchmark

Currently the test is failing:

benchmark run path /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark/agent/GPT-Pilot/agbenchmark/config.json /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark/agent/GPT-Pilot
Current configuration:
workspace: workspace/agbenchmark
entry_path: agbenchmark.benchmarks
Running specific test: TestWriteFile
Generating tests...
Generated test for TestWriteFile.
['--mock', '--api_mode', '--host', '--category', '--nc', '--cutoff', '--improve', '--maintain', '--explore', '--test', '--no_dep', '--suite', '-k', '-m', '--markers', '-x', '--exitfirst', '--fixtures', '--funcargs', '--fixtures-per-test', '--pdb', '--pdbcls', '--trace', '--capture', '-s', '--runxfail', '--lf', '--last-failed', '--ff', '--failed-first', '--nf', '--new-first', '--cache-show', '--cache-clear', '--lfnf', '--last-failed-no-failures', '--sw', '--stepwise', '--sw-skip', '--stepwise-skip', '--durations', '--durations-min', '-v', '--verbose', '--no-header', '--no-summary', '-q', '--quiet', '--verbosity', '-r', '--disable-warnings', '--disable-pytest-warnings', '-l', '--showlocals', '--no-showlocals', '--tb', '--show-capture', '--fulltrace', '--full-trace', '--color', '--code-highlight', '--pastebin', '--junitxml', '--junit-xml', '--junitprefix', '--junit-prefix', '-W', '--pythonwarnings', '--maxfail', '--strict-config', '--strict-markers', '--strict', '-c', '--config-file', '--continue-on-collection-errors', '--rootdir', '--collectonly', '--collect-only', '--co', '--pyargs', '--ignore', '--ignore-glob', '--deselect', '--confcutdir', '--noconftest', '--keepduplicates', '--keep-duplicates', '--collect-in-virtualenv', '--import-mode', '--doctest-modules', '--doctest-report', '--doctest-glob', '--doctest-ignore-import-errors', '--doctest-continue-on-failure', '--basetemp', '-V', '--version', '-h', '--help', '-p', '--traceconfig', '--trace-config', '--debug', '-o', '--override-ini', '--assert', '--setuponly', '--setup-only', '--setupshow', '--setup-show', '--setupplan', '--setup-plan', '--log-level', '--log-format', '--log-date-format', '--log-cli-level', '--log-cli-format', '--log-cli-date-format', '--log-file', '--log-file-level', '--log-file-format', '--log-file-date-format', '--log-auto-indent', '--log-disable', '--asyncio-mode']
============================= test session starts ==============================
platform linux -- Python 3.10.10, pytest-7.4.1, pluggy-1.3.0
rootdir: /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark
configfile: pyproject.toml
plugins: anyio-3.7.1, asyncio-0.21.1
asyncio: mode=auto
Warning: When  cdn_resources is 'local' jupyter notebook has issues displaying graphics on chrome/safari. Use cdn_resources='in_line' or cdn_resources='remote' if you have issues viewing graphics in a notebook.
collected 1 item

agbenchmark/generate_test.py Config file: /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark/agent/GPT-Pilot/agbenchmark/config.json
�[1;35m============Starting TestWriteFile challenge============�[0m
�[1;30mTask: Write the word 'Washington' to a .txt file�[0m
Running 'agbenchmark.benchmarks' with timeout 60
/mnt/c/Users/nalbi/.cache/pypoetry/virtualenvs/agbenchmark-_mQxNyZt-py3.10/bin/python: No module named agbenchmark.benchmarks

The Python function has finished running.
The agent timed out
FTerminating agent


=================================== FAILURES ===================================
__________________ TestWriteFile.test_method[challenge_data0] __________________

nalbion · 2023-09-09T15:46:39Z

I'm making progress on this. The documentation on adding new agents could be clearer though

ntindle · 2023-09-11T02:40:00Z

@merwanehamadi anything we can do to help out here?

nalbion · 2023-09-11T04:42:09Z

@merwanehamadi anything we can do to help out here?

Pilot GPT is only very new - 25 days. I think it needs some architectural changes before the benchmarks can be run against it.

Pythagora-io/gpt-pilot#73

Swiftyos · 2023-09-11T15:45:44Z

Hi @nalbion thank you for doing this. The team is currently working on fixing the benchmarking system to work smoothly with the mono repo, see #5194.

Also I noticed you have your agent in benchmark/agents, we are going to be keeping the agents in autogpts/{agent_name}.
Though to get your agent into the main repo, requires you to win the "best agent" competition - https://lablab.ai/event/autogpt-arena-hacks

github-actions · 2023-09-12T00:45:22Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

nalbion · 2023-09-12T13:14:45Z

@Swiftyos the rules for the competition seem to be contradictory.

❌ The creation of new agents should strictly be facilitated through the AutoGPT repo

✅ Participants may begin with an existing agent or start fresh.

✅ Please note that you are free to utilize any AI technology,
❌ ...as long as you begin with the AutoGPT repo as your foundation.

...as long as Pilot-GPT implements the Agent Protocol, it should be eligible, right?

github-actions · 2023-09-12T19:16:09Z

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

github-actions · 2023-09-13T10:26:27Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

nalbion added 3 commits September 9, 2023 21:58

agent config for GPT-Pilot

a699f89

frontend does not run without .env file

7a0a8c4

Auto-GPT-Benchmarks repo is deprecated, updated path

bd42e25

github-actions bot added the size/m label Sep 9, 2023

nalbion mentioned this pull request Sep 9, 2023

Benchmarking for GPT-Pilot #5185

Closed

1 task

run on my feature branch

e2b7a41

github-actions bot added size/l and removed size/m labels Sep 9, 2023

nalbion added 7 commits September 10, 2023 00:38

run on my feature branch

aac62c6

comment out token, use default

5762b09

Merge branch 'master' into feature/benchmark_gpt-pilot

11c2e8b

ignore "directory exists" error

ec373b2

gpt-pilot

1ebc4e1

renamed to gpt-pilot_local_test for testing CI

fab1c89

fixed "Unknown agent name" error

1a308a0

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Sep 12, 2023

github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Sep 12, 2023

nalbion mentioned this pull request Sep 13, 2023

Benchmarking using Auto-GPT/agbenchmark Pythagora-io/gpt-pilot#73

Closed

3 tasks

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Sep 13, 2023

setup env

17da97c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/benchmark gpt pilot #5184

Feature/benchmark gpt pilot #5184

nalbion commented Sep 9, 2023 •

edited

netlify bot commented Sep 9, 2023 •

edited

nalbion commented Sep 9, 2023

nalbion commented Sep 9, 2023

ntindle commented Sep 11, 2023

nalbion commented Sep 11, 2023

Swiftyos commented Sep 11, 2023

github-actions bot commented Sep 12, 2023

nalbion commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 13, 2023

Feature/benchmark gpt pilot #5184

Are you sure you want to change the base?

Feature/benchmark gpt pilot #5184

Conversation

nalbion commented Sep 9, 2023 • edited

Background

Changes 🏗️

PR Quality Scorecard ✨

netlify bot commented Sep 9, 2023 • edited

✅ Deploy Preview for auto-gpt-docs ready!

nalbion commented Sep 9, 2023

nalbion commented Sep 9, 2023

ntindle commented Sep 11, 2023

nalbion commented Sep 11, 2023

Swiftyos commented Sep 11, 2023

github-actions bot commented Sep 12, 2023

nalbion commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 13, 2023

nalbion commented Sep 9, 2023 •

edited

netlify bot commented Sep 9, 2023 •

edited