Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/benchmark gpt pilot #5184

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

nalbion
Copy link

@nalbion nalbion commented Sep 9, 2023

Background

GPT-Pilot benchmark config

Changes πŸ—οΈ

Because Auto-GPT-Benchmarks is deprecated I have had to make some additional code changes here to update the paths.

PR Quality Scorecard ✨

  • Have you used the PR description template?   +2 pts
  • Is your pull request atomic, focusing on a single change?   +5 pts
  • Have you linked the GitHub issue(s) that this PR addresses?   +5 pts
  • Have you documented your changes clearly and comprehensively?   +5 pts
  • Have you changed or added a feature?   -4 pts
    • Have you added/updated corresponding documentation?   +4 pts
    • Have you added/updated corresponding integration tests?   +5 pts
  • Have you changed the behavior of Auto-GPT?   -5 pts
    • Have you also run agbenchmark to verify that these changes do not regress performance?   +10 pts

@github-actions github-actions bot added the size/m label Sep 9, 2023
@netlify
Copy link

netlify bot commented Sep 9, 2023

βœ… Deploy Preview for auto-gpt-docs ready!

Name Link
πŸ”¨ Latest commit 17da97c
πŸ” Latest deploy log https://app.netlify.com/sites/auto-gpt-docs/deploys/6508fb92dfae2f0008f8114e
😎 Deploy Preview https://deploy-preview-5184--auto-gpt-docs.netlify.app
πŸ“± Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@nalbion
Copy link
Author

nalbion commented Sep 9, 2023

This is still a work-in-progress and I may need help.

Locally I'm working on the config files here - https://github.com/Significant-Gravitas/Auto-GPT/tree/bd42e25916f3c4bba9b86de227428482482b507b/benchmark/agent/GPT-Pilot/agbenchmark

...But it seems that they should ultimately be in the gpt-pilot repo:
https://github.com/nalbion/gpt-pilot/tree/feature/agbenchmark/agbenchmark

Currently the test is failing:

benchmark run path /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark/agent/GPT-Pilot/agbenchmark/config.json /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark/agent/GPT-Pilot
Current configuration:
workspace: workspace/agbenchmark
entry_path: agbenchmark.benchmarks
Running specific test: TestWriteFile
Generating tests...
Generated test for TestWriteFile.
['--mock', '--api_mode', '--host', '--category', '--nc', '--cutoff', '--improve', '--maintain', '--explore', '--test', '--no_dep', '--suite', '-k', '-m', '--markers', '-x', '--exitfirst', '--fixtures', '--funcargs', '--fixtures-per-test', '--pdb', '--pdbcls', '--trace', '--capture', '-s', '--runxfail', '--lf', '--last-failed', '--ff', '--failed-first', '--nf', '--new-first', '--cache-show', '--cache-clear', '--lfnf', '--last-failed-no-failures', '--sw', '--stepwise', '--sw-skip', '--stepwise-skip', '--durations', '--durations-min', '-v', '--verbose', '--no-header', '--no-summary', '-q', '--quiet', '--verbosity', '-r', '--disable-warnings', '--disable-pytest-warnings', '-l', '--showlocals', '--no-showlocals', '--tb', '--show-capture', '--fulltrace', '--full-trace', '--color', '--code-highlight', '--pastebin', '--junitxml', '--junit-xml', '--junitprefix', '--junit-prefix', '-W', '--pythonwarnings', '--maxfail', '--strict-config', '--strict-markers', '--strict', '-c', '--config-file', '--continue-on-collection-errors', '--rootdir', '--collectonly', '--collect-only', '--co', '--pyargs', '--ignore', '--ignore-glob', '--deselect', '--confcutdir', '--noconftest', '--keepduplicates', '--keep-duplicates', '--collect-in-virtualenv', '--import-mode', '--doctest-modules', '--doctest-report', '--doctest-glob', '--doctest-ignore-import-errors', '--doctest-continue-on-failure', '--basetemp', '-V', '--version', '-h', '--help', '-p', '--traceconfig', '--trace-config', '--debug', '-o', '--override-ini', '--assert', '--setuponly', '--setup-only', '--setupshow', '--setup-show', '--setupplan', '--setup-plan', '--log-level', '--log-format', '--log-date-format', '--log-cli-level', '--log-cli-format', '--log-cli-date-format', '--log-file', '--log-file-level', '--log-file-format', '--log-file-date-format', '--log-auto-indent', '--log-disable', '--asyncio-mode']
============================= test session starts ==============================
platform linux -- Python 3.10.10, pytest-7.4.1, pluggy-1.3.0
rootdir: /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark
configfile: pyproject.toml
plugins: anyio-3.7.1, asyncio-0.21.1
asyncio: mode=auto
Warning: When  cdn_resources is 'local' jupyter notebook has issues displaying graphics on chrome/safari. Use cdn_resources='in_line' or cdn_resources='remote' if you have issues viewing graphics in a notebook.
collected 1 item

agbenchmark/generate_test.py Config file: /mnt/c/Users/nalbi/src/com/github/Significant-Gravitas/Auto-GPT/benchmark/agent/GPT-Pilot/agbenchmark/config.json
οΏ½[1;35m============Starting TestWriteFile challenge============οΏ½[0m
οΏ½[1;30mTask: Write the word 'Washington' to a .txt fileοΏ½[0m
Running 'agbenchmark.benchmarks' with timeout 60
/mnt/c/Users/nalbi/.cache/pypoetry/virtualenvs/agbenchmark-_mQxNyZt-py3.10/bin/python: No module named agbenchmark.benchmarks

The Python function has finished running.
The agent timed out
FTerminating agent


=================================== FAILURES ===================================
__________________ TestWriteFile.test_method[challenge_data0] __________________

@nalbion nalbion mentioned this pull request Sep 9, 2023
1 task
@github-actions github-actions bot added size/l and removed size/m labels Sep 9, 2023
@nalbion
Copy link
Author

nalbion commented Sep 9, 2023

I'm making progress on this. The documentation on adding new agents could be clearer though

@ntindle
Copy link
Member

ntindle commented Sep 11, 2023

@merwanehamadi anything we can do to help out here?

@nalbion
Copy link
Author

nalbion commented Sep 11, 2023

@merwanehamadi anything we can do to help out here?

Pilot GPT is only very new - 25 days. I think it needs some architectural changes before the benchmarks can be run against it.

Pythagora-io/gpt-pilot#73

@Swiftyos
Copy link
Contributor

Hi @nalbion thank you for doing this. The team is currently working on fixing the benchmarking system to work smoothly with the mono repo, see #5194.

Also I noticed you have your agent in benchmark/agents, we are going to be keeping the agents in autogpts/{agent_name}.
Though to get your agent into the main repo, requires you to win the "best agent" competition - https://lablab.ai/event/autogpt-arena-hacks

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Sep 12, 2023
@github-actions
Copy link

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@nalbion
Copy link
Author

nalbion commented Sep 12, 2023

@Swiftyos the rules for the competition seem to be contradictory.

❌ The creation of new agents should strictly be facilitated through the AutoGPT repo

βœ… Participants may begin with an existing agent or start fresh.

βœ… Please note that you are free to utilize any AI technology,
❌ ...as long as you begin with the AutoGPT repo as your foundation.

...as long as Pilot-GPT implements the Agent Protocol, it should be eligible, right?

@github-actions
Copy link

Conflicts have been resolved! πŸŽ‰ A maintainer will review the pull request shortly.

@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Sep 12, 2023
@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Sep 13, 2023
@github-actions
Copy link

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conflicts Automatically applied to PRs with merge conflicts size/l
Projects
Status: πŸ†• Needs initial review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants