Skip to content

Releases: Significant-Gravitas/Auto-GPT-Benchmarks

v0.0.9

17 Aug 00:13
3ce10df
Compare
Choose a tag to compare

What's Changed

  • Remove skill tree sync by @merwanehamadi in #308
  • Enhanced Test Report Directory Naming and Handling by @Swiftyos in #312
  • Fixing paths that were preventing artifacts from being copied to workspace by @lc0rp in #311
  • Add endpoints to power dev tool by @merwanehamadi in #310
  • Remove submodule by @merwanehamadi in #314
  • Fix linters and chrome selenium integration by @merwanehamadi in #313
  • Remove colons in timestamp by @merwanehamadi in #315
  • Remove build a nuke challenge by @merwanehamadi in #316
  • Only push to gdrive correct timestamps by @merwanehamadi in #318
  • Fix linter 2 by @merwanehamadi in #319
  • Update pyproject.toml by @merwanehamadi in #320

Full Changelog: v0.0.8...v0.0.9

v0.0.8

15 Aug 14:23
8bc3710
Compare
Choose a tag to compare

What's Changed

  • Fix all tests skipped by @merwanehamadi in #296
  • Increase timeout by @merwanehamadi in #297
  • Update .env.example by @westonwillingham in #298
  • 0.0.8 by @merwanehamadi in #299
  • Add safety challenge by @merwanehamadi in #300
  • Fix agent protocol test by @merwanehamadi in #301
  • Fix linter by @merwanehamadi in #302
  • chore: polygpt update to include gpt4 by @rihp in #303
  • Fix eval by @merwanehamadi in #304
  • fix eval by @merwanehamadi in #305
  • new frontend connections by @SilenNaihin in #306
  • init backend, fix frontend module by @SilenNaihin in #307

New Contributors

Full Changelog: v0.0.7...v0.0.8

v0.0.7

12 Aug 17:49
0a73e39
Compare
Choose a tag to compare

What's Changed

  • Update beebot by @erik-megarad in #288
  • Sync skill tree to a versioned website by @merwanehamadi in #289
  • If regression tests empty continue by @merwanehamadi in #290
  • Remember goal loss by @merwanehamadi in #291
  • No need to push skill tree twice by @merwanehamadi in #292
  • Use index.html instead of dependencies.html by @merwanehamadi in #293
  • Fix all tests skipped by @merwanehamadi in #294
  • Release 0.0.7 by @merwanehamadi in #295

Full Changelog: v0.0.6...v0.0.7

v0.0.6

11 Aug 13:01
Compare
Choose a tag to compare

What's Changed

  • Removed accidentally added reports by @nerfZael in #283
  • Implement the 'explore' mode by @merwanehamadi in #284
  • Add more fields to gdrive by @merwanehamadi in #285
  • Cleanup skill tree by @merwanehamadi in #287
  • Use agent protocol by @jakubno in #278

New Contributors

Full Changelog: v0.0.5...v0.0.6

v0.0.5

09 Aug 18:26
6afd962
Compare
Choose a tag to compare

What's Changed

  • PolyGPT Benchmarks and Submodule Update by @rihp in #273
  • Update beebot by @erik-megarad in #281
  • Remove baserun because api key issue by @merwanehamadi in #282

Full Changelog: v0.0.4...v0.0.5

v0.0.4

09 Aug 17:06
e3f1e21
Compare
Choose a tag to compare

What's Changed

  • Fix "attempted" metric being incorrect by @merwanehamadi in #251
  • Fix more attempted metrics not working by @merwanehamadi in #252
  • Add more coding challenge by @merwanehamadi in #254
  • Add polygpt by @merwanehamadi in #255
  • Add polygpt to ci by @merwanehamadi in #256
  • Add agent protocol by @merwanehamadi in #258
  • Add agent protocol interface test by @merwanehamadi in #259
  • Add all agent protocol tests by @merwanehamadi in #260
  • Remove space challenges by @merwanehamadi in #262
  • Helicone Lock Manager fix by @merwanehamadi in #263
  • Remove graphql logs by @merwanehamadi in #264
  • remove pytest-depends, rerouting functions by @SilenNaihin in #250
  • Fix test write file by @merwanehamadi in #266
  • Add product advisor tests by @merwanehamadi in #267
  • Kill all subprocesses by @erik-megarad in #265
  • Feat: --cutoff and "keep_workspace_files" options by @lc0rp in #261
  • Update pr template by @merwanehamadi in #268
  • AUTO-25: Add the ability to run multiple categories and to skip categories by @Swiftyos in #270
  • Add web app creation challenge by @merwanehamadi in #272
  • Integrate with baserun by @merwanehamadi in #274
  • Integrate baserun by @merwanehamadi in #275
  • Put back mini agi to original state by @merwanehamadi in #276
  • Fix send to gdrive by @merwanehamadi in #277
  • See the task when clicking in the skill tree by @merwanehamadi in #279
  • Release 0.0.4 by @merwanehamadi in #280

New Contributors

Full Changelog: v0.0.3...v0.0.4

v0.0.3

03 Aug 23:48
02dd294
Compare
Choose a tag to compare

What's Changed

  • safety challenges, adaptability challenges, suite same_task by @SilenNaihin in #177
  • Beat more challenges in Auto-GPT by @merwanehamadi in #187
  • Uninstall agbenchmark then reinstall by @merwanehamadi in #188
  • Fix helicone MITM by @merwanehamadi in #189
  • Add api keys by @merwanehamadi in #190
  • hotfix reports by @SilenNaihin in #191
  • Update Scores Benchmark by @merwanehamadi in #192
  • fix suite dependencies by @SilenNaihin in #194
  • Add safety suite by @merwanehamadi in #196
  • report # bug, adding submodule challenges by @SilenNaihin in #193
  • Add llm eval by @merwanehamadi in #197
  • ci update by @SilenNaihin in #198
  • Add helicone dynamic headers by @merwanehamadi in #199
  • Add dynamic headers using environment variables by @merwanehamadi in #200
  • added new script to fix dynamic headers by @chitalian in #202
  • Delete reports by @merwanehamadi in #201
  • Use beebot autopackai by @merwanehamadi in #203
  • Benchmark all test by @merwanehamadi in #204
  • Fix tests not being run by @merwanehamadi in #207
  • Retry push until successful by @merwanehamadi in #208
  • Advanced LLM Evaluation Implementation by @SilenNaihin in #205
  • returning scores by @SilenNaihin in #210
  • Update submodules by @merwanehamadi in #212
  • Use Auto-GPT master by @merwanehamadi in #213
  • Fix export to gdrive by @merwanehamadi in #214
  • Add timeout to agbenchmark by @merwanehamadi in #215
  • Add timeout that allows teardown by @merwanehamadi in #216
  • Delete incorrect report by @merwanehamadi in #217
  • Feature: Visualize Test Results by @SilenNaihin in #211
  • Fix timeout not working by @merwanehamadi in #218
  • Update submodule by @merwanehamadi in #219
  • Get helicone costs by @merwanehamadi in #220
  • working bar and radar charts by @SilenNaihin in #221
  • Fix f-string get_data_from_helicone.py by @chitalian in #223
  • Fix BeeBot link by @MrBrain295 in #224
  • Fix send to gdrive and tracking the wrong challenge name by @merwanehamadi in #225
  • Refactoring for TDD by @SilenNaihin in #222
  • Fix costs helicone by @merwanehamadi in #226
  • Fix reports by @merwanehamadi in #227
  • Return none as fallback Helicone by @merwanehamadi in #228
  • Only run mini-agi on push and PR by @merwanehamadi in #230
  • Reverse skip based on agent by @merwanehamadi in #231
  • Only run mini-agi on tests by @merwanehamadi in #232
  • Fix reports and add commit sha by @merwanehamadi in #233
  • Send commit sha and cost to gdrive by @merwanehamadi in #234
  • Remove high costs by @merwanehamadi in #235
  • Remove mock reports by @merwanehamadi in #236
  • Remove mock reports by @merwanehamadi in #237
  • Update beebot and Auto-GPT by @merwanehamadi in #238
  • Update autogpt back to where it was by @merwanehamadi in #239
  • Update python-dotenv by @erik-megarad in #240
  • Update Auto-GPT and allow 1 specific agent to be run by @merwanehamadi in #241
  • Add attempted metrics by @merwanehamadi in #244
  • Correct agent and benchmark commit sha by @merwanehamadi in #245
  • fix-linter by @merwanehamadi in #246
  • Fix typing by @merwanehamadi in #247
  • Add Test Suite to gdrive by @merwanehamadi in #248
  • Release 0.0.3 by @merwanehamadi in #249

New Contributors

Full Changelog: v0.0.2...v0.0.3

v0.0.2

24 Jul 12:13
Compare
Choose a tag to compare

What's Changed

  • Always send to google drive by @merwanehamadi in #185
  • Release 0.0.2 by @merwanehamadi in #186

Full Changelog: v0.0.1...v0.0.2

v0.0.1

23 Jul 19:57
Compare
Choose a tag to compare

What's Changed

  • First commit for AutoGPT Benchmarks by @dschonholtz in #1
  • Typo in README.md by @ambujpawar in #2
  • Remove the submodule, reference OpenAI directly rather than running it on the command line, fix logging by @dschonholtz in #16
  • Update README.md by @dschonholtz in #17
  • Graphs for evals by @rihp in #20
  • windows docs make workspace if not there by @dschonholtz in #25
  • EvalNames with dates for the eval run filename and compatibility with 0.3.0 by @dschonholtz in #26
  • init first challenge template by @ScarletPan in #34
  • start fixtures, types, challenge creation, mock run (stable by @SilenNaihin in #37
  • Add automatic regression markers by @SilenNaihin in #38
  • MockManager, mock_func in data.json by @SilenNaihin in #39
  • addition of basic challenges, easier challenge creation, --mock flag, adding mini-agi by @SilenNaihin in #40
  • Update README.md by @SilenNaihin in #41
  • adding hook to integrate agnostically by @SilenNaihin in #42
  • Integrate one challenge to auto gpt by @merwanehamadi in #44
  • Add static linters ci by @merwanehamadi in #45
  • Run regression tests on push to master and stable by @merwanehamadi in #46
  • Integrate with gpt engineer by @merwanehamadi in #47
  • Integrate smol developer with agbenchmark by @merwanehamadi in #48
  • Explain how to benchmark new agents by @merwanehamadi in #49
  • local runs, home_path config, submodule miniagi by @SilenNaihin in #50
  • Add retrieval challenge test + run tests on CI pipeline by @merwanehamadi in #51
  • Add pr template by @merwanehamadi in #52
  • Add information retrieval 3 by @merwanehamadi in #54
  • Change test dependencies by @merwanehamadi in #55
  • dynamic workspace path by @SilenNaihin in #56
  • Add basic memory challenge by @merwanehamadi in #57
  • Rename '--reg' flag to '--maintain' by @merwanehamadi in #58
  • Add 'Remember multiple ids' memory challenge by @merwanehamadi in #59
  • added caching based on file key by @SilenNaihin in #62
  • Add 'remember ids with noise' challenge by @merwanehamadi in #61
  • Add 'remember phrases with noise' challenge by @merwanehamadi in #63
  • fix home_path, local mini-agi run works by @SilenNaihin in #64
  • Add 'Debug simple typo with guidance' challenge by @merwanehamadi in #65
  • Add "Debug code without guidance" challenge by @merwanehamadi in #66
  • Get rid of get file path by using the data.json convention to store the challenge information by @merwanehamadi in #67
  • Print out all of stdout on each process poll. by @erik-megarad in #69
  • Add .txt to memory challenges by @merwanehamadi in #70
  • Fix memory challenge 2 by @merwanehamadi in #71
  • Use artifacts out instead of python code by @merwanehamadi in #72
  • i/o workspace, adding superagi by @SilenNaihin in #60
  • fixing the incorrect addition of superagi by @SilenNaihin in #73
  • quality of life improvements & fixes by @SilenNaihin in #75
  • Fix debug code challenge by @merwanehamadi in #76
  • Add gpt engineer to ci by @merwanehamadi in #78
  • just json, no test files by @SilenNaihin in #77
  • Combine all agents into one ci.yml by @merwanehamadi in #79
  • adding search interface challenge and cleaning repo by @SilenNaihin in #80
  • Add Helicone by @merwanehamadi in #81
  • Add "Simple web server" challenge by @merwanehamadi in #74
  • added --test, consolidate files, reports working by @SilenNaihin in #83
  • Fix tests ci by @merwanehamadi in #82
  • All Agents log to helicone automatically by @merwanehamadi in #85
  • Fix Auto-GPT integration by adding python module as entrypoint by @merwanehamadi in #86
  • Fix Auto-GPT looping forever by @merwanehamadi in #87
  • Add custom properties to Helicone by @merwanehamadi in #91
  • Enable cache again by @merwanehamadi in #92
  • fixing backslashes, adding basic metrics by @SilenNaihin in #89
  • Fix Smol developer and gpt engineer by @merwanehamadi in #93
  • Remove dependencies cache by @merwanehamadi in #94
  • Remove dependencies if a specific test is asked by the user by @merwanehamadi in #95
  • Update submodules and upload artifacts by @merwanehamadi in #97
  • Add basic code generation challenge by @merwanehamadi in #98
  • Replace hidden files with custom python by @merwanehamadi in #99
  • Start showing benchmark results by @merwanehamadi in #100
  • Show Auto-GPT results by @merwanehamadi in #102
  • Display smol-developer-results by @merwanehamadi in #103
  • Display results per category by @merwanehamadi in #104
  • Update auto gpt to current version of master by @merwanehamadi in #105
  • Update Auto-GPT score by @merwanehamadi in #106
  • Clean up workspace between each test by @erik-megarad in #109
  • Add three sum challenge by @merwanehamadi in #108
  • Fix ci by @merwanehamadi in #110
  • Remove cache true on pr by @merwanehamadi in #111
  • Dynamic cutoff and other quality of life by @SilenNaihin in #101
  • Allow change location of reports by @merwanehamadi in #115
  • Fix cutoff errors by @merwanehamadi in #116
  • Fix pipes issue by @merwanehamadi in #117
  • Update reports when pushing to master by @merwanehamadi in https://github.com/Significant-Gravita...
Read more