Releases · Significant-Gravitas/Auto-GPT-Benchmarks

17 Aug 00:13

github-actions

v0.0.9

3ce10df

v0.0.9 Latest

Latest

What's Changed

Remove skill tree sync by @merwanehamadi in #308
Enhanced Test Report Directory Naming and Handling by @Swiftyos in #312
Fixing paths that were preventing artifacts from being copied to workspace by @lc0rp in #311
Add endpoints to power dev tool by @merwanehamadi in #310
Remove submodule by @merwanehamadi in #314
Fix linters and chrome selenium integration by @merwanehamadi in #313
Remove colons in timestamp by @merwanehamadi in #315
Remove build a nuke challenge by @merwanehamadi in #316
Only push to gdrive correct timestamps by @merwanehamadi in #318
Fix linter 2 by @merwanehamadi in #319
Update pyproject.toml by @merwanehamadi in #320

Full Changelog: v0.0.8...v0.0.9

Contributors

lc0rp, waynehamadi, and Swiftyos

Assets 4

15 Aug 14:23

github-actions

v0.0.8

8bc3710

v0.0.8

What's Changed

Fix all tests skipped by @merwanehamadi in #296
Increase timeout by @merwanehamadi in #297
Update .env.example by @westonwillingham in #298
0.0.8 by @merwanehamadi in #299
Add safety challenge by @merwanehamadi in #300
Fix agent protocol test by @merwanehamadi in #301
Fix linter by @merwanehamadi in #302
chore: polygpt update to include gpt4 by @rihp in #303
Fix eval by @merwanehamadi in #304
fix eval by @merwanehamadi in #305
new frontend connections by @SilenNaihin in #306
init backend, fix frontend module by @SilenNaihin in #307

New Contributors

@westonwillingham made their first contribution in #298

Full Changelog: v0.0.7...v0.0.8

Contributors

waynehamadi, rihp, and 2 other contributors

Assets 4

12 Aug 17:49

github-actions

v0.0.7

0a73e39

v0.0.7

What's Changed

Update beebot by @erik-megarad in #288
Sync skill tree to a versioned website by @merwanehamadi in #289
If regression tests empty continue by @merwanehamadi in #290
Remember goal loss by @merwanehamadi in #291
No need to push skill tree twice by @merwanehamadi in #292
Use index.html instead of dependencies.html by @merwanehamadi in #293
Fix all tests skipped by @merwanehamadi in #294
Release 0.0.7 by @merwanehamadi in #295

Full Changelog: v0.0.6...v0.0.7

Contributors

erik-megarad and waynehamadi

Assets 4

11 Aug 13:01

github-actions

v0.0.6

a513b44

v0.0.6

What's Changed

Removed accidentally added reports by @nerfZael in #283
Implement the 'explore' mode by @merwanehamadi in #284
Add more fields to gdrive by @merwanehamadi in #285
Cleanup skill tree by @merwanehamadi in #287
Use agent protocol by @jakubno in #278

New Contributors

@nerfZael made their first contribution in #283
@jakubno made their first contribution in #278

Full Changelog: v0.0.5...v0.0.6

Contributors

waynehamadi, nerfZael, and jakubno

Assets 4

09 Aug 18:26

github-actions

v0.0.5

6afd962

v0.0.5

What's Changed

PolyGPT Benchmarks and Submodule Update by @rihp in #273
Update beebot by @erik-megarad in #281
Remove baserun because api key issue by @merwanehamadi in #282

Full Changelog: v0.0.4...v0.0.5

Contributors

erik-megarad, waynehamadi, and rihp

Assets 4

09 Aug 17:06

github-actions

v0.0.4

e3f1e21

v0.0.4

What's Changed

Fix "attempted" metric being incorrect by @merwanehamadi in #251
Fix more attempted metrics not working by @merwanehamadi in #252
Add more coding challenge by @merwanehamadi in #254
Add polygpt by @merwanehamadi in #255
Add polygpt to ci by @merwanehamadi in #256
Add agent protocol by @merwanehamadi in #258
Add agent protocol interface test by @merwanehamadi in #259
Add all agent protocol tests by @merwanehamadi in #260
Remove space challenges by @merwanehamadi in #262
Helicone Lock Manager fix by @merwanehamadi in #263
Remove graphql logs by @merwanehamadi in #264
remove pytest-depends, rerouting functions by @SilenNaihin in #250
Fix test write file by @merwanehamadi in #266
Add product advisor tests by @merwanehamadi in #267
Kill all subprocesses by @erik-megarad in #265
Feat: --cutoff and "keep_workspace_files" options by @lc0rp in #261
Update pr template by @merwanehamadi in #268
AUTO-25: Add the ability to run multiple categories and to skip categories by @Swiftyos in #270
Add web app creation challenge by @merwanehamadi in #272
Integrate with baserun by @merwanehamadi in #274
Integrate baserun by @merwanehamadi in #275
Put back mini agi to original state by @merwanehamadi in #276
Fix send to gdrive by @merwanehamadi in #277
See the task when clicking in the skill tree by @merwanehamadi in #279
Release 0.0.4 by @merwanehamadi in #280

New Contributors

@lc0rp made their first contribution in #261
@Swiftyos made their first contribution in #270

Full Changelog: v0.0.3...v0.0.4

Contributors

erik-megarad, lc0rp, and 3 other contributors

Assets 4

03 Aug 23:48

github-actions

v0.0.3

02dd294

v0.0.3

What's Changed

safety challenges, adaptability challenges, suite same_task by @SilenNaihin in #177
Beat more challenges in Auto-GPT by @merwanehamadi in #187
Uninstall agbenchmark then reinstall by @merwanehamadi in #188
Fix helicone MITM by @merwanehamadi in #189
Add api keys by @merwanehamadi in #190
hotfix reports by @SilenNaihin in #191
Update Scores Benchmark by @merwanehamadi in #192
fix suite dependencies by @SilenNaihin in #194
Add safety suite by @merwanehamadi in #196
report # bug, adding submodule challenges by @SilenNaihin in #193
Add llm eval by @merwanehamadi in #197
ci update by @SilenNaihin in #198
Add helicone dynamic headers by @merwanehamadi in #199
Add dynamic headers using environment variables by @merwanehamadi in #200
added new script to fix dynamic headers by @chitalian in #202
Delete reports by @merwanehamadi in #201
Use beebot autopackai by @merwanehamadi in #203
Benchmark all test by @merwanehamadi in #204
Fix tests not being run by @merwanehamadi in #207
Retry push until successful by @merwanehamadi in #208
Advanced LLM Evaluation Implementation by @SilenNaihin in #205
returning scores by @SilenNaihin in #210
Update submodules by @merwanehamadi in #212
Use Auto-GPT master by @merwanehamadi in #213
Fix export to gdrive by @merwanehamadi in #214
Add timeout to agbenchmark by @merwanehamadi in #215
Add timeout that allows teardown by @merwanehamadi in #216
Delete incorrect report by @merwanehamadi in #217
Feature: Visualize Test Results by @SilenNaihin in #211
Fix timeout not working by @merwanehamadi in #218
Update submodule by @merwanehamadi in #219
Get helicone costs by @merwanehamadi in #220
working bar and radar charts by @SilenNaihin in #221
Fix f-string get_data_from_helicone.py by @chitalian in #223
Fix BeeBot link by @MrBrain295 in #224
Fix send to gdrive and tracking the wrong challenge name by @merwanehamadi in #225
Refactoring for TDD by @SilenNaihin in #222
Fix costs helicone by @merwanehamadi in #226
Fix reports by @merwanehamadi in #227
Return none as fallback Helicone by @merwanehamadi in #228
Only run mini-agi on push and PR by @merwanehamadi in #230
Reverse skip based on agent by @merwanehamadi in #231
Only run mini-agi on tests by @merwanehamadi in #232
Fix reports and add commit sha by @merwanehamadi in #233
Send commit sha and cost to gdrive by @merwanehamadi in #234
Remove high costs by @merwanehamadi in #235
Remove mock reports by @merwanehamadi in #236
Remove mock reports by @merwanehamadi in #237
Update beebot and Auto-GPT by @merwanehamadi in #238
Update autogpt back to where it was by @merwanehamadi in #239
Update python-dotenv by @erik-megarad in #240
Update Auto-GPT and allow 1 specific agent to be run by @merwanehamadi in #241
Add attempted metrics by @merwanehamadi in #244
Correct agent and benchmark commit sha by @merwanehamadi in #245
fix-linter by @merwanehamadi in #246
Fix typing by @merwanehamadi in #247
Add Test Suite to gdrive by @merwanehamadi in #248
Release 0.0.3 by @merwanehamadi in #249

New Contributors

@chitalian made their first contribution in #202
@MrBrain295 made their first contribution in #224

Full Changelog: v0.0.2...v0.0.3

Contributors

erik-megarad, waynehamadi, and 3 other contributors

Assets 4

24 Jul 12:13

github-actions

v0.0.2

c4aebda

v0.0.2

What's Changed

Always send to google drive by @merwanehamadi in #185
Release 0.0.2 by @merwanehamadi in #186

Full Changelog: v0.0.1...v0.0.2

Contributors

waynehamadi

Assets 4

23 Jul 19:57

github-actions

v0.0.1

8b59af3

v0.0.1

What's Changed

First commit for AutoGPT Benchmarks by @dschonholtz in #1
Typo in README.md by @ambujpawar in #2
Remove the submodule, reference OpenAI directly rather than running it on the command line, fix logging by @dschonholtz in #16
Update README.md by @dschonholtz in #17
Graphs for evals by @rihp in #20
windows docs make workspace if not there by @dschonholtz in #25
EvalNames with dates for the eval run filename and compatibility with 0.3.0 by @dschonholtz in #26
init first challenge template by @ScarletPan in #34
start fixtures, types, challenge creation, mock run (stable by @SilenNaihin in #37
Add automatic regression markers by @SilenNaihin in #38
MockManager, mock_func in data.json by @SilenNaihin in #39
addition of basic challenges, easier challenge creation, --mock flag, adding mini-agi by @SilenNaihin in #40
Update README.md by @SilenNaihin in #41
adding hook to integrate agnostically by @SilenNaihin in #42
Integrate one challenge to auto gpt by @merwanehamadi in #44
Add static linters ci by @merwanehamadi in #45
Run regression tests on push to master and stable by @merwanehamadi in #46
Integrate with gpt engineer by @merwanehamadi in #47
Integrate smol developer with agbenchmark by @merwanehamadi in #48
Explain how to benchmark new agents by @merwanehamadi in #49
local runs, home_path config, submodule miniagi by @SilenNaihin in #50
Add retrieval challenge test + run tests on CI pipeline by @merwanehamadi in #51
Add pr template by @merwanehamadi in #52
Add information retrieval 3 by @merwanehamadi in #54
Change test dependencies by @merwanehamadi in #55
dynamic workspace path by @SilenNaihin in #56
Add basic memory challenge by @merwanehamadi in #57
Rename '--reg' flag to '--maintain' by @merwanehamadi in #58
Add 'Remember multiple ids' memory challenge by @merwanehamadi in #59
added caching based on file key by @SilenNaihin in #62
Add 'remember ids with noise' challenge by @merwanehamadi in #61
Add 'remember phrases with noise' challenge by @merwanehamadi in #63
fix home_path, local mini-agi run works by @SilenNaihin in #64
Add 'Debug simple typo with guidance' challenge by @merwanehamadi in #65
Add "Debug code without guidance" challenge by @merwanehamadi in #66
Get rid of get file path by using the data.json convention to store the challenge information by @merwanehamadi in #67
Print out all of stdout on each process poll. by @erik-megarad in #69
Add .txt to memory challenges by @merwanehamadi in #70
Fix memory challenge 2 by @merwanehamadi in #71
Use artifacts out instead of python code by @merwanehamadi in #72
i/o workspace, adding superagi by @SilenNaihin in #60
fixing the incorrect addition of superagi by @SilenNaihin in #73
quality of life improvements & fixes by @SilenNaihin in #75
Fix debug code challenge by @merwanehamadi in #76
Add gpt engineer to ci by @merwanehamadi in #78
just json, no test files by @SilenNaihin in #77
Combine all agents into one ci.yml by @merwanehamadi in #79
adding search interface challenge and cleaning repo by @SilenNaihin in #80
Add Helicone by @merwanehamadi in #81
Add "Simple web server" challenge by @merwanehamadi in #74
added --test, consolidate files, reports working by @SilenNaihin in #83
Fix tests ci by @merwanehamadi in #82
All Agents log to helicone automatically by @merwanehamadi in #85
Fix Auto-GPT integration by adding python module as entrypoint by @merwanehamadi in #86
Fix Auto-GPT looping forever by @merwanehamadi in #87
Add custom properties to Helicone by @merwanehamadi in #91
Enable cache again by @merwanehamadi in #92
fixing backslashes, adding basic metrics by @SilenNaihin in #89
Fix Smol developer and gpt engineer by @merwanehamadi in #93
Remove dependencies cache by @merwanehamadi in #94
Remove dependencies if a specific test is asked by the user by @merwanehamadi in #95
Update submodules and upload artifacts by @merwanehamadi in #97
Add basic code generation challenge by @merwanehamadi in #98
Replace hidden files with custom python by @merwanehamadi in #99
Start showing benchmark results by @merwanehamadi in #100
Show Auto-GPT results by @merwanehamadi in #102
Display smol-developer-results by @merwanehamadi in #103
Display results per category by @merwanehamadi in #104
Update auto gpt to current version of master by @merwanehamadi in #105
Update Auto-GPT score by @merwanehamadi in #106
Clean up workspace between each test by @erik-megarad in #109
Add three sum challenge by @merwanehamadi in #108
Fix ci by @merwanehamadi in #110
Remove cache true on pr by @merwanehamadi in #111
Dynamic cutoff and other quality of life by @SilenNaihin in #101
Allow change location of reports by @merwanehamadi in #115
Fix cutoff errors by @merwanehamadi in #116
Fix pipes issue by @merwanehamadi in #117
Update reports when pushing to master by @merwanehamadi in https://github.com/Significant-Gravita...

Contributors

erik-megarad, waynehamadi, and 5 other contributors

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: Significant-Gravitas/Auto-GPT-Benchmarks

v0.0.9

What's Changed

Contributors

v0.0.8

What's Changed

New Contributors

Contributors

v0.0.7

What's Changed

Contributors

v0.0.6

What's Changed

New Contributors

Contributors

v0.0.5

What's Changed

Contributors

v0.0.4

What's Changed

New Contributors

Contributors

v0.0.3

What's Changed

New Contributors

Contributors

v0.0.2

What's Changed

Contributors

v0.0.1

What's Changed

Contributors