Risk-avoiding continuous mode #789

jnt0rrente · 2023-04-11T10:47:51Z

Duplicates

I have searched the existing issues

Summary 💡

When in risk-avoiding mode, one more GPT call should be made before running each command, asking it to moderate the would-be next command. Ideally, the call would return a value on a specific range we can compare with a user-defined risk threshold. If the risk exceeds the threshold, pause execution and await human feedback.

Examples 🌈

The intermediate call can be prompted as something on the lines of

S: You are a moderation agent tasked with reviewing potential actions of an autonomous agent. You will be given a command to review, and you are to evaluate said command on a risk scale from 0 to 100, where 0 is and 100 is .
U: COMMAND:

Motivation 🔦

There is currently no way to attempt properly using AutoGPT without either babysitting or taking the risk of giving it free, unmonitored agency.

This feature would allow users to put some more trust on AutoGPT and "letting it loose" while trusting on its own self-moderation capabilities.

jnt0rrente · 2023-04-11T10:49:57Z

It would obviously be necessary to test the responses' accuracy when evaluating risk and to tweak the prompt to something less improvised. Nevertheless, I believe this would be a good enough safeguard until a more complex system can be implemented.

richbeales · 2023-04-11T10:50:18Z

It would also potentially double the cost of running AutoGPT

jnt0rrente · 2023-04-11T10:52:07Z

Yes, it would increase costs, but not doubling at all since embeddings are not used here. Still, the idea is to have this as a third, optional mode.

jnt0rrente · 2023-04-11T17:38:52Z

I am almost done implementing this. GPT-4 does a very good job analyzing commands. However, GPT-3.5 is lacking at best. Will post feedback in pull request.

dboitnot · 2023-04-12T19:12:35Z

An alternative might be to allow the user to whitelist certain commands like do_nothing and list_agents and google.

Boostrix · 2023-04-30T12:18:32Z

related: #2701 (safeguards) but also: #2987 (comment) (preparing shell commands prior to executing them by adding relevant context like 1) availability of tools, 2) location/path, 3) version number)

Boostrix · 2023-05-08T20:07:11Z

anonhostpi · 2023-05-09T02:29:30Z

There's also quite a few other discussions playing with the idea of self-moderation:

https://gist.github.com/anonhostpi/97d4bb3e9535c92b8173fae704b76264#observerregulatory-agents-and-restrictions-proposals

Some of the ideas include using a separate agent for observing other agents and assessing whether or not they violated some sort of compliance.

github-actions · 2023-09-17T01:53:32Z

This issue was closed automatically because it has been stale for 10 days with no activity.

jnt0rrente mentioned this issue Apr 12, 2023

Adds risk avoidance mode and relevant config. #934

Closed

5 tasks

jnt0rrente linked a pull request Apr 12, 2023 that will close this issue

Adds risk avoidance mode and relevant config. #934

Closed

5 tasks

Boostrix mentioned this issue May 5, 2023

Command Base Class Interface #3824

Merged

5 tasks

This was referenced May 9, 2023

Train spawned agents #2944

Closed

fix/execute_code #3884

Closed

lc0rp added enhancement New feature or request Security 🛡️ labels Jun 12, 2023

github-actions bot added the Stale label Sep 6, 2023

Pwuts mentioned this issue Sep 10, 2023

Auto-GPT Performance 📈 #5190

Open

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Risk-avoiding continuous mode #789

Risk-avoiding continuous mode #789

jnt0rrente commented Apr 11, 2023 •

edited

jnt0rrente commented Apr 11, 2023

richbeales commented Apr 11, 2023

jnt0rrente commented Apr 11, 2023

jnt0rrente commented Apr 11, 2023

dboitnot commented Apr 12, 2023

Boostrix commented Apr 30, 2023

Boostrix commented May 8, 2023

anonhostpi commented May 9, 2023

github-actions bot commented Sep 17, 2023

Risk-avoiding continuous mode #789

Risk-avoiding continuous mode #789

Comments

jnt0rrente commented Apr 11, 2023 • edited

Duplicates

Summary 💡

Examples 🌈

Motivation 🔦

jnt0rrente commented Apr 11, 2023

richbeales commented Apr 11, 2023

jnt0rrente commented Apr 11, 2023

jnt0rrente commented Apr 11, 2023

dboitnot commented Apr 12, 2023

Boostrix commented Apr 30, 2023

Boostrix commented May 8, 2023

anonhostpi commented May 9, 2023

github-actions bot commented Sep 17, 2023

jnt0rrente commented Apr 11, 2023 •

edited