Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Risk-avoiding continuous mode #789

Closed
1 task done
jnt0rrente opened this issue Apr 11, 2023 · 9 comments
Closed
1 task done

Risk-avoiding continuous mode #789

jnt0rrente opened this issue Apr 11, 2023 · 9 comments
Labels

Comments

@jnt0rrente
Copy link

jnt0rrente commented Apr 11, 2023

Duplicates

  • I have searched the existing issues

Summary 💡

When in risk-avoiding mode, one more GPT call should be made before running each command, asking it to moderate the would-be next command. Ideally, the call would return a value on a specific range we can compare with a user-defined risk threshold. If the risk exceeds the threshold, pause execution and await human feedback.

Examples 🌈

The intermediate call can be prompted as something on the lines of

S: You are a moderation agent tasked with reviewing potential actions of an autonomous agent. You will be given a command to review, and you are to evaluate said command on a risk scale from 0 to 100, where 0 is and 100 is .
U: COMMAND:

Motivation 🔦

There is currently no way to attempt properly using AutoGPT without either babysitting or taking the risk of giving it free, unmonitored agency.

This feature would allow users to put some more trust on AutoGPT and "letting it loose" while trusting on its own self-moderation capabilities.

@jnt0rrente
Copy link
Author

It would obviously be necessary to test the responses' accuracy when evaluating risk and to tweak the prompt to something less improvised. Nevertheless, I believe this would be a good enough safeguard until a more complex system can be implemented.

@richbeales
Copy link
Contributor

It would also potentially double the cost of running AutoGPT

@jnt0rrente
Copy link
Author

Yes, it would increase costs, but not doubling at all since embeddings are not used here. Still, the idea is to have this as a third, optional mode.

@jnt0rrente
Copy link
Author

I am almost done implementing this. GPT-4 does a very good job analyzing commands. However, GPT-3.5 is lacking at best. Will post feedback in pull request.

@dboitnot
Copy link

An alternative might be to allow the user to whitelist certain commands like do_nothing and list_agents and google.

@jnt0rrente jnt0rrente linked a pull request Apr 12, 2023 that will close this issue
5 tasks
@Boostrix
Copy link
Contributor

related: #2701 (safeguards) but also: #2987 (comment) (preparing shell commands prior to executing them by adding relevant context like 1) availability of tools, 2) location/path, 3) version number)

@Boostrix
Copy link
Contributor

Boostrix commented May 8, 2023

@anonhostpi
Copy link

There's also quite a few other discussions playing with the idea of self-moderation:

https://gist.github.com/anonhostpi/97d4bb3e9535c92b8173fae704b76264#observerregulatory-agents-and-restrictions-proposals

Some of the ideas include using a separate agent for observing other agents and assessing whether or not they violated some sort of compliance.

This was referenced May 9, 2023
@lc0rp lc0rp added enhancement New feature or request Security 🛡️ labels Jun 12, 2023
@github-actions github-actions bot added the Stale label Sep 6, 2023
@github-actions
Copy link
Contributor

This issue was closed automatically because it has been stale for 10 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants