Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Content Moderation #1

Open
ricardobalk opened this issue Jul 27, 2023 · 0 comments
Open

Implement Content Moderation #1

ricardobalk opened this issue Jul 27, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ricardobalk
Copy link
Contributor

ricardobalk commented Jul 27, 2023

OpenAI's Content Moderation endpoint is a valuable tool for preventing requests and responses that violate the content policy. In order to ensure that HackGPT users can utilize this feature effectively, we propose the addition of a content moderation option. This feature will allow users to check requests and responses against the Moderation Endpoint and take appropriate actions based on the results.

Proposed Feature

1. Content Moderation Integration

We will add an option to HackGPT that enables content moderation checks for both incoming requests and outgoing responses. Users can enable this feature by specifying a configuration flag.

2. Simple Mode vs. Advanced Mode

The content moderation feature will offer two modes of operation: Simple Mode and Advanced Mode.

Simple Mode:

In this mode, users can enable content moderation with a single on/off switch. When enabled, HackGPT will perform a standard content moderation check for all categories available in the moderation endpoint.

Advanced Mode:

Advanced Mode provides users with more flexibility. Users can select specific categories from the moderation endpoint for which they want to perform content moderation. This allows users to tailor the moderation process according to their specific needs.

3. Threshold Configuration for Advanced Mode

When Advanced Mode is active, users will have the option to set custom thresholds for the selected categories. This involves using sliders to adjust the threshold values.

For example, a user might set a lower threshold for the "sexual" category if they wish to be more cautious about potential sexual content, whereas they could set a higher threshold for "harassment" if they want to allow a certain degree of leeway in that category.

4. Handling of Moderation Results

The results from the content moderation checks will be used to determine whether the incoming requests and outgoing responses comply with OpenAI's Content Policy.

A) Blocking

In Option A, if the moderation endpoint returns "flagged: true" for any category, HackGPT will automatically prevent the response from being sent to the user. Additionally, it will provide a notification to the user indicating that the content has been blocked by the content moderation.

B) Flagging

Option B, when enabled, will allow HackGPT to perform content moderation but instead of automatically blocking the response, it will flag the content that has been identified as potentially violating OpenAI's Content Policy. A notification will be sent to the user, indicating that the content has been flagged.


Conclusion

By implementing the content moderation feature, HackGPT users can ensure that conversations are in compliance with OpenAI's Content Policy. The flexibility provided by the Simple Mode and Advanced Mode allows users to adjust the content moderation settings according to their specific requirements and preferences.

@ricardobalk ricardobalk added this to the 1.0.0-alpha.2 milestone Jul 27, 2023
@ricardobalk ricardobalk self-assigned this Jul 27, 2023
@ricardobalk ricardobalk added the enhancement New feature or request label Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

1 participant