Implement Content Moderation #1

ricardobalk · 2023-07-27T07:41:50Z

OpenAI's Content Moderation endpoint is a valuable tool for preventing requests and responses that violate the content policy. In order to ensure that HackGPT users can utilize this feature effectively, we propose the addition of a content moderation option. This feature will allow users to check requests and responses against the Moderation Endpoint and take appropriate actions based on the results.

Proposed Feature

1. Content Moderation Integration

We will add an option to HackGPT that enables content moderation checks for both incoming requests and outgoing responses. Users can enable this feature by specifying a configuration flag.

2. Simple Mode vs. Advanced Mode

The content moderation feature will offer two modes of operation: Simple Mode and Advanced Mode.

Simple Mode:

In this mode, users can enable content moderation with a single on/off switch. When enabled, HackGPT will perform a standard content moderation check for all categories available in the moderation endpoint.

Advanced Mode:

Advanced Mode provides users with more flexibility. Users can select specific categories from the moderation endpoint for which they want to perform content moderation. This allows users to tailor the moderation process according to their specific needs.

3. Threshold Configuration for Advanced Mode

When Advanced Mode is active, users will have the option to set custom thresholds for the selected categories. This involves using sliders to adjust the threshold values.

For example, a user might set a lower threshold for the "sexual" category if they wish to be more cautious about potential sexual content, whereas they could set a higher threshold for "harassment" if they want to allow a certain degree of leeway in that category.

4. Handling of Moderation Results

The results from the content moderation checks will be used to determine whether the incoming requests and outgoing responses comply with OpenAI's Content Policy.

A) Blocking

In Option A, if the moderation endpoint returns "flagged: true" for any category, HackGPT will automatically prevent the response from being sent to the user. Additionally, it will provide a notification to the user indicating that the content has been blocked by the content moderation.

B) Flagging

Option B, when enabled, will allow HackGPT to perform content moderation but instead of automatically blocking the response, it will flag the content that has been identified as potentially violating OpenAI's Content Policy. A notification will be sent to the user, indicating that the content has been flagged.

Conclusion

By implementing the content moderation feature, HackGPT users can ensure that conversations are in compliance with OpenAI's Content Policy. The flexibility provided by the Simple Mode and Advanced Mode allows users to adjust the content moderation settings according to their specific requirements and preferences.

ricardobalk added this to the 1.0.0-alpha.2 milestone Jul 27, 2023

ricardobalk self-assigned this Jul 27, 2023

ricardobalk added the enhancement New feature or request label Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Content Moderation #1

Implement Content Moderation #1

ricardobalk commented Jul 27, 2023 •

edited

Loading

Implement Content Moderation #1

Implement Content Moderation #1

Comments

ricardobalk commented Jul 27, 2023 • edited Loading

Proposed Feature

1. Content Moderation Integration

2. Simple Mode vs. Advanced Mode

Simple Mode:

Advanced Mode:

3. Threshold Configuration for Advanced Mode

4. Handling of Moderation Results

A) Blocking

B) Flagging

Conclusion

ricardobalk commented Jul 27, 2023 •

edited

Loading