Responsible AI improvement with Content Safety/Content Filtering (multiple items) #129

gbecerra1982 · 2023-10-24T11:22:45Z

List of tasks: (see this item description below)

Create responsible ai configuration file/mechanism in gpt-rag #131
Add protected material check
Add prompt attack check Prompt shield
Add harm detection Text moderation (harm categories)
Review Content Safety Features to check if it adds value over the AOAI OOTB content filtering.
Migrate blocked words list (AOAI filtering)

Item description

User should be able to define what functions from Responsible AI plugin he/she wants to use as a guardrail when receiving the ask from the user and before sending the response back to the user and its thresholds.

List of functions:

Unfairness
Harm detection Text moderation (harm categories)
Prompt attacks Prompt shield
Protected material
Groundedness check (migrate to CS)
Blocked words (migrate to CS)

Notes:

Users can configure what functions they want to use and thresholds in gpt-rag configuration.
Items that can be met using native Azure OpenAI content filtering should do it so we save API calls.
Orchestator responses should contain metadata information about guardrails responses so future APIM or Security function can check them to enforce.

Out of scope items to be handled in a separated item:

IaaS (bicep) update to create and configure content safety service
Architecture redesign:
Create a new Azure Function "Custom Security Policy" that will receive the text from the Orchestrator and validate the content does not have violence, sexual, etc.
This function is the beggining of Security Function to add controls of security to the platform, further will be introduced additional security controls.

We need to prepare this function so the Security Team can add additional controls (i.e. Microsoft Purview, etc)

References:
https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-announces-prompt-shields-for-jailbreak-and-indirect/ba-p/4099140

https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/detect-and-mitigate-ungrounded-model-outputs/ba-p/4099261%23:~:text=Today%2520Azure%2520AI%2520makes%2520this,Copilots%2520and%2520document%2520summarization%2520applications.

vladborys · 2024-05-21T00:11:17Z

AOAI Blocklist branch: https://github.com/vladborys/GPT-RAG/tree/feature/aoai-blockwords

placerda · 2024-05-21T20:33:15Z

I'll pull this to my fork and then create the pull request after doing the bash/sh adjustments

placerda transferred this issue from Azure/GPT-RAG Nov 20, 2023

placerda changed the title ~~Add Content Safety Integrated~~ Add Content Safety Integrated (Orchestration) Apr 2, 2024

gbecerra1982 assigned gbecerra1982 and unassigned gbecerra1982 Apr 8, 2024

gbecerra1982 transferred this issue from Azure/gpt-rag-orchestrator Apr 8, 2024

gbecerra1982 assigned vladborys Apr 8, 2024

placerda changed the title ~~Add Content Safety Integrated (Orchestration)~~ Responsible AI improvement with Content Safety Apr 9, 2024

placerda changed the title ~~Responsible AI improvement with Content Safety~~ Responsible AI improvement with Content Safety (multiple items) Apr 9, 2024

placerda changed the title ~~Responsible AI improvement with Content Safety (multiple items)~~ Responsible AI improvement with Content Safety/Content Filtering (multiple items) May 7, 2024

placerda self-assigned this May 21, 2024

placerda mentioned this issue May 23, 2024

Feature/aoai blockwords #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Responsible AI improvement with Content Safety/Content Filtering (multiple items) #129

Responsible AI improvement with Content Safety/Content Filtering (multiple items) #129

gbecerra1982 commented Oct 24, 2023 •

edited by placerda

Loading

vladborys commented May 21, 2024

placerda commented May 21, 2024

Responsible AI improvement with Content Safety/Content Filtering (multiple items) #129

Responsible AI improvement with Content Safety/Content Filtering (multiple items) #129

Comments

gbecerra1982 commented Oct 24, 2023 • edited by placerda Loading

vladborys commented May 21, 2024

placerda commented May 21, 2024

gbecerra1982 commented Oct 24, 2023 •

edited by placerda

Loading