feat: add pii masking capability to private ai integration #901

letmerecall · 2024-12-11T17:16:13Z

Description

This PR enhances Private AI integration by adding support for PII masking capabilities. Previously, Private AI integration allowed for PII detection, preventing texts containing PII from being sent to the LLM or end user. With the new masking feature, texts can now be sanitized and shared with the LLM or user in a masked format.

For example, given the user input:

My email id is user@email.com

The PII masking configuration can transform the text into:

My email id is [EMAIL]

This enables safe data handling while maintaining the contextual integrity of the input.

@Pouyanpi

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

letmerecall · 2025-01-08T08:58:23Z

@Pouyanpi hope you had a wonderful holidays. PTAL whenever you find some time.

docs/user-guides/guardrails-library.md

nemoguardrails/library/privateai/actions.py

nemoguardrails/library/privateai/request.py

Pouyanpi · 2025-01-09T10:28:03Z

nemoguardrails/library/privateai/request.py

-            result = await resp.json()
-
-            return any(res["entities_present"] for res in result)
+            return await resp.json()


improve error handling for response parsing

As a suggestion, but please decide for yourself I have not given it much thought

try: response_json = await resp.json() except aiohttp.ContentTypeError: raise ValueError( f"Failed to parse response as JSON. Status: {resp.status}, " f"Content: {await resp.text()}" )

This sounds just right. Other issues should be be captured by status check above.

Pouyanpi · 2025-01-09T10:29:52Z

nemoguardrails/library/privateai/actions.py

+        server_endpoint,
+        pai_api_key,
+    )
+    return private_ai_response[0]["processed_text"]


Ensure that private_ai_response[0]["processed_text"] is always valid to avoid potential index errors.

Improve error handling

Update docstring with Raises (see ref)

Pouyanpi

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

letmerecall · 2025-01-09T20:01:28Z

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

Having a temp Private AI API key to test is a good idea. Let me get back to you on that.

Thanks for the review and the references. I'll address them soon :)

Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Girish Sharma <girishsharma001@gmail.com>

letmerecall · 2025-01-20T18:00:20Z

Hey @Pouyanpi, added error handling and live api testing.

Will it be possible for you to generate an API key using our portal? https://portal.private-ai.com? We can then bake it in as a secret for github actions. If you decide to do it, let me know your email id after creating a key, we can increase the quota accordingly :)

Pouyanpi

Thank you @letmerecall 👍🏻

letmerecall added 5 commits December 11, 2024 20:18

Add tests for private ai pii masking

da2f547

Add pii masking action and flow for private ai

9b4a4f6

Update private ai docs to add pii masking

eb4577f

Add private ai pii masking example config

85409c5

Add private ai pii masking example in notebook

eebe8c4

Pouyanpi self-assigned this Jan 8, 2025

Pouyanpi self-requested a review January 8, 2025 11:46

Pouyanpi added enhancement New feature or request status: in review labels Jan 8, 2025