Skip to content

Conversation

letmerecall
Copy link
Contributor

Description

This PR enhances Private AI integration by adding support for PII masking capabilities. Previously, Private AI integration allowed for PII detection, preventing texts containing PII from being sent to the LLM or end user. With the new masking feature, texts can now be sanitized and shared with the LLM or user in a masked format.

For example, given the user input:

My email id is user@email.com

The PII masking configuration can transform the text into:

My email id is [EMAIL]

This enables safe data handling while maintaining the contextual integrity of the input.

@Pouyanpi

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@letmerecall
Copy link
Contributor Author

@Pouyanpi hope you had a wonderful holidays. PTAL whenever you find some time.

@Pouyanpi Pouyanpi self-assigned this Jan 8, 2025
@Pouyanpi Pouyanpi self-requested a review January 8, 2025 11:46
@Pouyanpi Pouyanpi added enhancement New feature or request status: in review labels Jan 8, 2025
result = await resp.json()

return any(res["entities_present"] for res in result)
return await resp.json()
Copy link
Collaborator

@Pouyanpi Pouyanpi Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve error handling for response parsing

As a suggestion, but please decide for yourself I have not given it much thought

            try:
                response_json = await resp.json()
            except aiohttp.ContentTypeError:
                raise ValueError(
                    f"Failed to parse response as JSON. Status: {resp.status}, "
                    f"Content: {await resp.text()}"
                )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds just right. Other issues should be be captured by status check above.

server_endpoint,
pai_api_key,
)
return private_ai_response[0]["processed_text"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Ensure that private_ai_response[0]["processed_text"] is always valid to avoid potential index errors.
  • Improve error handling
  • Update docstring with Raises (see ref)

Copy link
Collaborator

@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

@Pouyanpi Pouyanpi assigned letmerecall and unassigned Pouyanpi Jan 9, 2025
@Pouyanpi Pouyanpi mentioned this pull request Jan 9, 2025
@letmerecall
Copy link
Contributor Author

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

Having a temp Private AI API key to test is a good idea. Let me get back to you on that.

Thanks for the review and the references. I'll address them soon :)

letmerecall and others added 3 commits January 20, 2025 11:42
Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com>
Signed-off-by: Girish Sharma <girishsharma001@gmail.com>
@letmerecall
Copy link
Contributor Author

Hey @Pouyanpi, added error handling and live api testing.

Will it be possible for you to generate an API key using our portal? https://portal.private-ai.com? We can then bake it in as a secret for github actions. If you decide to do it, let me know your email id after creating a key, we can increase the quota accordingly :)

@Pouyanpi Pouyanpi self-requested a review January 21, 2025 07:54
Copy link
Collaborator

@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @letmerecall 👍🏻

@Pouyanpi Pouyanpi merged commit 35cc703 into NVIDIA-NeMo:develop Jan 21, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request status: in review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants