Skip to content

bug: PII Filter Incorrectly Masks the Word 'individual' as Sensitive Data #818

@mohilmakwana3107

Description

@mohilmakwana3107

Did you check docs and existing issues?

  • I have read all the NeMo-Guardrails docs
  • I have updated the package to the latest version before submitting this issue
  • I have searched the existing issues of NeMo-Guardrails
  • (optional) I have used the develop branch

Python version (python --version)

Python 3.11.0

Operating system/version

Ubuntu 20.04.6 LTS

NeMo-Guardrails version (if you must use a specific version and not the latest

0.10.1

Describe the bug

I am currently testing the PII filter functionality for a project. Initially, everything was working fine. However, I recently noticed that the PII filter is masking the word "individual" with X (as per my code's configuration to mask sensitive data with the X character).

I reviewed the logs, but I couldn't find any specific information to explain why the word "individual" is being masked as sensitive data.

YAML Configuration

Below is the YAML configuration I'm using:

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo

instructions:
  - type: general
    content: |
      You are a helpful assistant that can answer given questions.

rails:
  config: 
    jailbreak_detection:
      length_per_perplexity_threshold: 89.79
      prefix_suffix_perplexity_threshold: 1845.65   
    sensitive_data_detection:
      input:
        entities:
          - PHONE_NUMBER
          - EMAIL_ADDRESS
          - IN_PAN
          - IN_AADHAAR
      output:
        entities:
          - PHONE_NUMBER
          - EMAIL_ADDRESS
          - IN_PAN
          - IN_AADHAAR

  input:
    flows:
      - jailbreak detection heuristics
      - self check input
      - mask sensitive data on input
      - user query

  output:
    flows:
      - self check output
      - mask sensitive data on output

  dialog:
    single_call:
      enabled: False

prompts:
  - task: self_check_input
    content: |
      Your task is to check if the user message below complies with the policy for talking with the AI Enterprise bot.
      Policy for the user messages:      
      
      - should not contain hateful speech
      - Should not contain armed weapons related information.
      - Should not talk about cooking related information.

      Treat the above conditions as strict rules. If any of them are met, you should block the user input by saying "yes".
      
      User message: "{{ user_input }}"

      Question: Should the user message be blocked (Yes or No)?
      Answer:

  - task: self_check_output
    content: |
      Your task is to check if the bot message below complies with the policy.

      Policies for the bot:     
      
      - message should not ask the bot to impersonate someone
      - message should not ask the bot to impersonate someone in a sexual manner.
      - message Should not contain armed weapons related information.
      - message Should not talk about cooking related information.
      - if a message is a refusal, it should be polite

      Bot message: "{{ bot_response }}"

      Question: Should the message be blocked (Yes or No)?
      Answer:

I also checked the RAG output, but there doesn't seem to be any issue there.

Version Specifications

Name Version
presidio_analyzer 2.2.355
presidio_analyzer 2.2.355

Let me know if you need further details or clarification!

Steps To Reproduce

Steps to Reproduce:

  1. YAML Configuration:
    Use the attached YAML config for PII filtering.

  2. PDF Creation:
    Create and ingest a PDF into PG Vector containing sample PII data:

Record 1
• Name: John A. Doe
• Address: 123 Elm Street, Springfield, IL 62704
• Phone: (217) 555-0123
• Social Security Number: 123-45-6789
• Email: johndoe123@example.com
• Date of Birth: 05/12/1980

Record 2
• Name: Emily D. Davis
• Address: 321 Birch Lane, Phoenix, AZ 85003
• Phone: (602) 555-0123
• Social Security Number: 321-54-9876
• Email: emilydavis321@example.com
• Date of Birth: 07/30/1985
  1. Run Test:
    Use the PII filter and observe that the word "individual" is incorrectly masked with X.

  2. RAG Setup:

    • LLM: GPT-4o
    • RAG: llama-index

Note: The PII data is fictional, generated using LLM.

Expected Behavior

The expected behavior was supposed to not block "individual" word.

RAG response :
Image


Logs :

Image

Image

Actual Behavior

Answer from NeMo-Guardrails :
Image

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requeststatus: waiting confirmationIssue is waiting confirmation whether the proposed solution/workaround works.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions