# Data Masking

The data masking module anonymizes or pseudonymizes personally identifiable information (PII) before it is processed by the LLM module. When data is anonymized, all identifying information is replaced with placeholders (e.g., MASKED_ENTITY), and the original data cannot be recovered, ensuring that no trace of the original information is retained. In contrast, pseudonymized data is substituted with unique placeholders (e.g., MASKED_ENTITY_ID), allowing the original information to be restored if needed. In both cases, the masking module identifies sensitive data and replaces it with appropriate placeholders before further processing.

First up the `DataMasking` module must be imported and configured. A method (either `ANONYMIZATION` or `PSEUDONYMIZATION`) must be set and entities must be targeted (`EMAIL`, `PHONE`, `PERSON`, `ORG`, `LOCATION`).

In [None]:
from gen_ai_hub.orchestration.models.data_masking import DataMasking
from gen_ai_hub.orchestration.models.sap_data_privacy_integration import SAPDataPrivacyIntegration, MaskingMethod, ProfileEntity

data_masking = DataMasking(
    providers=[
        SAPDataPrivacyIntegration(
            method=MaskingMethod.ANONYMIZATION,  # or MaskingMethod.PSEUDONYMIZATION
            entities=[
                ProfileEntity.EMAIL,
                ProfileEntity.PHONE,
                ProfileEntity.PERSON,
                ProfileEntity.ORG,
                ProfileEntity.LOCATION
            ]
        )
    ]
)

To complete the configuration, simply incorporate the pre-configured `DataMasking` module into the `OrchestrationConfig`, following the same process we've used for other modules.

In [None]:
import os

from gen_ai_hub.orchestration.models.config import OrchestrationConfig
from gen_ai_hub.orchestration.models.template import Template, TemplateValue
from gen_ai_hub.orchestration.models.llm import LLM
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
from gen_ai_hub.orchestration.service import OrchestrationService

config = OrchestrationConfig(
    template=Template(
        messages=[
            SystemMessage("You are a helpful AI assistant."),
            UserMessage("Please repeat the following input: {{?pii}}"),
        ]
    ),
    llm=LLM(
        name="gemini-1.5-flash",
    ),
    data_masking=data_masking
)

orchestration_service = OrchestrationService(
    api_url=os.environ["AICORE_ORCHESTRATION_DEPLOYMENT_URL"],
    config=config,
)

result = orchestration_service.run(
    config=config,
    template_values=[
        TemplateValue(
            name="pii",
            value="My name is Max Mustermann. You can contact me via max.mustermann@sap.com. I live in Dietmar-Hopp-Allee 16, Walldorf Germany.",
        )
    ]
)

print(result.orchestration_result.choices[0].message.content)

Make sure to try out anonymizing/pseudonymizing different entities or try out changing the user message.

# Summary

In this exercise you learned how data can be masked using orchestration. Now let's combine capabilities into a more complex scenario. Continue to [Exercise 4 - Orchestration Chatbot](./ex4.ipynb).