# PIIDetectionService - MSFT Presidio
The following is an example of how to use the PIIDetectionService with the MSFT presidio analyzer

>>Note: This notebook requires all dependencies to have been installed. For more information, review readme.


In [1]:
from pii_codex.services.detection_service import PIIDetectionService
from pii_codex.models.common import PIIType
from pii_codex.models.microsoft_presidio_pii import MSFTPresidioPIIType

presidio_entities = [MSFTPresidioPIIType[PIIType.PHONE_NUMBER.name].value,
                     MSFTPresidioPIIType[PIIType.EMAIL_ADDRESS.name].value]

# Perform PII detection with MSFT Presidio
detection_results = PIIDetectionService().analyze_with_msft_presidio(
    text="Here is my contact information: Phone number 555-555-5555 and my email is example123@email.com",
    entities=presidio_entities,
    language_code="en"
)

detection_results

DetectionResultList(detection_results=[DetectionResult(entity_type='EMAIL_ADDRESS', score=1.0, start=74, end=94), DetectionResult(entity_type='PHONE_NUMBER', score=0.75, start=45, end=57)])

In [2]:
# Loop through all detected PII
for detected_pii in detection_results.detection_results:
    print(detected_pii)

DetectionResult(entity_type='EMAIL_ADDRESS', score=1.0, start=74, end=94)
DetectionResult(entity_type='PHONE_NUMBER', score=0.75, start=45, end=57)


Alternatively, the analysis service can be called, and it will provide both the detections and the severity scores for any detections listed.

In [6]:
from pii_codex.models.common import AnalysisProviderType, AnalysisEncoder
from pii_codex.services.analysis_service import PIIAnalysisService
from pii_codex.services.ranking_service import PIIRanker
import pandas as pd

analysis_results = PIIAnalysisService().run_analysis(
    analysis_provider=AnalysisProviderType.MICROSOFT_PRESIDIO_ANALYSIS.name,
    text="Here is my contact information: Phone number 555-555-5555 and my email is example123@email.com",
    language_code="en"
)

mean_risk_score = PIIRanker().calculate_risk_assessment_score_average(analysis_results.analysis_results)

results = AnalysisEncoder().encode(analysis_results.analysis_results)
df = pd.DataFrame(analysis_results.to_dict())
print("Mean Score for Assessment: ", mean_risk_score)
print("Data from Assessment:")
df

Mean Score for Assessment:  2.6666666666666665
Data from Assessment:


Unnamed: 0,pii_type_detected,risk_level,risk_level_definition,cluster_membership_type,hipaa_category,dhs_category,nist_category,entity_type,score,start,end
0,EMAIL_ADDRESS,3,Identifiable,Personal Preferences,Protected Health Information,Stand Alone PII,Directly PII,EMAIL_ADDRESS,1.0,74,94
1,PHONE_NUMBER,3,Identifiable,Contact Information,Protected Health Information,Stand Alone PII,Directly PII,PHONE_NUMBER,0.75,45,57
2,URL,2,Semi-Identifiable,Community Interaction,Not Protected Health Information,Linkable,Linkable,URL,0.5,85,94
