# **OCR for Sensitive Data Protection in Images**
                                                    
This notebook shows an example of how Azure AI OCR can help detect and protect sensitive data embedded in images. Azure AI OCR extracts text from an image of a Social Security Card, the extracted text is then passed to Azure PII detection API. The PII detection API detects, and sensors sensitive text extracted from the image. 

## **Prerequisites**
•	Azure subscription - [Create one for free](https://azure.microsoft.com/en-us/free/ai-services/).  
•	Python 3.7 or later  
•	Once you have your Azure subscription, create an [Azure AI Services Resource](https://ms.portal.azure.com/#view/Microsoft_Azure_Marketplace/GalleryItemDetailsBladeNopdl/id/Microsoft.CognitiveServicesAllInOne/selectionMode~/false/resourceGroupId//resourceGroupLocation//dontDiscardJourney~/false/selectedMenuId/home/launchingContext~/%7B%22galleryItemId%22%3A%22Microsoft.CognitiveServicesAllInOne%22%2C%22source%22%3A%5B%22GalleryFeaturedMenuItemPart%22%2C%22VirtualizedTileDetails%22%5D%2C%22menuItemId%22%3A%22home%22%2C%22subMenuItemId%22%3A%22Search%20results%22%2C%22telemetryId%22%3A%2283634c2f-d125-43ab-97bc-7a640bbe21b8%22%7D/searchTelemetryId/371171fe-5873-4a59-b146-a99c27091437) in the Azure portal to get your key and endpoint. After it deploys, select Go to resource. You'll need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below.  


## Architectural Diagram

![Architectural Diagram](diagram.jpg)

## Example Image

![Example Image](example.png)

## Set up

In [None]:
pip install azure-ai-vision-imageanalysis

In [None]:
pip install azure-ai-textanalytics==5.2.0

## Example Code

In [None]:
import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.exceptions import HttpResponseError

# Set the values of your computer vision endpoint and computer vision key
# as environment variables:
try:
    endpoint = "" #Paste your AI services endpoint here
    key = "" #Paste your AI services resource key here
except KeyError:
    print("Missing 'ENDPOINT' or 'KEY'")
    print("Set them before running this sample.")
    exit()

# Create an Image Analysis client
image_analysis_client = ImageAnalysisClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)

#Create an Azure Text Analytics client
text_analytics_client = TextAnalyticsClient(
            endpoint=endpoint, 
            credential=AzureKeyCredential(key)
)


# Example method for detecting sensitive information (PII) from text in images 
def pii_recognition_example(client):

    #Get text from the image using Image Analysis OCR
    ocr_result = image_analysis_client.analyze_from_url(
    image_url="https://resources.ssnsimple.com/wp-content/uploads/2019/11/social-security-number.jpg",
    visual_features=[VisualFeatures.READ],
)
   
    documents = [' '.join([line['text'] for line in ocr_result.read.blocks[0].lines])]
  
    print(documents)

    #Detect sensitive information in OCR output
    response = text_analytics_client.recognize_pii_entities(documents, language="en")
    result = [doc for doc in response if not doc.is_error]
    
    for doc in result:
        print("Redacted Text: {}".format(doc.redacted_text))
        for entity in doc.entities:
            print("Entity: {}".format(entity.text))
            print("\tCategory: {}".format(entity.category))
            print("\tConfidence Score: {}".format(entity.confidence_score))
            print("\tOffset: {}".format(entity.offset))
            print("\tLength: {}".format(entity.length))
            
pii_recognition_example(text_analytics_client)
