# API Tutorial - Deidentification

<a target="_blank" href="https://colab.research.google.com/github/ai-amplified/models/blob/main/tutorials/Deidentification.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Short Description:
The Deidentification AI model is designed to extract, conceal, and anonymize Protected Health Information (PHI) in English biomedical texts, ensuring HIPAA compliance. Developed by Aimped using a fine-tuned [DeBERTa-v3-small](https://github.com/microsoft/DeBERTa) transformer model and a proprietary dataset, it combines Deep Learning NER and rule-based regex patterns. The model outputs structured JSON files with extracted and masked PHI entities, suitable for various professionals in the healthcare and research fields. With an impressive F1 score of approximately **0.96**, it supports up to 128K characters for deidentification operations in both UI and API requests.

## Tutorial
This tutorial will guide you through using the Deidentification API. By following the steps below, you'll be able to extract, conceal, and anonymize Protected Health Information (PHI) from biomedical texts using the API. The main steps involved are:

1. Creating an access token
2. Installing the aimped library
3. Running the API with your credentials and payload

## Step 1: Create Access Token

To use the API, you need an access token. Follow these steps to create one:

1. Go to the [API Access Token Creation Page](https://aimped.ai/a3m/#/tokens). You will land here:
![Token Creation Page](images/token_11.png)

2. Select scopes and click on "Create Token".
3. After clicking this button, you will see the pop-up from where you can copy the User Key and User Secret.

![Token Creation Page2](images/token_22.png)

3. Copy the generated access tokens and keep it safe. You'll need it for the next steps.

## Step 2: Install aimped Library
To interact with the API, you need to install the aimped Python library. Open your terminal or command prompt and run the following command:

In [2]:
!pip install aimped==0.2.44 pandas

This command will install the necessary library to communicate with the API.

## Step 3: Run the API
Now that you have your access tokens and the library installed, you can run the API to apply deidentification. Follow these steps:

### Set up your credentials:

In [3]:
user_key = "YOUR_USER_KEY"
user_secret = "YOUR_USER_SECRET"

### Import the AimpedAPI class and set the base URL and model ID:
To deidentify other languages, you just need to change the **Model ID**. The Model ID can be found under "API Information" in the "API Details" tab on each model card.

In [10]:
from aimped.services.api import AimpedAPI

BASE_URL = 'https://aimped.ai'
model_id = "25" # the Model ID can be found under "API Information" in the "API Details" tab on each model card.

### Initialize the API service:

In [16]:
api_service = AimpedAPI(user_key, user_secret, {"base_url": BASE_URL})

### Define your payload:
Define payload according to your input data type.

#### For Text input

In [20]:
payload = {
    "data_type": "data_json",
    "data_json": {
        "text": [
            "Mrs. Jane Smith, born on January 10, 1975, with Social Security Number 123-45-6789, has been undergoing treatment for diabetes mellitus type 2 at our clinic since March 2018, where she receives regular insulin injections and takes metformin 1000mg daily to manage her blood sugar levels, as prescribed by Dr. Johnson, her primary care physician.",
            ],
        "masked": True,
        "faked": True,
        "entity": [
            "DATE",
            "DOCTOR",
            "AGE",
            "PATIENT",
            "MEDICALRECORD",
            "IDNUM",
            "ORGANIZATION",
            "CITY",
            "STREET",
            "COUNTRY",
            "ZIP",
            "ACCOUNT",
            "PLATE",
            "LICENSE",
            "DEVICE",
            "HOSPITAL",
            "LOCATION",
            "PATIENT",
            "PHONE",
            "PROFESSION",
            "STATE",
            "USERNAME",
            "URL",
            "EMAIL",
            "FAX",
            "IP",
            "VIN",
            "SSN",
            "DLN"
        ]
    }
}


#### For File Input

In [None]:
path_uri_obj = api_service.file_upload(
    model_id,
    '/Users/John/Downloads/sample.txt'  # sample file path to upload
    )
path_uri = path_uri_obj['url']

payload = {
  "data_type": "data_txt",
  "extra_fields": {
    "masked": True,
    "faked": True,
    "entity": [
            "DATE",
            "DOCTOR",
            "AGE",
            "PATIENT",
            "MEDICALRECORD",
            "IDNUM",
            "ORGANIZATION",
            "CITY",
            "STREET",
            "COUNTRY",
            "ZIP",
            "ACCOUNT",
            "PLATE",
            "LICENSE",
            "DEVICE",
            "HOSPITAL",
            "LOCATION",
            "PATIENT",
            "PHONE",
            "PROFESSION",
            "STATE",
            "USERNAME",
            "URL",
            "EMAIL",
            "FAX",
            "IP",
            "VIN",
            "SSN",
            "DLN"
        ]
  },
  "data_txt": [
    path_uri
  ]
}

### Run the model:

In [21]:
result = api_service.run_model(model_id, payload)

If you're running this model for the first time or after a long time, you might see the following message:

In [19]:
print(result)

{'message': 'We will notify you via email when the instance is ready.'}


Wait for the email notification indicating that the instance is ready. You will be notified on the [Aimped](https://aimped.ai/) as well.
![Notification Page](images/notif_1.png)

You will see this notification, once the instance is ready:
![Notification Page2](images/deidentify_notif.png)

Once you receive the email or notification on aimped, run the model again:

In [44]:
result = api_service.run_model(model_id, payload)

In [5]:
result

{'used_credits': 4.22625,
 'status': True,
 'data_type': ['data_json'],
 'output': {'data_json': {'result': [{'entities': [{'entity': 'PATIENT',
       'confidence': 0.9999309778213501,
       'chunk': 'Jane Smith',
       'begin': 5,
       'end': 15,
       'faked_chunk': 'Yettie Dicte'},
      {'entity': 'DATE',
       'confidence': 0.9999922513961792,
       'chunk': 'January 10, 1975',
       'begin': 25,
       'end': 41,
       'faked_chunk': '12/25/2008'},
      {'chunk': '123-45-6789',
       'confidence': 1,
       'begin': 71,
       'end': 82,
       'entity': 'SSN',
       'faked_chunk': '193-09-5107'},
      {'entity': 'DATE',
       'confidence': 0.999982476234436,
       'chunk': 'March 2018',
       'begin': 163,
       'end': 173,
       'faked_chunk': '1999'},
      {'entity': 'DOCTOR',
       'confidence': 0.99998939037323,
       'chunk': 'Johnson',
       'begin': 309,
       'end': 316,
       'faked_chunk': 'Dunn, Amanda'}],
     'masked_text': 'Mrs. <<PATIENT>>

### Visualizing the results

In [8]:
!pip install -qqq seqeval

In [9]:
from aimped.nlp.deid import DeidentificationVisualizer
visualizer = DeidentificationVisualizer()

In [10]:
# This is the input text we've used in the payload
text = "Mrs. Jane Smith, born on January 10, 1975, with Social Security Number 123-45-6789, has been undergoing treatment for diabetes mellitus type 2 at our clinic since March 2018, where she receives regular insulin injections and takes metformin 1000mg daily to manage her blood sugar levels, as prescribed by Dr. Johnson, her primary care physician."

In [11]:
data = result["output"]['data_json']["result"][0]["entities"]

In [12]:
# Display PHI Entities
visualizer.display_visualization(text, data, mode='phi_entities')

In [13]:
# Display Anonymized
visualizer.display_visualization(text, data, mode='anonymized')

In [14]:
# Display Pseudonymized
visualizer.display_visualization(text, data, mode='pseudonymized')