### PII Detection with a pre-trained BOLT NER model

In this notebook, we will show how to use ThirdAI's pre-trained PII detection model on your dataset. This model was trained on a proprietaty synthetic dataset generated from GPT-4. It detects the following types of PII.

'PHONEIMEI', 'JOBAREA', 'FIRSTNAME', 'VEHICLEVIN', 'AGE', 'GENDER', 'HEIGHT', 'BUILDINGNUMBER', 'MASKEDNUMBER', 'PASSWORD', 'DOB', 'IPV6', 'NEARBYGPSCOORDINATE', 'USERAGENT', 'TIME', 'JOBTITLE', 'COUNTY', 'EMAIL', 'ACCOUNTNUMBER', 'PIN', 'EYECOLOR', 'LASTNAME', 'IPV4', 'DATE', 'STREET', 'CITY', 'PREFIX', 'MIDDLENAME', 'CREDITCARDISSUER', 'CREDITCARDNUMBER', 'STATE', 'VEHICLEVRM', 'ORDINALDIRECTION', 'SEX', 'JOBTYPE', 'CURRENCYCODE', 'CURRENCYSYMBOL', 'AMOUNT', 'ACCOUNTNAME', 'BITCOINADDRESS', 'LITECOINADDRESS', 'PHONENUMBER', 'MAC', 'CURRENCY', 'IBAN', 'COMPANYNAME', 'CURRENCYNAME', 'ZIPCODE', 'SSN', 'URL', 'IP', 'SECONDARYADDRESS', 'USERNAME', 'ETHEREUMADDRESS', 'CREDITCARDCVV', 'BIC'



If you want to train a BOLT NER model on your own dataset, please refer to the other notebook in this folder.

In [None]:
!pip3 install thirdai --upgrade

### Activate your ThirdAI License Key

You can apply for a trial license [here](https://www.thirdai.com/try-bolt/) .

In [None]:
import os
from thirdai import bolt, licensing

import os
if "THIRDAI_KEY" in os.environ:
    licensing.activate(os.environ["THIRDAI_KEY"])
else:
    licensing.activate("")  # Enter your ThirdAI key here

### Download the Model

In [None]:
import os

if not os.path.isdir("./models/"):
    os.system("mkdir ./models/")

if not os.path.exists("./models/pretrained_multilingual.model"):
    os.system("wget -nv -O ./models/pretrained_multilingual.model 'https://www.dropbox.com/scl/fi/xx8dnigcd2p5vh8n5kkby/model_unig.bolt?rlkey=tb1jpzzn4p2mj3mrqvrnw565m&st=ik609vvq&dl=0'")

### Load the Model

In [4]:
pii_model = bolt.NER.load("./models/pretrained_multilingual.model")

### Make predictions on your data

In [23]:
sample_sentence = "I'm Robert. I work at ThirdAI Corp. in Houston. I want to apply for a credit card. My email is robbie@gmail.com."

tokens = sample_sentence.split()

predicted_tags = pii_model.get_ner_tags([tokens], top_k=1)

for i in range(len(tokens)):
    if predicted_tags[0][i][0][0]!='O':
        print(tokens[i]+' : '+predicted_tags[0][i][0][0])




Robert. : FIRSTNAME
Houston. : CITY
robbie@gmail.com. : EMAIL


In [27]:
sample_sentence = "I'm Siddharth. I work at Google in Mountain View. I want to cancel my credit card with the number 4147202361663155."

tokens = sample_sentence.split()

predicted_tags = pii_model.get_ner_tags([tokens], top_k=1)

for i in range(len(tokens)):
    if predicted_tags[0][i][0][0]!='O':
        print(tokens[i]+' : '+predicted_tags[0][i][0][0])

Siddharth. : FIRSTNAME
Mountain : CITY
View. : CITY
4147202361663155. : CREDITCARDNUMBER
