# --------------------------- spacy-stanza -----------------------------

Spacy-Stanza represents an advanced synergy of two prominent natural language processing (NLP) libraries, SpaCy and Stanza. SpaCy is known for its unparalleled speed, ease of use, and efficiency in natural language processing. In parallel, Stanza, formerly known as "StanfordNLP", is an NLP library developed by the renowned Stanford University, which provides sophisticated functions for the syntactic and semantic analysis of texts.

The convergent implementation of spaCy and Stanza allows developers and researchers to exploit the full potential of both libraries. SpaCy provides an easy-to-use API for text processing, while Stanza offers advanced models and techniques for tokenization, part-of-speech (PoS) tagging, lemmatization, and named entity recognition (NER).

A major advantage of SpaCy-Stanza is its language diversity, making it a highly versatile solution for NLP applications. The use of pre-trained models allows users to perform complex text analysis without having to train their own models from scratch. This makes it much easier to get started with NLP development and speeds up the development process.

In addition to the basic functionalities, SpaCy-Stanza also offers extension options and customization features that allow for flexible configuration according to individual requirements. As a result, the integration of Stanza into spaCy has created a comprehensive NLP library that appeals to both novice and experienced developers and covers a wide range of natural language processing applications.

In the evaluation of 5000 selected data sets, SpaCy-Stanza achieved mixed results in terms of recognizing personally identifiable information (PII). Using Named Entity Recognition (NER), 20 different PII categories were recognized. However, there were problems with the automatic assignment of the different categories of SpaCy-Stanza to those present in the data set. A manual assignment was made for the evaluation in order to be able to evaluate the quality of the results in comparison with other models. Accuracy was the decisive factor in selecting the right model for the PII Detector application. With a maximum accuracy of 0.41 in the NAME category, the model is significantly behind the performance of other models, which are described below.

To summarize, SpaCy-Stanza is a comprehensive and powerful NLP library that offers a harmonious blend of user-friendly interface and advanced linguistic analysis features. However, it is not suitable for the use case of the PII-Detector.

**Sources**
- https://github.com/explosion/spacy-stanza
- https://www.kaggle.com/code/curiousprogrammer/entity-extraction-and-classification-using-spacy
- https://stanfordnlp.github.io/stanza/
- https://spacy.io/universe/project/spacy-stanza
- https://spacy.io/
- https://github.com/explosion/spaCy

### 1. Setup

In [2]:
import stanza
import spacy_stanza
from spacy import displacy
from spacy import Language
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, fbeta_score

### 2. Download stanza model

In [3]:
# Download the stanza model if necessary
stanza.download("en")

# Initialize the pipeline
nlp = spacy_stanza.load_pipeline("en")

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.6.0.json:   0%|   …

2024-01-17 14:31:08 INFO: Downloading default packages for language: en (English) ...


2024-01-17 14:31:11 INFO: File exists: C:\Users\Franziska\stanza_resources\en\default.zip
2024-01-17 14:31:18 INFO: Finished downloading models and saved to C:\Users\Franziska\stanza_resources.
2024-01-17 14:31:18 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.6.0.json:   0%|   …

2024-01-17 14:31:21 INFO: Loading these models for language: en (English):
| Processor    | Package             |
--------------------------------------
| tokenize     | combined            |
| pos          | combined_charlm     |
| lemma        | combined_nocharlm   |
| constituency | ptb3-revised_charlm |
| depparse     | combined_charlm     |
| sentiment    | sstplus             |
| ner          | ontonotes_charlm    |

2024-01-17 14:31:21 INFO: Using device: cpu
2024-01-17 14:31:21 INFO: Loading: tokenize
2024-01-17 14:31:22 INFO: Loading: pos
2024-01-17 14:31:23 INFO: Loading: lemma
2024-01-17 14:31:23 INFO: Loading: constituency
2024-01-17 14:31:24 INFO: Loading: depparse
2024-01-17 14:31:24 INFO: Loading: sentiment
2024-01-17 14:31:25 INFO: Loading: ner
2024-01-17 14:31:26 INFO: Done loading processors!


### 3. Qickstart stanza

In [4]:
doc = nlp("Mr. Barack Obama was born in Hawaii. He was elected president in 2008. He is 1.8 meters.")

for token in doc:
    #print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_, token.tag_)
    print(token.text, token.pos_, token.ent_type_, token.tag_)
print(f"-------------------------------------------------------")
print(doc.ents)

Mr. PROPN  NNP
Barack PROPN PERSON NNP
Obama PROPN PERSON NNP
was AUX  VBD
born VERB  VBN
in ADP  IN
Hawaii PROPN GPE NNP
. PUNCT  .
He PRON  PRP
was AUX  VBD
elected VERB  VBN
president NOUN  NN
in ADP  IN
2008 NUM DATE CD
. PUNCT  .
He PRON  PRP
is AUX  VBZ
1.8 NUM QUANTITY CD
meters NOUN QUANTITY NNS
. PUNCT  .
-------------------------------------------------------
(Barack Obama, Hawaii, 2008, 1.8 meters)


In [5]:
# Access spaCy's lexical attributes
print([token.is_stop for token in doc])
print([token.like_num for token in doc])

# Visualize dependencies
displacy.render(doc) 

# Process texts with nlp.pipe
for doc in nlp.pipe(["Lots of texts", "Even more texts", "..."]):
    print(doc.text)

# Combine with your own custom pipeline components
@Language.component("custom_component")
def custom_component(doc):
    # Do something to the doc here
    print(f"Custom component called: {doc.text}")
    return doc

nlp.add_pipe("custom_component")
doc = nlp("Some text")

# Serialize attributes to a numpy array
np_array = doc.to_array(['ORTH', 'LEMMA', 'POS'])
np_array

[False, False, False, True, False, True, False, False, True, True, False, False, True, False, False, True, True, False, False, False]
[False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, True, False, False]


Lots of texts
Even more texts
...
Custom component called: Some text


array([[14298532990736973729,  7000492816108906599,                   90],
       [15099781594404091470, 15099781594404091470,                   92]],
      dtype=uint64)

### 4. Load data

In [8]:
df = pd.read_json("../../data/dataset_english.json")
df

Unnamed: 0,masked_text,unmasked_text,privacy_mask,span_labels,bio_labels,tokenised_text
0,A students assessment was found on device bear...,A students assessment was found on device bear...,"{'[PHONEIMEI_1]': '06-184755-866851-3', '[JOBA...","[[0, 57, O], [57, 75, PHONEIMEI_1], [75, 138, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, B-PHON...","[a, student, s, assessment, was, found, on, de..."
1,"Dear [FIRSTNAME_1], as per our records, your l...","Dear Omer, as per our records, your license 78...","{'[FIRSTNAME_1]': 'Omer', '[VEHICLEVIN_1]': '7...","[[0, 5, O], [5, 9, FIRSTNAME_1], [9, 44, O], [...","[O, B-FIRSTNAME, I-FIRSTNAME, O, O, O, O, O, O...","[dear, om, ##er, ,, as, per, our, records, ,, ..."
2,[FIRSTNAME_1] could you please share your reco...,Kattie could you please share your recomndatio...,"{'[FIRSTNAME_1]': 'Kattie', '[AGE_1]': '72', '...","[[0, 6, FIRSTNAME_1], [6, 75, O], [75, 77, AGE...","[B-FIRSTNAME, I-FIRSTNAME, O, O, O, O, O, O, O...","[kat, ##tie, could, you, please, share, your, ..."
3,Emergency supplies in [BUILDINGNUMBER_1] need ...,Emergency supplies in 16356 need a refill. Use...,"{'[BUILDINGNUMBER_1]': '16356', '[MASKEDNUMBER...","[[0, 22, O], [22, 27, BUILDINGNUMBER_1], [27, ...","[O, O, O, B-BUILDINGNUMBER, I-BUILDINGNUMBER, ...","[emergency, supplies, in, 1635, ##6, need, a, ..."
4,"The [AGE_1] old child at [BUILDINGNUMBER_1], h...","The 88 old child at 5862, has showcased an unu...","{'[AGE_1]': '88', '[BUILDINGNUMBER_1]': '5862'...","[[0, 4, O], [4, 6, AGE_1], [6, 20, O], [20, 24...","[O, B-AGE, O, O, O, B-BUILDINGNUMBER, I-BUILDI...","[the, 88, old, child, at, 58, ##6, ##2, ,, has..."
...,...,...,...,...,...,...
43496,"Hello [FIRSTNAME_1], your cognitive therapy ap...","Hello Nellie, your cognitive therapy appointme...","{'[FIRSTNAME_1]': 'Nellie', '[DATE_1]': '8/21'...","[[0, 6, O], [6, 12, FIRSTNAME_1], [12, 66, O],...","[O, B-FIRSTNAME, O, O, O, O, O, O, O, O, B-DAT...","[hello, nellie, ,, your, cognitive, therapy, a..."
43497,"Dear [FIRSTNAME_1], we appreciate your active ...","Dear Jalon, we appreciate your active involvem...","{'[FIRSTNAME_1]': 'Jalon', '[CREDITCARDNUMBER_...","[[0, 5, O], [5, 10, FIRSTNAME_1], [10, 159, O]...","[O, B-FIRSTNAME, I-FIRSTNAME, O, O, O, O, O, O...","[dear, ja, ##lon, ,, we, appreciate, your, act..."
43498,"Dear [SEX_1] at [ZIPCODE_1], we are raising fu...","Dear Female at 32363-2779, we are raising fund...","{'[SEX_1]': 'Female', '[ZIPCODE_1]': '32363-27...","[[0, 5, O], [5, 11, SEX_1], [11, 15, O], [15, ...","[O, B-SEX, O, B-ZIPCODE, I-ZIPCODE, I-ZIPCODE,...","[dear, female, at, 323, ##6, ##3, -, 277, ##9,..."
43499,"Hello [FIRSTNAME_1], we encourage you to pay t...","Hello Tito, we encourage you to pay the fees o...","{'[FIRSTNAME_1]': 'Tito', '[ETHEREUMADDRESS_1]...","[[0, 6, O], [6, 10, FIRSTNAME_1], [10, 137, O]...","[O, B-FIRSTNAME, O, O, O, O, O, O, O, O, O, O,...","[hello, tito, ,, we, encourage, you, to, pay, ..."


In [9]:
pd.set_option('display.max_colwidth', None)

In [10]:
df_small = df.head(5000)


### 5. Checking stanza by using datset

In [11]:
# function to check if specific tags are present in a row.
def check_data_tags(row, tags):
    return any(tag in str(row) for tag in tags)

# flag will be set to True if the respective tags are found in the 'span_labels' column
df_small['PERSON_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['FIRSTNAME_', 'LASTNAME_', 'MIDDLENAME_', 'CREDITCARDISSUER_']))
df_small['NORP_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['STREET_', 'STATE_', 'CITY_', 'COUNTRY_']))
df_small['FAC_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['STREET_', 'SECONDARYADRESS_', 'BUILDINGNUMBER_']))
df_small['ORG_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['IBAN_', 'BIC_']))
df_small['GPE_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['CITY_', 'STATE_', 'JOBAREA_', 'COUNTY_', 'ZIPCODE_', 'NEARBYGPSCOORDINATE_']))
df_small['LOC_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, []))
df_small['PRODUCT_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['ACCOUNTNAME_', 'VEHICLEVRM_', 'PASSWORD_']))
df_small['EVENT_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, []))
df_small['WORK_OF_ART_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, []))
df_small['LAW_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, []))
df_small['LANGUAGE_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, []))
df_small['DATE_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['DATE_', 'AGE_', 'DOB_']))
df_small['TIME_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['TIME_']))
df_small['PERCENT_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, []))
df_small['MONEY_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['CURRENCYSYMBOL_', 'AMOUNT_', 'CURRENCY_', 'CURRENCYNAME_', 'CURRENCYCODE_']))
df_small['QUANTITY_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['HEIGHT_']))
df_small['ORDINAL_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, []))
df_small['CARDINAL_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['PHONEIMEI_', 'ACCOUNTNUMBER_', 'CREDITCARDNUMBER_', 'CREDITCARDCVV_', 'PHONENUMBER_', 'IP_', 'PIN_', 'IP4_', 'IP6_', 'MAC_', 'VEHICLEVIN_', 'SSN_' ]))




df_small[['span_labels', 'PERSON_flag', 'NORP_flag', 'FAC_flag', 'ORG_flag', 'GPE_flag', 'LOC_flag', 'PRODUCT_flag','EVENT_flag','WORK_OF_ART_flag', 'LAW_flag', 'LANGUAGE_flag', 'DATE_flag', 'TIME_flag', 'PERCENT_flag', 'MONEY_flag', 'QUANTITY_flag', 'ORDINAL_flag', 'CARDINAL_flag']]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_small['PERSON_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['FIRSTNAME_', 'LASTNAME_', 'MIDDLENAME_', 'CREDITCARDISSUER_']))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_small['NORP_flag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, ['STREET_', 'STATE_', 'CITY_', 'COUNTRY_']))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata

Unnamed: 0,span_labels,PERSON_flag,NORP_flag,FAC_flag,ORG_flag,GPE_flag,LOC_flag,PRODUCT_flag,EVENT_flag,WORK_OF_ART_flag,LAW_flag,LANGUAGE_flag,DATE_flag,TIME_flag,PERCENT_flag,MONEY_flag,QUANTITY_flag,ORDINAL_flag,CARDINAL_flag
0,"[[0, 57, O], [57, 75, PHONEIMEI_1], [75, 138, O], [138, 150, JOBAREA_1], [150, 189, O]]",False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True
1,"[[0, 5, O], [5, 9, FIRSTNAME_1], [9, 44, O], [44, 61, VEHICLEVIN_1], [61, 170, O]]",True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
2,"[[0, 6, FIRSTNAME_1], [6, 75, O], [75, 77, AGE_1], [77, 82, O], [82, 97, GENDER_1], [97, 103, O], [103, 117, HEIGHT_1], [117, 118, O]]",True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False
3,"[[0, 22, O], [22, 27, BUILDINGNUMBER_1], [27, 47, O], [47, 63, MASKEDNUMBER_1], [63, 80, O]]",False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,"[[0, 4, O], [4, 6, AGE_1], [6, 20, O], [20, 24, BUILDINGNUMBER_1], [24, 98, O], [98, 110, PASSWORD_1], [110, 131, O]]",False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,"[[0, 18, O], [18, 35, GENDER_1], [35, 44, O], [44, 52, LASTNAME_1], [52, 68, O], [68, 78, DATE_1], [78, 117, O], [117, 122, EYECOLOR_1], [122, 136, O], [136, 152, PHONENUMBER_1], [152, 154, O], [154, 161, LASTNAME_2], [161, 188, O], [188, 207, ACCOUNTNAME_1], [207, 227, O], [227, 228, CURRENCYSYMBOL_1], [228, 238, AMOUNT_1]]",True,False,False,False,False,False,True,False,False,False,False,True,False,False,True,False,False,True
4996,"[[0, 5, O], [5, 11, FIRSTNAME_1], [11, 77, O], [77, 106, EMAIL_1], [106, 141, O], [141, 145, PIN_1], [145, 150, O], [150, 162, PASSWORD_1], [162, 173, O]]",True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True
4997,"[[0, 120, O], [120, 135, IPV4_1], [135, 170, O], [170, 196, COMPANYNAME_1], [196, 211, O]]",False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4998,"[[0, 33, O], [33, 45, COUNTY_1], [45, 72, O], [72, 80, SECONDARYADDRESS_1], [80, 158, O]]",False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False


In [12]:
def tag_text(doc, nlp):
    sentence = nlp(doc)
    return [(token.text, token.ent_type_) for token in sentence]

In [13]:
df_small['tagged_entities'] = df_small['unmasked_text'].apply(lambda doc: tag_text(doc, nlp))
df_small

Custom component called: A students assessment was found on device bearing IMEI: 06-184755-866851-3. The document falls under the various topics discussed in our Optimization curriculum. Can you please collect it?
Custom component called: Dear Omer, as per our records, your license 78B5R2MVFAHJ48500 is still registered in our records for access to the educational tools. Please feedback on its operability.
Custom component called: Kattie could you please share your recomndations about vegetarian diet for 72 old Intersex person with 158centimeters?
Custom component called: Emergency supplies in 16356 need a refill. Use 5890724654311332 to pay for them.
Custom component called: The 88 old child at 5862, has showcased an unusual ability to remember and recite passwords, with Y2rWliOhf8Ir being most repeated.
Custom component called: Your recent hospital data recorded on 29/12/1957 regarding chronic disease management has been encrypted with IPv6 edaf:fd8f:e1e8:cfec:8bab:1afd:6aad:550c for 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_small['tagged_entities'] = df_small['unmasked_text'].apply(lambda doc: tag_text(doc, nlp))


Unnamed: 0,masked_text,unmasked_text,privacy_mask,span_labels,bio_labels,tokenised_text,PERSON_flag,NORP_flag,FAC_flag,ORG_flag,...,LAW_flag,LANGUAGE_flag,DATE_flag,TIME_flag,PERCENT_flag,MONEY_flag,QUANTITY_flag,ORDINAL_flag,CARDINAL_flag,tagged_entities
0,A students assessment was found on device bearing IMEI: [PHONEIMEI_1]. The document falls under the various topics discussed in our [JOBAREA_1] curriculum. Can you please collect it?,A students assessment was found on device bearing IMEI: 06-184755-866851-3. The document falls under the various topics discussed in our Optimization curriculum. Can you please collect it?,"{'[PHONEIMEI_1]': '06-184755-866851-3', '[JOBAREA_1]': 'Optimization'}","[[0, 57, O], [57, 75, PHONEIMEI_1], [75, 138, O], [138, 150, JOBAREA_1], [150, 189, O]]","[O, O, O, O, O, O, O, O, O, O, O, O, O, B-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, I-PHONEIMEI, O, O, O, O, O, O, O, O, O, O, O, B-JOBAREA, O, O, O, O, O, O, O, O]","[a, student, s, assessment, was, found, on, device, bearing, im, ##ei, :, 06, -, 1847, ##55, -, 86, ##6, ##85, ##1, -, 3, ., the, document, falls, under, the, various, topics, discussed, in, our, optimization, curriculum, ., can, you, please, collect, it, ?]",False,False,False,False,...,False,False,False,False,False,False,False,False,True,"[(A, ), (students, ), (assessment, ), (was, ), (found, ), (on, ), (device, ), (bearing, ), (IMEI, ), (:, ), (06-184755-866851-3, ), (., ), (The, ), (document, ), (falls, ), (under, ), (the, ), (various, ), (topics, ), (discussed, ), (in, ), (our, ), (Optimization, ), (curriculum, ), (., ), (Can, ), (you, ), (please, ), (collect, ), (it, ), (?, )]"
1,"Dear [FIRSTNAME_1], as per our records, your license [VEHICLEVIN_1] is still registered in our records for access to the educational tools. Please feedback on its operability.","Dear Omer, as per our records, your license 78B5R2MVFAHJ48500 is still registered in our records for access to the educational tools. Please feedback on its operability.","{'[FIRSTNAME_1]': 'Omer', '[VEHICLEVIN_1]': '78B5R2MVFAHJ48500'}","[[0, 5, O], [5, 9, FIRSTNAME_1], [9, 44, O], [44, 61, VEHICLEVIN_1], [61, 170, O]]","[O, B-FIRSTNAME, I-FIRSTNAME, O, O, O, O, O, O, O, O, B-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, I-VEHICLEVIN, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]","[dear, om, ##er, ,, as, per, our, records, ,, your, license, 78, ##b, ##5, ##r, ##2, ##m, ##v, ##fa, ##h, ##j, ##48, ##500, is, still, registered, in, our, records, for, access, to, the, educational, tools, ., please, feedback, on, it, s, opera, ##bility, .]",True,False,False,False,...,False,False,False,False,False,False,False,False,True,"[(Dear, ), (Omer, PERSON), (,, ), (as, ), (per, ), (our, ), (records, ), (,, ), (your, ), (license, ), (78B5R2, ), (MVFAHJ48500, ), (is, ), (still, ), (registered, ), (in, ), (our, ), (records, ), (for, ), (access, ), (to, ), (the, ), (educational, ), (tools, ), (., ), (Please, ), (feedback, ), (on, ), (its, ), (operability, ), (., )]"
2,[FIRSTNAME_1] could you please share your recomndations about vegetarian diet for [AGE_1] old [GENDER_1] with [HEIGHT_1]?,Kattie could you please share your recomndations about vegetarian diet for 72 old Intersex person with 158centimeters?,"{'[FIRSTNAME_1]': 'Kattie', '[AGE_1]': '72', '[GENDER_1]': 'Intersex person', '[HEIGHT_1]': '158centimeters'}","[[0, 6, FIRSTNAME_1], [6, 75, O], [75, 77, AGE_1], [77, 82, O], [82, 97, GENDER_1], [97, 103, O], [103, 117, HEIGHT_1], [117, 118, O]]","[B-FIRSTNAME, I-FIRSTNAME, O, O, O, O, O, O, O, O, O, O, O, O, O, B-AGE, O, B-GENDER, I-GENDER, I-GENDER, I-GENDER, O, B-HEIGHT, I-HEIGHT, I-HEIGHT, O]","[kat, ##tie, could, you, please, share, your, rec, ##om, ##nda, ##tions, about, vegetarian, diet, for, 72, old, inter, ##se, ##x, person, with, 158, ##cent, ##imeters, ?]",True,False,False,False,...,False,False,True,False,False,False,True,False,False,"[(Kattie, PERSON), (could, ), (you, ), (please, ), (share, ), (your, ), (recomndations, ), (about, ), (vegetarian, ), (diet, ), (for, ), (72, DATE), (old, DATE), (Intersex, ), (person, ), (with, ), (158, QUANTITY), (centimeters, QUANTITY), (?, )]"
3,Emergency supplies in [BUILDINGNUMBER_1] need a refill. Use [MASKEDNUMBER_1] to pay for them.,Emergency supplies in 16356 need a refill. Use 5890724654311332 to pay for them.,"{'[BUILDINGNUMBER_1]': '16356', '[MASKEDNUMBER_1]': '5890724654311332'}","[[0, 22, O], [22, 27, BUILDINGNUMBER_1], [27, 47, O], [47, 63, MASKEDNUMBER_1], [63, 80, O]]","[O, O, O, B-BUILDINGNUMBER, I-BUILDINGNUMBER, O, O, O, O, O, O, B-MASKEDNUMBER, I-MASKEDNUMBER, I-MASKEDNUMBER, I-MASKEDNUMBER, I-MASKEDNUMBER, I-MASKEDNUMBER, I-MASKEDNUMBER, I-MASKEDNUMBER, I-MASKEDNUMBER, O, O, O, O, O]","[emergency, supplies, in, 1635, ##6, need, a, ref, ##ill, ., use, 58, ##90, ##7, ##24, ##65, ##43, ##11, ##33, ##2, to, pay, for, them, .]",False,False,True,False,...,False,False,False,False,False,False,False,False,False,"[(Emergency, ), (supplies, ), (in, ), (16356, DATE), (need, ), (a, ), (refill, ), (., ), (Use, ), (5890724654311332, ), (to, ), (pay, ), (for, ), (them, ), (., )]"
4,"The [AGE_1] old child at [BUILDINGNUMBER_1], has showcased an unusual ability to remember and recite passwords, with [PASSWORD_1] being most repeated.","The 88 old child at 5862, has showcased an unusual ability to remember and recite passwords, with Y2rWliOhf8Ir being most repeated.","{'[AGE_1]': '88', '[BUILDINGNUMBER_1]': '5862', '[PASSWORD_1]': 'Y2rWliOhf8Ir'}","[[0, 4, O], [4, 6, AGE_1], [6, 20, O], [20, 24, BUILDINGNUMBER_1], [24, 98, O], [98, 110, PASSWORD_1], [110, 131, O]]","[O, B-AGE, O, O, O, B-BUILDINGNUMBER, I-BUILDINGNUMBER, I-BUILDINGNUMBER, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, B-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, O, O, O, O]","[the, 88, old, child, at, 58, ##6, ##2, ,, has, showcased, an, unusual, ability, to, remember, and, rec, ##ite, password, ##s, ,, with, y, ##2, ##r, ##wl, ##io, ##h, ##f, ##8, ##ir, being, most, repeated, .]",False,False,True,False,...,False,False,True,False,False,False,False,False,False,"[(The, ), (88, CARDINAL), (old, ), (child, ), (at, ), (5862, DATE), (,, ), (has, ), (showcased, ), (an, ), (unusual, ), (ability, ), (to, ), (remember, ), (and, ), (recite, ), (passwords, ), (,, ), (with, ), (Y2rWliOhf8, PERSON), (Ir, PERSON), (being, ), (most, ), (repeated, ), (., )]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,"We have scheduled [GENDER_1] patient [LASTNAME_1]s operation on [DATE_1]. To discuss medical issues related to [EYECOLOR_1], please call [PHONENUMBER_1]. [LASTNAME_2] is required to settle her [ACCOUNTNAME_1] account balance of [CURRENCYSYMBOL_1][AMOUNT_1]","We have scheduled Two-spirit person patient Mitchells operation on 19/01/1905. To discuss medical issues related to Brown, please call +76.984.787.5187. Trantow is required to settle her Credit Card Account account balance of R741,612.65","{'[GENDER_1]': 'Two-spirit person', '[LASTNAME_1]': 'Mitchell', '[DATE_1]': '19/01/1905', '[EYECOLOR_1]': 'Brown', '[PHONENUMBER_1]': '+76.984.787.5187', '[LASTNAME_2]': 'Trantow', '[ACCOUNTNAME_1]': 'Credit Card Account', '[CURRENCYSYMBOL_1]': 'R', '[AMOUNT_1]': '741,612.65'}","[[0, 18, O], [18, 35, GENDER_1], [35, 44, O], [44, 52, LASTNAME_1], [52, 68, O], [68, 78, DATE_1], [78, 117, O], [117, 122, EYECOLOR_1], [122, 136, O], [136, 152, PHONENUMBER_1], [152, 154, O], [154, 161, LASTNAME_2], [161, 188, O], [188, 207, ACCOUNTNAME_1], [207, 227, O], [227, 228, CURRENCYSYMBOL_1], [228, 238, AMOUNT_1]]","[O, O, O, B-GENDER, I-GENDER, I-GENDER, I-GENDER, O, B-LASTNAME, O, O, O, O, B-DATE, I-DATE, I-DATE, I-DATE, I-DATE, O, O, O, O, O, O, O, B-EYECOLOR, O, O, O, B-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, I-PHONENUMBER, O, B-LASTNAME, I-LASTNAME, O, O, O, O, O, B-ACCOUNTNAME, I-ACCOUNTNAME, I-ACCOUNTNAME, O, O, O, B-CURRENCYSYMBOL, O, O, I-AMOUNT, I-AMOUNT, I-AMOUNT, I-AMOUNT, I-AMOUNT]","[we, have, scheduled, two, -, spirit, person, patient, mitchell, s, operation, on, 19, /, 01, /, 1905, ., to, discuss, medical, issues, related, to, brown, ,, please, call, +, 76, ., 98, ##4, ., 78, ##7, ., 51, ##8, ##7, ., tran, ##tow, is, required, to, settle, her, credit, card, account, account, balance, of, r, ##7, ##41, ,, 61, ##2, ., 65]",True,False,False,False,...,False,False,True,False,False,True,False,False,True,"[(We, ), (have, ), (scheduled, ), (Two, CARDINAL), (-, ), (spirit, ), (person, ), (patient, ), (Mitchells, PERSON), (operation, ), (on, ), (19/01/1905, CARDINAL), (., ), (To, ), (discuss, ), (medical, ), (issues, ), (related, ), (to, ), (Brown, PERSON), (,, ), (please, ), (call, ), (+76.984.787.5187, CARDINAL), (., ), (Trantow, ORG), (is, ), (required, ), (to, ), (settle, ), (her, ), (Credit, ORG), (Card, ORG), (Account, ORG), (account, ), (balance, ), (of, ), (R741,612.65, CARDINAL)]"
4996,"Dear [FIRSTNAME_1], to finalize your registration process, please verify your email [EMAIL_1]. Dont disclose your verification [PIN_1] and [PASSWORD_1] to anyone.","Dear Kurtis, to finalize your registration process, please verify your email Katlynn_Oberbrunner@yahoo.com. Dont disclose your verification 4166 and CMIzDIzFDN4z to anyone.","{'[FIRSTNAME_1]': 'Kurtis', '[EMAIL_1]': 'Katlynn_Oberbrunner@yahoo.com', '[PIN_1]': '4166', '[PASSWORD_1]': 'CMIzDIzFDN4z'}","[[0, 5, O], [5, 11, FIRSTNAME_1], [11, 77, O], [77, 106, EMAIL_1], [106, 141, O], [141, 145, PIN_1], [145, 150, O], [150, 162, PASSWORD_1], [162, 173, O]]","[O, B-FIRSTNAME, I-FIRSTNAME, O, O, O, O, O, O, O, O, O, O, O, O, B-EMAIL, I-EMAIL, I-EMAIL, I-EMAIL, I-EMAIL, I-EMAIL, I-EMAIL, I-EMAIL, I-EMAIL, I-EMAIL, O, O, O, O, O, O, O, B-PIN, I-PIN, O, B-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, I-PASSWORD, O, O, O]","[dear, kurt, ##is, ,, to, final, ##ize, your, registration, process, ,, please, verify, your, email, kat, ##lynn, _, obe, ##rb, ##runner, @, yahoo, ., com, ., don, t, disclose, your, verification, 41, ##66, and, cm, ##iz, ##di, ##z, ##f, ##d, ##n, ##4, ##z, to, anyone, .]",True,False,False,False,...,False,False,False,False,False,False,False,False,True,"[(Dear, ), (Kurtis, PERSON), (,, ), (to, ), (finalize, ), (your, ), (registration, ), (process, ), (,, ), (please, ), (verify, ), (your, ), (email, ), (Katlynn_Oberbrunner@yahoo.com, ), (., ), (Dont, ), (disclose, ), (your, ), (verification, ), (4166, ), (and, ), (CMIzDIzFDN4, ), (z, ), (to, ), (anyone, ), (., )]"
4997,"To ensure an efficient work-from-home setup and reduced energy consumption, employees are encouraged to disconnect from [IPV4_1] outside of office hours. Regards, [COMPANYNAME_1] IT department.","To ensure an efficient work-from-home setup and reduced energy consumption, employees are encouraged to disconnect from 250.116.137.156 outside of office hours. Regards, Wilkinson, Terry and Johns IT department.","{'[IPV4_1]': '250.116.137.156', '[COMPANYNAME_1]': 'Wilkinson, Terry and Johns'}","[[0, 120, O], [120, 135, IPV4_1], [135, 170, O], [170, 196, COMPANYNAME_1], [196, 211, O]]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, B-IPV4, I-IPV4, I-IPV4, I-IPV4, I-IPV4, I-IPV4, I-IPV4, O, O, O, O, O, O, O, B-COMPANYNAME, I-COMPANYNAME, I-COMPANYNAME, I-COMPANYNAME, I-COMPANYNAME, O, O, O]","[to, ensure, an, efficient, work, -, from, -, home, setup, and, reduced, energy, consumption, ,, employees, are, encouraged, to, disco, ##nne, ##ct, from, 250, ., 116, ., 137, ., 156, outside, of, office, hours, ., regards, ,, wilkinson, ,, terry, and, johns, it, department, .]",False,False,False,False,...,False,False,False,False,False,False,False,False,False,"[(To, ), (ensure, ), (an, ), (efficient, ), (work, ), (-, ), (from, ), (-, ), (home, ), (setup, ), (and, ), (reduced, ), (energy, ), (consumption, ), (,, ), (employees, ), (are, ), (encouraged, ), (to, ), (disconnect, ), (from, ), (250.116.137.156, TIME), (outside, TIME), (of, TIME), (office, TIME), (hours, TIME), (., ), (Regards, ), (,, ), (Wilkinson, ORG), (,, ), (Terry, PERSON), (and, ), (Johns, ORG), (IT, ORG), (department, ), (., )]"
4998,Ive arranged for a meeting with [COUNTY_1] health officials in Suite [SECONDARYADDRESS_1]. Lets prepare a presentation showcasing our infrastructure and patient care.,Ive arranged for a meeting with Grant County health officials in Suite Apt. 119. Lets prepare a presentation showcasing our infrastructure and patient care.,"{'[COUNTY_1]': 'Grant County', '[SECONDARYADDRESS_1]': 'Apt. 119'}","[[0, 33, O], [33, 45, COUNTY_1], [45, 72, O], [72, 80, SECONDARYADDRESS_1], [80, 158, O]]","[O, O, O, O, O, O, O, O, B-COUNTY, I-COUNTY, O, O, O, O, B-SECONDARYADDRESS, I-SECONDARYADDRESS, I-SECONDARYADDRESS, O, O, O, O, O, O, O, O, O, O, O, O, O, O]","[i, ve, arranged, for, a, meeting, with, grant, county, health, officials, in, suite, apt, ., 119, ., let, s, prepare, a, presentation, showcasing, our, infrastructure, and, patient, care, .]",False,False,False,False,...,False,False,False,False,False,False,False,False,False,"[(Ive, PERSON), (arranged, ), (for, ), (a, ), (meeting, ), (with, ), (Grant, GPE), (County, GPE), (health, ), (officials, ), (in, ), (Suite, ORG), (Apt., ORG), (119, CARDINAL), (., ), (Lets, ), (prepare, ), (a, ), (presentation, ), (showcasing, ), (our, ), (infrastructure, ), (and, ), (patient, ), (care, ), (., )]"


In [14]:
df_small['PERSON_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'PERSON'))
df_small['NORP_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'NORP'))
df_small['FAC_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'FAC'))
df_small['ORG_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'ORG'))
df_small['GPE_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'GPE'))
df_small['LOC_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'LOC'))
df_small['PRODUCT_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'PRODUCT'))
df_small['EVENT_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'EVENT'))
df_small['WORK_OF_ART_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'WORK_OF_ART'))
df_small['LAW_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'LAW'))
df_small['LANGUAGE_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'LANGUAGE'))
df_small['DATE_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'DATE'))
df_small['TIME_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'TIME'))
df_small['PERCENT_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'PERCENT'))
df_small['MONEY_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'MONEY'))
df_small['QUANTITY_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'QUANTITY'))
df_small['ORDINAL_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'ORDINAL'))
df_small['CARDINAL_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'CARDINAL'))

df_small[['span_labels', 'PERSON_flag', 'NORP_flag', 'FAC_flag', 'ORG_flag', 'GPE_flag', 'LOC_flag',\
           'PRODUCT_flag','EVENT_flag','WORK_OF_ART_flag', 'LAW_flag', 'LANGUAGE_flag', 'DATE_flag', 'TIME_flag', \
            'PERCENT_flag', 'MONEY_flag', 'QUANTITY_flag', 'ORDINAL_flag', 'CARDINAL_flag', #'tagged_entities',
            'PERSON_tag', 'NORP_tag', 'FAC_tag', 'ORG_tag', 'GPE_tag', 'LOC_tag', 'PRODUCT_tag','EVENT_tag',
            'WORK_OF_ART_tag', 'LAW_tag', 'LANGUAGE_tag', 'DATE_tag', 'TIME_tag', 'PERCENT_tag', 'MONEY_tag', 
            'QUANTITY_tag', 'ORDINAL_tag', 'CARDINAL_tag']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_small['PERSON_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'PERSON'))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_small['NORP_tag'] = df_small['span_labels'].apply(lambda row: check_data_tags(row, 'NORP'))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_small['F

Unnamed: 0,span_labels,PERSON_flag,NORP_flag,FAC_flag,ORG_flag,GPE_flag,LOC_flag,PRODUCT_flag,EVENT_flag,WORK_OF_ART_flag,...,WORK_OF_ART_tag,LAW_tag,LANGUAGE_tag,DATE_tag,TIME_tag,PERCENT_tag,MONEY_tag,QUANTITY_tag,ORDINAL_tag,CARDINAL_tag
0,"[[0, 57, O], [57, 75, PHONEIMEI_1], [75, 138, O], [138, 150, JOBAREA_1], [150, 189, O]]",False,False,False,False,True,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
1,"[[0, 5, O], [5, 9, FIRSTNAME_1], [9, 44, O], [44, 61, VEHICLEVIN_1], [61, 170, O]]",True,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
2,"[[0, 6, FIRSTNAME_1], [6, 75, O], [75, 77, AGE_1], [77, 82, O], [82, 97, GENDER_1], [97, 103, O], [103, 117, HEIGHT_1], [117, 118, O]]",True,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
3,"[[0, 22, O], [22, 27, BUILDINGNUMBER_1], [27, 47, O], [47, 63, MASKEDNUMBER_1], [63, 80, O]]",False,False,True,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
4,"[[0, 4, O], [4, 6, AGE_1], [6, 20, O], [20, 24, BUILDINGNUMBER_1], [24, 98, O], [98, 110, PASSWORD_1], [110, 131, O]]",False,False,True,False,False,False,True,False,False,...,True,True,True,True,True,True,True,True,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,"[[0, 18, O], [18, 35, GENDER_1], [35, 44, O], [44, 52, LASTNAME_1], [52, 68, O], [68, 78, DATE_1], [78, 117, O], [117, 122, EYECOLOR_1], [122, 136, O], [136, 152, PHONENUMBER_1], [152, 154, O], [154, 161, LASTNAME_2], [161, 188, O], [188, 207, ACCOUNTNAME_1], [207, 227, O], [227, 228, CURRENCYSYMBOL_1], [228, 238, AMOUNT_1]]",True,False,False,False,False,False,True,False,False,...,True,True,True,True,True,True,True,True,True,True
4996,"[[0, 5, O], [5, 11, FIRSTNAME_1], [11, 77, O], [77, 106, EMAIL_1], [106, 141, O], [141, 145, PIN_1], [145, 150, O], [150, 162, PASSWORD_1], [162, 173, O]]",True,False,False,False,False,False,True,False,False,...,True,True,True,True,True,True,True,True,True,True
4997,"[[0, 120, O], [120, 135, IPV4_1], [135, 170, O], [170, 196, COMPANYNAME_1], [196, 211, O]]",False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True
4998,"[[0, 33, O], [33, 45, COUNTY_1], [45, 72, O], [72, 80, SECONDARYADDRESS_1], [80, 158, O]]",False,False,False,False,True,False,False,False,False,...,True,True,True,True,True,True,True,True,True,True


### 6. Evaluation

In [15]:
def calculate_and_print_metrics(df, entity_type):
    y_true = df[entity_type + '_flag']
    y_pred = df[entity_type + '_tag']

    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, zero_division=0)
    recall = recall_score(y_true, y_pred, zero_division=0)
    f1 = f1_score(y_true, y_pred, zero_division=0)
    f2 = (5 * precision * recall) / ((4 * precision) + recall) if (precision + recall) != 0 else 0


    print(f"Metrics for {entity_type}:")
    print(f"  Accuracy:  {accuracy:.2f}")
    print(f"  Precision: {precision:.2f}")
    print(f"  Recall:    {recall:.2f}")
    print(f"  F1 Score:  {f1:.2f}")
    print(f"  F2 Score:  {f2:.2f}")
    print("------------------------------------------------")

calculate_and_print_metrics(df_small, 'PERSON')
calculate_and_print_metrics(df_small, 'NORP')
calculate_and_print_metrics(df_small, 'FAC')
calculate_and_print_metrics(df_small, 'ORG')
calculate_and_print_metrics(df_small, 'GPE')
calculate_and_print_metrics(df_small, 'LOC')
calculate_and_print_metrics(df_small, 'PRODUCT')
calculate_and_print_metrics(df_small, 'EVENT')
calculate_and_print_metrics(df_small, 'WORK_OF_ART')
calculate_and_print_metrics(df_small, 'LAW')
calculate_and_print_metrics(df_small, 'LANGUAGE')
calculate_and_print_metrics(df_small, 'DATE')
calculate_and_print_metrics(df_small, 'TIME')
calculate_and_print_metrics(df_small, 'PERCENT')
calculate_and_print_metrics(df_small, 'MONEY')
calculate_and_print_metrics(df_small, 'QUANTITY')
calculate_and_print_metrics(df_small, 'ORDINAL')
calculate_and_print_metrics(df_small, 'CARDINAL')

Metrics for PERSON:
  Accuracy:  0.43
  Precision: 0.43
  Recall:    1.00
  F1 Score:  0.60
  F2 Score:  0.79
------------------------------------------------
Metrics for NORP:
  Accuracy:  0.16
  Precision: 0.16
  Recall:    1.00
  F1 Score:  0.27
  F2 Score:  0.48
------------------------------------------------
Metrics for FAC:
  Accuracy:  0.11
  Precision: 0.08
  Recall:    0.87
  F1 Score:  0.14
  F2 Score:  0.29
------------------------------------------------
Metrics for ORG:
  Accuracy:  0.05
  Precision: 0.05
  Recall:    1.00
  F1 Score:  0.10
  F2 Score:  0.21
------------------------------------------------
Metrics for GPE:
  Accuracy:  0.30
  Precision: 0.29
  Recall:    0.99
  F1 Score:  0.45
  F2 Score:  0.67
------------------------------------------------
Metrics for LOC:
  Accuracy:  0.00
  Precision: 0.00
  Recall:    0.00
  F1 Score:  0.00
  F2 Score:  0.00
------------------------------------------------
Metrics for PRODUCT:
  Accuracy:  0.15
  Precision: 0.15
  R