#
#
Spacy Version 3.0.6 

**NER Notebook**

Created by: Tan Poh Keam, Republic Polytechnic

Acknowledgement: ...

This note demonstrates how token patterns can be used in the Entity Ruler.

The tested environment is based on Pyton 3.8 and Spacy 3.


In [3]:
import spacy
print(spacy.__version__)

3.0.6


**Preparing**

Let us load in the standard English module.
The pipeline components will include the statisical NER model

In [4]:
nlp = spacy.load("en_core_web_sm")
print("Standard Pipeline Components: \n", nlp.pipe_names)

Standard Pipeline Components: 
 ['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']


**Now, add the entity ruler into the pipeline**

 The entity ruler is included as the last component in the pipeline. At this point, no patterns are provided.

In [5]:
ruler = nlp.add_pipe("entity_ruler" , config={"overwrite_ents": True})
print("Custom Pipeline Components: \n", nlp.pipe_names)

Custom Pipeline Components: 
 ['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'entity_ruler']


Let's see how the statisical NER model against a text with local context. 
You should see that the standard NER has identified entities, but assigned the wrong tag.

In [6]:
text = "Poh Heng Jewellery Store is having a promotion this month. \
Tiffany is also having a sales at the same time. "

doc = nlp(text)
for ent in doc.ents:
    print(ent.label_ , ent.text) 

PERSON Poh Heng Jewellery Store
DATE this month
ORG Tiffany


We need to tell the ER to reassign the phrase "Poh Heng Jewellery Store" as an ORG instead. We can use a simple phrase matcher. By adding a simple pattern, the NER has correctly label the entity with local content.

In [7]:
patterns = [{"label": "ORG", "pattern": "Poh Heng Jewellery Store"}]
ruler.add_patterns(patterns)

doc = nlp(text)
for ent in doc.ents:
    print(ent.label_ , ent.text) 

In [9]:
another_text = "Lot One Shoppers' Mall will be closed for 2 weeks for disinfecting."
doc = nlp(another_text)
print([(ent.text, ent.label_) for ent in doc.ents])

[("Lot One Shoppers'", 'ORG'), ('2 weeks', 'DATE')]


**Exercise**

Complete the codes below to perform NER given text.
The NER should look for phrases such as 'Cloud computing', 'Cloud', 'cloud services' and assigned the label IT-PRODUCT

"Cloud computing is the delivery of computing services—including servers, storage, databases, networking, software, analytics, and intelligence—over the Internet (“the cloud”) to offer faster innovation, flexible resources, and economies of scale. You typically pay only for cloud services you use, helping you lower your operating costs, run your infrastructure more efficiently, and scale as your business needs change."

***Outline***

Step 1: Load in the English model

Step 2: Input the text into the pipeline to determine if the NER model provides can perform NER

Step 3: Add a ruler to the pipeline

Step 4: Define the patterns (using phrases)

Step 5: Add the patterns to the ruler

Step 6: Input the text into the piepline to determine if NER is correctly performed


In [10]:
my_nlp = spacy.load("en_core_web_sm")
my_text = "Cloud computing is the delivery of computing services—including servers, \
storage, databases, networking, software, analytics, and intelligence—over the \
Internet (“the cloud”) to offer faster innovation, flexible resources, and \
economies of scale. You typically pay only for cloud services you use, \
helping you lower your operating costs, run your infrastructure more efficiently, \
and scale as your business needs change."

mydoc=my_nlp(my_text)
for ent in mydoc.ents:
    print(ent.label_ , ent.text) 

In [11]:
my_ruler = my_nlp.add_pipe("entity_ruler" , config={"overwrite_ents": True})

In [12]:
my_patterns = [{"label": "IT_PRODUCT", "pattern": "Cloud computing"},
               { "label": "IT_PRODUCT", "pattern": "cloud services"} ] 

my_ruler.add_patterns(my_patterns)  

mydoc=my_nlp(my_text)
for ent in mydoc.ents:
    print(ent.label_ , ent.text)

IT_PRODUCT Cloud computing
IT_PRODUCT cloud services
