#
#
Spacy Version 3.0.6 

**NER Notebook**

Created by: Tan Poh Keam, Republic Polytechnic

Acknowledgement: n/a

This note demonstrates how token patterns can be used in the Entity Ruler.
The tested environment is based on Pyton 3.8 and Spacy 3.


In [1]:
import spacy
print(spacy.__version__)
from spacy import displacy

3.0.6


**Preparing**

Let us load in the standard English module.
The pipeline components will include the statisical NER model

In [2]:
nlp = spacy.load("en_core_web_sm")
print("Standard Pipeline Components: \n", nlp.pipe_names)

Standard Pipeline Components: 
 ['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']


In [4]:
my_text = "Cloud computing is the delivery of computing services—including servers, \
storage, databases, networking, software, analytics, and intelligence—over the Internet  \
to offer faster innovation, flexible resources, and economies of scale. You typically pay only for \
cloud services you use, helping you lower your operating costs, run your infrastructure more efficiently, \
and scale as your business needs change. The current market leaders are Microsoft, Amazon and Google. \
The Google Docs is an example of an application services that is offered on the cloud."

doc=nlp(my_text)
for ent in doc.ents:
    print(ent.label_ , ent.text) 

    # There should only be ORG entities identified

ORG Microsoft
ORG Amazon
ORG Google


We want the model to be able to identify any token (tokens) that have the words cloud, cloud services, or cloud computing. Let's try to define some token patterns.


In [5]:
patterns = [
{"label": "IT PROD", "pattern": [ {"LOWER": "cloud" }  ]},
{"label": "IT PROD", "pattern": [ {"LOWER": "cloud" } , {"LOWER": "services" } ]}]

Add Ruler to the pipeline and add the pattern

In [6]:
ruler = nlp.add_pipe("entity_ruler" , config={"overwrite_ents": True})

In [7]:
ruler.add_patterns(patterns)

In [8]:
doc=nlp(my_text)
for ent in doc.ents:
    print(ent.label_ , ent.text)

IT PROD Cloud
IT PROD cloud services
ORG Microsoft
ORG Amazon
ORG Google
IT PROD cloud


We can visualise the NER using the dispacy utility.

In [9]:
# The display.render is a utility to help us visualise the NER better.
displacy.render(doc, style="ent")

**Exercise**

The example above is incomplete. The first two tokens (Cloud computing) should be treated as one entity.
Adjust the given pattern so that it is more accurate.


In [None]:
## your codes ##