<h1 align='center'>NER: Reconhecimento de Entidade Nomeada</h1>

Data Scientist.: Dr.Eddy Giusepe Chirinos Isidro

Link de estudo:

* [Exemplo simples do uso de NER](https://www.youtube.com/watch?v=2XUhKpH0p4M&list=PLeo1K3hjS3uuvuAXhYjV2lMEShq2UYSwX&index=12)

# <font color="red">Exemplo NER 1:</font>

In [40]:
import spacy

In [41]:
# Modelo pré-treinado:
nlp = spacy.load("en_core_web_sm")

In [42]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [43]:
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion.")

In [44]:
doc.ents

(Tesla Inc, Twitter Inc, $45 billion)

In [45]:
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))


Tesla Inc | ORG | Companies, agencies, institutions, etc.
Twitter Inc | ORG | Companies, agencies, institutions, etc.
$45 billion | MONEY | Monetary values, including unit


In [46]:
from spacy import displacy

displacy.render(doc, style="ent", jupyter=True)

O `NER` do spaCy não é perfeito, por exemplo se temos:

```
Tesla Inc is going to acquire twitter for $45 billion.
```
só irá a reconhecer `Tesla Inc` e `$45 billion`.

In [47]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

`span`: é uma classe do `spaCy`. Vejamos a seguir alguns exemplos.

In [75]:
doc = nlp("Tesla Inc is going to acquire Twitter for $45 billion.")

In [76]:
doc

Tesla Inc is going to acquire Twitter for $45 billion.

In [77]:
doc[0]

Tesla

In [78]:
doc[2:5]    # type(doc[2:5]) --> spacy.tokens.span.Span

is going to

In [79]:
from spacy.tokens import Span  

s1 = Span(doc, 0, 0, label="ORG")
s2 = Span(doc, 6, 7, label="ORG")

doc.set_ents([s1, s2], default="unmodified")

In [82]:
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|")


Tesla Inc | ORG |
Twitter | ORG |
$45 billion | MONEY |


# <font color="red">Exemplo NER 2:</font>

In [83]:
#Import the requisite library
import spacy

#Sample text
text = "This is a sample number (555) 555-5555."

# Build upon the spaCy Small Model (Construa sobre o modelo spaCy Small)
nlp = spacy.blank("en")

#Create the Ruler and Add it
ruler = nlp.add_pipe("entity_ruler")

#List of Entities and Patterns (source: https://spacy.io/usage/rule-based-matching)
patterns = [
                {"label": "PHONE_NUMBER", "pattern": [{"ORTH": "("}, {"SHAPE": "ddd"}, {"ORTH": ")"}, {"SHAPE": "ddd"},
                {"ORTH": "-", "OP": "?"}, {"SHAPE": "dddd"}]}
            ]
#add patterns to ruler
ruler.add_patterns(patterns)



#create the doc
doc = nlp(text)

#extract entities
for ent in doc.ents:
    print (ent.text, ent.label_)

(555) 555-5555 PHONE_NUMBER
