<a href="https://colab.research.google.com/github/DarkLord-13/Machine-Learning-01/blob/main/NER_(Named_Entity_Recognition).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***spacy***: This is the spaCy library, which is a popular open-source library for natural language processing (NLP) in Python.

***load('en_core_web_sm')***: This function is used to load a spaCy language model. In this case, you are loading a model named 'en_core_web_sm,' which is a pre-trained English language model included with spaCy. The 'sm' in the model name stands for "small," indicating that it's a smaller and faster version of the model, suitable for most common NLP tasks.

In [1]:
import spacy

nlp = spacy.load('en_core_web_sm')

In [2]:
# prints the list of components in the pipeline which we can use for processing
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [3]:
doc = nlp('Tesla is going to acquire Twitter Inc for $45 billion')

In [4]:
for ent in doc.ents:
  print(ent.text, '|', ent.label_)

Tesla | ORG
Twitter Inc | ORG
$45 billion | MONEY


In [5]:
from spacy import displacy

displacy.render(doc, style='ent', jupyter=True)

In [6]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

In [7]:
doc = nlp('Michael Bloomberg founded Bloomberg Inc in 1982')

for ent in doc.ents:
  print(ent.text, '|', ent.label_, spacy.explain(ent.label_))

Michael Bloomberg | PERSON People, including fictional
Bloomberg Inc | ORG Companies, agencies, institutions, etc.
1982 | DATE Absolute or relative dates or periods


doc is acting like a string, therefore slicing also working the same, but actualy the concept of 'Span' is used.

In [8]:
type(doc)

spacy.tokens.doc.Doc

In [9]:
doc[0:3]

Michael Bloomberg founded

In [10]:
doc = nlp('Tesla is going to acquire Twitter for $45 billion')
for ent in doc.ents:
  print(ent.text, '|', ent.label_, spacy.explain(ent.label_))

Tesla | ORG Companies, agencies, institutions, etc.
Twitter | PRODUCT Objects, vehicles, foods, etc. (not services)
$45 billion | MONEY Monetary values, including unit


In [11]:
from spacy.tokens import Span

In [12]:
s1 = Span(doc, 0, 1, label='ORG')
s2 = Span(doc, 5, 6, label='ORG')

doc.set_ents([s1, s2], default='unmodified') # this sets new entity labels and leaves the deafult labels unmodified

In [13]:
# now when we run the same code, it identifies Tesla and Twitter as ORG
for ent in doc.ents:
  print(ent.text, '|', ent.label_, spacy.explain(ent.label_))

Tesla | ORG Companies, agencies, institutions, etc.
Twitter | ORG Companies, agencies, institutions, etc.
$45 billion | MONEY Monetary values, including unit
