# Basic NLP Course

## Named Entity Recognition (NER)

Named Entity Recognition (NER) is a subtask of Natural Language Processing (NLP) that focuses on identifying and classifying named entities in text into predefined categories such as names of persons, organizations, locations, dates, and more.

- **Definition**: NER helps in extracting structured information from unstructured text by identifying entities and their types.
- **Example**: In the sentence "Google was founded by Larry Page and Sergey Brin in California," NER identifies:
    - "Google" → Organization
    - "Larry Page" → Person
    - "Sergey Brin" → Person
    - "California" → Location


In [10]:
import spacy
from spacy import displacy
from spacy.tokens import Span

nlp = spacy.load('en_core_web_sm')

In [3]:
# show the pipeline components
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [4]:
text = "The Haber-Bosch process, developed by Fritz Haber and Carl Bosch in the early 20th century, is a key method for synthesizing ammonia. Ammonia is widely used in fertilizers and is produced in facilities like BASF in Germany. The process operates under high pressure and temperature, utilizing catalysts such as iron."

In [6]:
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Fritz Haber | PERSON | People, including fictional
Carl Bosch | PERSON | People, including fictional
the early 20th century | DATE | Absolute or relative dates or periods
Ammonia | GPE | Countries, cities, states
BASF | ORG | Companies, agencies, institutions, etc.
Germany | GPE | Countries, cities, states


In [None]:
displacy.render(doc, style='ent')

In [9]:
# see which categories or NER are suppoted
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

In [16]:
doc[1:4]

Haber-Bosch

In [18]:
s1 = Span(doc, 1, 4, label='PROCESS_METHOD')

doc.set_ents([s1], default='unmodified')

In [19]:
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Haber-Bosch | PROCESS_METHOD | None
Fritz Haber | PERSON | People, including fictional
Carl Bosch | PERSON | People, including fictional
the early 20th century | DATE | Absolute or relative dates or periods
Ammonia | GPE | Countries, cities, states
BASF | ORG | Companies, agencies, institutions, etc.
Germany | GPE | Countries, cities, states


