# Performing named entity recognition using spaCy

Named Entity Recognition (NER) is used in Natural Language Processing (NLP) to identify and classify important information within unstructured text. These "named entities" include proper nouns like people, organizations, locations and other meaningful categories such as dates, monetary values and products. By tagging these entities, we can transform raw text into structured data that can be analyzed, indexed or used in applications.

INSTALL SPACY AND THE PRE-TRAINED MODEL

In [1]:
!pip install spacy



In [2]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m104.8 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


IMPORT SPACY

In [3]:
import spacy

INITIALIZE THE SPACY ENGINE

In [4]:
nlp = spacy.load("en_core_web_sm")

INITIALIZE THE ARTICLE TEXT

In [6]:
article = """iPhone 12: Apple makes jump to 5G
Apple has confirmed its iPhone 12 handsets will be its
first to work on faster 5G networks.
The company has also extended the range to include a new
"Mini" model that has a smaller 5.4in screen.
The US firm bucked a wider industry downturn by
increasing its handset sales over the past year.
But some experts say the new features give Apple its best
opportunity for growth since 2014, when it revamped its
line-up with the iPhone 6.
"Networks are going to have to offer eye-wateringly
attractive deals, and the way they're going to do that is
on great tariffs and attractive trade-in deals,"
predicted Ben Wood from the consultancy CCS Insight.
Apple typically unveils its new iPhones in September, but
opted for a later date this year.
It has not said why, but it was widely speculated to be
related to disruption caused by the coronavirus pandemic.
The firm's shares ended the day 2.7% lower.
This has been linked to reports that several Chinese
internet platforms opted not to carry the livestream,
although it was still widely viewed and commented on via
the social media network Sina Weibo."""

CREATE THE SPACY DOC OBJECT

In [7]:
doc = nlp(article)

LOOP THROUGH THE ENTITIES AND PRINT THEIR INFORMATION

In [9]:
for ent in doc.ents:
  print(ent.text, ent.start_char, ent.end_char, ent.label_)

12 7 9 CARDINAL
Apple 11 16 ORG
5 31 32 CARDINAL
Apple 34 39 ORG
12 65 67 CARDINAL
first 89 94 ORDINAL
5 113 114 CARDINAL
5.4 215 218 CARDINAL
US 233 235 GPE
the past year 311 324 DATE
Apple 369 374 ORG
2014 413 417 DATE
6 464 465 CARDINAL
Ben Wood 636 644 PERSON
CCS Insight 666 677 ORG
Apple 679 684 ORG
iPhones 711 718 ORG
September 722 731 DATE
a later date this year 747 769 DATE
2.7% 917 921 PERCENT
Chinese 974 981 NORP
Sina 1118 1122 ORG
