
# Named Entity Recognition

Named entity recognition (NER) will help us to easily identify the key components in a text, such as names of people, places, brands, monetary values, and more.

And extracting the main entities from a text helps us to sort the unstructured data and detect the important information, which is crucial if you have to deal with large datasets.

So, Let’s discuss some of the interesting use cases of Named Entity Recognition:

## Customer Support

<img target="_blank" src="https://user-images.githubusercontent.com/32620288/146425311-b20e1295-d11d-481d-a799-0238bcd85de0.png" width=560; height=560>

Let’s discuss the use case of customer support tickets where we deal with a rising number of tickets, there we can use named entity recognition techniques to handle the customer requests faster.

From a business perspective, if we automate the repetitive customer service tasks, such as categorizing customers’ issues, and queries, then it saves you valuable time. As a result, it helps to improve your resolution rates and boost customer satisfaction.

Here, we can also use entity extraction to pull the relevant pieces of information, like product names or serial numbers, which makes it easier to route tickets to the most suitable agent or team for handling that issue.

In [8]:
article = '''
Asian shares skidded on Tuesday after a rout in tech stocks put Wall Street to the sword, while a 
sharp drop in oil prices and political risks in Europe pushed the dollar to 16-month highs as investors dumped 
riskier assets. MSCI’s broadest index of Asia-Pacific shares outside Japan dropped 1.7 percent to a 1-1/2 
week trough, with Australian shares sinking 1.6 percent. Japan’s Nikkei dived 3.1 percent led by losses in 
electric machinery makers and suppliers of Apple’s iphone parts. Sterling fell to $1.286 after three straight 
sessions of losses took it to the lowest since Nov.1 as there were still considerable unresolved issues with the
European Union over Brexit, British Prime Minister Theresa May said on Monday.'''

In [9]:
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.chunk import conlltags2tree, tree2conlltags
from pprint import pprint

In [10]:
nltk.download('words')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('maxent_ne_chunker')

[nltk_data] Downloading package words to
[nltk_data]     C:\Users\divak\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\divak\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\divak\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\divak\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!


True

In [11]:
print('NTLK version: %s' % (nltk.__version__))

NTLK version: 3.6.5


In [12]:
def fn_preprocess(art):
    art = nltk.word_tokenize(art)
    art = nltk.pos_tag(art)

    return art

art_processed = fn_preprocess(article)
print(art_processed)

[('Asian', 'JJ'), ('shares', 'NNS'), ('skidded', 'VBN'), ('on', 'IN'), ('Tuesday', 'NNP'), ('after', 'IN'), ('a', 'DT'), ('rout', 'NN'), ('in', 'IN'), ('tech', 'JJ'), ('stocks', 'NNS'), ('put', 'VBD'), ('Wall', 'NNP'), ('Street', 'NNP'), ('to', 'TO'), ('the', 'DT'), ('sword', 'NN'), (',', ','), ('while', 'IN'), ('a', 'DT'), ('sharp', 'JJ'), ('drop', 'NN'), ('in', 'IN'), ('oil', 'NN'), ('prices', 'NNS'), ('and', 'CC'), ('political', 'JJ'), ('risks', 'NNS'), ('in', 'IN'), ('Europe', 'NNP'), ('pushed', 'VBD'), ('the', 'DT'), ('dollar', 'NN'), ('to', 'TO'), ('16-month', 'JJ'), ('highs', 'NNS'), ('as', 'IN'), ('investors', 'NNS'), ('dumped', 'VBD'), ('riskier', 'JJR'), ('assets', 'NNS'), ('.', '.'), ('MSCI', 'NNP'), ('’', 'NNP'), ('s', 'VBD'), ('broadest', 'JJS'), ('index', 'NN'), ('of', 'IN'), ('Asia-Pacific', 'NNP'), ('shares', 'NNS'), ('outside', 'IN'), ('Japan', 'NNP'), ('dropped', 'VBD'), ('1.7', 'CD'), ('percent', 'NN'), ('to', 'TO'), ('a', 'DT'), ('1-1/2', 'JJ'), ('week', 'NN'), ('tr

In [15]:
results = ne_chunk(art_processed)
print(results)

(S
  (GPE Asian/JJ)
  shares/NNS
  skidded/VBN
  on/IN
  Tuesday/NNP
  after/IN
  a/DT
  rout/NN
  in/IN
  tech/JJ
  stocks/NNS
  put/VBD
  (FACILITY Wall/NNP Street/NNP)
  to/TO
  the/DT
  sword/NN
  ,/,
  while/IN
  a/DT
  sharp/JJ
  drop/NN
  in/IN
  oil/NN
  prices/NNS
  and/CC
  political/JJ
  risks/NNS
  in/IN
  (GPE Europe/NNP)
  pushed/VBD
  the/DT
  dollar/NN
  to/TO
  16-month/JJ
  highs/NNS
  as/IN
  investors/NNS
  dumped/VBD
  riskier/JJR
  assets/NNS
  ./.
  (ORGANIZATION MSCI/NNP)
  ’/NNP
  s/VBD
  broadest/JJS
  index/NN
  of/IN
  Asia-Pacific/NNP
  shares/NNS
  outside/IN
  (GPE Japan/NNP)
  dropped/VBD
  1.7/CD
  percent/NN
  to/TO
  a/DT
  1-1/2/JJ
  week/NN
  trough/NN
  ,/,
  with/IN
  (GPE Australian/JJ)
  shares/NNS
  sinking/VBG
  1.6/CD
  percent/NN
  ./.
  (PERSON Japan/NNP)
  ’/NNP
  s/VBD
  (PERSON Nikkei/NNP)
  dived/VBD
  3.1/CD
  percent/NN
  led/VBN
  by/IN
  losses/NNS
  in/IN
  electric/JJ
  machinery/NN
  makers/NNS
  and/CC
  suppliers/NNS
  of/IN
  

In [16]:
for x in str(results).split('\n'):
    if '/NN' in x:
        print(x)

  shares/NNS
  Tuesday/NNP
  rout/NN
  stocks/NNS
  (FACILITY Wall/NNP Street/NNP)
  sword/NN
  drop/NN
  oil/NN
  prices/NNS
  risks/NNS
  (GPE Europe/NNP)
  dollar/NN
  highs/NNS
  investors/NNS
  assets/NNS
  (ORGANIZATION MSCI/NNP)
  ’/NNP
  index/NN
  Asia-Pacific/NNP
  shares/NNS
  (GPE Japan/NNP)
  percent/NN
  week/NN
  trough/NN
  shares/NNS
  percent/NN
  (PERSON Japan/NNP)
  ’/NNP
  (PERSON Nikkei/NNP)
  percent/NN
  losses/NNS
  machinery/NN
  makers/NNS
  suppliers/NNS
  (PERSON Apple/NNP)
  ’/NNP
  iphone/NN
  parts/NNS
  (PERSON Sterling/NN)
  sessions/NNS
  losses/NNS
  Nov.1/NNP
  issues/NNS
  (ORGANIZATION European/NNP Union/NNP)
  (GPE Brexit/NNP)
  (GPE British/NNP)
  Prime/NNP
  Minister/NNP
  (PERSON Theresa/NNP May/NNP)
  Monday/NNP


In [17]:
pattern = 'NP: {<DT>?<JJ>*<NN>}'
cp = nltk.RegexpParser(pattern)
cs = cp.parse(art_processed)
print(cs)

(S
  Asian/JJ
  shares/NNS
  skidded/VBN
  on/IN
  Tuesday/NNP
  after/IN
  (NP a/DT rout/NN)
  in/IN
  tech/JJ
  stocks/NNS
  put/VBD
  Wall/NNP
  Street/NNP
  to/TO
  (NP the/DT sword/NN)
  ,/,
  while/IN
  (NP a/DT sharp/JJ drop/NN)
  in/IN
  (NP oil/NN)
  prices/NNS
  and/CC
  political/JJ
  risks/NNS
  in/IN
  Europe/NNP
  pushed/VBD
  (NP the/DT dollar/NN)
  to/TO
  16-month/JJ
  highs/NNS
  as/IN
  investors/NNS
  dumped/VBD
  riskier/JJR
  assets/NNS
  ./.
  MSCI/NNP
  ’/NNP
  s/VBD
  broadest/JJS
  (NP index/NN)
  of/IN
  Asia-Pacific/NNP
  shares/NNS
  outside/IN
  Japan/NNP
  dropped/VBD
  1.7/CD
  (NP percent/NN)
  to/TO
  (NP a/DT 1-1/2/JJ week/NN)
  (NP trough/NN)
  ,/,
  with/IN
  Australian/JJ
  shares/NNS
  sinking/VBG
  1.6/CD
  (NP percent/NN)
  ./.
  Japan/NNP
  ’/NNP
  s/VBD
  Nikkei/NNP
  dived/VBD
  3.1/CD
  (NP percent/NN)
  led/VBN
  by/IN
  losses/NNS
  in/IN
  (NP electric/JJ machinery/NN)
  makers/NNS
  and/CC
  suppliers/NNS
  of/IN
  Apple/NNP
  ’/NNP
  s/

In [18]:
iob_tagged = tree2conlltags(cs)
pprint(iob_tagged)

[('Asian', 'JJ', 'O'),
 ('shares', 'NNS', 'O'),
 ('skidded', 'VBN', 'O'),
 ('on', 'IN', 'O'),
 ('Tuesday', 'NNP', 'O'),
 ('after', 'IN', 'O'),
 ('a', 'DT', 'B-NP'),
 ('rout', 'NN', 'I-NP'),
 ('in', 'IN', 'O'),
 ('tech', 'JJ', 'O'),
 ('stocks', 'NNS', 'O'),
 ('put', 'VBD', 'O'),
 ('Wall', 'NNP', 'O'),
 ('Street', 'NNP', 'O'),
 ('to', 'TO', 'O'),
 ('the', 'DT', 'B-NP'),
 ('sword', 'NN', 'I-NP'),
 (',', ',', 'O'),
 ('while', 'IN', 'O'),
 ('a', 'DT', 'B-NP'),
 ('sharp', 'JJ', 'I-NP'),
 ('drop', 'NN', 'I-NP'),
 ('in', 'IN', 'O'),
 ('oil', 'NN', 'B-NP'),
 ('prices', 'NNS', 'O'),
 ('and', 'CC', 'O'),
 ('political', 'JJ', 'O'),
 ('risks', 'NNS', 'O'),
 ('in', 'IN', 'O'),
 ('Europe', 'NNP', 'O'),
 ('pushed', 'VBD', 'O'),
 ('the', 'DT', 'B-NP'),
 ('dollar', 'NN', 'I-NP'),
 ('to', 'TO', 'O'),
 ('16-month', 'JJ', 'O'),
 ('highs', 'NNS', 'O'),
 ('as', 'IN', 'O'),
 ('investors', 'NNS', 'O'),
 ('dumped', 'VBD', 'O'),
 ('riskier', 'JJR', 'O'),
 ('assets', 'NNS', 'O'),
 ('.', '.', 'O'),
 ('MSCI', 'NNP'