# Named Entity Recognition

In any text document, there are particular terms that represent specific entities that are more informative and have a unique context. These entities are known as named entities , which more specifically refer to terms that represent real-world objects like people, places, organizations, and so on, which are often denoted by proper names. 

__Named entity recognition (NER)__ , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.

There are out of the box NER taggers available through popular libraries like __`nltk`__ and __`spacy`__. Each library follows a different approach to solve the problem.

# NER with SpaCy

In [1]:
text = """Three more countries have joined an “international grand committee” of parliaments, adding to calls for 
Facebook’s boss, Mark Zuckerberg, to give evidence on misinformation to the coalition. Brazil, Latvia and Singapore 
bring the total to eight different parliaments across the world, with plans to send representatives to London on 27 
November with the intention of hearing from Zuckerberg. Since the Cambridge Analytica scandal broke, the Facebook chief 
has only appeared in front of two legislatures: the American Senate and House of Representatives, and the European parliament. 
Facebook has consistently rebuffed attempts from others, including the UK and Canadian parliaments, to hear from Zuckerberg. 
He added that an article in the New York Times on Thursday, in which the paper alleged a pattern of behaviour from Facebook 
to “delay, deny and deflect” negative news stories, “raises further questions about how recent data breaches were allegedly 
dealt with within Facebook.”
"""
print(text)

Three more countries have joined an “international grand committee” of parliaments, adding to calls for 
Facebook’s boss, Mark Zuckerberg, to give evidence on misinformation to the coalition. Brazil, Latvia and Singapore 
bring the total to eight different parliaments across the world, with plans to send representatives to London on 27 
November with the intention of hearing from Zuckerberg. Since the Cambridge Analytica scandal broke, the Facebook chief 
has only appeared in front of two legislatures: the American Senate and House of Representatives, and the European parliament. 
Facebook has consistently rebuffed attempts from others, including the UK and Canadian parliaments, to hear from Zuckerberg. 
He added that an article in the New York Times on Thursday, in which the paper alleged a pattern of behaviour from Facebook 
to “delay, deny and deflect” negative news stories, “raises further questions about how recent data breaches were allegedly 
dealt with within Facebook.”



In [2]:
import re

text = re.sub(r'\n', '', text)
text

'Three more countries have joined an “international grand committee” of parliaments, adding to calls for Facebook’s boss, Mark Zuckerberg, to give evidence on misinformation to the coalition. Brazil, Latvia and Singapore bring the total to eight different parliaments across the world, with plans to send representatives to London on 27 November with the intention of hearing from Zuckerberg. Since the Cambridge Analytica scandal broke, the Facebook chief has only appeared in front of two legislatures: the American Senate and House of Representatives, and the European parliament. Facebook has consistently rebuffed attempts from others, including the UK and Canadian parliaments, to hear from Zuckerberg. He added that an article in the New York Times on Thursday, in which the paper alleged a pattern of behaviour from Facebook to “delay, deny and deflect” negative news stories, “raises further questions about how recent data breaches were allegedly dealt with within Facebook.”'

In [3]:
import spacy

nlp = spacy.load('en')
text_nlp = nlp(text)

In [4]:
# print named entities in article
ner_tagged = [(word.text, word.ent_type_) for word in text_nlp]
print(ner_tagged)

[('Three', 'CARDINAL'), ('more', ''), ('countries', ''), ('have', ''), ('joined', ''), ('an', ''), ('“', ''), ('international', ''), ('grand', ''), ('committee', ''), ('”', ''), ('of', ''), ('parliaments', ''), (',', ''), ('adding', ''), ('to', ''), ('calls', ''), ('for', ''), ('Facebook', ''), ('’s', ''), ('boss', ''), (',', ''), ('Mark', 'PERSON'), ('Zuckerberg', 'PERSON'), (',', ''), ('to', ''), ('give', ''), ('evidence', ''), ('on', ''), ('misinformation', ''), ('to', ''), ('the', ''), ('coalition', ''), ('.', ''), ('Brazil', 'GPE'), (',', ''), ('Latvia', 'GPE'), ('and', ''), ('Singapore', 'GPE'), ('bring', ''), ('the', ''), ('total', ''), ('to', ''), ('eight', 'CARDINAL'), ('different', ''), ('parliaments', ''), ('across', ''), ('the', ''), ('world', ''), (',', ''), ('with', ''), ('plans', ''), ('to', ''), ('send', ''), ('representatives', ''), ('to', ''), ('London', 'GPE'), ('on', ''), ('27', 'DATE'), ('November', 'DATE'), ('with', ''), ('the', ''), ('intention', ''), ('of', ''),

In [5]:
from spacy import displacy

# visualize named entities
displacy.render(text_nlp, style='ent', jupyter=True)

Spacy offers fast NER tagger based on a number of techniques. The exact algorithm hasn't been talked about in much detail but the documentation marks it as <font color=blue> "The exact algorithm is a pastiche of well-known methods, and is not currently described in any single publication " </font>

The entities identified by spacy NER tagger are as shown in the following table \(details here: [spacy_documentation](https://spacy.io/api/annotation#named-entities)\)

![](https://github.com/duybluemind1988/Data-science/blob/master/NLP/Text_analytic_Apress/Ch08%20-%20Semantic%20Analysis/spacy_ner.png?raw=1)

In [6]:
named_entities = []
temp_entity_name = ''
temp_named_entity = None
for term, tag in ner_tagged:
    if tag:
        temp_entity_name = ' '.join([temp_entity_name, term]).strip()
        temp_named_entity = (temp_entity_name, tag)
    else:
        if temp_named_entity:
            named_entities.append(temp_named_entity)
            temp_entity_name = ''
            temp_named_entity = None

In [7]:
print(named_entities)

[('Three', 'CARDINAL'), ('Mark Zuckerberg', 'PERSON'), ('Brazil', 'GPE'), ('Latvia', 'GPE'), ('Singapore', 'GPE'), ('eight', 'CARDINAL'), ('London', 'GPE'), ('27 November', 'DATE'), ('Zuckerberg', 'GPE'), ('Facebook', 'ORG'), ('two', 'CARDINAL'), ('the American Senate', 'ORG'), ('House of Representatives', 'ORG'), ('European', 'NORP'), ('UK', 'GPE'), ('Canadian', 'NORP'), ('Zuckerberg', 'GPE'), ('the New York Times', 'ORG'), ('Thursday', 'DATE'), ('Facebook', 'ORG'), ('Facebook', 'ORG')]


In [8]:
from collections import Counter
c = Counter([item[1] for item in named_entities])
c.most_common()

[('GPE', 7),
 ('ORG', 6),
 ('CARDINAL', 3),
 ('DATE', 2),
 ('NORP', 2),
 ('PERSON', 1)]

Text example from DNN

In [14]:
text = """It is an enduring political question amid a pandemic recession, double-digit unemployment and a recovery that appears to be slowing: Why does President Trump continue to get higher marks on economic issues in polls than his predecessors Barack Obama, George W. Bush and George H.W. Bush enjoyed when they stood for re-election?

Mr. Trump’s relative strength on the economy, and whether Joseph R. Biden Jr. can cut into it over the next 10 weeks, are among the crucial dynamics in battleground states in the Midwest and the Sun Belt that are expected to decide the election. Many of these states have struggled this summer with rising coronavirus infection and death rates as well as job losses and vanishing wages and savings — hard times that, history suggests, will pose a threat to an incumbent president seeking re-election.

Yet polling data and interviews with voters and political analysts suggest that a confluence of factors are raising Mr. Trump’s standing on the economy issue, which remains a centerpiece of his pitch for a second term and is expected to be a major theme of the Republican National Convention this week.

The president has built an enduring brand with conservative voters, in particular, who continue to see him as a successful businessman and tough negotiator. Many of those voters praise his economic stewardship before the pandemic hit, and they do not blame him for the damage it has caused. In interviews, some of those voters cited record stock market gains — although only about half of Americans own any stock at all — as evidence of a rebound under the president.

ADVERTISEMENT

Continue reading the main story

“He’s had failures — so have I — in business,” said Dale Georgeff, 58, of Cedarburg, Wis., a Trump supporter who owns parts of a brewery and a vehicle paint shop and also sells insurance. “But I think the biggest thing is that — and I think this is how it rubs certain people the wrong way — he’s treating this like a business, and he’s running it like a business.”

David Winton, a Republican strategist and pollster, said that Mr. Trump’s ratings had been bolstered by the economy’s adding nine million jobs in May, June and July, after it lost more than 20 million in March and April. Mr. Trump’s approval on the economy “has still generally remained positive, and better than his overall job approval,” he said. “This has certainly been helped by the last three good monthly jobs reports that occurred despite the continuing restrictions on many businesses to operate.”

Thanks for reading The Times.
Subscribe to The Times
Polling suggests that Americans who form Mr. Trump’s voter base are less likely to have lost a job or income than Democratic or independent voters. That divergence is partially driven by race — the coronavirus crisis has disproportionately harmed Black and Latino workers, who lean heavily Democratic — but may also reflect regional divides. Small business owners in small, more rural states that backed Mr. Trump in the 2016 election report less economic damage from the crisis than those in larger blue states, according to an analysis of census survey data by the Economic Innovation Group in Washington.

Perhaps most notably, Mr. Trump is reaping the benefits of extreme polarization of the American electorate, a divide so intense that it has overpowered long-running connections between economic performance and presidential approval ratings. For many Republican voters and conservatives, optimism about the economy and approval of the president have become deeply entwined — and for Democrats, disfavor for Mr. Trump brought deep pessimism over the economy even in the years of growth and low unemployment before the crisis.


ImageSupporters of Mr. Trump at Mariotti Building Products in Old Forge, Pa., on Thursday. Even Republicans hit hard by the coronavirus crisis continue to give Mr. Trump and his economy high marks.
Supporters of Mr. Trump at Mariotti Building Products in Old Forge, Pa., on Thursday. Even Republicans hit hard by the coronavirus crisis continue to give Mr. Trump and his economy high marks.Credit...Doug Mills/The New York Times
Polls conducted in June, July and August for The New York Times by the online research firm SurveyMonkey underscore the degree to which even Republicans hit hard by the crisis continue to give Mr. Trump and his economy high marks. Eight in 10 Republican respondents who lost a job in the recession and have yet to return to work approve of Mr. Trump’s handling of the pandemic. Nearly three in 10 Republicans who lost jobs say they are better off economically than they were a year ago, a sentiment that is shared by barely one in 10 Democrats who have kept their jobs throughout the crisis.

ADVERTISEMENT

Continue reading the main story
“For so many of these voters, opinions of Trump are basically baked in,” said Amy Walter, national editor for the Cook Political Report in Washington, who has written extensively on the economy and Mr. Trump’s electoral fortunes. “And what the actual economic situation is in November is less important to them than it would be in a different time with different candidates.”

Mr. Trump’s overall approval ratings have never cracked a majority throughout his presidency. Voters have given him higher approval ratings on his handling of the economy — he topped 60 percent in one survey this year before the pandemic hit — even as some of his signature economic initiatives, like the 2017 tax cut package he signed into law, remain relatively unpopular.

But the plunge in economic activity since the coronavirus began to spread rapidly in the United States late this past winter has hurt Mr. Trump’s standing on economic issues as well as his overall approval. Most polls now find Americans are evenly split on whether they approve of his handling of the issue.

Gallup, for example, found Mr. Trump enjoyed a 48 percent approval rating on the economy this month, down from 63 percent in January. The decline was particularly acute among moderates, independents and voters who attended at least some college.

In a recent ABC News/Washington Post poll, two-thirds of Americans said the economy was in bad shape — the most since 2014, and a 20-percentage-point increase in negative ratings of the economy since Mr. Trump took office.

The decline in sentiment is hurting Mr. Trump in his campaign against Mr. Biden, the Democratic nominee. Among registered voters who said they thought the economy was doing badly, 70 percent planned to support Mr. Biden and his running mate, Senator Kamala Harris of California, in November, according to the ABC/Post poll.

ADVERTISEMENT

Continue reading the main story
But Mr. Biden, the former vice president, is far from commanding on the issue: Voters were split almost evenly into thirds on the question of whether the economy would be in better, worse or about the same shape now, if he were president. And while some polls this summer showed the candidates deadlocked on the question of who would best handle the economy, Mr. Trump led Mr. Biden on handling the economy in an NBC News/Wall Street Journal poll released this week. A Reuters poll had the men tied.

Election 2020 ›
Live Updates
Aug. 24, 2020, 9:29 p.m. ET6m ago
6m ago
Vernon Jones, a Democratic state legislator, crosses party lines to back Trump.
Aug. 24, 2020, 9:25 p.m. ET10m ago
10m ago
We’re fact-checking the Republican National Convention.
Aug. 24, 2020, 9:24 p.m. ET11m ago
11m ago
Herschel Walker says he takes claim that Trump is racist as a ‘personal insult.’
Mr. Biden emphasized his plans to create jobs and to bring the virus under control in his acceptance speech at the Democratic National Convention last week, and he criticized Mr. Trump’s handling of the pandemic. “I understand something this president doesn’t,” Mr. Biden said. “We will never get our economy back on track, we will never get our kids safely back to school, we will never have our lives back — until we deal with this virus.”

The Biden campaign has sought to link Mr. Trump to the recession in television advertisements, including one that proclaims that “Trump’s botched handling of the coronavirus pandemic cost jobs.” Campaign officials say Mr. Biden and his surrogates will increase those attacks in the weeks to come.

Mr. Trump “still has no plan to bring the pandemic under control or end the recession he catastrophically and needlessly worsened,” Andrew Bates, a Biden spokesman, said on Saturday.

The president continues to express confidence that economic issues favor him in the race, even as he overstates his mixed position in polls. “We’re building up the economy,” Mr. Trump said on Friday in Arlington, Va. “And we’re way ahead, by every poll — even the fake polls — we’re way ahead on the economy, which is very important.”


Image
Joseph R. Biden Jr. emphasized his plans to create jobs and to bring the coronavirus under control in his acceptance speech at the Democratic National Convention last week.
Joseph R. Biden Jr. emphasized his plans to create jobs and to bring the coronavirus under control in his acceptance speech at the Democratic National Convention last week.Credit...Erin Schaff/The New York Times
Partisan politics — and divergent experiences with the virus — factor heavily into the remaining divide. The SurveyMonkey polling shows Republicans are less likely to have lost a job in the crisis than Democrats or independents, though the gap shrinks when comparing only white voters. In the recovery from the depths of recession, the unemployment rate has remained higher for Black and Latino workers than for whites.

ADVERTISEMENT

Continue reading the main story
“Republicans are putting more importance on the economic issues of the pandemic,” said Laura Wronski, a research scientist for SurveyMonkey, “and Democrats are putting more importance on the health issues.”

Fewer than one in five conservative Republicans worries about losing a job in the crisis, far less than any other ideological group, the SurveyMonkey polling shows. (In perhaps a troubling sign for Mr. Trump, the group that worries most about job loss is independent voters.) Nearly two in five conservative Republicans say that by late October “the virus will be under control, and the economy will be strong or steadily improving,” which is more than double the rate of Americans overall. Only 3 percent of Democrats agree with that statement.

“I’ve seen a steady growth since he’s been in office,” said Rick Slowicki, president of Nonstop Couriers, a delivery service in Philadelphia that employs 11 people, runs 14 vehicles and expects revenue of $1.3 million this year. “I just bought three new vehicles with the confidence that we’re going to grow, even during Covid. I’m doubling down.”

Others praise Mr. Trump’s populist trade policies, including tariffs on imports from China that Mr. Trump claims have returned manufacturing jobs to America. “He is the only individual who has actually brought jobs back to the U.S.A. and put the country first,” said Dale Palmer, 63, a Republican who supports Mr. Trump and owns a boiler service business in Byron Center, Mich.


Image
People shop at the Galleria mall in Houston last month. The plunge in economic activity has hurt Mr. Trump’s standing on economic issues.
People shop at the Galleria mall in Houston last month. The plunge in economic activity has hurt Mr. Trump’s standing on economic issues.Credit...Erin Schaff/The New York Times
Democrats predict that if the recovery stalls in the fall and economic damage mounts anew, Mr. Trump’s economic ratings will plunge.

“Trump is a master at convincing people of his alternative reality,” said Jared Bernstein, an economist at the Center on Budget and Policy Priorities who is an outside adviser to Mr. Biden. “But he will be unable to do so as people face evictions, job losses, falling incomes and tremendous difficulties meeting their basic needs. At some point, reality TV collides with reality.”

Reporting was contributed by Ben Casselman, Kathleen Grey, Jon Hurdle, Tom Kertscher, Alan Rappeport and Giovanni Russonello.

Our 2020 Election Guide
Updated  Aug. 24, 2020
R.N.C. Updates
The Republican National Convention is underway, and President Trump plans to speak during each of its four nights this week. Follow our live updates.

Watch Live
The Times is livestreaming the Republican National Convention tonight, with real-time analysis and fact-checking from our reporters. Watch with us.

Meet the Candidates
Learn more about the presidential contenders.


Joe Biden
Democrat

Donald Trump
Republican
Keep Up With Our Coverage
Get an email recapping the day’s news

Download our mobile app on iOS and Android and turn on Breaking News and Politics alerts

ADVERTISEMENT

Continue reading the main story
Subscribe for $0.25 a week. Ends soon.

VIEW OFFER
Site Index
Site Information Navigation
© 2020 The New York Times Company
NYTCoContact UsWork with usAdvertiseT Brand StudioYour Ad ChoicesPrivacyTerms of ServiceTerms of SaleSite MapHelpSubscriptions

"""

In [15]:
print(text)

It is an enduring political question amid a pandemic recession, double-digit unemployment and a recovery that appears to be slowing: Why does President Trump continue to get higher marks on economic issues in polls than his predecessors Barack Obama, George W. Bush and George H.W. Bush enjoyed when they stood for re-election?

Mr. Trump’s relative strength on the economy, and whether Joseph R. Biden Jr. can cut into it over the next 10 weeks, are among the crucial dynamics in battleground states in the Midwest and the Sun Belt that are expected to decide the election. Many of these states have struggled this summer with rising coronavirus infection and death rates as well as job losses and vanishing wages and savings — hard times that, history suggests, will pose a threat to an incumbent president seeking re-election.

Yet polling data and interviews with voters and political analysts suggest that a confluence of factors are raising Mr. Trump’s standing on the economy issue, which rema

In [16]:
import re
import spacy
from spacy import displacy

text = re.sub(r'\n', '', text)
nlp = spacy.load('en')
text_nlp = nlp(text)
# visualize named entities
displacy.render(text_nlp, style='ent', jupyter=True)

# NER with Stanford NLP

Stanford’s Named Entity Recognizer is based on an implementation of linear chain __Conditional Random Field (CRF)__ sequence models. 

__Prerequisites:__ Download the official Stanford NER Tagger from [here](http://nlp.stanford.edu/software/stanford-ner-2014-08-27.zip), which seems to work quite well. You can try out a later version by going to [this website](https://nlp.stanford.edu/software/CRF-NER.shtml#Download)

This model is only trained on instances of _PERSON, ORGANIZATION and LOCATION_ types. The model is exposed through ```nltk``` wrappers.

In [None]:
import os
from nltk.tag import StanfordNERTagger

JAVA_PATH = r'C:\Program Files\Java\jre1.8.0_192\bin\java.exe'
os.environ['JAVAHOME'] = JAVA_PATH

STANFORD_CLASSIFIER_PATH = 'E:/stanford/stanford-ner-2014-08-27/classifiers/english.all.3class.distsim.crf.ser.gz'
STANFORD_NER_JAR_PATH = 'E:/stanford/stanford-ner-2014-08-27/stanford-ner.jar'

sn = StanfordNERTagger(STANFORD_CLASSIFIER_PATH,
                       path_to_jar=STANFORD_NER_JAR_PATH)
sn

<nltk.tag.stanford.StanfordNERTagger at 0x205a9ded0b8>

In [None]:
text_enc = text.encode('ascii', errors='ignore').decode('utf-8')
ner_tagged = sn.tag(text_enc.split())
print(ner_tagged)

[('Three', 'O'), ('more', 'O'), ('countries', 'O'), ('have', 'O'), ('joined', 'O'), ('an', 'O'), ('international', 'O'), ('grand', 'O'), ('committee', 'O'), ('of', 'O'), ('parliaments,', 'O'), ('adding', 'O'), ('to', 'O'), ('calls', 'O'), ('for', 'O'), ('Facebooks', 'ORGANIZATION'), ('boss,', 'O'), ('Mark', 'O'), ('Zuckerberg,', 'O'), ('to', 'O'), ('give', 'O'), ('evidence', 'O'), ('on', 'O'), ('misinformation', 'O'), ('to', 'O'), ('the', 'O'), ('coalition.', 'O'), ('Brazil,', 'O'), ('Latvia', 'LOCATION'), ('and', 'O'), ('Singapore', 'LOCATION'), ('bring', 'O'), ('the', 'O'), ('total', 'O'), ('to', 'O'), ('eight', 'O'), ('different', 'O'), ('parliaments', 'O'), ('across', 'O'), ('the', 'O'), ('world,', 'O'), ('with', 'O'), ('plans', 'O'), ('to', 'O'), ('send', 'O'), ('representatives', 'O'), ('to', 'O'), ('London', 'LOCATION'), ('on', 'O'), ('27', 'O'), ('November', 'O'), ('with', 'O'), ('the', 'O'), ('intention', 'O'), ('of', 'O'), ('hearing', 'O'), ('from', 'O'), ('Zuckerberg.', 'O')

In [None]:
named_entities = []
temp_entity_name = ''
temp_named_entity = None
for term, tag in ner_tagged:
    if tag != 'O':
        temp_entity_name = ' '.join([temp_entity_name, term]).strip()
        temp_named_entity = (temp_entity_name, tag)
    else:
        if temp_named_entity:
            named_entities.append(temp_named_entity)
            temp_entity_name = ''
            temp_named_entity = None

In [None]:
print(named_entities)

[('Facebooks', 'ORGANIZATION'), ('Latvia', 'LOCATION'), ('Singapore', 'LOCATION'), ('London', 'LOCATION'), ('Cambridge Analytica', 'ORGANIZATION'), ('Facebook', 'ORGANIZATION'), ('Senate', 'ORGANIZATION'), ('Facebook', 'ORGANIZATION'), ('UK', 'LOCATION'), ('New York Times', 'ORGANIZATION'), ('Facebook', 'ORGANIZATION')]


In [None]:
c = Counter([item[1] for item in named_entities])
c.most_common()

[('ORGANIZATION', 7), ('LOCATION', 4)]

# NER with Stanford CoreNLP

NLTK is slowly deprecating the old Stanford Parsers in favor of the more active Stanford Core NLP Project. It might even get removed after `nltk` version `3.4` so best to stay updated.

Details: https://github.com/nltk/nltk/issues/1839

Step by Step Tutorial here: https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK

Sadly a lot of things have changed in the process so we need to do some extra effort to make it work!

Get CoreNLP from [here](https://stanfordnlp.github.io/CoreNLP/)

After you download, go to the folder and spin up a terminal and start the Core NLP Server locally

```
E:\> java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9000 -port 9000 -timeout 15000
```

If it runs successfully you should see the following messages on the terminal

```
E:\stanford\stanford-corenlp-full-2018-02-27>java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9000 -port 9000 -timeout 15000
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP -     Threads: 4
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.9 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.0 sec].
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.8 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - TokensRegexNERAnnotator ner.fine.regexner: Read 580641 unique entries out of 581790 from edu/stanford/nlp/models/kbp/regexner_caseless.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - TokensRegexNERAnnotator ner.fine.regexner: Read 4857 unique entries out of 4868 from edu/stanford/nlp/models/kbp/regexner_cased.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - TokensRegexNERAnnotator ner.fine.regexner: Read 585498 unique entries from 2 files
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [4.6 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 22.43 (s)
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [24.4 sec].
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
```

![](https://github.com/duybluemind1988/Data-science/blob/master/NLP/Text_analytic_Apress/Ch08%20-%20Semantic%20Analysis/corenlp_ner.png?raw=1)

In [None]:
from nltk.parse import CoreNLPParser

ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
ner_tagger

<nltk.parse.corenlp.CoreNLPParser at 0x2059cbfb4a8>

In [None]:
import nltk

tags = list(ner_tagger.raw_tag_sents(nltk.sent_tokenize(text)))
tags = [sublist[0] for sublist in tags]
tags = [word_tag for sublist in tags for word_tag in sublist]
print(tags)

[('Three', 'NUMBER'), ('more', 'O'), ('countries', 'O'), ('have', 'O'), ('joined', 'O'), ('an', 'O'), ('``', 'O'), ('international', 'O'), ('grand', 'O'), ('committee', 'O'), ("''", 'O'), ('of', 'O'), ('parliaments', 'O'), (',', 'O'), ('adding', 'O'), ('to', 'O'), ('calls', 'O'), ('for', 'O'), ('Facebook', 'ORGANIZATION'), ("'s", 'O'), ('boss', 'TITLE'), (',', 'O'), ('Mark', 'PERSON'), ('Zuckerberg', 'PERSON'), (',', 'O'), ('to', 'O'), ('give', 'O'), ('evidence', 'O'), ('on', 'O'), ('misinformation', 'O'), ('to', 'O'), ('the', 'O'), ('coalition', 'O'), ('.', 'O'), ('Brazil', 'COUNTRY'), (',', 'O'), ('Latvia', 'COUNTRY'), ('and', 'O'), ('Singapore', 'COUNTRY'), ('bring', 'O'), ('the', 'O'), ('total', 'O'), ('to', 'O'), ('eight', 'NUMBER'), ('different', 'O'), ('parliaments', 'O'), ('across', 'O'), ('the', 'O'), ('world', 'O'), (',', 'O'), ('with', 'O'), ('plans', 'O'), ('to', 'O'), ('send', 'O'), ('representatives', 'O'), ('to', 'O'), ('London', 'CITY'), ('on', 'O'), ('27', 'DATE'), ('N

In [None]:
named_entities = []
temp_entity_name = ''
temp_named_entity = None
for term, tag in tags:
    if tag != 'O':
        temp_entity_name = ' '.join([temp_entity_name, term]).strip()
        temp_named_entity = (temp_entity_name, tag)
    else:
        if temp_named_entity:
            named_entities.append(temp_named_entity)
            temp_entity_name = ''
            temp_named_entity = None

print(named_entities)

[('Three', 'NUMBER'), ('Facebook', 'ORGANIZATION'), ('boss', 'TITLE'), ('Mark Zuckerberg', 'PERSON'), ('Brazil', 'COUNTRY'), ('Latvia', 'COUNTRY'), ('Singapore', 'COUNTRY'), ('eight', 'NUMBER'), ('London', 'CITY'), ('27 November', 'DATE'), ('Zuckerberg', 'PERSON'), ('Cambridge Analytica', 'ORGANIZATION'), ('Facebook', 'ORGANIZATION'), ('two', 'NUMBER'), ('American Senate', 'ORGANIZATION'), ('House of Representatives', 'ORGANIZATION'), ('European', 'NATIONALITY'), ('Facebook', 'ORGANIZATION'), ('UK', 'COUNTRY'), ('Canadian', 'NATIONALITY'), ('Zuckerberg', 'PERSON'), ('New York Times', 'ORGANIZATION'), ('Thursday', 'DATE'), ('Facebook', 'ORGANIZATION'), ('Facebook', 'ORGANIZATION')]


In [None]:
c = Counter([item[1] for item in named_entities])
c.most_common()

[('ORGANIZATION', 9),
 ('COUNTRY', 4),
 ('NUMBER', 3),
 ('PERSON', 3),
 ('DATE', 2),
 ('NATIONALITY', 2),
 ('TITLE', 1),
 ('CITY', 1)]