Taken from https://medium.com/@cetinsamet/part-of-speech-pos-tagging-8af646a3d5bb and associted GitHub repo.

![Named Entity Recognition](https://cdn-images-1.medium.com/max/800/0*6qNBX5v1XFr1pMvr.jpg)
Source: https://hackernoon.com/named-entity-recognition-applications-and-use-cases-c2ef0904e9fe

In [1]:
ner_dir = '/Users/marck/stanford-ner/'

In [2]:
# Copy from https://en.wikipedia.org/wiki/Stanford_University

article = "The university was founded in 1885 by Leland and Jane Stanford in memory of \
their only child, Leland Stanford Jr., who had died of typhoid fever at age 15 the previous \
year. Stanford was a former Governor of California and U.S. Senator; he made his fortune as a railroad tycoon. \
The school admitted its first students on October 1, 1891,[2][3] as a coeducational and non-denominational institution."

In [3]:
article2 = 'New York, New York , NY N.Y. new york'

# Stanford NER

In [4]:
import nltk
print('NTLK Version: %s' % nltk.__version__)

from nltk.tag import StanfordNERTagger

stanford_ner_tagger = StanfordNERTagger(
    ner_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz',
    ner_dir + 'stanford-ner.jar'
)

NTLK Version: 3.3


In [5]:
results = stanford_ner_tagger.tag(article.split())

In [6]:
print('Original Sentence: %s' % (article))
print()
for result in results:
    tag_value = result[0]
    tag_type = result[1]
    if tag_type != 'O':
        print('Type: %s, Value: %s' % (tag_type, tag_value))

Original Sentence: The university was founded in 1885 by Leland and Jane Stanford in memory of their only child, Leland Stanford Jr., who had died of typhoid fever at age 15 the previous year. Stanford was a former Governor of California and U.S. Senator; he made his fortune as a railroad tycoon. The school admitted its first students on October 1, 1891,[2][3] as a coeducational and non-denominational institution.

Type: PERSON, Value: Leland
Type: PERSON, Value: Jane
Type: PERSON, Value: Stanford
Type: PERSON, Value: Leland
Type: PERSON, Value: Stanford
Type: PERSON, Value: Jr.,
Type: ORGANIZATION, Value: Stanford
Type: LOCATION, Value: California
Type: LOCATION, Value: U.S.


In [7]:
results = stanford_ner_tagger.tag(article2.split())

In [8]:
print('Original Sentence: %s' % (article2))
print()
for result in results:
    tag_value = result[0]
    tag_type = result[1]
    if tag_type != 'O':
        print('Type: %s, Value: %s' % (tag_type, tag_value))

Original Sentence: New York, New York , NY N.Y. new york

Type: LOCATION, Value: New
Type: LOCATION, Value: York
Type: LOCATION, Value: NY
Type: LOCATION, Value: N.Y.


# NLTK NE

In [9]:
import nltk

print('NTLK version: %s' % (nltk.__version__))

from nltk import word_tokenize, pos_tag, ne_chunk

nltk.download('words')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('maxent_ne_chunker')

NTLK version: 3.3
[nltk_data] Downloading package words to /Users/marck/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/marck/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /Users/marck/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/marck/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.


True

In [10]:
results = ne_chunk(pos_tag(word_tokenize(article)))

In [11]:
print('Original Sentence: %s' % (article))
print()
for x in str(results).split('\n'):
    if '/NNP' in x:
        print(x)

Original Sentence: The university was founded in 1885 by Leland and Jane Stanford in memory of their only child, Leland Stanford Jr., who had died of typhoid fever at age 15 the previous year. Stanford was a former Governor of California and U.S. Senator; he made his fortune as a railroad tycoon. The school admitted its first students on October 1, 1891,[2][3] as a coeducational and non-denominational institution.

  (GPE Leland/NNP)
  (PERSON Jane/NNP Stanford/NNP)
  (GPE Leland/NNP)
  Stanford/NNP
  Jr./NNP
  (PERSON Stanford/NNP)
  Governor/NNP
  (GPE California/NNP)
  (GPE U.S/NNP)
  Senator/NNP
  October/NNP
  ]/NNP


In [13]:
results = ne_chunk(pos_tag(word_tokenize(article2)))

In [12]:
print('Original Sentence: %s' % (article2))
print()
for x in str(results).split('\n'):
    if '/NNP' in x:
        print(x)

Original Sentence: New York, New York , NY N.Y. new york

  (GPE Leland/NNP)
  (PERSON Jane/NNP Stanford/NNP)
  (GPE Leland/NNP)
  Stanford/NNP
  Jr./NNP
  (PERSON Stanford/NNP)
  Governor/NNP
  (GPE California/NNP)
  (GPE U.S/NNP)
  Senator/NNP
  October/NNP
  ]/NNP


# Spacy

In [14]:
import spacy

print('spaCy: %s' % (spacy.__version__))

spaCy: 2.2.2


In [15]:
spacy_nlp = spacy.load('en')

In [17]:
document = spacy_nlp(article)

print('Original Sentence: %s' % (article))
print()
for element in document.ents:
    print('Type: %s, Value: %s' % (element.label_, element))

ValueError: [E167] Unknown morphological feature: 'ConjType' (9141427322507498425). This can happen if the tagger was trained with a different set of morphological features. If you're using a pretrained model, make sure that your models are up to date:
python -m spacy validate

In [18]:
document = spacy_nlp(article2)

print('Original Sentence: %s' % (article2))
print()
for element in document.ents:
    print('Type: %s, Value: %s' % (element.label_, element))

Original Sentence: New York, New York , NY N.Y. new york

Type: GPE, Value: New York
Type: GPE, Value: New York
Type: GPE, Value: NY N.Y.
