Tagging Part of Speech
--
Part of speech (POS) tagging is another crucial part of natural language
processing that involves labeling the words with a part of speech such as
noun, verb, adjective, etc. POS is the base for Named Entity Resolution,
Sentiment Analysis, Question Answering, and Word Sense Disambiguation.

Problem
--
Tagging the parts of speech for a sentence.

Solution
--
There are 2 ways a tagger can be built.

• Rule based - Rules created manually, which tag a word belonging to a particular POS.

• Stochastic based - These algorithms capture the sequence of the words and tag the probability of the sequence using hidden Markov models.

Again, NLTK has the best POS tagging module. nltk.pos_tag(word) is the
function that will generate the POS tagging for any given word. Use for loop
and generate POS for all the words present in the document.

In [1]:
text = "I love NLP and I will learn NLP in 2 month"

# NLTK for POS
# Importing necessary packages and stopwords
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
stop_words = set(stopwords.words('english'))

# Tokenize the text
tokens = sent_tokenize(text)

#Generate tagging for all the tokens using loop
for i in tokens:
 words = nltk.word_tokenize(i)
 words = [w for w in words if not w in stop_words]

# POS-tagger.
tags = nltk.pos_tag(words)

print(tags)

[('I', 'PRP'), ('love', 'VBP'), ('NLP', 'NNP'), ('I', 'PRP'), ('learn', 'VBP'), ('NLP', 'RB'), ('2', 'CD'), ('month', 'NN')]


Extract Entities from Text
--
In this coding example, we are going to discuss how to identify and extract entities from the text, called Named Entity Recognition. 

There are multiple libraries to perform this task like NLTK chunker, StanfordNER, SpaCy, opennlp, and NeuroNER; and there are a lot of APIs also like WatsonNLU, AlchemyAPI, NERD, Google Cloud NLP API, and many more.

Problem
--
You want to identify and extract entities from the text.

Solution
--
The simplest way to do this is by using the ne_chunk from NLTK or SpaCy.

In [2]:
sent = "John is studying at Stanford University in California"

#import libraries
import nltk

from nltk import ne_chunk
from nltk import word_tokenize

#NER
ne_chunk(nltk.pos_tag(word_tokenize(sent)), binary=False)

# second binary parameter indicates whether Named entity is Person, 
# Org or GPE.
# i.e binary=False would help you in classifying the NE

The Ghostscript executable isn't found.
See http://web.mit.edu/ghostscript/www/Install.htm
If you're using a Mac, you can try installing
https://docs.brew.sh/Installation then `brew install ghostscript`


LookupError: 

Tree('S', [Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('studying', 'VBG'), ('at', 'IN'), Tree('ORGANIZATION', [('Stanford', 'NNP'), ('University', 'NNP')]), ('in', 'IN'), Tree('GPE', [('California', 'NNP')])])

Use Cases of NER Models
--
Named Entity Recognition has a wide range of applications in the field of Natural Language Processing and Information Retrieval. 

Few such examples have been listed below :

1> Automatically Summarizing Resumes : One of the key challenges faced by the HR Department across companies is to evaluate a gigantic pile of resumes to shortlist candidates. To add to their burden, resumes of applicants are often excessively populated in detail, of which, most of the information is irrelevant to what the evaluator is seeking. With the aim of simplifying this process, through our NER model, we could facilitate evaluation of resumes at a quick glance, thereby simplifying the effort required in shortlisting candidates among a pile of resumes.

2> Optimizing Search Engine Algorithms : To design a search engine algorithm, instead of searching for an entered query across the millions of articles and websites online, a more efficient approach would be to run an NER model on the articles once and store the entities associated with them permanently. The key tags in the search query can then be compared with the tags associated with the website articles for a quick and efficient search.

3> Powering Recommender Systems : NER can be used in developing algorithms for recommender systems which automatically filter relevant content we might be interested in and accordingly guide us to discover related and unvisited relevant contents based on our previous behaviour. This may be achieved by extracting the entities associated with the content in our history or previous activity and comparing them with label assigned to other unseen content to filter relevant ones.

4> Simplifying Customer Support : NER can be used in recognizing relevant entities in customer complaints and feedback such as Product specifications, department or company branch details, so that the feedback is classified accordingly and forwarded to the appropriate department responsible for the identified product.

Discussion Case study ( 20 - 30 mins )
--
Automatic Resume Summarization using NER

> Each team of 2 participants is expected to scan through this link :
https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175
> and be ready for a Q n A round with the trainer 

Recommended Reading - Home Assignment
--
https://medium.freecodecamp.org/an-introduction-to-part-of-speech-tagging-and-the-hidden-markov-model-953d45338f24

Doing NER Using SpaCy
--
Prerequiste : Micrsosoft Visual Studio C++ build tools

In [3]:
# doing NER Using SpaCy
!pip install spacy
import spacy
nlp = spacy.load('en')

# Read/create a sentence
doc = nlp(u'Apple is ready to launch new phone worth $10000 in New york time square ')
for ent in doc.ents:
 print(ent.text, ent.start_char, ent.end_char, ent.label_)

# for running spaCy, Microsoft Visual Studio C++ build tools error would arise. 
# I downloaded, added it to the path, .. did lots of Stackoverflow, but still 
# spaCy on my PC is not able to detect Micrsosoft Visual Studio C++
# This error may come up in some Windows based machine , 
# due to certain missing dlls.

# I run the above code on someone's Google colab account and 
# got the following o/p
# Apple 0 5 ORG
# 10000 42 47 MONEY
# New york 51 59 GPE

Collecting spacy
  Downloading https://files.pythonhosted.org/packages/0a/19/2b2c0e1340131a8e23ce4a9804cdccdd62d4d23d3d86c1754857b3de7a14/spacy-2.2.4-cp36-cp36m-win_amd64.whl (9.9MB)
Collecting preshed<3.1.0,>=3.0.2
  Downloading https://files.pythonhosted.org/packages/b0/71/a58322c3489bf0f5a71aa69a66b42164cbc4f0d5ac5e1042c11233766b3f/preshed-3.0.2-cp36-cp36m-win_amd64.whl (105kB)
Collecting plac<1.2.0,>=0.9.6
  Downloading https://files.pythonhosted.org/packages/86/85/40b8f66c2dd8f4fd9f09d59b22720cffecf1331e788b8a0cab5bafb353d1/plac-1.1.3-py2.py3-none-any.whl
Collecting catalogue<1.1.0,>=0.0.7
  Downloading https://files.pythonhosted.org/packages/6c/f9/9a5658e2f56932e41eb264941f9a2cb7f3ce41a80cb36b2af6ab78e2f8af/catalogue-1.0.0-py2.py3-none-any.whl
Collecting cymem<2.1.0,>=2.0.2
  Downloading https://files.pythonhosted.org/packages/dd/ec/904b4741879a2a280a40d5bf0b61449a20d1f75281e14ebee06566f7765b/cymem-2.0.3-cp36-cp36m-win_amd64.whl
Collecting wasabi<1.1.0,>=0.4.0
  Downloading https

You should consider upgrading via the 'python -m pip install --upgrade pip' command.


OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Optional Extra Reading :
https://towardsdatascience.com/a-simple-word-sense-disambiguation-application-3ca645c56357

You may need to check this stackoverflow link for better understanding :
https://towardsdatascience.com/a-simple-word-sense-disambiguation-application-3ca645c56357