# Implementing Part of Speech (POS) Tagging in NLP
* Notebook by Adam Lang
* Date: 3/26/2024
* In this notebook we will demonstrate implementation and use cases of part of speech tagging in NLP in Python.


In [1]:
# import spacy library
import spacy

In [2]:
# define a string to parse
text = "With great power comes great responsibility."

In [3]:
# load a spacy language model
nlp = spacy.load('en_core_web_sm')

In [4]:
# create spacy doc object
doc = nlp(text)

In [6]:
# parse for POS tags
for token in doc:
  print(token.text, '=>', token.pos_, '=>', token.tag_)

With => ADP => IN
great => ADJ => JJ
power => NOUN => NN
comes => VERB => VBZ
great => ADJ => JJ
responsibility => NOUN => NN
. => PUNCT => .


Summary of the output:
* spacy gives 2 methods for analyzing POS

1. `.pos_` - returns the 'universal POS tags'
2. `.tag_` - returns 'detailed POS tags'(more granular version of POS)

In [7]:
## define another string to test
text = "The teacher reads a book to her students then asks them questions about the story."

In [8]:
# create spacy doc object
doc=nlp(text)

In [9]:
# parse text for POS tags
for token in doc:
  print(token.text,'=>',token.pos_,'=>',token.tag_,'=>')

The => DET => DT =>
teacher => NOUN => NN =>
reads => VERB => VBZ =>
a => DET => DT =>
book => NOUN => NN =>
to => ADP => IN =>
her => PRON => PRP$ =>
students => NOUN => NNS =>
then => ADV => RB =>
asks => VERB => VBZ =>
them => PRON => PRP =>
questions => NOUN => NNS =>
about => ADP => IN =>
the => DET => DT =>
story => NOUN => NN =>
. => PUNCT => . =>


summary of above:
* Again we can see the standard POS tag and the detailed tag for each part of the sentence in the text variable.

## Parsing for words based on POS tags
* Rather than tagging or labeling, we can search text based on predicted POS tags.

In [10]:
# extract verbs from a sentence
[token.text for token in doc if(token.pos_ == 'VERB')]

['reads', 'asks']

In [11]:
# extract nouns from sentence
[token.text for token in doc if(token.pos_ == 'NOUN')]

['teacher', 'book', 'students', 'questions', 'story']

# Exercise using POS tagging
* In this exercise we will use a text file 'moon.txt' to perform POS tagging.
* We will do the following:

1. Read file into notebook.
2. Annotate POS tags.
3. Count number of nouns and verbs seen in the file.

In [17]:
# open file
file = open('/content/drive/MyDrive/Colab Notebooks/Classical NLP/moon.txt','r',encoding='utf-8')

# read file as strings
text = file.readline()


# close file
file.close()



In [18]:
# view text file
print(file)

<_io.TextIOWrapper name='/content/drive/MyDrive/Colab Notebooks/Classical NLP/moon.txt' mode='r' encoding='utf-8'>


In [16]:
# import spacy
import spacy

In [19]:
# load language model
nlp = spacy.load('en_core_web_sm')

In [20]:
# create doc object
doc = nlp(text)

## Parse for POS tags

In [21]:
# parse for POS tags:
for token in doc:
  print(token.text,'=>',token.pos_,'=>',token.tag_,'=>')

The => DET => DT =>
moon => NOUN => NN =>
is => AUX => VBZ =>
the => DET => DT =>
satellite => NOUN => NN =>
of => ADP => IN =>
the => DET => DT =>
earth => NOUN => NN =>
. => PUNCT => . =>
It => PRON => PRP =>
moves => VERB => VBZ =>
round => ADP => IN =>
the => DET => DT =>
earth => NOUN => NN =>
. => PUNCT => . =>
It => PRON => PRP =>
shines => VERB => VBZ =>
at => ADP => IN =>
night => NOUN => NN =>
by => ADP => IN =>
light => NOUN => NN =>
reflected => VERB => VBN =>
from => ADP => IN =>
the => DET => DT =>
Sun => PROPN => NNP =>
. => PUNCT => . =>
It => PRON => PRP =>
looks => VERB => VBZ =>
beautiful => ADJ => JJ =>
. => PUNCT => . =>
The => DET => DT =>
bright => ADJ => JJ =>
Moonlight => PROPN => NNP =>
is => AUX => VBZ =>
very => ADV => RB =>
soothing => ADJ => JJ =>
. => PUNCT => . =>
The => DET => DT =>
earthly => ADJ => JJ =>
objects => NOUN => NNS =>
shine => VERB => VBP =>
like => ADP => IN =>
silver => NOUN => NN =>
in => ADP => IN =>
the => DET => DT =>
moonlight => NO

## Parse for verbs

In [22]:
# create list comprehension for verb parsing
[token.text for token in doc if(token.pos_ == 'VERB')]

['moves',
 'shines',
 'reflected',
 'looks',
 'shine',
 'fascinated',
 'looks',
 'seems',
 'shines',
 'found',
 'got',
 'looks',
 'has',
 'forbidding',
 'look',
 'see',
 'walk',
 'has',
 'fascinated',
 'looked',
 'composed',
 'tried',
 'reveal',
 'wanted',
 'send',
 'made',
 'place',
 'reached',
 'walked',
 'collected',
 'returned',
 'sent',
 'conquered',
 'thrilling',
 'make',
 'have',
 'go']

## Parse for NOUNs

In [23]:
# extract nouns using list comprehension
[token.text for token in doc if(token.pos_ == 'NOUN')]

['moon',
 'satellite',
 'earth',
 'earth',
 'night',
 'light',
 'objects',
 'silver',
 'moonlight',
 'beauty',
 'moon',
 'sky',
 'night',
 'matter',
 'fact',
 'plants',
 'animals',
 'moon',
 'place',
 'plants',
 'animals',
 'form',
 'life',
 'moon',
 'earth',
 'moon',
 'atmosphere',
 'days',
 'nights',
 'moon',
 'earth',
 'fact',
 'appearance',
 'rocks',
 'craters',
 'moon',
 'night',
 'spots',
 'spots',
 'rocks',
 'craters',
 'pull',
 'moon',
 'earth',
 'surface',
 'moon',
 'man',
 'beginning',
 'life',
 'earth',
 'wonder',
 'poets',
 'poems',
 'moon',
 'Scientists',
 'mystery',
 'moon',
 'human',
 'moon',
 'attempts',
 'man',
 'moon',
 'moon',
 'surface',
 'moon',
 'rocks',
 'earth',
 'scientists',
 'men',
 'moon',
 'times',
 'moon',
 'man',
 'object',
 'journey',
 'moon',
 'life',
 'earth',
 'life',
 'earth',
 'moon']

## What are the most frequent Verbs and Nouns in the text?


In [30]:
# need to import pandas
import pandas as pd

## write a function
def word_freq(tokens):
  # use list comprehension in series - count verbs
  freq_verb = pd.Series([token.text for token in doc if(token.pos_ == 'VERB')]).value_counts()

  # use list comprehension in series - count nouns
  freq_nouns = pd.Series([token.text for token in doc if(token.pos_ == 'NOUN')]).value_counts()

  # print out results
  print(f"Top 20 most frequent verbs in moon.txt file are:\n {freq_verb[:20]}")
  print("\n")
  print(f"Top 20 most frequent nouns in moon.txt file are:\n {freq_nouns[:20]}")

  #return freq_verb, freq_nouns


In [31]:
# get the POS word frequencies for the text file
word_freq(doc)

Top 20 most frequent verbs in moon.txt file are:
 looks         3
has           2
shines        2
fascinated    2
returned      1
made          1
place         1
reached       1
walked        1
collected     1
moves         1
wanted        1
sent          1
conquered     1
thrilling     1
make          1
have          1
send          1
tried         1
reveal        1
dtype: int64


Top 20 most frequent nouns in moon.txt file are:
 moon          19
earth          9
life           4
night          3
rocks          3
man            3
fact           2
craters        2
animals        2
surface        2
plants         2
spots          2
mystery        1
human          1
attempts       1
Scientists     1
poems          1
scientists     1
poets          1
wonder         1
dtype: int64
