<a href="https://colab.research.google.com/github/Suryan5h/Natural-Language-Processing/blob/main/basics/POSTagging_Spacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Import Spacy
import spacy



In [2]:
# Load pre-trained spacy model for performing basic NLP tasks such as POS Tagging, parsing, lexical analysis etc.
model = spacy.load("en_core_web_sm")
# ‘en’ stands for English language, which means you are working specifically on English language using the spaCy library.
# ‘core’ stands for core NLP tasks such as lemmatization or PoS tagging, which means you are loading the pre-built models which can perform some of the core NLP-related tasks.
# ‘web’ is the pre-built model of the spaCy library which you will use for NLP tasks that are trained from web source content such as blogs, social media and comments.
# ‘sm’ means small models which are faster and use smaller pipelines but are comparatively less accurate. As a complement to ‘sm’, you can use ‘lg’ or ‘md’ for larger pipelines which will be more accurate than ‘sm’.

In [3]:
# Use model to process the sentence
tokens = model("She wished she could desert him in the desert.")

In [5]:
# Print the tokens and their respective POS Tags
for token in tokens:
  print(token.text, "--", token.pos_ , "--", token.tag_)

She -- PRON -- PRP
wished -- VERB -- VBD
she -- PRON -- PRP
could -- AUX -- MD
desert -- VERB -- VB
him -- PRON -- PRP
in -- ADP -- IN
the -- DET -- DT
desert -- NOUN -- NN
. -- PUNCT -- .


Note here that in the above example, the two instances of *desert* have different PoS tags and hence, the text to speech system can use this information to generate the correct pronounciation. 

The above task is a specific example of the larger NLP problem called Word Sense Disambiguation (WSD). For words that have more than one meaning, WSD is the problem of identifying the correct meaning of the word based on the context in which the word is used.



Note that this technique will not work when the different meanings have the same PoS tags.

In [6]:
# Let's take a new example.
tokens = model("The bass swam around the bass drum on the ocean floor")
for token in tokens:
    print(token.text, "--", token.pos_, "--", token.tag_)

The -- DET -- DT
bass -- NOUN -- NN
swam -- NOUN -- NN
around -- ADP -- IN
the -- DET -- DT
bass -- NOUN -- NN
drum -- NOUN -- NN
on -- ADP -- IN
the -- DET -- DT
ocean -- NOUN -- NN
floor -- NOUN -- NN
