## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma.
- Lemmatization is used in Q&A, Chatbots, & Text Summarization
- Lemmatization has a dictonary for all of the root words, therefore it is better than stemming. 

In [5]:
from nltk.stem import WordNetLemmatizer
# nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()

'''
Types of POS in Lemmatization:  
Noun - n
verb - v
adjective - a
adverb - r
'''

lemmatizer.lemmatize("going", pos='v')

'go'

In [6]:
words = ["eating", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalized"]

In [7]:
for word in words:
    print(word + " --> " + lemmatizer.lemmatize(word, pos='v'))

eating --> eat
eats --> eat
eaten --> eat
writing --> write
writes --> write
programming --> program
programs --> program
history --> history
finally --> finally
finalized --> finalize


In [8]:
lemmatizer.lemmatize("goes", pos='v')

'go'

In [9]:
# words whose stem were not properly found using stemming. Hence, using lemmatization to find the root word
lemmatizer.lemmatize("fairly", pos='v'), lemmatizer.lemmatize("sportingly")

('fairly', 'sportingly')

In [None]:
'''When working with a large set of text, manually setting the pos parameter for each word can be time-consuming and impractical. Here are a few strategies to help you efficiently handle this: \n
Use a POS tagger: Before lemmatizing, use a POS tagger (such as NLTK's pos_tag() function) to automatically assign a POS tag to each word in your text. This way, you can programmatically determine the pos parameter for each word.

Use a library with automated POS detection: Some libraries, like spaCy, automatically detect the POS tag for each word when you process the text. You can then use this information to set the pos parameter.

Use a default POS: If you don't have a strong requirement for precise POS tagging, you can set a default pos parameter (e.g., 'N' for noun) and apply it to all words. This might not be ideal, but it can save time.

Pre-process and store POS tags: If you're working with a large, static dataset, you can pre-process the text, store the POS tags alongside the words, and then use this information when lemmatizing.

By implementing one of these strategies, you can efficiently handle setting the pos parameter for each word in your large text dataset.'''

"When working with a large set of text, manually setting the pos parameter for each word can be time-consuming and impractical. Here are a few strategies to help you efficiently handle this: \n\nUse a POS tagger: Before lemmatizing, use a POS tagger (such as NLTK's pos_tag() function) to automatically assign a POS tag to each word in your text. This way, you can programmatically determine the pos parameter for each word.\n\nUse a library with automated POS detection: Some libraries, like spaCy, automatically detect the POS tag for each word when you process the text. You can then use this information to set the pos parameter.\n\nUse a default POS: If you don't have a strong requirement for precise POS tagging, you can set a default pos parameter (e.g., 'N' for noun) and apply it to all words. This might not be ideal, but it can save time.\n\nPre-process and store POS tags: If you're working with a large, static dataset, you can pre-process the text, store the POS tags alongside the wor

In [20]:
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.stem import WordNetLemmatizer

# Sample sentence
sentence = "The quick brown fox jumps over the lazy dog."

# Tokenize the sentence
tokens = word_tokenize(sentence)

# Perform POS tagging
tagged_tokens = pos_tag(tokens)

# Initialize the WordNet Lemmatizer
lemmatizer = WordNetLemmatizer()

# Lemmatize tagged tokens
lemmatized_tokens = []
for token, pos_tag in tagged_tokens:
    # Map POS tags to WordNet POS tags
    wn_pos_tag = nltk.corpus.wordnet.VERB if pos_tag.startswith('V') else nltk.corpus.wordnet.NOUN
    lemmatized_token = lemmatizer.lemmatize(token, pos=wn_pos_tag)
    lemmatized_tokens.append(lemmatized_token)

# Print original tokens, POS tags, and lemmatized tokens
print("Original Tokens:", tokens)
print("POS Tags:", tagged_tokens)
print("Lemmatized Tokens:", lemmatized_tokens)

Original Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.']
POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]
Lemmatized Tokens: ['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog', '.']
