# **Lemmatization**

---

### **What is Lemmatization?**

Lemmatization is similar to stemming but it brings context to the words. 
The output after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will get a valid word that means the same thing.
Lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.
One major difference with stemming is that lemmatize takes a part of speech parameter, “pos” If not supplied, the default is “noun.”


### **Need of Lemmatization**

---


1. Used in comprehensive retrieval systems like search engines.
2. Used in compact indexing


### **Various approaches to Lemmatization**

---

<hr>

### **1.  Wordnet Lemmatizer (with POS tag)**

<hr/>

We add a tag with a particular word defining its type (verb, noun, adjective etc). 

In [5]:
from nltk.stem import WordNetLemmatizer 
nltk.download('wordnet')
  
lemmatizer = WordNetLemmatizer() 
  

print("better :", lemmatizer.lemmatize("better", pos ="a")) 

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
better : good


<hr>

### **2.  TextBlob**

<hr/>

TextBlob is a python library used for processing textual data. It provides a simple API to access its methods and perform basic NLP tasks.

In [9]:
from textblob import TextBlob, Word 
nltk.download('punkt')
  
sentence = 'We develop open-source solutions for developers which empowers them so that they can make better products for the world. We educate people about Artificial Intelligence, its scope and impact via resources and tutorials..'
  
s = TextBlob(sentence) 
lemmatized_sentence = " ".join([w.lemmatize() for w in s.words]) 
  
print(lemmatized_sentence) 

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
We develop open-source solution for developer which empowers them so that they can make better product for the world We educate people about Artificial Intelligence it scope and impact via resource and tutorial


<hr>

### **3. spaCy**

<hr/>

spaCy is an open-source python library that parses and “understands” large volumes of text. Separate models are available that cater to specific languages (English, French, German, etc.).

In [10]:
import spacy 
nlp = spacy.load('en_core_web_sm') 
  

doc = nlp(u'We develop open-source solutions for developers which empowers them so that they can make better products for the world. We educate people about Artificial Intelligence, its scope and impact via resources and tutorials.') 
  
# Create list of tokens 
tokens = [] 
for token in doc: 
    tokens.append(token) 
  
print(tokens)
  
lemmatized_sentence = " ".join([token.lemma_ for token in doc]) 
  
print(lemmatized_sentence) 

[We, develop, open, -, source, solutions, for, developers, which, empowers, them, so, that, they, can, make, better, products, for, the, world, ., We, educate, people, about, Artificial, Intelligence, ,, its, scope, and, impact, via, resources, and, tutorials, .]
-PRON- develop open - source solution for developer which empower -PRON- so that -PRON- can make well product for the world . -PRON- educate people about Artificial Intelligence , -PRON- scope and impact via resource and tutorial .


<hr>

### **Difference Between Stemming and Lemmatization**

<hr/>


|   S.NO    | Stemming      | Lemmatization |
| ----------- | ----------- | ----------- |
| 1     | Stemming is faster because it chops words without knowing the context of the word in given sentences.      |    Lemmatization is slower as compared to stemming but it knows the context of the word before proceeding.|
|   2   | It is a rule-based approach.  |   It is a dictionary-based approach.      | 
|  3|   Accuracy is less.  |  Accuracy is more as compared to Stemming. |
| 4 |  When we convert any word into root-form then stemming may create the non-existence meaning of a word. |    Lemmatization always gives the dictionary meaning word while converting into root-form.    |
