### Code Demos for Multiple Lemmatizers

#### WordNet lemmatizer (NLTK) 

The SpaCy lemmatizer is designed using a combination of rule-based and statistical methods to find the base forms of words. It comes integrated with the SpaCy NLP library. This lemmatizer is predominantly effective for high performance and context-aware lemmatization because of its accuracy in reducing words to their base forms. Named entity recognition (NER), dependency parsing, advanced text analytics, and large-scale data processing are the NLP applications that especially leverage this lemmatizer. Below is it's code demo.

In [None]:
# Install spacy if you have not done it already.
!pip install spacy

In [None]:
# Install SpaCy English model.
!python -m spacy download en_core_web_sm

In [8]:
import spacy

# Load the SpaCy English model.
nlp = spacy.load('en_core_web_sm')

# Create Sample text.
text = """The SpaCy lemmatizer is designed using a combination of rule-based and statistical methods 
to find the base forms of words. It comes integrated with the SpaCy NLP library.
"""

# Process the text using SpaCyEnglish model.
doc = nlp(text)

# Get lemmatized forms.
lemmatized_tokens = [(token.text, token.lemma_) for token in doc]

# Print the output.
print("\n")
print("OUTPUT")
print("\n")
print("Original Text: ", text)
print("Lemmatized Tokens: ", lemmatized_tokens)



OUTPUT


Original Text:  The SpaCy lemmatizer is designed using a combination of rule-based and statistical methods 
to find the base forms of words. It comes integrated with the SpaCy NLP library.

Lemmatized Tokens:  [('The', 'the'), ('SpaCy', 'SpaCy'), ('lemmatizer', 'lemmatizer'), ('is', 'be'), ('designed', 'design'), ('using', 'use'), ('a', 'a'), ('combination', 'combination'), ('of', 'of'), ('rule', 'rule'), ('-', '-'), ('based', 'base'), ('and', 'and'), ('statistical', 'statistical'), ('methods', 'method'), ('\n', '\n'), ('to', 'to'), ('find', 'find'), ('the', 'the'), ('base', 'base'), ('forms', 'form'), ('of', 'of'), ('words', 'word'), ('.', '.'), ('It', 'it'), ('comes', 'come'), ('integrated', 'integrate'), ('with', 'with'), ('the', 'the'), ('SpaCy', 'SpaCy'), ('NLP', 'NLP'), ('library', 'library'), ('.', '.'), ('\n', '\n')]


- Note that all the base forms extracted with SpaCy are accurate and carry usable meanings. 

#### SpaCy lemmatizer

SpaCy lemmetizer is known for its high speed and efficiency. It is optimized to swiftly process large amounts of text data. NLTK also stands for a solid performance, but tends to be slower as compared to  spaCy when processing huge of text data. Below is it's code demo.

In [3]:
import spacy

# These English pipelines have an inbuilt rule-based lemmatizer.
nlp = spacy.load("en_core_web_sm")
lemmatizer = nlp.get_pipe("lemmatizer")
print(lemmatizer.mode)  # 'rule'

sample_doc = nlp(""" The SpaCy lemmatizer is designed using a combination of rule-based and statistical methods 
                     to find the base forms of words. It comes integrated with the SpaCy NLP library.""")
print([token.lemma_ for token in sample_doc])

rule
[' ', 'the', 'SpaCy', 'lemmatizer', 'be', 'design', 'use', 'a', 'combination', 'of', 'rule', '-', 'base', 'and', 'statistical', 'method', '\n                     ', 'to', 'find', 'the', 'base', 'form', 'of', 'word', '.', 'it', 'come', 'integrate', 'with', 'the', 'SpaCy', 'NLP', 'library', '.']


#### TextBlob lemmatizer

TextBlob performs faster when compared to nltk. It can be easily deployed with lesser computing resources. TextBlob is simpler to use and it supports many functions that are not available in nltk. Below is its code demo.

In [None]:
# First install TextBlob.
!pip install textblob

In [8]:
from textblob import TextBlob, Word

# Create sample text.
text = """ TextBlob performs faster when compared to nltk. It can be easily deployed with lesser computing resources. 
          TextBlob is simpler to use and it supports many functions that are not available in nltk. 

       """

# Create a TextBlob object.
blob = TextBlob(text)

# Tokenize the sample text.
words = blob.words

# Lemmatize.
lemmatized_words = [Word(word).lemmatize() for word in words]

# Print the output.
print("Original sample words:", words)
print("Lemmatized words:", lemmatized_words)

Original sample words: ['TextBlob', 'performs', 'faster', 'when', 'compared', 'to', 'nltk', 'It', 'can', 'be', 'easily', 'deployed', 'with', 'lesser', 'computing', 'resources', 'TextBlob', 'is', 'simpler', 'to', 'use', 'and', 'it', 'supports', 'many', 'functions', 'that', 'are', 'not', 'available', 'in', 'nltk']
Lemmatized words: ['TextBlob', 'performs', 'faster', 'when', 'compared', 'to', 'nltk', 'It', 'can', 'be', 'easily', 'deployed', 'with', 'lesser', 'computing', 'resource', 'TextBlob', 'is', 'simpler', 'to', 'use', 'and', 'it', 'support', 'many', 'function', 'that', 'are', 'not', 'available', 'in', 'nltk']


Code Snippet 5.3