#Text Input

In [14]:
paragraph="""My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart."""

# Lemmatization
It is a fundamental text pre-processing technique widely applied in natural language processing (NLP) and machine learning. Serving a purpose akin to stemming, lemmatization seeks to distill words to their foundational forms. In this linguistic refinement, the resultant base word is referred to as a “lemma.”

##Lemmatization Techniques:

### Using NLTK's WordNetLemmatizer:

NLTK (Natural Language Toolkit) provides a simple interface to WordNet, a large lexical database of English.

In [19]:
!pip install nltk
import nltk



In [20]:

nltk.download('stopwords')
nltk.download('popular',quiet=True)
nltk.download('punkt')
nltk.download('wordnet')


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [21]:
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

Required Tokenization

In [22]:
sentences= nltk.sent_tokenize(paragraph)

In [23]:
# Initialize the WordNet Lemmatizer
lemmatizer = WordNetLemmatizer()


In [25]:
#Lemmatization
# Process each sentence
for i in range(len(sentences)):
    words = word_tokenize(sentences[i])
    words = [lemmatizer.lemmatize(word) for word in words if word.lower() not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

# Reconstruct the paragraph from processed sentences
lemmatized_paragraph = ' '.join(sentences)

# Print the lemmatized paragraph
print(lemmatized_paragraph)


dear young friend , dream , dream , dream . Dreams transform thought thought result action . dream dream come true . goal constant quest acquire knowledge . Hard work perseverance essential . Use technology benefit humankind destruction . ignited mind youth powerful resource earth , earth , earth . student ready , teacher appear . Aim high , dream big , work hard achieve dream . future belongs young courage dream determination realize dream . Remember , small aim crime ; great aim pursue heart .


###Using spaCy:

spaCy is an open-source software library for advanced natural language processing in Python. It includes POS tagging and lemmatization.


In [26]:
#download important NLTK DataFiles
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [30]:
import nltk
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# Download necessary NLTK data files
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

# Function to map NLTK POS tags to WordNet POS tags
def get_wordnet_pos(word):
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ, "N": wordnet.NOUN, "V": wordnet.VERB, "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)

lemmatizer = WordNetLemmatizer()
# Tokenize the sentence
words = word_tokenize(paragraph)

# Lemmatize each word with POS tag consideration
lemmatized_words = [lemmatizer.lemmatize(word, get_wordnet_pos(word)) for word in words]

print("Original paragraph:")
print(paragraph)
print("\nLemmatized Sentence:")
print(' '.join(lemmatized_words))



Original paragraph:
My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart.

Lemmatized Sentence:
My dear young friend , dream , dream , dream . Dreams transform into thought and thought result in action . You have to dream before your dream can come true . You should have a goal and a constant quest 

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


### Using TextBlob:

TextBlob is a simpler library built on top of NLTK and provides easy-to-use APIs for common NLP tasks, including lemmatization.

In [31]:
from textblob import TextBlob

# Create a TextBlob object
blob = TextBlob(paragraph)

# Lemmatize each word
lemmatized_words = [word.lemmatize() for word in blob.words]

print("Original paragraph:")
print(paragraph)
print("\nLemmatized Sentence:")
print(' '.join(lemmatized_words))


Original paragraph:
My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart.

Lemmatized Sentence:
My dear young friend dream dream dream Dreams transform into thought and thought result in action You have to dream before your dream can come true You should have a goal and a constant quest to acquire k

### Machine learning (ML) based lemmatization :
It involves using ML models to predict the base or root form of a word based on its context. While traditional lemmatizers like the one provided by NLTK use rule-based approaches, ML-based lemmatizers leverage the power of large datasets and advanced algorithms to learn and predict the lemma of a word more accurately.

In [32]:
import spacy

# Load the spaCy model
nlp = spacy.load('en_core_web_sm')
# Process the paragraph with spaCy
doc = nlp(paragraph)

# Lemmatize each word
lemmatized_words = [token.lemma_ for token in doc]

print("Original Paragraph:")
print(paragraph)
print("\nLemmatized Paragraph:")
print(' '.join(lemmatized_words))


Original Paragraph:
My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart.

Lemmatized Paragraph:
my dear young friend , dream , dream , dream . dream transform into thought and thought result in action . you have to dream before your dream can come true . you should have a goal and a constant quest 