**Text Input**

In [7]:
paragraph="""My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart."""

#Stemming:
Stemming is a text normalization technique used in Natural Language Processing (NLP) to reduce words to their base or root form. The goal of stemming is to group together different forms of the same word so they can be analyzed as a single item. This helps in reducing the complexity of the data without losing significant meaning.



## Stemming Techniques

**Common Stemming Algorithms**

* **Porter Stemmer:** One of the most widely used stemming algorithms, developed by Martin Porter in 1980. It uses a series of rules to iteratively strip suffixes from words.
* **Snowball Stemmer:**An improvement over the Porter Stemmer, also developed by Martin Porter. It supports multiple languages and is more efficient.
* **Lancaster Stemmer:** Another alternative, known for being more aggressive than the Porter and Snowball stemmers.

In [1]:
!pip install nltk



##Porter Stemmer

In [15]:
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
from nltk import punkt
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [16]:
sentences= nltk.sent_tokenize(paragraph)
stemmer= PorterStemmer()


In [4]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [17]:
#Stemming
for i in range (len(sentences)):
  words=nltk.word_tokenize(sentences[i])
  words=[stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
  sentences[i]=' '.join(words)


In [18]:
stemmed_paragraph = ' '.join(sentences)

print("Original Paragraph:")
print(paragraph)
print("\nStemmed Paragraph:")
print(stemmed_paragraph)

Original Paragraph:
My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart.

Stemmed Paragraph:
my dear young friend , dream , dream , dream . dream transform thought thought result action . you dream dream come true . you goal constant quest acquir knowledg . hard work persever essenti . use technolo

##Snowball Stemming

In [19]:
from nltk.stem import SnowballStemmer

# Initialize the stemmer
stemmer = SnowballStemmer("english")

# List of words to stem
#words = ["running", "runner", "ran", "easily", "fairly"]
words=nltk.word_tokenize(paragraph)

# Apply stemming
stemmed_words = [stemmer.stem(word) for word in words]

print(stemmed_words)

['my', 'dear', 'young', 'friend', ',', 'dream', ',', 'dream', ',', 'dream', '.', 'dream', 'transform', 'into', 'thought', 'and', 'thought', 'result', 'in', 'action', '.', 'you', 'have', 'to', 'dream', 'befor', 'your', 'dream', 'can', 'come', 'true', '.', 'you', 'should', 'have', 'a', 'goal', 'and', 'a', 'constant', 'quest', 'to', 'acquir', 'knowledg', '.', 'hard', 'work', 'and', 'persever', 'are', 'essenti', '.', 'use', 'technolog', 'for', 'the', 'benefit', 'of', 'humankind', 'and', 'not', 'for', 'it', 'destruct', '.', 'the', 'ignit', 'mind', 'of', 'the', 'youth', 'is', 'the', 'most', 'power', 'resourc', 'on', 'the', 'earth', ',', 'abov', 'the', 'earth', ',', 'and', 'under', 'the', 'earth', '.', 'when', 'the', 'student', 'is', 'readi', ',', 'the', 'teacher', 'will', 'appear', '.', 'aim', 'high', ',', 'dream', 'big', ',', 'and', 'work', 'hard', 'to', 'achiev', 'those', 'dream', '.', 'the', 'futur', 'belong', 'to', 'the', 'young', 'who', 'have', 'the', 'courag', 'to', 'dream', 'and', 'th

##Lancaster Stemmer method

In [20]:

from nltk.corpus import stopwords
from nltk.stem import LancasterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize

# Download necessary NLTK data files
nltk.download('punkt')
nltk.download('stopwords')

# Initialize the Lancaster Stemmer
stemmer = LancasterStemmer()
# Tokenize the paragraph into sentences
sentences = sent_tokenize(paragraph)

# Process each sentence
for i in range(len(sentences)):
    words = word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word.lower() not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

# Reconstruct the paragraph from processed sentences
stemmed_paragraph = ' '.join(sentences)

print("Original Paragraph:")
print(paragraph)
print("\nStemmed Paragraph:")
print(stemmed_paragraph.split(" "))

Original Paragraph:
My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart.

Stemmed Paragraph:
['dear', 'young', 'friend', ',', 'dream', ',', 'dream', ',', 'dream', '.', 'dream', 'transform', 'thought', 'thought', 'result', 'act', '.', 'dream', 'dream', 'com', 'tru', '.', 'goal', 'const', 'quest', 'a

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Regex-based Stemmer
A simpler approach where custom regular expressions are used to remove common suffixes from words. This method can be useful for specific applications but is less flexible and powerful than other stemmers.

In [21]:
import re
from nltk.tokenize import word_tokenize, sent_tokenize

def regex_stemmer(word):
    patterns = [
        (r'ing$', ''),
        (r'ed$', ''),
        (r'ly$', ''),
        (r'es$', ''),
        (r's$', ''),
    ]
    for pattern, replacement in patterns:
        word = re.sub(pattern, replacement, word)
    return word

# Tokenize the paragraph into sentences
sentences = sent_tokenize(paragraph)

# Process each sentence
for i in range(len(sentences)):
    words = word_tokenize(sentences[i])
    words = [regex_stemmer(word) for word in words]
    sentences[i] = ' '.join(words)

# Reconstruct the paragraph from processed sentences
stemmed_paragraph = ' '.join(sentences)

print("Original Paragraph:")
print(paragraph)
print("\nStemmed Paragraph:")
print(stemmed_paragraph)


Original Paragraph:
My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart.

Stemmed Paragraph:
My dear young friend , dream , dream , dream . Dream transform into thought and thought result in action . You have to dream before your dream can come true . You should have a goal and a constant quest to 

## Lovins Stemmer
The Lovins Stemmer, developed by Julie Beth Lovins in 1968, is one of the oldest stemming algorithms. It removes the longest suffix from a word and uses a large list of irregular forms.

Example:
The Lovins Stemmer is not available in popular NLP libraries like NLTK or SpaCy, but it can be implemented with custom logic if needed.

In [22]:
import re
from nltk.tokenize import word_tokenize, sent_tokenize

def lovins_stemmer(word):
    patterns = [
        (r'ations$', 'ate'),
        (r'ingly$', 'ingli'),
        (r'ed$', ''),
        (r'ing$', ''),
        (r'es$', ''),
        (r's$', ''),
    ]
    for pattern, replacement in patterns:
        if re.search(pattern, word):
            return re.sub(pattern, replacement, word)
    return word



# Tokenize the paragraph into sentences
sentences = sent_tokenize(paragraph)

# Process each sentence
for i in range(len(sentences)):
    words = word_tokenize(sentences[i])
    words = [lovins_stemmer(word) for word in words]
    sentences[i] = ' '.join(words)

# Reconstruct the paragraph from processed sentences
stemmed_paragraph = ' '.join(sentences)

print("Original Paragraph:")
print(paragraph)
print("\nStemmed Paragraph:")
print(stemmed_paragraph)


Original Paragraph:
My dear young friends, dream, dream, dream. Dreams transform into thoughts and thoughts result in action. You have to dream before your dreams can come true. You should have a goal and a constant quest to acquire knowledge. Hard work and perseverance are essential. Use technology for the benefit of humankind and not for its destruction. The ignited mind of the youth is the most powerful resource on the earth, above the earth, and under the earth. When the student is ready, the teacher will appear. Aim high, dream big, and work hard to achieve those dreams. The future belongs to the young who have the courage to dream and the determination to realize those dreams. Remember, small aim is a crime; have great aim and pursue it with all your heart.

Stemmed Paragraph:
My dear young friend , dream , dream , dream . Dream transform into thought and thought result in action . You have to dream before your dream can come true . You should have a goal and a constant quest to 