<a href="https://colab.research.google.com/github/SURESHBEEKHANI/Natural-language-processing-/blob/main/Stemming_And_Lemmatization_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **Stemming and lemmatization**

Stemming and lemmatization are two common techniques used in natural language processing (NLP) and text mining to preprocess text data. Their primary goal is to reduce words to their base or root form, which helps in standardizing vocabulary and improving the efficiency of text-based operations like indexing, searching, and analysis.

Stemming:
Definition: Stemming is the process of reducing words to their word stem or root form. It involves chopping off the ends of words to remove affixes such as prefixes and suffixes. The resulting stem may not be a valid word in the language.

Purpose: Stemming is typically used to normalize words for the purpose of indexing and retrieval. It aims to reduce variants of words to a common base form, even if the stem itself is not semantically correct.

Example:

Original: walking, walked, walks
Stem: walk
Lemmatization:
Definition: Lemmatization, on the other hand, also reduces words to their base or root form, but it ensures that the root belongs to the language. It uses vocabulary and morphological analysis of words to accurately derive the lemma, which is the canonical form of a set of words.

Purpose: Lemmatization is more sophisticated than stemming as it takes into account the context and meaning of words. It aims to transform words to their dictionary form, which is linguistically correct and meaningful.

Example:

Original: went, going, gone
Lemma: go
Key Differences:
Output: Stemming may produce words that are not actual words, whereas lemmatization always results in actual words.
Accuracy: Lemmatization is more accurate but computationally expensive compared to stemming.
Use Cases: Stemming is often used in information retrieval systems and search engines where speed is crucial, while lemmatization is preferred in applications requiring precision and understanding of the text's context.
In practice, the choice between stemming and lemmatization depends on the specific requirements of the NLP task and the trade-off between speed and accura

In [None]:
import nltk
nltk.download('punkt')  # This downloads the necessary resources for tokenization
nltk.download('averaged_perceptron_tagger')  # For part-of-speech tagging
nltk.download('wordnet')  # For lemmatization
nltk.download('stopwords')  # For stopwords


In [None]:
import nltk
from nltk.stem import PorterStemmer

nltk.download('punkt')

# Initialize the Porter Stemmer
stemmer = PorterStemmer()

# Example words to stem
words = ["walking", "walked", "walks"]

# Stem each word and print the results
for word in words:
    stemmed_word = stemmer.stem(word)
    print(f"Original: {word}, Stemmed: {stemmed_word}")


In [None]:
import nltk
from nltk.stem import LancasterStemmer

nltk.download('punkt')

# Initialize the Lancaster Stemmer
stemmer = LancasterStemmer()

# Example words to stem
words = ["walking", "walked", "walks"]

# Stem each word and print the results
for word in words:
    stemmed_word = stemmer.stem(word)
    print(f"Original: {word}, Stemmed: {stemmed_word}")


In [None]:
from nltk.stem import StemmerI
import re

# Example usage
stemmer = RegexpStemmer(r'ing$|ed$|es$|s$', min_length=2)

words = ["running", "played", "walks"]

for word in words:
    stemmed_word = stemmer.stem(word)
    print(f"Original: {word}, Stemmed: {stemmed_word}")


In [None]:
import nltk
from nltk.stem import SnowballStemmer

# Initialize the Snowball Stemmer for English
stemmer = SnowballStemmer("english")

# Example words to stem
words = ["running", "jumps", "easily", "fairly"]

# Stem each word and print the results
for word in words:
    stemmed_word = stemmer.stem(word)
    print(f"Original: {word}, Stemmed: {stemmed_word}")


# Lemmatization

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')  # Optional: for tokenizing sentences


In [None]:
from nltk.stem import WordNetLemmatizer

# Initialize the WordNet Lemmatizer
lemmatizer = WordNetLemmatizer()


In [None]:
words = ["running", "jumps", "easily", "fairly", "better"]

for word in words:
    lemma = lemmatizer.lemmatize(word ,pos="v")
    print(f"Original: {word}, Lemma: {lemma}")
