# Stemming and lemmatization
- Stemming and lemmatization are text normalization techniques used in natural language processing (NLP) to reduce words to their base or root form.
- Here’s a brief overview of both, along with example implementations using popular NLP libraries.

## Stemming
- Stemming is the process of reducing a word to its base or root form by removing suffixes. The result may not be an actual word but a truncated version of the original.

In [1]:
from nltk.stem import PorterStemmer

# Create a PorterStemmer object
stemmer = PorterStemmer()

# Example words
words = ["running", "ran", "runs", "easily", "fairly"]

# Stem each word
stems = [stemmer.stem(word) for word in words]

print(stems)  # Output: ['run', 'ran', 'run', 'easili', 'fairli']


['run', 'ran', 'run', 'easili', 'fairli']


## Lemmatization
- Lemmatization reduces a word to its base or dictionary form, known as the lemma.
- This process considers the context and the morphological analysis of the words, resulting in valid words.

In [2]:
from nltk.stem import WordNetLemmatizer
import nltk

# Download WordNet data
nltk.download('wordnet')
nltk.download('omw-1.4')

# Create a WordNetLemmatizer object
lemmatizer = WordNetLemmatizer()

# Example words
words = ["running", "ran", "runs", "better", "fairly"]

# Lemmatize each word
lemmas = [lemmatizer.lemmatize(word, pos='v') for word in words]  # 'v' indicates verb

print(lemmas)  # Output: ['run', 'run', 'run', 'better', 'fairly']


[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...


['run', 'run', 'run', 'better', 'fairly']


Differences between Stemming and Lemmatization

#### Accuracy:

- Stemming can produce non-existent words by chopping off parts of the word. For example, “better” might be stemmed to “bett”.

- Lemmatization results in actual words by considering the morphological analysis of the word.

#### Context:

- Stemming does not consider the context and just removes common prefixes and suffixes.
  
- Lemmatization takes into account the context and part of speech, providing more accurate results.

#### Complexity:

- Stemming is a simpler and faster process.

- Lemmatization is more complex as it requires a dictionary and understanding of the word's context.