## Lemmatization
Lemmatization is the process of reducing a word to its base or dictionary form (lemma). Unlike stemming, lemmatization considers the context and converts words to their meaningful base forms.

In [1]:
word_list = [
    "running", "ran", "runs",
    "better", "good", "best",
    "flying", "flew", "flies",
    "studies", "studying", "studied",
    "wolves", "feet", "children"
]

### WordNetLemmatizer
WordNetLemmatizer uses the WordNet database to find lemmas. It requires specifying the part of speech (POS) for accurate results.

In [2]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\HashanEranga\AppData\Roaming\nltk_data...


True

In [3]:
from nltk.stem import WordNetLemmatizer

In [4]:
lemmatizer = WordNetLemmatizer()

#### Default Lemmatization (assumes noun)

In [5]:
print("Default (noun):")
for word in word_list:
    print(word, "->", lemmatizer.lemmatize(word))

Default (noun):
running -> running
ran -> ran
runs -> run
better -> better
good -> good
best -> best
flying -> flying
flew -> flew
flies -> fly
studies -> study
studying -> studying
studied -> studied
wolves -> wolf
feet -> foot
children -> child


#### Lemmatization with POS tags
- 'n' = noun
- 'v' = verb
- 'a' = adjective
- 'r' = adverb

In [6]:
print("As verbs:")
for word in word_list:
    print(word, "->", lemmatizer.lemmatize(word, pos='v'))

As verbs:
running -> run
ran -> run
runs -> run
better -> better
good -> good
best -> best
flying -> fly
flew -> fly
flies -> fly
studies -> study
studying -> study
studied -> study
wolves -> wolves
feet -> feet
children -> children


In [7]:
print("As adjectives:")
for word in word_list:
    print(word, "->", lemmatizer.lemmatize(word, pos='a'))

As adjectives:
running -> running
ran -> ran
runs -> runs
better -> good
good -> good
best -> best
flying -> flying
flew -> flew
flies -> flies
studies -> studies
studying -> studying
studied -> studied
wolves -> wolves
feet -> feet
children -> children


### Comparison: Stemming vs Lemmatization

In [8]:
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()

print(f"{'Word':<12} {'Stemmed':<12} {'Lemmatized (v)'}")
print("-" * 40)
for word in word_list:
    print(f"{word:<12} {stemmer.stem(word):<12} {lemmatizer.lemmatize(word, pos='v')}")

Word         Stemmed      Lemmatized (v)
----------------------------------------
running      run          run
ran          ran          run
runs         run          run
better       better       better
good         good         good
best         best         best
flying       fli          fly
flew         flew         fly
flies        fli          fly
studies      studi        study
studying     studi        study
studied      studi        study
wolves       wolv         wolves
feet         feet         feet
children     children     children
