# NLP Class Activity 3: Stemming vs. Lemmatization

### **Objective**
This notebook demonstrates and compares two fundamental text normalization techniques in Natural Language Processing (NLP): **Stemming** and **Lemmatization**.

The goal of both is to reduce a word to its root form, but they do so in very different ways.
* **Stemming** is a crude, rule-based process that chops off the end of words. It's fast but sometimes the result isn't a real word. (Think of it as a quick, messy haircut ).
* **Lemmatization** is a more sophisticated, dictionary-based process that returns the actual base form of a word, known as the lemma. (Think of this as a smart dictionary lookup ).


### **Step 1: Setup and Initialization**
* The code begins by importing the necessary tools from the **NLTK** (Natural Language Toolkit) library:
    * `PorterStemmer`:  algorithm for stemming.
    * `WordNetLemmatizer`: tool that uses the WordNet dictionary to find word lemmas.

### **Step 2: The Main Loop and Processing**
* The code iterates through a predefined `list` of words.
* For each `word` in the list, it performs two actions:
    1.  **Stemming**: It calls `stemmer.stem(word)` to get the stemmed form.
    2.  **Lemmatization**: It calls `lemmatizer.lemmatize(word)`.

### **Step 3: Displaying Results**
* After processing each word, the original word, its stemmed version, and its lemmatized version are printed in a neatly formatted table.

In [None]:
import nltk
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer

In [None]:
nltk.download('punkt')
nltk.download('wordnet')

In [None]:
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

In [None]:
words = ['running', 'painting', 'walking', 'dressing', 'likely', 'children', 'whom', 'good', 'ate', 'fishing']

In [None]:
print(f"{'Original Word':<15} | {'Stemmed Word':<15} | {'Lemmatized Word (Lemma)':<20}")
print("-" * 60)

for word in words:
    stemmed_word = stemmer.stem(word)

    if word.endswith('ing') or word == 'ate':
        lemmatized_word = lemmatizer.lemmatize(word, pos='v')
    else:
        lemmatized_word = lemmatizer.lemmatize(word)
    print(f"{word:<15} | {stemmed_word:<15} | {lemmatized_word:<20}")