### Stemming
- Stemming in Natural Language Processing (NLP) is the process of reducing a word to its root form or base form (known as the stem). The goal of stemming is to remove suffixes (and sometimes prefixes) from a word to simplify its representation, helping to treat related words as the same during text processing.

#### For example:
"running", "runner", and "ran" → run
"studies", "studying", and "studied" → studi

### Why Stemming is Important
- Normalization: Reduces words to a common form so that variations of the same word are treated identically.
- Efficiency: Reduces vocabulary size, making text processing faster and resource-efficient.
- Search & Retrieval: Improves accuracy in search engines by grouping word variations (e.g., "run" and "running").

### Algorithms for Stemming
- Porter Stemmer:
The most common stemming algorithm.
Applies a series of rules to remove common suffixes like -ing, -ed, -s.
Example: "running" → "run".

- Lancaster Stemmer:
A more aggressive stemmer than Porter Stemmer.
Often results in very short stems.
Example: "running" → "run", "playing" → "play".

- Snowball Stemmer:
An improved version of the Porter Stemmer with better efficiency and flexibility for multiple languages.

### Stemming vs Lemmatization
- Stemming:
Cuts off prefixes or suffixes using heuristic rules.
May produce non-dictionary words (e.g., "studies" → "studi").

- Lemmatization:
Maps words to their dictionary form (lemma) using vocabulary and grammar.
Produces valid words (e.g., "studies" → "study").
In summary: Stemming simplifies words to their root form by truncating suffixes. While it is fast and computationally inexpensive, it can sometimes result in inaccurate or "chopped" stems.

In [15]:
words = [
  "run", "running", "runs", "runner",
  "write", "writing", "writes", "wrote", "written",
  "jump", "jumping", "jumps", "jumped",
  "swim", "swimming", "swims", "swam", "swum",
  "go", "going", "goes", "went", "gone",
  "read", "reading", "reads",
  "love", "loving", "loves", "loved",
  "walk", "walking", "walks", "walked",
  "try", "trying", "tries", "tried",
  "study", "studying", "studies", "studied",
  "fly", "flying", "flies", "flew", "flown",
  "buy", "buying", "buys", "bought",
  "speak", "speaking", "speaks", "spoke", "spoken",
  "see", "seeing", "sees", "saw", "seen",
]


### Porter stemmer

In [16]:
from nltk.stem import PorterStemmer
stemming = PorterStemmer()

In [22]:
changed = {word: stemming.stem(word) for word in words if word != stemming.stem(word)}
unchanged = {word: stemming.stem(word) for word in words if word == stemming.stem(word)}

# Display the results
print("List of changed words:")
for word, stem in changed.items():
    print(f"{word} -> {stem}")

print("\nList of unchanged words:")
for word, stem in unchanged.items():
    print(f"{word} -> {stem}")

List of changed words:
running -> run
runs -> run
writing -> write
writes -> write
jumping -> jump
jumps -> jump
jumped -> jump
swimming -> swim
swims -> swim
going -> go
goes -> goe
reading -> read
reads -> read
loving -> love
loves -> love
loved -> love
walking -> walk
walks -> walk
walked -> walk
try -> tri
trying -> tri
tries -> tri
tried -> tri
study -> studi
studying -> studi
studies -> studi
studied -> studi
fly -> fli
flying -> fli
flies -> fli
buying -> buy
buys -> buy
speaking -> speak
speaks -> speak
seeing -> see
sees -> see

List of unchanged words:
run -> run
runner -> runner
write -> write
wrote -> wrote
written -> written
jump -> jump
swim -> swim
swam -> swam
swum -> swum
go -> go
went -> went
gone -> gone
read -> read
love -> love
walk -> walk
flew -> flew
flown -> flown
buy -> buy
bought -> bought
speak -> speak
spoke -> spoke
spoken -> spoken
see -> see
saw -> saw
seen -> seen


### Snowball Steamer

In [27]:
from nltk.stem import SnowballStemmer
snow_stemmer = SnowballStemmer('english')

In [28]:
changed = {word: snow_stemmer.stem(word) for word in words if word != snow_stemmer.stem(word)}
unchanged = {word: snow_stemmer.stem(word) for word in words if word == snow_stemmer.stem(word)}

# Display the results
print("List of changed words:")
for word, stem in changed.items():
    print(f"{word} -> {stem}")

print("\nList of unchanged words:")
for word, stem in unchanged.items():
    print(f"{word} -> {stem}")

List of changed words:
running -> run
runs -> run
writing -> write
writes -> write
jumping -> jump
jumps -> jump
jumped -> jump
swimming -> swim
swims -> swim
going -> go
goes -> goe
reading -> read
reads -> read
loving -> love
loves -> love
loved -> love
walking -> walk
walks -> walk
walked -> walk
try -> tri
trying -> tri
tries -> tri
tried -> tri
study -> studi
studying -> studi
studies -> studi
studied -> studi
fly -> fli
flying -> fli
flies -> fli
buying -> buy
buys -> buy
speaking -> speak
speaks -> speak
seeing -> see
sees -> see

List of unchanged words:
run -> run
runner -> runner
write -> write
wrote -> wrote
written -> written
jump -> jump
swim -> swim
swam -> swam
swum -> swum
go -> go
went -> went
gone -> gone
read -> read
love -> love
walk -> walk
flew -> flew
flown -> flown
buy -> buy
bought -> bought
speak -> speak
spoke -> spoke
spoken -> spoken
see -> see
saw -> saw
seen -> seen


In [29]:
stemming.stem('fairly'), stemming.stem('sportingly')

('fairli', 'sportingli')

In [30]:
snow_stemmer.stem('fairly'), snow_stemmer.stem('sportingly')

('fair', 'sport')