### Stemming vs. Lemmatization in NLP
Both stemming and lemmatization are text normalization techniques used in Natural Language Processing (NLP) to reduce words to their root forms. However, they differ in their approach and accuracy.

1. Stemming
Definition:

Stemming reduces a word to its base/root form by removing suffixes.
It does not necessarily produce a real word.

Running ---------------	run
Studies --------------	studi
Happily ------------	happi
Organization -------	organiz

2. Lemmatization
Definition:

Lemmatization reduces words to their dictionary base form (lemma) based on morphology.
It ensures that the output is a valid meaningful word.

Running ---------	run
Studies --------	study
Happily --------	happy
Better ---------	good

In [4]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [6]:
words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word, "|", stemmer.stem(word))

eating | eat
eats | eat
eat | eat
ate | ate
adjustable | adjust
rafting | raft
ability | abil
meeting | meet


### Lemmatization in Spacy

In [9]:
import spacy

In [13]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Mando talked for 3 hours although talking isn't his thing")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")

for token in doc:
    print(token, "|", token.lemma_)

eating | eat
eats | eat
eat | eat
ate | eat
adjustable | adjustable
rafting | raft
ability | ability
meeting | meet
better | well


### Customizing lemmatizer

In [16]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [18]:
ar = nlp.get_pipe('attribute_ruler')

ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})

doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, "|", token.lemma_)

Bro | Brother
, | ,
you | you
wanna | wanna
go | go
? | ?
Brah | Brother
, | ,
do | do
n't | not
say | say
no | no
! | !
I | I
am | be
exhausted | exhaust


In [20]:
doc[6]

Brah

In [22]:
doc[6].lemma_

'Brother'