## Stemming Implementation

Stemming is a text preprocessing technique that reduces words to their root or base form, also known as the stem. <br>
It involves removing affixes (prefixes or suffixes) from words to simplify them for analysis and processing.

In [11]:
words  = ["eating", "eats", "eaten", "writing", "writes", "written", "writter", "running", "runs", "runner", "history", "historian", "historical"]

### Porter Stemmer

In [7]:
from nltk.stem import PorterStemmer

In [9]:
portstem = PorterStemmer()

In [13]:
for word in words:
    print(word + " ----> " + portstem.stem(word))

eating ----> eat
eats ----> eat
eaten ----> eaten
writing ----> write
writes ----> write
written ----> written
writter ----> writter
running ----> run
runs ----> run
runner ----> runner
history ----> histori
historian ----> historian
historical ----> histor


For certain words, it literally changes the meaning of the word. This is one of the major disadvantages of stemming.

### Regexp Stemmer

The Regexp Stemmer, or Regular Expression Stemmer, is a stemming algorithm that utilizes regular expressions to identify and remove suffixes from words. <br>
It allows users to define custom rules for stemming by specifying patterns to match and remove.

In [19]:
from nltk.stem import RegexpStemmer

In [23]:
regexpstem = RegexpStemmer('ing$|s$|able$|e$', min = 4)

In [31]:
regexpstem.stem('eating')

'eat'

In [33]:
for word in words:
    print(word + " ----> " + regexpstem.stem(word))

eating ----> eat
eats ----> eat
eaten ----> eaten
writing ----> writ
writes ----> write
written ----> written
writter ----> writter
running ----> runn
runs ----> run
runner ----> runner
history ----> history
historian ----> historian
historical ----> historical


### Snowball Stemmer

The Snowball Stemmer, compared to the Porter Stemmer, is multi-lingual as it can handle non-English words. <br>
The Snowball stemmer is way more aggressive than Porter Stemmer and is also referred to as Porter2 Stemmer.

In [37]:
from nltk.stem import SnowballStemmer

In [41]:
snowballstem = SnowballStemmer('english')

In [43]:
portstem.stem('fairly'), portstem.stem('sportingly')

('fairli', 'sportingli')

In [45]:
snowballstem.stem('fairly'), snowballstem.stem('sportingly')

('fair', 'sport')

## Lemmatization Implementation

Lemmatization in Natural Language Processing (NLP) is a process that reduces words to their base or dictionary form, known as the lemma, while considering the word's context and meaning. Unlike stemming, which is a more rule-based approach, lemmatization analyzes word context to ensure that the correct dictionary form is selected. 

### WordNet Lemmatizer

In [54]:
from nltk.stem import WordNetLemmatizer

In [56]:
wordnetlem = WordNetLemmatizer()

In [74]:
'''
POS- 
Noun-n
verb-v
adjective-a
adverb-r
'''

wordnetlem.lemmatize("going", pos = 'v')

'go'

In [80]:
for word in words:
    print(word + " ----> " + wordnetlem.lemmatize(word, pos = 'v'))

eating ----> eat
eats ----> eat
eaten ----> eat
writing ----> write
writes ----> write
written ----> write
writter ----> writter
running ----> run
runs ----> run
runner ----> runner
history ----> history
historian ----> historian
historical ----> historical
