## Stemming

<u>**Stemming**</u> is the technique used to extract the base form of words by removing affixes from them.
<u>**Stemming**</u> algorithms reduce words to their root or stem form.
Stemming is important in natural language understanding (NLU) and natural language processing (NLP).


**Note:**
It is important to note that **stemming** is different from **Lemmatization**. **Lemmatization** is the process of reducing a word to its base form, but unlike stemming, it takes into account <u>the context of the word</u>, and it <u>produces a valid word</u>, unlike **stemming** which may produce a <u>non-word as the root form</u>.

In [1]:
words = ["eating", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalized"]

### PorterStemmer

PorterStemmer is a stemming technique

In [2]:
from nltk.stem import PorterStemmer

In [3]:
porter_stemmer = PorterStemmer()

In [4]:
for word in words:
    print(word + "-->" + porter_stemmer.stem(word) )

eating-->eat
eats-->eat
eaten-->eaten
writing-->write
writes-->write
programming-->program
programs-->program
history-->histori
finally-->final
finalized-->final


In [5]:
porter_stemmer.stem("congratulations")

'congratul'

In [6]:
porter_stemmer.stem("sitting")

'sit'

___

### RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression.

In [7]:
from nltk.stem import RegexpStemmer

In [8]:
# reg_exp_stemmer = RegexpStemmer()

TypeError: RegexpStemmer.__init__() missing 1 required positional argument: 'regexp'

In [9]:
reg_exp_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [10]:
reg_exp_stemmer.stem('eating')

'eat'

In [11]:
reg_exp_stemmer.stem('ingesting')

'ingest'

In [17]:
for word in words:
    print(word + " --> " + reg_exp_stemmer.stem(word) )

eating --> eat
eats --> eat
eaten --> eaten
writing --> writ
writes --> write
programming --> programm
programs --> program
history --> history
finally --> finally
finalized --> finalized


___

### Snowball Stemmer
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [12]:
from nltk.stem import SnowballStemmer

In [13]:
# snowball_stemmer = SnowballStemmer()

TypeError: SnowballStemmer.__init__() missing 1 required positional argument: 'language'

In [14]:
# snowball_stemmer = SnowballStemmer("English")

ValueError: The language 'English' is not supported.

In [15]:
snowball_stemmer = SnowballStemmer('english')

In [16]:
for word in words:
    print(word + " --> " + snowball_stemmer.stem(word) )

eating --> eat
eats --> eat
eaten --> eaten
writing --> write
writes --> write
programming --> program
programs --> program
history --> histori
finally --> final
finalized --> final


In [20]:
porter_stemmer.stem("fairly"), porter_stemmer.stem("sportingly")

('fairli', 'sportingli')

In [22]:
reg_exp_stemmer.stem("fairly"), reg_exp_stemmer.stem("sportingly")

('fairly', 'sportingly')

In [21]:
snowball_stemmer.stem("fairly"), snowball_stemmer.stem("sportingly")

('fair', 'sport')

___