### Before moving to the application part, let's try to understand what is 'Stemming' and why do we need it ? 🤔🤔

So, one important thing in NLP is that all the input data is in the form of text and we cannot just pass this text data to our model because the model will not be able to understand what is that text data. So, what we do is that we try to pre-process that data and we also try to convert it into some numerical representation which we basically call it as vectors. So, 'Stemming' is a process wherein we are trying to reduce the infected word to the word stem.

Secondly, we need 'Stemming' because in NLP, most of the problems that we see like Sentiment Analysis, Determining Spam Classifier, Understanding Movie or Restaurant Reviews and provide some ratings based on that, are determined only by the word stem words. So, just by finding out a particular word stem, we would be able to determine if the word is positive or negative.

So, you've understood the main cause behind Stemming and why we use it so often. Now, let's dig into the application part of this tutorial. Yay...😄

In [1]:
# Importing the libraries
import nltk

# Helps to implement Stemming
from nltk.stem import PorterStemmer

# Used for the removal of common words such as 'of', 'the', 'a',etc which are not relevant to our model
from nltk.corpus import stopwords

In [3]:
paragraph = """Real Madrid simply show no sign of letting up. The LaLiga table-toppers saw off Alavés at 
               the Di Stéfano to make it eight wins on the bounce and retain the four-point buffer at the 
               summit with three games to go. The Madrid goals came from Karim Benzema, who converted 
               from the spot, whilst Marco Asensio was also on the mark for the hosts, who recorded a fifth 
               successive shutout. Ferland Mendy started at left wing-back, with Lucas Vázquez occupying 
               the right wing-back berth and inside the first minute, the pair were involved in the madridistas' 
               first forward foray, which culminated in Luka Modric sending his effort wide of the target. 
               The Alavés response wasn't long in coming and Joselu's headed effort struck the crossbar, 
               whilst Raphaël Varane cleared a Lucas Pérez's follow-up off the line. It looked as if we were 
               in store for a high-tempo affair and just after the 10-minute mark, Mendy once again showed what a 
               threat he is down the left. Ximo Navarro upended the Frenchman in the area and Benzema stepped 
               up to make it 1-0. With 12 minutes gone, Toni Kroos’ did his best to find the top corner, before a 
               fierce Mendy cross nearly forced Camarasa to turn into his own net on 17’. The Blanquiazules refused
               to roll over though, with Oliver Burke proving a constant nuisance for the defence and testing
               Thibaut Courtois, despite the final chances before the break falling to Rodrygo and Benzema. 
               After the restart, referee Gil Manzano retired injured and by the time the 50th minute came around, 
               Madrid had added to their advantage. Benzema and Asensio raced through on goal, up against Roberto, 
               and the Balearic Island-born forward stroked home with ease, though his goal was originally ruled 
               out for offside before being correctly awarded by VAR."""

In [13]:
# Converting the whole paragraph into sentences
sentences = nltk.sent_tokenize(paragraph)
sentences

['Real Madrid simply show no sign of letting up.',
 'The LaLiga table-toppers saw off Alavés at \n               the Di Stéfano to make it eight wins on the bounce and retain the four-point buffer at the \n               summit with three games to go.',
 'The Madrid goals came from Karim Benzema, who converted \n               from the spot, whilst Marco Asensio was also on the mark for the hosts, who recorded a fifth \n               successive shutout.',
 "Ferland Mendy started at left wing-back, with Lucas Vázquez occupying \n               the right wing-back berth and inside the first minute, the pair were involved in the madridistas' \n               first forward foray, which culminated in Luka Modric sending his effort wide of the target.",
 "The Alavés response wasn't long in coming and Joselu's headed effort struck the crossbar, \n               whilst Raphaël Varane cleared a Lucas Pérez's follow-up off the line.",
 'It looked as if we were \n               in store for a hi

In [7]:
# list of words that we don't want. Also, we can change this language parameter to whichever language you want
stopwords.words('english')

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [8]:
# same stopwords in spanish!
stopwords.words('spanish')

['de',
 'la',
 'que',
 'el',
 'en',
 'y',
 'a',
 'los',
 'del',
 'se',
 'las',
 'por',
 'un',
 'para',
 'con',
 'no',
 'una',
 'su',
 'al',
 'lo',
 'como',
 'más',
 'pero',
 'sus',
 'le',
 'ya',
 'o',
 'este',
 'sí',
 'porque',
 'esta',
 'entre',
 'cuando',
 'muy',
 'sin',
 'sobre',
 'también',
 'me',
 'hasta',
 'hay',
 'donde',
 'quien',
 'desde',
 'todo',
 'nos',
 'durante',
 'todos',
 'uno',
 'les',
 'ni',
 'contra',
 'otros',
 'ese',
 'eso',
 'ante',
 'ellos',
 'e',
 'esto',
 'mí',
 'antes',
 'algunos',
 'qué',
 'unos',
 'yo',
 'otro',
 'otras',
 'otra',
 'él',
 'tanto',
 'esa',
 'estos',
 'mucho',
 'quienes',
 'nada',
 'muchos',
 'cual',
 'poco',
 'ella',
 'estar',
 'estas',
 'algunas',
 'algo',
 'nosotros',
 'mi',
 'mis',
 'tú',
 'te',
 'ti',
 'tu',
 'tus',
 'ellas',
 'nosotras',
 'vosotros',
 'vosotras',
 'os',
 'mío',
 'mía',
 'míos',
 'mías',
 'tuyo',
 'tuya',
 'tuyos',
 'tuyas',
 'suyo',
 'suya',
 'suyos',
 'suyas',
 'nuestro',
 'nuestra',
 'nuestros',
 'nuestras',
 'vuestro'

So now what we're trying to do is that first, we need to remove the stopwords from each and every sentence that we have in our data. And then, after removing the stopwords we would then apply stemming to each word present in each of the sentences in the paragraph.So, let's see how we can do that.

In [9]:
stemmer = PorterStemmer()

# Stemming
# If the word belongs to the stopwords then we're gonna remove it. Otherwise, we'll apply stemming to that
# And then finally, we're gonna join the combination of words that have been stemmed
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [11]:
# Resultant list of words
words

['benzema',
 'asensio',
 'race',
 'goal',
 ',',
 'roberto',
 ',',
 'balear',
 'island-born',
 'forward',
 'stroke',
 'home',
 'eas',
 ',',
 'though',
 'goal',
 'origin',
 'rule',
 'offsid',
 'correctli',
 'award',
 'var',
 '.']

In [12]:
# Resultant sentences
sentences[i]

'benzema asensio race goal , roberto , balear island-born forward stroke home eas , though goal origin rule offsid correctli award var .'

##### The main problem regarding 'Stemming' is that it produces intermediate representation of the word i.e some of the words in the corpus doesn't have any meaning to the human language. So for this reason, we use Lemmatization in many cases and what that does is that it not only stems the sentences but also produces meaningful words in it's corpus.

### Now, it's your turn to try this out by yourself. Till then, PEACE...✌️ 