#What is Stemming in NLP?In Natural Language Processing (NLP), stemming is a text preprocessing technique that reduces words to their root or base form (stem) by removing affixes (prefixes and suffixes). This process helps normalize text and improve the performance of tasks like information retrieval and text analysis.


#Porter stemmer

In [None]:
words=["eating","eats","eaten","writing","writes",'written','history','programs','programming','finally','finalized']


In [None]:
from nltk.stem import PorterStemmer


In [None]:
porter_stemming = PorterStemmer()
for word in words:
  print(word+" --->",porter_stemming.stem(word))

eating ---> eat
eats ---> eat
eaten ---> eaten
writing ---> write
writes ---> write
written ---> written
history ---> histori
programs ---> program
programming ---> program
finally ---> final
finalized ---> final


Disadvantage of stemming

In [None]:
porter_stemming.stem(("congratulations"))

'congratul'

In [None]:
porter_stemming.stem(("sitiing"))

'siti'

#RegexpStemming

In [None]:
from nltk.stem import RegexpStemmer
reg_stemmer = RegexpStemmer('ing$|s$|e$|  able$|')

In [None]:
reg_stemmer.stem("eating")

'eat'

#Snowball stemmer

In [None]:
from nltk.stem import SnowballStemmer
snow_stemmer = SnowballStemmer('english')
snow_stemmer.stem("eating")

'eat'

Comparing Porter stemming and snowball stemming

In [None]:
porter_stemming.stem('fairly'),porter_stemming.stem('sportingly')

('fairli', 'sportingli')

In [None]:
snow_stemmer.stem('fairly'),snow_stemmer.stem('sportingly')

('fair', 'sport')

It wont work for every word

In [None]:
porter_stemming.stem('goes')

'goe'

In [None]:
snow_stemmer.stem("goes")

'goe'

#Lemmatization

Lemmatization, a core technique in Natural Language Processing (NLP), is the process of reducing a word to its base or dictionary form (lemma) by considering its context and intended meaning, unlike stemming which simply removes suffixes.

In [None]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [None]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [None]:
lemmatizer.lemmatize("history")

'history'

In [None]:
lemmatizer.lemmatize("going")

'going'

In [None]:
for i in words:
  print(i+"-->"+lemmatizer.lemmatize(i,pos='v'))

eating-->eat
eats-->eat
eaten-->eat
writing-->write
writes-->write
written-->write
history-->history
programs-->program
programming-->program
finally-->finally
finalized-->finalize


Stop words

In [None]:
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords


In [None]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
stopwords.words('english')

['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

In [None]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [None]:
paragraph="""I am trying to get a friend into gaming, I have been playing games since I was 5 years old while my friend has never touched a game unless it was a browser game and even that very little. Basically has zero game sense and does not know how to walk even, I am a bad teacher so which games are good for beginners so they get used to games as a whole?

I started my friend with Skyrim, is this a bad pick?

I think Minecraft is a good pick."""

In [None]:
sentence = nltk.sent_tokenize(paragraph)

In [None]:
type(sentence)

list

In [None]:
stemmer = PorterStemmer()

In [None]:
## Apply stopwords and filter then apply stemming
for i in range(len(sentence)):
  word = nltk.word_tokenize(sentence[i])
  words = [stemmer.stem(word) for word in word if word not in set(stopwords.words('english'))]
  sentence[i]=' '.join(words)

In [None]:
sentence

['i tri get friend game , i play game sinc i 5 year old friend never touch game unless browser game even littl .',
 'basic zero game sens know walk even , i bad teacher game good beginn get use game whole ?',
 'i start friend skyrim , bad pick ?',
 'i think minecraft good pick .']

In [None]:
## Apply stopwords and filter then apply lemmatization
for i in range(len(sentence)):
  word = nltk.word_tokenize(sentence[i])
  words = [lemmatizer.lemmatize(word,pos='v') for word in word if word not in set(stopwords.words('english'))]
  sentence[i]=' '.join(words)

In [None]:
sentence

['tri get friend game , play game sinc 5 year old friend never touch game unless browser game even littl .',
 'basic zero game sen know walk even , bad teacher game good beginn get use game whole ?',
 'start friend skyrim , bad pick ?',
 'think minecraft good pick .']