Stemming and lemmatization are two text preprocessing techniques used to reduce words to their base or root form. The primary goal of these techniques is to reduce the number of unique words in a text document, making it easier to analyze and understand.

They are widely used for Search engines and tagging. Search engines use stemming for indexing the words. Therefore, instead of storing all forms of a word, a search engine may only store its roots. In this way, stemming reduces the size of the index and increases retrieval accuracy.

### Stemming

In [1]:
from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()
 
def stem_words(text):
    word_tokens = text.split()
    stems = [stemmer.stem(word) for word in word_tokens]
    return stems
 
text = 'text preprocessing techniques for natural language processing by Aysel Aydin'
stem_words(text)

['text',
 'preprocess',
 'techniqu',
 'for',
 'natur',
 'languag',
 'process',
 'by',
 'aysel',
 'aydin']

### Lemmatization

In [2]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

def lemmatize_word(text):
    word_tokens = text.split()
    lemmas = [lemmatizer.lemmatize(word, pos ='v') for word in word_tokens]
    return lemmas
 
text = 'text preprocessing techniques for natural language processing by Aysel Aydin'
lemmatize_word(text)

['text',
 'preprocessing',
 'techniques',
 'for',
 'natural',
 'language',
 'process',
 'by',
 'Aysel',
 'Aydin']