# Stemming

Stemming is performed to reduce the text to their base form. In this example we will deal with three different stemming algorithms - the __Porter__, __Lancaster__ and __Snowball__ stemmer. 

The __Porter__ stemmer is the least in terms of strictness and __Lancaster__ is the strictest. The stemmed outputs that are obtained from the __Lancaster__ stemmer are a bit obfuscated because it reduces the words a lot. At the same time it is really fast. A good rule of thumb is to use the __Snowball__ stemmer because it's a good trade between speed and strictness.

In [10]:
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem.snowball import SnowballStemmer

In [11]:
input_words = ['writing', 'calves', 'be', 'branded', 'horse', 'randomize', 'possibly', 'provision', 'hospital', 'kept', 'scratchy', 'code']

In [12]:
# Create various stemmber objects
porter = PorterStemmer()
lancaster = LancasterStemmer()
snowball = SnowballStemmer('english')

In [13]:
# Create a list of stemmer names for display
stemmer_names = ['PORTER', 'LANCASTER', 'SNOWBALL']
formatted_text = '{:>16}' * (len(stemmer_names) + 1)
print('\n', formatted_text.format('INPUT WORD', *stemmer_names), '\n', '='*68)

# Stem each word and display the output
for word in input_words:
    output = [word, 
              porter.stem(word), 
              lancaster.stem(word), 
              snowball.stem(word)]
    print(formatted_text.format(*output))


       INPUT WORD          PORTER       LANCASTER        SNOWBALL 
         writing           write            writ           write
          calves            calv            calv            calv
              be              be              be              be
         branded           brand           brand           brand
           horse            hors            hors            hors
       randomize          random          random          random
        possibly         possibl            poss         possibl
       provision          provis          provid          provis
        hospital          hospit          hospit          hospit
            kept            kept            kept            kept
        scratchy        scratchi        scratchy        scratchi
            code            code             cod            code
