### Text Preprocessing 

In [1]:
corpus = """
        Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. That founding principle, rooted in the very fabric of our independence, was not simply a declaration for that time alone, but a promise meant to endure for generations. Today, as we stand upon this sacred ground, we are called to remember those who gave their last full measure of devotion, so that the nation might continue to live true to its original vision.
        Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. This war is not merely a clash of armies but a contest of ideals, determining whether liberty and equality shall remain cornerstones of human society or fall beneath the weight of division and oppression. We stand here on the battlefield of Gettysburg, a place where thousands of brave souls gave their lives. They fought not only for victory in battle, but for the survival of a nation committed to freedom.
        It is altogether fitting and proper that we should do this. Yet, in a larger sense, we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, living and dead, who struggled here, have consecrated it far above our poor power to add or detract. Their sacrifice speaks louder than any words we can utter. It reminds us that the cost of liberty is high, but it is always worth paying.
        The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us, the living, rather, to be dedicated to the unfinished work which they who fought here have thus far so nobly advanced. The duty is ours now — to take increased devotion to that cause for which they gave the last full measure of devotion.
        That from these honored dead we take increased devotion to the great task remaining before us — that this nation, under God, shall have a new birth of freedom — and that government of the people, by the people, for the people, shall not perish from the earth

"""

In [2]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\himan\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


True

In [15]:
from nltk.stem import SnowballStemmer
from nltk.corpus import stopwords

In [16]:
stopwords.words('english')

['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

But its good generally to create our own stopwords and try to remove them from the paragraph. 

In [18]:
stemmer = SnowballStemmer(language='english')

In [19]:
sentences = nltk.sent_tokenize(corpus)

Now what we have to do is to find out the stopwords present in the corpus and filter them out from it. And then apply the Stemming on the remaining words.

In [20]:
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i]) ## gives a list of words
    stop_words = [word for word in words if word not in stopwords.words('english')] ## list of words remain after removing stopwords
    stemmed_words = [stemmer.stem(word) for word in stop_words]    ## list of all the remaining words after stemming
    sentences[i] = " ".join(stemmed_words)  
    

In [21]:
sentences

['four score seven year ago father brought forth contin new nation , conceiv liberti , dedic proposit men creat equal .',
 'that found principl , root fabric independ , simpli declar time alon , promis meant endur generat .',
 'today , stand upon sacr ground , call rememb gave last full measur devot , nation might continu live true origin vision .',
 'now engag great civil war , test whether nation , nation conceiv dedic , long endur .',
 'this war mere clash armi contest ideal , determin whether liberti equal shall remain cornerston human societi fall beneath weight divis oppress .',
 'we stand battlefield gettysburg , place thousand brave soul gave live .',
 'they fought victori battl , surviv nation commit freedom .',
 'it altogeth fit proper .',
 'yet , larger sens , dedic , consecr , hallow ground .',
 'the brave men , live dead , struggl , consecr far poor power add detract .',
 'their sacrific speak louder word utter .',
 'it remind us cost liberti high , alway worth pay .',
 't