# Stop Words:

Stop words are a set of commonly used words in a language. Examples of stop words in English are “a,” “the,” “is,” “are,” etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so widely used that they carry very little useful information.

# Aim and Objective:

- Select a Paragraph,
- Remove stop words from it,
- Apply stemming, lemmatization techniques:
  - Stemming: PorterStemmer, SnowballStemmer
  - Lemmatization: WordNetLemmatizer
- Count words in stemming and lemmatization

In [56]:
# here is the selected paragraph
paragraph = """We know the scheme of English subjects for matric classes.
In this way, the subject of English has two parts, English A and English B.
In this way, the second part is more important and has long questions with more than five marks.
However, the Essays and stores cover about 10 marks for both parts.
But in some academies, the preference for Essays is also included for the 9th class.
Therefore, we are going to provide an Essay on My Country Pakistan for class 9 in this post.
So, you can read this post to check the Essay on My Country.
Basically, the section of the essay will be included in the next class.
But it is also useful to remember for the next class.
So, we are going to mention to all students that you can read and learn for school tests and exams.
The Essay on My Country consists of more than 200 words in this class.
However, the section of the paragraph is also included in the matrix in which the topic My country can come in the final exam.
In this way, you can learn this essay to get good marks in your final exam.
Moreover, the students can also get other essays from this place for their final exam preparation.
In this way, we are covering all important essays that normally come in the annual exam.
So, you can explore this website to get all the important Essay of Matric for your English B preparation."""

In [57]:
import nltk
# for converting paragraph into sentences and words
from nltk.tokenize import sent_tokenize, word_tokenize
# for removing stopwords
from nltk.corpus import stopwords
# for applying stemming and lemmatization
from nltk.stem import PorterStemmer ,SnowballStemmer, WordNetLemmatizer

Converting paragraph(corpus) into sentences(documents)

In [68]:
sentences = sent_tokenize(paragraph)

## 1. PorterStemmer

In [59]:
porter_stemmer = PorterStemmer()

In [60]:
for i in range(len(sentences)):
  words = word_tokenize(sentences[i])
  words = [porter_stemmer.stem(word) for word in words if word not in set(stopwords.words("english"))]
  sentences[i] = ' '.join(words)

In [61]:
sentences

['we know scheme english subject matric class .',
 'in way , subject english two part , english a english b .',
 'in way , second part import long question five mark .',
 'howev , essay store cover 10 mark part .',
 'but academi , prefer essay also includ 9th class .',
 'therefor , go provid essay my countri pakistan class 9 post .',
 'so , read post check essay my countri .',
 'basic , section essay includ next class .',
 'but also use rememb next class .',
 'so , go mention student read learn school test exam .',
 'the essay my countri consist 200 word class .',
 'howev , section paragraph also includ matrix topic my countri come final exam .',
 'in way , learn essay get good mark final exam .',
 'moreov , student also get essay place final exam prepar .',
 'in way , cover import essay normal come annual exam .',
 'so , explor websit get import essay matric english b prepar .']

In [62]:
# counting returning words
count = 0
for i in range(len(sentences)):
  count += len(sentences[i].split(" "))
print(count)

166


## 2. SnowballStemmer

In [64]:
snowball_stemmer = SnowballStemmer("english")

In [65]:
for i in range(len(sentences)):
  words = word_tokenize(sentences[i])
  words = [snowball_stemmer.stem(word) for word in words if word not in set(stopwords.words("english"))]
  sentences[i] = ' '.join(words)

In [66]:
sentences

['we know scheme english subject matric class .',
 'in way , subject english two part , english a english b .',
 'in way , second part import long question five mark .',
 'howev , essay store cover 10 mark part .',
 'but academi , prefer essay also includ 9th class .',
 'therefor , go provid essay my countri pakistan class 9 post .',
 'so , read post check essay my countri .',
 'basic , section essay includ next class .',
 'but also use rememb next class .',
 'so , go mention student read learn school test exam .',
 'the essay my countri consist 200 word class .',
 'howev , section paragraph also includ matrix topic my countri come final exam .',
 'in way , learn essay get good mark final exam .',
 'moreov , student also get essay place final exam prepar .',
 'in way , cover import essay normal come annual exam .',
 'so , explor websit get import essay matric english b prepar .']

In [67]:
# counting returning words
count = 0
for i in range(len(sentences)):
  count += len(sentences[i].split(" "))
print(count)

166


## 3. WordNetLemmatizer

In [69]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [70]:
lemmatizer = WordNetLemmatizer()

In [71]:
for i in range(len(sentences)):
  words = word_tokenize(sentences[i])
  words = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words("english"))]
  sentences[i] = ' '.join(words)

In [72]:
sentences

['We know scheme English subject matric class .',
 'In way , subject English two part , English A English B .',
 'In way , second part important long question five mark .',
 'However , Essays store cover 10 mark part .',
 'But academy , preference Essays also included 9th class .',
 'Therefore , going provide Essay My Country Pakistan class 9 post .',
 'So , read post check Essay My Country .',
 'Basically , section essay included next class .',
 'But also useful remember next class .',
 'So , going mention student read learn school test exam .',
 'The Essay My Country consists 200 word class .',
 'However , section paragraph also included matrix topic My country come final exam .',
 'In way , learn essay get good mark final exam .',
 'Moreover , student also get essay place final exam preparation .',
 'In way , covering important essay normally come annual exam .',
 'So , explore website get important Essay Matric English B preparation .']

In [73]:
# counting returning words
count = 0
for i in range(len(sentences)):
  count += len(sentences[i].split(" "))
print(count)

166


End of Code!