### Stopwords
- Filtering or removal of common words
- Focus on high-frequency, low-value words
- Noise reduction in NLP

In [None]:
## Flowchart

# for loop using range function
# Sentences ----> tokenization
# words ----> stopwords
# stemming/lemmatization

In [1]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


True

In [None]:
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize

In [96]:
corpus="""
Pablo Escobar, the infamous leader of the Medellín Cartel, dominated the global cocaine trade through a brutal philosophy known as "plata o plomo" (silver or lead). While his immense wealth once ranked him as one of the wealthiest individuals in the world, his legacy remains deeply polarized between those who remember his violent acts of narco-terrorism and the poor of Medellín who viewed him as a Robin Hood figure for his local charity work. His reign finally ended during a rooftop shootout with Colombian authorities and U.S.-backed forces."""

In [115]:
sentences=sent_tokenize(corpus)
for sentence in sentences:
    print(sentence)



Pablo Escobar, the infamous leader of the Medellín Cartel, dominated the global cocaine trade through a brutal philosophy known as "plata o plomo" (silver or lead).
While his immense wealth once ranked him as one of the wealthiest individuals in the world, his legacy remains deeply polarized between those who remember his violent acts of narco-terrorism and the poor of Medellín who viewed him as a Robin Hood figure for his local charity work.
His reign finally ended during a rooftop shootout with Colombian authorities and U.S.-backed forces.


In [82]:
## Trying with PorterStemmer
from nltk.stem import PorterStemmer
stemmer=PorterStemmer()

In [102]:
for i in range(len(sentences)):
    words=word_tokenize(sentences[i])
    words=[stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i]= ' '.join(words)



In [103]:
for sentence in sentences:
    print(sentence)

pablo escobar , infam leader medellín cartel , domin global cocain trade brutal philosophi known `` plata plomo '' ( silver lead ) .
while immens wealth rank one wealthiest individu world , legaci remain deepli polar rememb violent act narco-terror poor medellín view robin hood figur local chariti work .
hi reign final end rooftop shootout colombian author u.s.-back forc .


In [70]:
## Trying with SnowballStemmer
from nltk.stem import SnowballStemmer
snowstemmer=SnowballStemmer('english')

In [105]:
for i in range(len(sentences)):
    words=word_tokenize(sentences[i])
    words=[snowstemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i]= ' '.join(words)



In [106]:
for sentence in sentences:
    print(sentence)

pablo escobar , infam leader medellín cartel , domin global cocain trade brutal philosophi known `` plata plomo '' ( silver lead ) .
while immens wealth rank one wealthiest individu world , legaci remain deepli polar rememb violent act narco-terror poor medellín view robin hood figur local chariti work .
his reign final end rooftop shootout colombian author u.s.-back forc .


In [98]:
## Trying RegexpStemmet
from nltk.stem import RegexpStemmer
regstemmer=RegexpStemmer('s$|es$|ies$|ing$|ed$|able$|')

In [108]:
for i in range(len(sentences)):
    words=word_tokenize(sentences[i])
    words=[regstemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i]= ' '.join(words)



In [109]:
for sentence in sentences:
    print(sentence)

Pablo Escobar , infamou leader Medellín Cartel , dominat global cocaine trade brutal philosophy known `` plata plomo '' ( silver lead ) .
While immense wealth rank one wealthiest individual world , legacy remain deeply polariz remember violent act narco-terrorism poor Medellín view Robin Hood figure local charity work .
Hi reign finally end rooftop shootout Colombian authorit U.S.-back forc .


In [110]:
from nltk.stem import WordNetLemmatizer
lemmatizer=WordNetLemmatizer()

In [116]:
for i in range(len(sentences)):
    words=word_tokenize(sentences[i])
    words=[lemmatizer.lemmatize(word.lower(), pos='v') for word in words if word not in set(stopwords.words('english'))]
    sentences[i]= ' '.join(words)



In [117]:
for sentence in sentences:
    print(sentence)

pablo escobar , infamous leader medellín cartel , dominate global cocaine trade brutal philosophy know `` plata plomo '' ( silver lead ) .
while immense wealth rank one wealthiest individuals world , legacy remain deeply polarize remember violent act narco-terrorism poor medellín view robin hood figure local charity work .
his reign finally end rooftop shootout colombian authorities u.s.-backed force .
