---

## **Notes on the Code**

### **Purpose of the Code**
This script demonstrates the application of:
1. **Stopword Removal**: Filters out commonly used words (e.g., "is," "and," "the") that do not add significant meaning to text analysis.
2. **Stemming**: Reduces words to their root forms using the **PorterStemmer** and **SnowballStemmer**.
3. **Lemmatization**: Normalizes words to their dictionary forms using the **WordNetLemmatizer**.
4. The code processes a given paragraph by tokenizing it into sentences, removing stopwords, and applying either stemming or lemmatization to the words.

---

### **Step-by-Step Explanation**

#### **1. Setup**
```python
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punk_tab')
nltk.download('punkt_tab')
stopwords.words('english')
```
- **Imports Required Libraries**: Includes tools for stemming, lemmatization, tokenization, and stopword removal.
- **Downloads Required NLTK Data**:
  - `stopwords`: List of common words to filter out.
  - `punkt_tab`: Tokenization rules.
- **`stopwords.words('english')`**: Loads the stopwords list for English.

---

#### **2. Tokenize Paragraph into Sentences**
```python
sentences = nltk.sent_tokenize(paragraph)
type(sentences)
```
- **`nltk.sent_tokenize`**: Splits the paragraph into sentences for processing.
- **`type(sentences)`**: Confirms the result is a list of sentences.

---

#### **3. Stopword Removal and PorterStemmer Stemming**
```python
stemmer = PorterStemmer()

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)
```
- **PorterStemmer**:
  - Reduces words to their root forms (e.g., "running" → "run").
  - It applies a set of rules to strip suffixes like "-ing," "-ed," or "-s."
- **Stopword Filtering**:
  - Excludes words that do not contribute significant meaning, such as "the," "is," and "and."
- **Tokenization**:
  - Splits each sentence into words for processing using `nltk.word_tokenize`.
- **Rejoin Words**:
  - After processing, the words are reassembled into sentences using `' '.join()`.

---

#### **4. Stopword Removal and SnowballStemmer Stemming**
```python
from nltk.stem import SnowballStemmer
Snowstemmer = SnowballStemmer("english")

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [Snowstemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)
```
- **SnowballStemmer**:
  - Another stemming algorithm, optimized for multiple languages.
  - Typically more aggressive than PorterStemmer.
- **Processing Logic**:
  - Similar to PorterStemmer processing but uses SnowballStemmer for word normalization.

---

#### **5. Stopword Removal and WordNetLemmatizer**
```python
Lemma = WordNetLemmatizer()

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [Lemma.lemmatize(word, pos='v') for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)
```
- **WordNetLemmatizer**:
  - Converts words to their base dictionary forms (lemmas) while considering their part of speech (POS).
  - **POS Tag**:
    - Verb (`'v'`) is specified to enhance the accuracy of lemmatization.
  - Examples:
    - `"running"` → `"run"`
    - `"written"` → `"write"`
- **Process**:
  - Identical to the stemming logic but applies lemmatization for more semantically accurate results.

---

### **Key Takeaways**

1. **Stopwords Removal**:
   - Filters out common words that provide minimal information.
   - Reduces noise in text data.

2. **PorterStemmer vs. SnowballStemmer**:
   - Both aim to reduce words to their roots.
   - **PorterStemmer**: Simpler, rule-based.
   - **SnowballStemmer**: More advanced and language-flexible.

3. **Lemmatization**:
   - Produces valid dictionary words, unlike stemming, which may produce truncated results.
   - Context-aware: Considers part-of-speech (POS) for better accuracy.

4. **Comparison Example**:
   - **Word**: "running"
     - **PorterStemmer**: "run"
     - **SnowballStemmer**: "run"
     - **Lemmatizer**: "run" (if POS is `'v'`)

This script showcases the process of refining raw text for downstream tasks like text analysis, classification, or information retrieval.

In [1]:
paragraph = """
The increasing reliance on technology has transformed nearly every aspect of modern life, from how we communicate to how we work and learn.
 Smartphones, laptops, and other digital devices have become indispensable tools, enabling people to stay connected and access vast amounts of information at their fingertips. Despite the undeniable benefits, this technological shift has brought its own set of challenges.
  People often find themselves overwhelmed by the constant flow of notifications, emails, and messages, making it difficult to focus on meaningful tasks.
   Furthermore, the rise of social media has led to debates about its impact on mental health, as individuals are exposed to unrealistic standards and excessive comparisons.
    While technology offers immense opportunities for innovation and progress, it is equally important to address the societal and personal implications it carries to ensure a balanced and sustainable future.
"""

In [15]:
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punk_tab')
nltk.download('punkt_tab')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Error loading punk_tab: Package 'punk_tab' not found in
[nltk_data]     index
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [11]:
stopwords.words('english')

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [12]:
stemmer = PorterStemmer()

In [41]:
sentences =  nltk.sent_tokenize(paragraph)

In [18]:
type(sentences)

list

In [19]:
## Apply Stopwards and filter and then Apply Stemming

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word)for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words) # converting all list of the words into sentences


In [20]:
sentences

['the increas relianc technolog transform nearli everi aspect modern life , commun work learn .',
 'smartphon , laptop , digit devic becom indispens tool , enabl peopl stay connect access vast amount inform fingertip .',
 'despit undeni benefit , technolog shift brought set challeng .',
 'peopl often find overwhelm constant flow notif , email , messag , make difficult focu meaning task .',
 'furthermor , rise social media led debat impact mental health , individu expos unrealist standard excess comparison .',
 'while technolog offer immens opportun innov progress , equal import address societ person implic carri ensur balanc sustain futur .']

In [29]:
from nltk.stem import SnowballStemmer
Snowstemmer = SnowballStemmer("english")

In [28]:
## Apply Stopwards and filter and then Apply Stemming

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [Snowstemmer.stem(word)for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words) # converting all list of the words into sentences


In [30]:
sentences

['the increas relianc technolog transform near everi aspect modern life , communic work learn .',
 'smartphon , laptop , digit devic becom indispens tool , enabl peopl stay connect access vast amount inform fingertip .',
 'despit undeni benefit , technolog shift brought set challeng .',
 'peopl often find overwhelm constant flow notif , email , messag , make difficult focus meaning task .',
 'furthermor , rise social media led debat impact mental health , individu expos unrealist standard excess comparison .',
 'while technolog offer immens opportun innov progress , equal import address societ person implic carri ensur balanc sustain futur .']

In [31]:
Lemma = WordNetLemmatizer()

In [42]:
## Apply Stopwards and filter and then Apply Stemming

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [Lemma.lemmatize(word,pos='v')for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words) # converting all list of the words into sentences


In [43]:
sentences

['The increase reliance technology transform nearly every aspect modern life , communicate work learn .',
 'Smartphones , laptops , digital devices become indispensable tool , enable people stay connect access vast amount information fingertips .',
 'Despite undeniable benefit , technological shift bring set challenge .',
 'People often find overwhelm constant flow notifications , email , message , make difficult focus meaningful task .',
 'Furthermore , rise social media lead debate impact mental health , individuals expose unrealistic standards excessive comparisons .',
 'While technology offer immense opportunities innovation progress , equally important address societal personal implications carry ensure balance sustainable future .']