Question 1 - NLP Preprocessing Pipeline

In [1]:
# Install and import required libraries
!pip install nltk

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

# Download necessary data
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('punkt_tab') # Download the punkt_tab data

# Define the preprocessing function
def preprocess_nlp(sentence):
    print("Sentence:", sentence)

    # Tokenize
    tokens = word_tokenize(sentence)
    print("\nOriginal Tokens:")
    print(tokens)

    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
    print("\nTokens Without Stopwords:")
    print(filtered_tokens)

    # Stemming
    ps = PorterStemmer()
    stemmed_words = [ps.stem(word) for word in filtered_tokens]
    print("\nStemmed Words:")
    print(stemmed_words)

# Run on sample sentence
sentence = "NLP techniques are used in virtual assistants like Alexa and Siri."
preprocess_nlp(sentence)



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Sentence: NLP techniques are used in virtual assistants like Alexa and Siri.

Original Tokens:
['NLP', 'techniques', 'are', 'used', 'in', 'virtual', 'assistants', 'like', 'Alexa', 'and', 'Siri', '.']

Tokens Without Stopwords:
['NLP', 'techniques', 'used', 'virtual', 'assistants', 'like', 'Alexa', 'Siri', '.']

Stemmed Words:
['nlp', 'techniqu', 'use', 'virtual', 'assist', 'like', 'alexa', 'siri', '.']


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


### **Q1: What is the difference between stemming and lemmatization? Provide examples with the word “running.”**

**✅ Answer:**  
Stemming reduces words to their root form by chopping off prefixes or suffixes, often resulting in non-standard words. For example, *"running"* becomes *"run"* using a stemmer.  
Lemmatization, on the other hand, uses vocabulary and grammar rules to return the base form of a word (called a lemma). For example, *"running"* also becomes *"run"*, but in a grammatically correct way.

---

### **Q2: Why might removing stop words be useful in some NLP tasks, and when might it actually be harmful?**

**✅ Answer:**  
Removing stop words is useful in tasks like document classification or topic modeling to reduce noise and improve model efficiency.  
However, in tasks like sentiment analysis or question answering, stop words (like *“not”* or *“is”*) may carry essential meaning, so removing them can hurt performance.

Question 2 - Named Entity Recognition with SpaCy

In [2]:
# Install and import spaCy
!pip install -U spacy

import spacy

# Download English model
!python -m spacy download en_core_web_sm

# Load the model
nlp = spacy.load("en_core_web_sm")

# Define input sentence
sentence = "Barack Obama served as the 44th President of the United States and won the Nobel Peace Prize in 2009."

# Process the sentence
doc = nlp(sentence)

# Print named entities
print("Named Entities:\n")
for ent in doc.ents:
    print(f"Text: {ent.text}, Label: {ent.label_}, Start: {ent.start_char}, End: {ent.end_char}")


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m106.9 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
Named Entities:

Text: Barack Obama, Label: PERSON, Start: 0, End: 12
Text: 44th, Label: ORDINAL, Start: 27, End: 31
Text: the United States, Label: GPE, Start: 45, End: 62
Text: the Nobel Peace Prize, Label: WORK_OF_ART, Start: 71, End: 92
Text: 2009, Label: DATE, Start: 96, End: 100


### **Q1: How does NER differ from POS tagging in NLP?**

**✅ Answer:**  
NER (Named Entity Recognition) identifies and classifies entities in text (like names, dates, locations), while POS (Part-of-Speech) tagging labels each word with its grammatical role (like noun, verb, adjective).  
For example, in the sentence *"Barack Obama was president"*, POS tagging labels *"Barack"* as a noun, while NER identifies *"Barack Obama"* as a PERSON entity.

---

### **Q2: Describe two applications that use NER in the real world.**

**✅ Answer:**  
1. **Financial News Analysis**: NER extracts company names, stock tickers, and economic events to support trading algorithms.  
2. **Search Engines**: NER helps understand queries like *"Hotels in New York near Central Park"* by identifying location entities to provide relevant results.


Question 3 - Scaled Dot-Product Attention

In [3]:
import numpy as np

def scaled_dot_product_attention(Q, K, V):
    # Step 1: Compute dot product QKᵀ
    scores = np.dot(Q, K.T)

    # Step 2: Scale by sqrt(d)
    d = K.shape[1]
    scaled_scores = scores / np.sqrt(d)

    # Step 3: Apply softmax
    exp_scores = np.exp(scaled_scores - np.max(scaled_scores, axis=1, keepdims=True))
    attention_weights = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

    # Step 4: Multiply with V
    output = np.dot(attention_weights, V)

    # Print results
    print("Attention Weights:\n", attention_weights)
    print("\nOutput:\n", output)

# Test inputs
Q = np.array([[1, 0, 1, 0], [0, 1, 0, 1]])
K = np.array([[1, 0, 1, 0], [0, 1, 0, 1]])
V = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

# Run function
scaled_dot_product_attention(Q, K, V)


Attention Weights:
 [[0.73105858 0.26894142]
 [0.26894142 0.73105858]]

Output:
 [[2.07576569 3.07576569 4.07576569 5.07576569]
 [3.92423431 4.92423431 5.92423431 6.92423431]]


### **Q1: Why do we divide the attention score by √d in the scaled dot-product attention formula?**

**✅ Answer:**  
We divide by √d (where d is the dimension of the key vectors) to prevent the dot product values from growing too large, which can push the softmax function into regions with very small gradients.  
This scaling improves training stability and ensures better gradient flow in deep attention networks.

---

### **Q2: How does self-attention help the model understand relationships between words in a sentence?**

**✅ Answer:**  
Self-attention allows each word in a sentence to focus on other relevant words, regardless of their position.  
This helps the model capture contextual dependencies — for example, in *“The animal didn’t cross the street because it was too tired”*, self-attention helps the model understand that *“it”* refers to *“animal”*.

Question 4 - Sentiment Analysis using HuggingFace Transformers

In [4]:
# Install HuggingFace transformers
!pip install transformers

from transformers import pipeline

# Load sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")

# Define input sentence
sentence = "Despite the high price, the performance of the new MacBook is outstanding."

# Run sentiment analysis
result = classifier(sentence)[0]

# Print results
print(f"Sentiment: {result['label']}")
print(f"Confidence Score: {result['score']:.4f}")




No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


Sentiment: POSITIVE
Confidence Score: 0.9998


### **Q1: What is the main architectural difference between BERT and GPT? Which uses an encoder and which uses a decoder?**

**✅ Answer:**  
BERT uses a **transformer encoder** architecture and is designed for understanding context bidirectionally (left and right of a word).  
GPT uses a **transformer decoder** architecture and is optimized for text generation, processing context in a unidirectional (left-to-right) manner.

---

### **Q2: Why is using pre-trained models (like BERT or GPT) beneficial for NLP applications instead of training from scratch?**

**✅ Answer:**  
Pre-trained models are trained on massive datasets and capture rich language representations, saving time, data, and compute resources.  
They enable faster development and improved accuracy for downstream tasks like sentiment analysis, even with limited labeled data.
