9 - Write a Python script to apply Non-negative Matrix Factorization (NMF) for topic
modeling and document clustering.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

# Sample data - more varied topics
documents = [
    "The person is in the house.",
    "The house is beautiful and big.",
    "Someone is in the house.",
    "The night sky is clear and full of stars.",
    "Stars are shining in the clear night.",
    "The night is quiet and calm.",
    "Today is a bright and sunny day.",
    "Someone likes sunny days in the house.",
]

# Vectorize text with TF-IDF, removing English stop words
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)

# Apply NMF with 2 topics
nmf = NMF(n_components=2, random_state=42)
nmf.fit(X)

# Display topics
print("Topics identified using NMF:")
for index, topic in enumerate(nmf.components_):
    print(f'\nTopic {index + 1}:')
    # Display top 5 words per topic
    top_words = [vectorizer.get_feature_names_out()[i] for i in topic.argsort()[-5:]]
    print(", ".join(top_words))


Topics identified using NMF:

Topic 1:
sunny, big, beautiful, person, house

Topic 2:
shining, sky, clear, stars, night


10 - Create a Python program to implement and compare various NLP algorithms for
tasks such as classification, clustering, and sentiment analysis.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Sample data
texts = ["I love cars.", "Lewis Hamilton is GOAT!", "I like dumplings."]
labels = [1, 1, 0]

# Split data
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.2)

# Vectorize text
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Naive Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train_vec, y_train)
nb_predictions = nb_classifier.predict(X_test_vec)
nb_accuracy = accuracy_score(y_test, nb_predictions)

# SVM classifier
svm_classifier = SVC()
svm_classifier.fit(X_train_vec, y_train)
svm_predictions = svm_classifier.predict(X_test_vec)
svm_accuracy = accuracy_score(y_test, svm_predictions)

# Compare results
print(f'Naive Bayes Accuracy: {nb_accuracy}')
print(f'SVM Accuracy: {svm_accuracy}')


Naive Bayes Accuracy: 0.0
SVM Accuracy: 1.0


11 - Develop a Python script to perform sentiment analysis on text data using lexiconbased methods and machine learning models.

In [None]:
from textblob import TextBlob

# Sample F1-related texts
f1_texts = [
    "That was an incredible race! The strategy was perfect and the overtakes were amazing.",
    "I’m disappointed with the team’s performance today; the car setup just wasn’t right.",
    "What a historic win for the driver! Absolutely deserved and well-earned."
]

print("F1 Sentiment Analysis Results:\n")

for text in f1_texts:
    # Create TextBlob object and analyze sentiment
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    subjectivity = blob.sentiment.subjectivity

    # Classify sentiment based on polarity score
    if polarity > 0:
        sentiment_label = "Positive"
    elif polarity < 0:
        sentiment_label = "Negative"
    else:
        sentiment_label = "Neutral"

    # Display results
    print(f"Text: {text}")
    print(f"Polarity: {polarity}, Subjectivity: {subjectivity}")
    print(f"Sentiment: {sentiment_label}\n")


F1 Sentiment Analysis Results:

Text: That was an incredible race! The strategy was perfect and the overtakes were amazing.
Polarity: 0.8666666666666667, Subjectivity: 0.9333333333333332
Sentiment: Positive

Text: I’m disappointed with the team’s performance today; the car setup just wasn’t right.
Polarity: -0.23214285714285715, Subjectivity: 0.6428571428571428
Sentiment: Negative

Text: What a historic win for the driver! Absolutely deserved and well-earned.
Polarity: 0.39999999999999997, Subjectivity: 0.43333333333333335
Sentiment: Positive



12 - Write a Python program to apply deep learning models such as RNNs, LSTMs, or
Transformers for NLP tasks, including experimenting with pre-trained models like
BERT or GPT.

In [None]:
import torch
from transformers import pipeline

# Load pre-trained model and tokenizer for sentiment analysis
classifier = pipeline('sentiment-analysis')

# Sample F1-related texts for sentiment analysis
f1_texts = [
    "The race was absolutely thrilling! Great strategy by the team.",
    "I'm disappointed with the driver's performance today.",
    "What an incredible win! He really deserved that podium finish."
]

# Perform sentiment analysis on each F1-related text
print("F1 Sentiment Analysis Results:\n")
for text in f1_texts:
    result = classifier(text)
    label = result[0]['label']
    score = result[0]['score']

    # Display each text and its corresponding sentiment
    print(f"Text: {text}")
    print(f"Sentiment: {label} (Confidence: {score:.2f})\n")


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



F1 Sentiment Analysis Results:

Text: The race was absolutely thrilling! Great strategy by the team.
Sentiment: POSITIVE (Confidence: 1.00)

Text: I'm disappointed with the driver's performance today.
Sentiment: NEGATIVE (Confidence: 1.00)

Text: What an incredible win! He really deserved that podium finish.
Sentiment: POSITIVE (Confidence: 1.00)

