Assignment 3: Apply NLP techniques like Part-of-Speech tagging, Named Entity Recognition (NER), and dependency parsing for text understanding. Perform text classification, sentiment analysis, and topic modeling to extract insights from unstructured text.

Part-of-Speech Tagging, NER, Dependency Parsing

In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")

doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

# POS Tagging
print([(token.text, token.pos_) for token in doc])

# Named Entity Recognition
print([(ent.text, ent.label_) for ent in doc.ents])

# Dependency Parsing
print([(token.text, token.dep_, token.head.text) for token in doc])

[('Apple', 'PROPN'), ('is', 'AUX'), ('looking', 'VERB'), ('at', 'ADP'), ('buying', 'VERB'), ('U.K.', 'PROPN'), ('startup', 'VERB'), ('for', 'ADP'), ('$', 'SYM'), ('1', 'NUM'), ('billion', 'NUM')]
[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
[('Apple', 'nsubj', 'looking'), ('is', 'aux', 'looking'), ('looking', 'ROOT', 'looking'), ('at', 'prep', 'looking'), ('buying', 'pcomp', 'at'), ('U.K.', 'nsubj', 'startup'), ('startup', 'ccomp', 'buying'), ('for', 'prep', 'startup'), ('$', 'quantmod', 'billion'), ('1', 'compound', 'billion'), ('billion', 'pobj', 'for')]


Text Classification & Sentiment Analysis

In [None]:
from textblob import TextBlob

sample = "I absolutely loved this movie, it was fantastic!"
blob = TextBlob(sample)
print("Sentiment polarity:", blob.sentiment.polarity)  # >0 positive, <0 negative

Sentiment polarity: 0.6


In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ["Win money now", "Hello friend"]
labels = [1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

print(model.predict(vectorizer.transform(["Win a prize"])))


[1]


Topic Modeling (LDA)

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

# Define a list of sample texts for topic modeling
texts = [
    "The quick brown fox jumps over the lazy dog.",
    "Never jump over the fence in the dark.",
    "Dogs and cats are common pets. Cats are independent animals.",
    "Topic modeling helps in understanding the main themes in a collection of documents.",
    "Natural language processing is a field of artificial intelligence."
]

vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(texts)

lda = LatentDirichletAllocation(n_components=2, random_state=42)
lda.fit(X)

for idx, topic in enumerate(lda.components_):
    print(f"Topic {idx}:",
          [vectorizer.get_feature_names_out()[i] for i in topic.argsort()[-5:]])

Topic 0: ['main', 'modeling', 'themes', 'topic', 'understanding']
Topic 1: ['pets', 'independent', 'common', 'dogs', 'cats']
