# Useful NLP Libraries & Networks | Assignment


## Question 1: Compare and contrast NLTK and spaCy in terms of features, ease of use, and performance.


NLTK is a comprehensive library with extensive linguistic resources and academic focus, offering more granular control but slower performance. spaCy is production-oriented with faster processing, built-in word vectors, and a streamlined API. NLTK requires more manual configuration and is better for research, while spaCy provides optimized pipelines and is ideal for real-world applications. Performance-wise, spaCy is significantly faster due to its Cython implementation, whereas NLTK prioritizes flexibility and educational value over speed.


## Question 2: What is TextBlob and how does it simplify common NLP tasks like sentiment analysis and translation?


TextBlob is a Python library built on top of NLTK and Pattern that provides a simple API for common NLP tasks. It simplifies sentiment analysis by offering built-in polarity and subjectivity scores through a single method call, eliminating the need for complex preprocessing. For translation, TextBlob integrates with Google Translate API, allowing text translation with minimal code. Its intuitive interface abstracts away the underlying complexity, making NLP accessible to beginners while maintaining sufficient functionality for many practical applications.


## Question 3: Explain the role of Stanford NLP in academic and industry NLP Projects.


Stanford NLP provides state-of-the-art tools and models developed by Stanford University, serving as a bridge between academic research and industry applications. In academia, it offers robust implementations of core NLP tasks like parsing, named entity recognition, and coreference resolution, enabling reproducible research. In industry, Stanford NLP tools are used for building production systems, particularly for tasks requiring high accuracy like information extraction and question answering. Its comprehensive Java and Python APIs make it accessible for both research and commercial projects, with models that often serve as benchmarks in the field.


## Question 4: Describe the architecture and functioning of a Recurrent Neural Network (RNN).


A Recurrent Neural Network (RNN) is designed to process sequential data by maintaining hidden states that capture information from previous time steps. The architecture consists of input, hidden, and output layers, where the hidden layer has connections that loop back, allowing the network to retain memory of previous inputs. At each time step, the RNN processes the current input along with the previous hidden state, updating its internal representation. This enables RNNs to handle variable-length sequences and capture temporal dependencies, making them suitable for tasks like language modeling, machine translation, and sentiment analysis where context from earlier words matters.


## Question 5: What is the key difference between LSTM and GRU networks in NLP applications?


LSTM (Long Short-Term Memory) networks use three gates (forget, input, and output) along with a cell state to control information flow, providing more complex memory management. GRU (Gated Recurrent Unit) networks simplify this architecture by combining the forget and input gates into a single update gate and merging the cell state with the hidden state. GRUs are computationally more efficient and faster to train, while LSTMs offer more fine-grained control over memory retention. In practice, GRUs often perform comparably to LSTMs with less computational overhead, making them popular for many NLP tasks, though LSTMs may excel in scenarios requiring longer-term dependencies.


## Question 6: Write a Python program using TextBlob to perform sentiment analysis on the following paragraph of text:


In [None]:
from textblob import TextBlob

text = "I had a great experience using the new mobile banking app. The interface is intuitive, and customer support was quick to resolve my issue. However, the app did crash once during a transaction, which was frustrating"

blob = TextBlob(text)
print("Polarity:", blob.sentiment.polarity)
print("Subjectivity:", blob.sentiment.subjectivity)


## Question 7: Given the sample paragraph below, perform string tokenization and frequency distribution using Python and NLTK:


In [None]:
import nltk
from collections import Counter

text = "Natural Language Processing (NLP) is a fascinating field that combines linguistics, computer science, and artificial intelligence. It enables machines to understand, interpret, and generate human language. Applications of NLP include chatbots, sentiment analysis, and machine translation. As technology advances, the role of NLP in modern solutions is becoming increasingly critical."

tokens = nltk.word_tokenize(text)
word_freq = Counter(tokens)

for word, freq in word_freq.most_common(10):
    print(f"{word}: {freq}")


## Question 8: Implement a basic LSTM model in Keras for a text classification task using the following dummy dataset.


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

texts = [
    "I love this project",
    "This is an amazing experience",
    "I hate waiting in line",
    "This is the worst service",
    "Absolutely fantastic!"
]

labels = [1, 1, 0, 0, 1]

tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
padded_sequences = pad_sequences(sequences, maxlen=10)

model = Sequential()
model.add(Embedding(len(tokenizer.word_index) + 1, 32, input_length=10))
model.add(LSTM(64))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(padded_sequences, np.array(labels), epochs=10, batch_size=1, verbose=0)
print("Model trained successfully")


## Question 9: Using spaCy, build a simple NLP pipeline that includes tokenization, lemmatization, and entity recognition.


In [None]:
import spacy

nlp = spacy.load('en_core_web_sm')
text = "Homi Jehangir Bhaba was an Indian nuclear physicist who played a key role in the development of India's atomic energy program. He was the founding director of the Tata Institute of Fundamental Research (TIFR) and was instrumental in establishing the Atomic Energy Commission of India."

doc = nlp(text)

for token in doc:
    print(token.text, token.lemma_, token.ent_type_)


## Question 10: You are working on a chatbot for a mental health platform. Explain how you would leverage LSTM or GRU networks along with libraries like spaCy or Stanford NLP to understand and respond to user input effectively.


For a mental health chatbot, I would use spaCy for preprocessing (tokenization, lemmatization, entity recognition) to extract key information and normalize user input. A GRU-based sequence-to-sequence model would process the preprocessed text, capturing contextual understanding of user emotions and intents. The architecture would include an encoder GRU to understand input, a decoder GRU to generate responses, and attention mechanisms to focus on critical parts of the conversation. Ethical considerations include ensuring user privacy, providing disclaimers about not replacing professional help, implementing safety protocols for crisis situations, and maintaining transparency about AI limitations. The system would route severe cases to human professionals while providing supportive responses for general queries.


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

data = ["I'm feeling sad today.", "I want to talk about my depression.", "Can you help me find resources for mental health support?"]

tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
sequences = tokenizer.texts_to_sequences(data)
padded_sequences = pad_sequences(sequences, maxlen=50)

model = Sequential()
model.add(Embedding(len(tokenizer.word_index) + 1, 64, input_length=50))
model.add(GRU(128, return_sequences=True))
model.add(GRU(128))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

labels = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
model.fit(padded_sequences, labels, epochs=10, batch_size=1, verbose=0)
print("Model trained successfully")
