<a href="https://colab.research.google.com/github/Arun9438/Boston-Housing-Pricing/blob/main/Useful_NLP_Libraries_%26_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Question 1: Compare and contrast NLTK and spaCy in terms of features, ease of use, and performance.

NLTK and spaCy are two popular Python libraries for Natural Language Processing, but they are designed for different purposes.

| Aspect            | NLTK                  | spaCy                         |
| ----------------- | --------------------- | ----------------------------- |
| Primary Focus     | Education & research  | Industrial & production use   |
| Ease of Use       | Requires more coding  | Simple and user-friendly      |
| Speed             | Slower                | Very fast                     |
| Pretrained Models | Limited               | Advanced pretrained pipelines |
| Use Case          | Learning NLP concepts | Real-world applications       |

NLTK is ideal for beginners and academic learning, whereas spaCy is preferred for high-performance NLP applications in industry.

## Question 2: What is TextBlob and how does it simplify common NLP tasks like sentiment analysis and translation?

TextBlob is a high-level Python library built on top of NLTK and Pattern that simplifies common NLP tasks. It provides an easy-to-use API for sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and language detection.
For example, sentiment analysis can be performed in a single line of code, returning polarity and subjectivity scores without manually building models. This simplicity makes TextBlob ideal for quick prototyping and small-scale NLP tasks.

## Question 3: Explain the role of Stanford NLP in academic and industry NLP projects.
Stanford NLP provides state-of-the-art NLP tools developed by Stanford University. It is widely used in both academia and industry due to its accuracy and linguistic depth.
In academia, Stanford NLP is used for research in syntactic parsing, named entity recognition, and semantic analysis. In industry, it supports applications such as information extraction, chatbots, and document classification. Its CoreNLP toolkit supports multiple languages and deep linguistic annotations.

## Question 4: Describe the architecture and functioning of a Recurrent Neural Network (RNN).

A Recurrent Neural Network (RNN) is a neural network designed to process sequential data such as text or time series. Unlike feedforward networks, RNNs have loops that allow information to persist across time steps.
At each time step, the network takes the current input and the previous hidden state to produce a new hidden state. This enables RNNs to capture contextual information in sequences, making them suitable for NLP tasks like language modeling and text generation.
However, standard RNNs suffer from vanishing gradient problems when handling long sequences.

## Question 5: What is the key difference between LSTM and GRU networks in NLP applications?

LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) are advanced RNN architectures designed to overcome vanishing gradient problems.
LSTM uses three gates (input, forget, output) and a memory cell, making it powerful but computationally heavy.
GRU uses two gates (reset and update), making it simpler and faster.
Both perform well in NLP tasks, but GRUs are preferred when computational efficiency is important.

## Question 6: Sentiment analysis using TextBlob

In [1]:
from textblob import TextBlob

text = """I had a great experience using the new mobile banking app.
The interface is intuitive, and customer support was quick to resolve my issue.
However, the app did crash once during a transaction, which was frustrating"""

blob = TextBlob(text)

print("Polarity:", blob.sentiment.polarity)
print("Subjectivity:", blob.sentiment.subjectivity)


Polarity: 0.21742424242424244
Subjectivity: 0.6511363636363636


## Question 7: Tokenization and Frequency Distribution using NLTK

In [3]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

nltk.download('punkt')
nltk.download('punkt_tab') # Added to resolve the LookupError

text = """Natural Language Processing (NLP) is a fascinating field that combines
linguistics, computer science, and artificial intelligence."""

tokens = word_tokenize(text.lower())
freq_dist = FreqDist(tokens)

print(freq_dist)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


<FreqDist with 20 samples and 21 outcomes>


## Question 8: Basic LSTM model using Keras

In [6]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
import numpy as np # Import numpy

texts = [
    "I love this project",
    "This is an amazing experience",
    "I hate waiting in line",
    "This is the worst service",
    "Absolutely fantastic"
]

labels = [1, 1, 0, 0, 1]

tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
padded = pad_sequences(sequences)

labels = np.array(labels) # Convert labels to a NumPy array

model = Sequential()
model.add(Embedding(input_dim=100, output_dim=16))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(padded, labels, epochs=10)


Epoch 1/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.2000 - loss: 0.6949
Epoch 2/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step - accuracy: 0.4000 - loss: 0.6930
Epoch 3/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step - accuracy: 0.6000 - loss: 0.6910
Epoch 4/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 49ms/step - accuracy: 0.6000 - loss: 0.6891
Epoch 5/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 58ms/step - accuracy: 0.6000 - loss: 0.6871
Epoch 6/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.6000 - loss: 0.6851
Epoch 7/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 49ms/step - accuracy: 0.6000 - loss: 0.6830
Epoch 8/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step - accuracy: 0.6000 - loss: 0.6809
Epoch 9/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

<keras.src.callbacks.history.History at 0x78d826596750>

## Question 9: spaCy NLP pipeline

In [5]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = """Homi Jehangir Bhaba was an Indian nuclear physicist who played a key role
in the development of India’s atomic energy program."""

doc = nlp(text)

for token in doc:
    print(token.text, token.lemma_)

print("Entities:")
for ent in doc.ents:
    print(ent.text, ent.label_)


Homi Homi
Jehangir Jehangir
Bhaba Bhaba
was be
an an
Indian indian
nuclear nuclear
physicist physicist
who who
played play
a a
key key
role role

 

in in
the the
development development
of of
India India
’s ’s
atomic atomic
energy energy
program program
. .
Entities:
Homi Jehangir Bhaba FAC
Indian NORP
India GPE


## Question 10: Chatbot for mental health platform

Architecture:
Input layer → Embedding layer → LSTM/GRU → Dense layer
spaCy or Stanford NLP for entity recognition and intent detection
Data Preprocessing:
Text cleaning and normalization
Tokenization and lemmatization
Padding sequences for uniform input
Ethical Considerations:
User privacy and data security
Avoiding harmful or biased responses
Clear disclaimers (not a replacement for professional help)