# Perceptron

Using perceptron to perform binary classification of sentiment. 

Make sure you do

 **pip install scikit-learn nltk**

 **pip install transformers torch**


In [1]:
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score

# Expanded sample labeled reviews
reviews = [
    "The pasta was cooked perfectly, but the service was extremely slow.",
    "Absolutely loved the sushi! Fresh and delicious. Will visit again.",
    "The burger was overpriced and tasted bland. Not worth it.",
    "Fantastic ambiance with great music, but the food was mediocre.",
    "The waiter was rude and inattentive, but the steak was amazing.",
    "Horrible experience! Cold food, bad service, never coming back.",
    "Best pizza in town! Crispy crust and the perfect amount of cheese.",
    "The dessert was heavenly, but the portions were too small.",
    "Waited 40 minutes for food, and when it arrived, it was cold.",
    "Great value for money! Generous portions and excellent quality."
]

# Corresponding labels: 1 = Positive, 0 = Negative
labels = [0, 1, 0, 0, 0, 0, 1, 0, 0, 1]
    
# Convert text to numerical vectors
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(reviews).toarray()
y = labels

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Perceptron
model = Perceptron()
model.fit(X_train, y_train)

# Evaluate model performance
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

# Predict sentiment for new reviews
new_reviews = [
    "The food was exceptional, and the staff was super friendly!",
    "Terrible experience. Food was late and tasted awful.",
    "Amazing atmosphere but the drinks were overpriced."
]

new_X = vectorizer.transform(new_reviews).toarray()
predictions = model.predict(new_X)

# Print results
for review, prediction in zip(new_reviews, predictions):
    sentiment = "Positive" if prediction == 1 else "Negative"
    print(f"Review: {review}\nSentiment: {sentiment}\n")


Model Accuracy: 0.67
Review: The food was exceptional, and the staff was super friendly!
Sentiment: Negative

Review: Terrible experience. Food was late and tasted awful.
Sentiment: Negative

Review: Amazing atmosphere but the drinks were overpriced.
Sentiment: Negative



Perceptron : Takes a lot of time to compute and very inaccurate since the test set was very small. 

## VADER 
Using vader leads to more accuracy due to pre-built sentiment classification: 

In [2]:
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk

# Download the required package
nltk.download('vader_lexicon')

# Initialize Sentiment Analyzer
sia = SentimentIntensityAnalyzer()

# Reviews to analyze
reviews = [
    "Carve Cafe food is very bad in taste and greasy.",
    "Terrible experience. Food was late and tasted awful.",
    "Morrison serves the best chicken sandwich in town!",
    "The food was exceptional, and the staff was super friendly!",
    "The pasta was cooked perfectly, but the service was extremely slow.", # Note the program didn't detect any negative sentiment in this review
    "The burger was greasy and the fries were soggy.", # Same here, no negative sentiment detected
] 

# Analyze Sentiment
for review in reviews:
    sentiment_score = sia.polarity_scores(review)
    sentiment = "Positive" if sentiment_score['compound'] > 0 else "Negative"
    print(f"Review: {review}\nSentiment: {sentiment} (Score: {sentiment_score})\n")


Review: Carve Cafe food is very bad in taste and greasy.
Sentiment: Negative (Score: {'neg': 0.296, 'neu': 0.704, 'pos': 0.0, 'compound': -0.5849})

Review: Terrible experience. Food was late and tasted awful.
Sentiment: Negative (Score: {'neg': 0.504, 'neu': 0.496, 'pos': 0.0, 'compound': -0.7269})

Review: Morrison serves the best chicken sandwich in town!
Sentiment: Positive (Score: {'neg': 0.0, 'neu': 0.609, 'pos': 0.391, 'compound': 0.6696})

Review: The food was exceptional, and the staff was super friendly!
Sentiment: Positive (Score: {'neg': 0.0, 'neu': 0.52, 'pos': 0.48, 'compound': 0.8122})

Review: The pasta was cooked perfectly, but the service was extremely slow.
Sentiment: Positive (Score: {'neg': 0.0, 'neu': 0.794, 'pos': 0.206, 'compound': 0.3818})

Review: The burger was greasy and the fries were soggy.
Sentiment: Negative (Score: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0})



[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/arsh/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


# K-MEANS Clustering and TF-IDF

TF-IDF Vectorization:
Reviews are transformed into numerical features that capture the importance of words and phrases (including bigrams) while ignoring common stop words.

K-Means Clustering:
The reviews are grouped into clusters. You can adjust num_clusters based on how many distinct topics/products you expect (for example, clusters might naturally emerge for "chicken", "greasy food", "pasta", etc.).

Top Terms Extraction:
For each cluster, we sort the cluster center’s feature weights and print the top terms. These terms give insight into what the cluster is mainly discussing.

In [4]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

# Sample restaurant reviews focusing on product mentions
reviews = [
    "I love the crispy chicken wings and the spicy sauce.",
    "The burger was greasy and the fries were soggy.",
    "Amazing grilled chicken salad with fresh veggies.",
    "The pizza was too greasy, but the crust was perfect.",
    "Chicken tenders here are delicious and crispy.",
    "I found the pasta a bit bland and the sauce too oily.",
    "The fried chicken is superb, perfectly seasoned.",
    "Greasy food isn't my thing, especially when it's overcooked.",
    "Loved the chicken wrap; it was light and full of flavor.",
    "The deep-fried chicken was overly greasy and unappetizing."
]

# Step 1: Convert text to TF-IDF features
vectorizer = TfidfVectorizer(stop_words="english", ngram_range=(1,2))
X = vectorizer.fit_transform(reviews)

# Step 2: Apply K-Means clustering
num_clusters = 3  # Adjust as needed for your dataset
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
clusters = kmeans.fit_predict(X)

# Step 3: Output the reviews grouped by cluster and list top terms per cluster
terms = vectorizer.get_feature_names_out()

for i in range(num_clusters):
    print(f"--- Cluster {i} ---")
    cluster_reviews = [reviews[j] for j in range(len(reviews)) if clusters[j] == i]
    for review in cluster_reviews:
        print(review)
    
    # Identify top terms in the cluster
    # Note: We use the cluster centers to rank terms. They are in the TF-IDF space.
    order_centroids = kmeans.cluster_centers_[i].argsort()[::-1]
    top_terms = [terms[ind] for ind in order_centroids[:10]]
    print("\nTop terms in this cluster:", top_terms)
    print("\n")


--- Cluster 0 ---
The burger was greasy and the fries were soggy.
The pizza was too greasy, but the crust was perfect.
Greasy food isn't my thing, especially when it's overcooked.

Top terms in this cluster: ['greasy', 'crust', 'soggy', 'greasy crust', 'greasy fries', 'crust perfect', 'perfect', 'fries', 'pizza', 'pizza greasy']


--- Cluster 1 ---
Amazing grilled chicken salad with fresh veggies.
The fried chicken is superb, perfectly seasoned.
Loved the chicken wrap; it was light and full of flavor.
The deep-fried chicken was overly greasy and unappetizing.

Top terms in this cluster: ['chicken', 'fried chicken', 'fried', 'superb perfectly', 'seasoned', 'perfectly seasoned', 'superb', 'perfectly', 'chicken superb', 'chicken wrap']


--- Cluster 2 ---
I love the crispy chicken wings and the spicy sauce.
Chicken tenders here are delicious and crispy.
I found the pasta a bit bland and the sauce too oily.

Top terms in this cluster: ['crispy', 'sauce', 'tenders delicious', 'chicken tende