<a href="https://colab.research.google.com/github/Susruth-23BCE5060/genAI-proj/blob/main/proj.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:


# STEP 2: Import and Upload Dataset
import pandas as pd
import spacy
from google.colab import files
from transformers import pipeline

uploaded = files.upload()
df = pd.read_csv("hotel_reviews_aspect_dataset_3001.csv")
print("Loaded dataset:")
print(df.head())

# STEP 3: Load Models
nlp = spacy.load("en_core_web_sm")

# Use a better pipeline: zero-shot for multi-class sentiment (positive, negative, neutral)
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Define common aspects
aspects = ['room', 'food', 'service', 'location', 'price']

# STEP 4: Define helper functions
def find_aspects(text):
    found = []
    for aspect in aspects:
        if aspect in text.lower():
            found.append(aspect)
    return found

# Convert zero-shot scores to sentiment
def get_sentiment_zero_shot(text):
    candidate_labels = ["positive", "negative", "neutral"]
    result = classifier(text, candidate_labels)
    return result['labels'][0]  # highest confidence label

# STEP 5: Process each sentence for aspect & sentiment
results = []

for review in df['review']:
    doc = nlp(review)
    for sent in doc.sents:
        sentence = sent.text.strip()
        matched_aspects = find_aspects(sentence)
        if matched_aspects:
            try:
                sentiment = get_sentiment_zero_shot(sentence)
                for aspect in matched_aspects:
                    results.append({
                        "Sentence": review,
                        "Aspect": aspect,
                        "Sentiment": sentiment
                    })
            except:
                continue

# STEP 6: Save Results
result_df = pd.DataFrame(results)
print(result_df.head(10))

result_df.to_csv("aspect_sentiment_transformer_output.csv", index=False)
files.download("aspect_sentiment_transformer_output.csv")

Saving hotel_reviews_aspect_dataset_3001.csv to hotel_reviews_aspect_dataset_3001.csv
Loaded dataset:
                                              review
0                 Food was cold but Affordable rates
1           Staff was rude but I loved the breakfast
2          The food was delicious but Staff was rude
3  Too far from attractions but Overpriced stay b...
4                                  Difficult to find


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


                                            Sentence Aspect Sentiment
0                 Food was cold but Affordable rates   food  positive
1          The food was delicious but Staff was rude   food  negative
2  Too far from attractions but Overpriced stay b...  price  negative
3      The food was delicious but The room was noisy   room  negative
4      The food was delicious but The room was noisy   food  negative
5            Great value for money but Food was cold   food  negative
6  Overpriced stay but The food was delicious but...   room  negative
7  Overpriced stay but The food was delicious but...   food  negative
8  Overpriced stay but The food was delicious but...  price  negative
9  Room smelled bad but Great location near downtown   room  negative


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [4]:
import pandas as pd
from sklearn.metrics import accuracy_score, classification_report

# Upload files via Colab file upload widget
from google.colab import files
uploaded = files.upload()

# Load both datasets
original_df = pd.read_csv("aspect_sentiment_transformer_output-2.csv")
corrected_df = pd.read_csv("aspect_sentiment_transformer_output_corrected.csv")

# Check if the rows align
assert len(original_df) == len(corrected_df), "Datasets have different lengths!"

# Compute accuracy
accuracy = accuracy_score(corrected_df['Sentiment'], original_df['Sentiment'])
print(f"Model Accuracy: {accuracy:.2%}")

# Detailed classification report
print("\nClassification Report:")
print(classification_report(corrected_df['Sentiment'], original_df['Sentiment']))

Saving aspect_sentiment_transformer_output_corrected.csv to aspect_sentiment_transformer_output_corrected (2).csv
Saving aspect_sentiment_transformer_output-2.csv to aspect_sentiment_transformer_output-2 (2).csv
Model Accuracy: 67.22%

Classification Report:
              precision    recall  f1-score   support

    negative       0.49      0.88      0.63       306
    positive       0.91      0.58      0.71       658

    accuracy                           0.67       964
   macro avg       0.70      0.73      0.67       964
weighted avg       0.78      0.67      0.68       964

