# Sentiment Analysis on Yelp Reviews

This notebook demonstrates how to perform sentiment analysis on Yelp reviews using a pre-trained BERT model.


In [None]:
# Check if required packages are installed
import importlib.util
import sys

required_packages = ['transformers', 'torch', 'selenium', 'bs4', 'pandas']

missing_packages = []
for package in required_packages:
    if importlib.util.find_spec(package) is None:
        missing_packages.append(package)

if missing_packages:
    print("The following required packages are missing:")
    for package in missing_packages:
        print(f"- {package}")
    print("\nPlease install the missing packages using the instructions in the README.md file.")
    print("You can run the following command in your terminal:")
    print("pip install -r requirements.txt")
    sys.exit(1)
else:
    print("All required packages are installed. You're good to go!")


## Data Loading


In [None]:
import pandas as pd

# Sample DataFrame
data = {
    'review': [
        "I had to get Chili Crab while I was in Singapore. So glad I did!",
        "You must try chili crab in Singapore if you're a fan of spicy seafood!",
        "My gf found this place on yelp and saw the amazing ratings and decided to give it a try.",
        "We went there on February the 5th, at about 1pm. The place was packed, but we got a table quickly.",
        "One of the best places to try chili crab. They have a variety of dishes and the ambiance is great."
    ]
}
df = pd.DataFrame(data)
df.head()


## Data Preprocessing


In [None]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors='pt', truncation=True)
    result = model(tokens)
    return int(torch.argmax(result.logits)) + 1


## Web Scraping


In [None]:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import re

# Setup Selenium WebDriver
driver = webdriver.Chrome()  # Make sure to have the appropriate driver installed

# Open the webpage
url = 'https://www.yelp.com/biz/holycrab-singapore?osq=Restaurants'
driver.get(url)

# Wait for the JavaScript to load (adjust as needed)
time.sleep(5)

# Get the page source and parse it with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Close the browser
driver.quit()

# Find all <p> tags with class containing 'comment__'
reviews = soup.find_all('p', class_=re.compile('^comment__'))

# Extract text from each review
reviews_text = [review.text for review in reviews]

# Convert to DataFrame
scraped_df = pd.DataFrame({'review': reviews_text})
scraped_df.head()


## Sentiment Analysis


In [None]:
# Apply the sentiment_score function to each review
scraped_df['sentiment'] = scraped_df['review'].apply(sentiment_score)
scraped_df


## Results

### Sentiment Analysis Results

The sentiment analysis results for each review are displayed in the DataFrame above. The sentiment scores range from 1 (very negative) to 5 (very positive).


## Conclusion

In this notebook, we demonstrated how to perform sentiment analysis on Yelp reviews using a pre-trained BERT model. The process involved loading the data, preprocessing the text, scraping reviews from Yelp, and applying the sentiment analysis model to obtain sentiment scores.
