# Yelp Sentiment Analysis with BERT

## Project Overview:

In the Yelp Sentiment Analysis with BERT project, I undertook the task of sentiment analysis on Yelp reviews using state-of-the-art natural language processing techniques. The goal of the project was to analyze the sentiment of user reviews for various desert businesses on Yelp, providing valuable insights into customer opinions.

## Key Components:

### 1. Web Scraping:

- Utilized web scraping techniques to extract reviews from Yelp pages of different desert businesses.
- Employed the BeautifulSoup library to parse HTML content and retrieve relevant information.

### 2. Sentiment Analysis with BERT:

- Implemented sentiment analysis using BERT, a powerful transformer-based model for natural language processing.
- Utilized the Hugging Face Transformers library to access a pre-trained BERT model for sequence classification.
- Tokenized the reviews, passed them through the BERT model, and extracted sentiment scores.

### 3. Data Storage and Manipulation:

- Employed the pandas library to store and manipulate data in a structured format.
- Created a DataFrame to store reviews and their corresponding sentiment scores.
- Iterated over multiple Yelp URLs, collected reviews, and appended them to the DataFrame.

## Technologies Used:

- Python
- BeautifulSoup
- Hugging Face Transformers
- Pandas

## Conclusion:

This Yelp Sentiment Analysis project not only showcases technical skills in web scraping and natural language processing but also highlights the ability to extract meaningful insights from unstructured data. The sentiment scores can be leveraged by businesses to understand customer sentiments and improve overall customer satisfaction.

In [1]:
# Import the necessary libraries
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re
import numpy as np
import pandas as pd

### Loading BERT

In [2]:
# Load the pre-trained tokenizer for sentiment analysis
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

# Load the pre-trained model for sentiment analysis
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

### Testing sentiment analysis on one line

In [3]:
# Tokenize the input text using the pre-trained tokenizer
text = 'this movie was the greatest i have ever seen in my life'
tokens = tokenizer.encode(text, return_tensors='pt')

In [4]:
# Pass the encoded tokens through the pre-trained sentiment analysis model
result = model(tokens)

In [5]:
# Access the raw output logits from the model's result
logits = result.logits

In [6]:
# Find the index of the maximum logit score and add 1 to get the predicted sentiment class
predicted_sentiment_class = int(torch.argmax(result.logits)) + 1
print(f'The predicted sentiment for: "{text}" is {predicted_sentiment_class}')

The predicted sentiment for: "this movie was the greatest i have ever seen in my life" is 5


### Web scraping

In [7]:
def get_reviews_from_yelp(url):
    # Make a GET request to the Yelp page for a specific business
    r = requests.get(url)

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(r.text, 'html.parser')

    # Define a regular expression for matching class names containing 'comment'
    regex = re.compile('.*comment.*')

    # Find all 'p' (paragraph) elements with class names matching the regex pattern
    results = soup.find_all('p', {'class': regex})

    # Extract the text content from each matching element and store it in the 'reviews' list
    reviews = [result.text for result in results]

    return reviews

In [8]:
# List the urls for business we want to scrape data from
yelp_urls = [
    'https://www.yelp.com/biz/crumbl-cookies-salt-lake-city-salt-lake-city-3?osq=crumbl+cookies',
    'https://www.yelp.com/biz/twisted-sugar-salt-lake-city-2?osq=twisted+sugar',
    'https://www.yelp.com/biz/swig-salt-lake-city-4?osq=Swig+Soda'
]

In [9]:
# Create a Pandas DataFrame from the list of reviews
df = pd.DataFrame(columns=['review'])

In [10]:
# Loop through the urls and add data to a dataframe
for url in yelp_urls:
    business_reviews = get_reviews_from_yelp(url)
    df = pd.concat([df, pd.DataFrame({'review': business_reviews})], ignore_index=True)

In [11]:
# Check the first 5 rows of the dataframe
df.head()

Unnamed: 0,review
0,I love Crumbl so much. We usually bike here an...
1,Lots of parking. Never a line. Cookies are mad...
2,Crumbl cookies always satisfies my cookie crav...
3,I came here to get a free birthday cookie beca...
4,What I love about crumbl- different flavors ev...


In [12]:
# Check how many entries there is in the dataframe
len(df.index)

36

### Generate sentiment scores on our dataframe

In [13]:
def sentiment_score(review):
    # Tokenize the input review using the pre-trained tokenizer
    tokens = tokenizer.encode(review, return_tensors='pt')
    # Pass the encoded tokens through the pre-trained sentiment analysis model
    result = model(tokens)
    # Find the index of the maximum logit score and add 1 to get the predicted sentiment class
    return int(torch.argmax(result.logits))+1

In [14]:
# Add a new column 'sentiment' to the DataFrame, containing sentiment scores for each review
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:500]))

In [15]:
# Overview of the dataframe
df

Unnamed: 0,review,sentiment
0,I love Crumbl so much. We usually bike here an...,5
1,Lots of parking. Never a line. Cookies are mad...,4
2,Crumbl cookies always satisfies my cookie crav...,5
3,I came here to get a free birthday cookie beca...,3
4,What I love about crumbl- different flavors ev...,3
5,I finally tried Crumbl after hearing so much p...,2
6,Worst customer service I have ever had. Refuse...,1
7,Tasty cookies and great customer service at th...,5
8,Hi John. We appreciate the great review! So g...,5
9,Seems like they have a different recipe than o...,3


### View results

In [16]:
# Check the highest score
df['review'].iloc[0]

"I love Crumbl so much. We usually bike here and share a cookie. It's a nice treat and the variety of cookies keeps us coming back. This time we had the Pumpkin Spice Cookie. It was amazing and served warm."

In [17]:
# Check the lowest score
df['review'].iloc[16]

"Went to this location today and ended up with this piece of silicone or rubber in my drink. I called to let them know and best they had for me was sorry. I've been there 3 times since I found about them a week ago. Glad to say they have lost a repeat customer. They should work on a better compensation than sorry."