# Sentiment Analysis of Yelp Reviews

**Author**: Logan Ash  
**Date**: 2025-04-29

## Introduction

**Research Question**: What is the overall sentiment of Yelp reviews for coffee shops in New York, NY?

In this notebook, we will:
1. Collect at least 60 reviews using the Yelp Fusion API.  
2. Clean the review text.  
3. Perform sentiment analysis using:
   - TextBlob's default sentiment analyzer  
   - TextBlob's NaiveBayesAnalyzer  
4. Visualize the sentiment distributions with donut charts.  
5. Remove stop words and generate a WordCloud of the top 20 words.  

This approach will help us compare analyzers and understand common themes in customer feedback.


In [None]:
# Uncomment to install required packages if not already installed
# !pip install yelpapi textblob nltk wordcloud matplotlib nbformat

import os
import re
import requests
import pandas as pd
import matplotlib.pyplot as plt
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
import nltk
from nltk.corpus import stopwords
from wordcloud import WordCloud


In [None]:
# Set your Yelp Fusion API Key here
API_KEY = os.getenv('YELP_API_KEY') or 'YOUR_YELP_API_KEY'
HEADERS = {'Authorization': f'Bearer {API_KEY}'}


In [None]:
# Fetch business IDs for 'coffee' in New York, NY
business_ids = []
search_url = 'https://api.yelp.com/v3/businesses/search'
params = {'term': 'coffee', 'location': 'New York, NY', 'limit': 20}
response = requests.get(search_url, headers=HEADERS, params=params).json()
for biz in response.get('businesses', []):
    business_ids.append(biz['id'])

# Collect up to 60+ reviews (3 per business)
reviews = []
review_url = 'https://api.yelp.com/v3/businesses/{}/reviews'
for biz_id in business_ids:
    resp = requests.get(review_url.format(biz_id), headers=HEADERS).json()
    for r in resp.get('reviews', []):
        reviews.append(r['text'])

print(f"Collected {len(reviews)} reviews")


In [None]:
# Clean text: remove non-alphanumeric, lowercase, strip
def clean_text(text):
    text = re.sub(r'[^A-Za-z0-9 ]+', '', text)
    return text.lower().strip()

cleaned_reviews = [clean_text(r) for r in reviews]


In [None]:
# Analyze sentiment with TextBlob default analyzer
counts_default = {'positive': 0, 'negative': 0, 'neutral': 0}
for txt in cleaned_reviews:
    pol = TextBlob(txt).sentiment.polarity
    if pol > 0:
        counts_default['positive'] += 1
    elif pol < 0:
        counts_default['negative'] += 1
    else:
        counts_default['neutral'] += 1

# Donut chart
labels = list(counts_default.keys())
sizes = list(counts_default.values())
fig, ax = plt.subplots()
ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, wedgeprops={'width':0.3})
ax.set_title('Sentiment Distribution (TextBlob Default)')
plt.show()


In [None]:
# Analyze sentiment with NaiveBayesAnalyzer
counts_nb = {'positive': 0, 'negative': 0}
for txt in cleaned_reviews:
    blob = TextBlob(txt, analyzer=NaiveBayesAnalyzer())
    classification = blob.sentiment.classification
    counts_nb[classification] += 1

# Donut chart for NB
labels = list(counts_nb.keys())
sizes = list(counts_nb.values())
fig, ax = plt.subplots()
ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, wedgeprops={'width':0.3})
ax.set_title('Sentiment Distribution (NaiveBayesAnalyzer)')
plt.show()


In [None]:
# Download NLTK stopwords and filter
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

all_words = ' '.join(cleaned_reviews)
filtered = ' '.join(w for w in all_words.split() if w not in stop_words)

# Generate WordCloud
wc = WordCloud(width=800, height=400, max_words=20).generate(filtered)
plt.figure(figsize=(10, 5))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title('Top 20 Words WordCloud')
plt.show()


## Conclusion

- The default TextBlob analyzer classified reviews as positive, negative, and neutral, providing a nuanced view.
- The NaiveBayesAnalyzer only labels reviews as positive or negative, resulting in no neutrals.
- The WordCloud highlights frequent themes (e.g., espresso, service, ambiance), which align with the sentiment distributions.

Through this comparison, we see different sentiment tools offer varied insights. The WordCloud also surfaces common customer feedback topics.
