# Yelp Analysis: Finding Underrated Restaurants in New Orleans

This notebook brings together filtered Yelp business and review data to uncover "hidden gem" restaurants in New Orleans. These are restaurants with high customer satisfaction but low visibility. I will apply sentiment analysis, estimate lost revenue potential, analyze business attributes, and export cleaned datasets for Tableau visualization.

In [18]:
# Imports
import pandas as pd
import os
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

## Load Filtered Business and Review Data
Load the cleaned datasets created in the previous notebooks.

In [21]:
nola_restaurants = pd.read_csv('nola_restaurants_clean.csv')
df_nola_reviews = pd.read_csv('nola_reviews_filtered.csv')

print(f"Restaurants loaded: {len(nola_restaurants)}")
print(f"Reviews loaded: {len(df_nola_reviews)}")

Restaurants loaded: 2262
Reviews loaded: 476572


## Merge Review Data with Business Metadata
Improve the review data by joining it with business details like category, review count, and restaurant name.

In [23]:
df_reviews_joined = df_nola_reviews.merge(
    nola_restaurants[['business_id', 'name', 'stars', 'review_count', 'categories']],
    on='business_id',
    how='left'
)

print(f"Merged dataset shape: {df_reviews_joined.shape}")
df_reviews_joined[['name', 'text']].head()

Merged dataset shape: (476572, 13)


Unnamed: 0,name,text
0,Melt,Cute interior and owner (?) gave us tour of up...
1,Creole House Restaurant & Oyster Bar,"Service was crappy, and food was mediocre. I ..."
2,Mr. B's Bistro,"In a word... ""OVERRATED!"". The food took fore..."
3,Cafe Beignet on Bourbon Street,The cafe was extremely cute. We came at 8am an...
4,Pho Cam Ly,After 3 weeks of working in the area I finally...


## Define “Underrated” Restaurants
Restaurants with a 4.5+ star rating and fewer than 30 reviews are considered underrated since they have high satisfaction but low visibility.

In [25]:
underrated = nola_restaurants[
    (nola_restaurants['stars'] >= 4.5) &
    (nola_restaurants['review_count'] < 30)
].copy()

print(f"Underrated restaurants found: {len(underrated)}")
underrated[['name', 'stars', 'review_count']].head()

Underrated restaurants found: 237


Unnamed: 0,name,stars,review_count
3,Eat Mah Taco @ Pal's Lounge,4.5,8
15,Stokehold Restaurant,4.5,13
36,The Wacked Out Weiner New Orleans,4.5,21
42,Petite Clouet Cafe,4.5,18
50,The Munch Factory,4.5,21


## Sentiment Analysis on 5K Sample Using VADER
Next, I apply fast sentiment analysis on a random 5,000-review sample using the VADER library. Each review receives a sentiment score from -1 (very negative) to +1 (very positive).

In [27]:
analyzer = SentimentIntensityAnalyzer()

# Use actual column from merged data
df_sample_reviews = df_reviews_joined[['name', 'text', 'stars_y']].sample(5000, random_state=42).copy()
df_sample_reviews.rename(columns={'stars_y': 'stars'}, inplace=True)

# Apply sentiment
df_sample_reviews['sentiment'] = df_sample_reviews['text'].apply(
    lambda x: analyzer.polarity_scores(x)['compound']
)

print("Sample sentiment scores:")
df_sample_reviews[['name', 'stars', 'sentiment']].head()

Sample sentiment scores:


Unnamed: 0,name,stars,sentiment
351944,Daisy Dukes Express,4.0,0.966
242146,Majoria's Commerce Restaurant,4.5,0.9661
48154,Creole House Restaurant & Oyster Bar,4.0,0.8947
235291,The Market Cafe,3.5,0.5719
255296,Daisy Dukes - French Quarter,3.5,0.9863


## Estimate Revenue Potential
Next I assume an average spend of $50 per review to estimate revenue. Then, I compare actual vs. potential revenue if underrated restaurants had median visibility.

In [29]:
avg_spend = 50
nola_restaurants['estimated_revenue'] = nola_restaurants['review_count'] * avg_spend
median_revenue = nola_restaurants['estimated_revenue'].median()

underrated = nola_restaurants[
    (nola_restaurants['stars'] >= 4.5) &
    (nola_restaurants['review_count'] < 30)
].copy()
underrated['potential_revenue_if_exposed'] = median_revenue

underrated[['name', 'estimated_revenue', 'potential_revenue_if_exposed']].head()

Unnamed: 0,name,estimated_revenue,potential_revenue_if_exposed
3,Eat Mah Taco @ Pal's Lounge,400,2850.0
15,Stokehold Restaurant,650,2850.0
36,The Wacked Out Weiner New Orleans,1050,2850.0
42,Petite Clouet Cafe,900,2850.0
50,The Munch Factory,1050,2850.0


## Analyze Common Business Features
Next I explore which business categories (like cuisine or service type) appear most often in high-rated restaurants. This can help uncover success patterns.

In [31]:
top_rated = nola_restaurants[nola_restaurants['stars'] >= 4.5]
category_counts = top_rated['categories'].str.get_dummies(sep=', ').sum().sort_values(ascending=False)

print("Top business categories in highly-rated restaurants:")
category_counts.head(10)

Top business categories in highly-rated restaurants:


Restaurants           595
Food                  256
Nightlife             134
Bars                  123
Breakfast & Brunch    105
American (New)         94
Sandwiches             88
Seafood                80
Cajun/Creole           77
Coffee & Tea           76
dtype: int64

## Export Clean Datasets for Tableau
Finally, export all relevant outputs to the `exports/` folder for use in Tableau dashboards.

In [43]:
os.makedirs('exports', exist_ok=True)

df_sample_reviews.to_csv('exports/nola_reviews_sample_sentiment1.csv', index=False)
nola_restaurants.to_csv('exports/nola_restaurants_with_revenue1.csv', index=False)
underrated.to_csv('exports/nola_underrated_restaurants1.csv', index=False)

print("Export complete. Files saved to /exports/")
print(os.listdir('exports'))

Export complete. Files saved to /exports/
['nola_restaurants_with_revenue.csv', 'nola_restaurants_with_revenue.xlsx', 'nola_restaurants_with_revenue1.csv', 'nola_reviews_sample_sentiment.csv', 'nola_reviews_sample_sentiment.xlsx', 'nola_reviews_sample_sentiment1.csv', 'nola_reviews_sample_sentiment2.csv', 'nola_underrated_restaurants.csv', 'nola_underrated_restaurants.xlsx', 'nola_underrated_restaurants1.csv']


### Summary
- Merged business and review data for New Orleans
- Identified "hidden gem" restaurants with strong ratings and low visibility
- Applied fast sentiment analysis using VADER
- Estimated potential revenue uplift
- Analyzed top business features in high-performing restaurants
- Exported cleaned datasets for Tableau visualization