# Yelp Open Dataset: Filter Reviews for New Orleans Restaurants

This notebook loads the full Yelp reviews dataset and filters it to keep only the reviews related to New Orleans restaurants. This will allow me to analyze customer sentiment and behavior in later steps.

In [2]:
# Import libraries
import pandas as pd
import json

## Load and Filter the Review Dataset

The Yelp review dataset is very large, so it'll be read it line by line and keep only those reviews that match one of our New Orleans restaurants.

In [4]:
print("Loading business data...")
businesses = []
with open('yelp_academic_dataset_business.json', 'r', encoding='utf-8') as f:
    for line in f:
        businesses.append(json.loads(line))

df_business = pd.DataFrame(businesses)
df_business['categories'] = df_business['categories'].fillna('')

# Filter for restaurants in New Orleans
is_restaurant = df_business['categories'].str.contains('Restaurant', case=False)
is_new_orleans = df_business['city'].str.lower() == 'new orleans'

nola_restaurants = df_business[is_restaurant & is_new_orleans]

print(f"Total New Orleans restaurants found: {len(nola_restaurants)}")
print(nola_restaurants[['name', 'stars', 'review_count', 'categories']].head())

# --------------------------
# Step 2: Filter Reviews for NOLA Restaurants
# --------------------------
print("Filtering reviews... this may take a few minutes.")

nola_business_ids = set(nola_restaurants['business_id'])
nola_reviews = []

with open('yelp_academic_dataset_review.json', 'r', encoding='utf-8') as f:
    for line in f:
        review = json.loads(line)
        if review['business_id'] in nola_business_ids:
            nola_reviews.append(review)

df_nola_reviews = pd.DataFrame(nola_reviews)

print(f"Total NOLA reviews loaded: {len(df_nola_reviews)}")
print(df_nola_reviews[['business_id', 'stars', 'text']].head())

Loading business data...
Total New Orleans restaurants found: 2262
                            name  stars  review_count  \
87                   Copper Vine    4.5           350   
103   Mahony's Po-Boys & Seafood    4.0           382   
131                     Altamura    3.5            27   
231  Eat Mah Taco @ Pal's Lounge    4.5             8   
253              Mellow Mushroom    3.5           149   

                                            categories  
87   Nightlife, Pubs, Event Planning & Services, Wi...  
103                 Restaurants, Seafood, Cajun/Creole  
131  Cocktail Bars, Italian, Nightlife, Seafood, Ba...  
231  American (New), Food, Bars, Nightlife, Lounges...  
253    Pizza, Restaurants, Bars, Nightlife, Sandwiches  
Filtering reviews... this may take a few minutes.
Total NOLA reviews loaded: 476572
              business_id  stars  \
0  e4Vwtrqf-wpJfwesgvdgxQ    4.0   
1  S2Ho8yLxhKAa26pBAm6rxA    3.0   
2  W4ZEKkva9HpAdZG88juwyQ    3.0   
3  I6L0Zxi5Ww0zEWSAV

## Preview the Filtered Reviews
Confirm that the review dataset looks correct.

In [6]:
print(f"Total New Orleans restaurants found: {len(nola_restaurants)}")
print(nola_restaurants[['name', 'stars', 'review_count', 'categories']].head())

Total New Orleans restaurants found: 2262
                            name  stars  review_count  \
87                   Copper Vine    4.5           350   
103   Mahony's Po-Boys & Seafood    4.0           382   
131                     Altamura    3.5            27   
231  Eat Mah Taco @ Pal's Lounge    4.5             8   
253              Mellow Mushroom    3.5           149   

                                            categories  
87   Nightlife, Pubs, Event Planning & Services, Wi...  
103                 Restaurants, Seafood, Cajun/Creole  
131  Cocktail Bars, Italian, Nightlife, Seafood, Ba...  
231  American (New), Food, Bars, Nightlife, Lounges...  
253    Pizza, Restaurants, Bars, Nightlife, Sandwiches  


## Save the Filtered Review Data

Save this for use in the next notebook, where I'll join it with business info and run sentiment analysis.

In [8]:
df_nola_reviews.to_csv('nola_reviews_filtered.csv', index=False)

### Summary
- Loaded the Yelp review dataset line-by-line
- Filtered it to include only reviews for restaurants in New Orleans
- Saved the cleaned review data for use in downstream analysis