# Yelp Open Dataset: Load and Filter the Yelp Dataset to New Orleans Restaurants
This notebook loads the Yelp Academic Dataset and filters it to isolate restaurant businesses located in New Orleans, Louisiana. This is going to be used to filter into a business list for downstream analysis like reviews, sentiment, and business insights.

In [10]:
# Import libraries
import pandas as pd
import json

## Load the Yelp Business Dataset (JSON)

Yelp's business data is stored in a JSON format, where each line is a separate JSON object. We'll load it line-by-line into a list and then convert it to a Pandas DataFrame.

In [13]:
businesses = []
with open('yelp_academic_dataset_business.json', 'r', encoding='utf-8') as f:
    for line in f:
        businesses.append(json.loads(line))

df_business = pd.DataFrame(businesses)

## Clean and Prepare the Data

Ensure that category values are not null so I can filter businesses by type. Then look specifically for businesses that are:
- Located in New Orleans
- Have "Restaurant" listed in their categories

In [15]:
df_business['categories'] = df_business['categories'].fillna('')

# Apply filters
is_restaurant = df_business['categories'].str.contains('Restaurant', case=False)
is_new_orleans = df_business['city'].str.lower() == 'new orleans'

# Combine filters
nola_restaurants = df_business[is_restaurant & is_new_orleans]


## Preview the Filtered New Orleans Restaurants

Now that there is a list of all restaurant businesses in New Orleans. This is a preview of a few to confirm.

In [17]:
print(f"Total New Orleans restaurants found: {len(nola_restaurants)}")
nola_restaurants[['name', 'stars', 'review_count', 'categories']].head()

Total New Orleans restaurants found: 2262


Unnamed: 0,name,stars,review_count,categories
87,Copper Vine,4.5,350,"Nightlife, Pubs, Event Planning & Services, Wi..."
103,Mahony's Po-Boys & Seafood,4.0,382,"Restaurants, Seafood, Cajun/Creole"
131,Altamura,3.5,27,"Cocktail Bars, Italian, Nightlife, Seafood, Ba..."
231,Eat Mah Taco @ Pal's Lounge,4.5,8,"American (New), Food, Bars, Nightlife, Lounges..."
253,Mellow Mushroom,3.5,149,"Pizza, Restaurants, Bars, Nightlife, Sandwiches"


## Save the Filtered Data for Use in Other Notebooks

Export this DataFrame so I can reuse it in the next step when I filter reviews.

In [19]:
nola_restaurants.to_csv('nola_restaurants_clean.csv', index=False)

### Summary
- Loaded the full Yelp business dataset (line-by-line JSON)
- Filtered it to include only restaurants based in New Orleans
- Saved the cleaned business data for the next stage