## Data Preprocessing

**Name**: Diane Lu

**Contact**: dianengalu@gmail.com

**Date**: 07/31/2023

## Table of Contents 

1. [Introduction](#intro)
2. [Merged Dataset](#merge)
3. [Final Dataset](#final)
4. [Model Dataset](#model)
5. [Vancouver Data](#van)

### Introduction <a class="anchor" id="intro"></a>

Data Cleaning is the process of rectifying inconsistencies and handling missing values in raw data to ensure its quality and suitability for analysis and modeling tasks. This essential phase plays a critical role in producing accurate and reliable results from the data-driven processes ahead.

#### Downloading Yelp Dataset from Kaggle 

You can download the necessary Yelp Dataset from [Kaggle](https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/3?datasetId=10100&sortBy=voteCount&select=Dataset_User_Agreement.pdf). For this data analysis, we will be using Version 3 of the Yelp Dataset found on Kaggle and specifically the business, review, and user datasetS. 

#### Importing Python Libraries 

Importing necessary libraries for the Data Preprocessing.

In [1]:
# Import necessary libraries
import numpy as np 
import pandas as pd 

# Ignore all warnings to avoid cluttering the output
import warnings
warnings.filterwarnings("ignore")

### Merging Datasets for Modeling <a class="anchor" id="merge"></a>

In [73]:
# Load the pickled business_data DataFrame
business_data = pd.read_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/business_data.pkl')

# Load the pickled review_data DataFrame
review_data = pd.read_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/review_data.pkl')

# Load the pickled user_data DataFrame
user_data = pd.read_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/user_data.pkl')

In [74]:
# Merge the 'review_data' DataFrame and 'business_data' DataFrame based on the common column 'business_id'
business_reviews = review_data.merge(business_data, on='business_id')

# Display the first few rows of the 'business_reviews' DataFrame
business_reviews.head()

Unnamed: 0,review_id,user_id,business_id,stars,text,restaurant_name,address,city,state,postal_code,latitude,longitude,restaurant_rating,restaurant_review_count,is_open,categories
0,lWC-xP3rd6obsecCYsGZRg,ak0TdVmGKo4pwqdJSTLwWw,buF9druCkbuXLX526sGELQ,4.0,Apparently Prides Osteria had a rough summer a...,Prides Osteria,240 Rantoul St,Beverly,MA,1915,42.549609,-70.884046,3.5,83,0,"[Wine Bars, Nightlife, Farmers Market, Bars, I..."
1,hpcZLEzqD4_gPi6eSVi_Bg,Y-j2svl0M_5-jF1ehYuNPQ,buF9druCkbuXLX526sGELQ,2.0,I was really disappointed to say the least. I ...,Prides Osteria,240 Rantoul St,Beverly,MA,1915,42.549609,-70.884046,3.5,83,0,"[Wine Bars, Nightlife, Farmers Market, Bars, I..."
2,3FvY1Se8y2WXqTbaANOqMw,xUCX4GhBpeWxZB0l2lmt_w,buF9druCkbuXLX526sGELQ,5.0,This is as close to dining in Italy as you'll ...,Prides Osteria,240 Rantoul St,Beverly,MA,1915,42.549609,-70.884046,3.5,83,0,"[Wine Bars, Nightlife, Farmers Market, Bars, I..."
3,C1uQNP2ehBktS43ZRMEvkg,2M6KFsWIUXElqcQRz4A0Qg,buF9druCkbuXLX526sGELQ,5.0,Great food and service! Again. 4 out of the la...,Prides Osteria,240 Rantoul St,Beverly,MA,1915,42.549609,-70.884046,3.5,83,0,"[Wine Bars, Nightlife, Farmers Market, Bars, I..."
4,Cja8_35_kQDnF9g4voikzw,t5SRIRU6INiAyVkiMJhRPA,buF9druCkbuXLX526sGELQ,1.0,We ordered the roasted chicken and homemade pa...,Prides Osteria,240 Rantoul St,Beverly,MA,1915,42.549609,-70.884046,3.5,83,0,"[Wine Bars, Nightlife, Farmers Market, Bars, I..."


In [75]:
# Merge the 'business_reviews' DataFrame and 'user_data' DataFrame based on the common column 'user_id'
final_business_reviews = business_reviews.merge(user_data, on='user_id')

# Display the first few rows of the 'final_business_reviews' DataFrame
final_business_reviews.head()

In [None]:
# Pickle the DataFrame
final_business_reviews.to_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/final_reviews.pkl')

### Final Dataset for EDA <a class="anchor" id="final"></a>

**Data Dictionary:**
* `review_id`: unique review id
* `user_id`: unique user id
* `business_id`: unique user id
* `stars`: star rating
* `text`: the review itself
* `restaurant_name`: the restaurant's name
* `address`: the full address of the restaurant
* `city`: the city
* `state`: 2 character state code
* `postal_code`: the postal code
* `latitude`: latitude of the restaurant
* `longitude`: longitude of the restaurant
* `restaurant_rating`: star rating
* `restaurant_review_count`: number of reviews
* `user_review_count`: the number of reviews they've written
* `is_open`: 0 or 1 for closed or open
* `categories`: business categories
* `user_name`: the user's first name
* `average_stars`: average rating of all reviews

In [None]:
final_data = pd.read_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/final_reviews.pkl')

In [None]:
# Display concise information about the 'final_data' DataFrame
final_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5574714 entries, 0 to 5574713
Data columns (total 19 columns):
 #   Column                   Dtype  
---  ------                   -----  
 0   review_id                object 
 1   user_id                  object 
 2   business_id              object 
 3   stars                    float64
 4   text                     object 
 5   restaurant_name          object 
 6   address                  object 
 7   city                     object 
 8   state                    object 
 9   postal_code              object 
 10  latitude                 float64
 11  longitude                float64
 12  restaurant_rating        float64
 13  restaurant_review_count  int64  
 14  is_open                  int64  
 15  categories               object 
 16  name                     object 
 17  user_review_count        int64  
 18  average_stars            float64
dtypes: float64(5), int64(3), object(11)
memory usage: 808.1+ MB


In [None]:
# Display the first few rows of the 'final_data' DataFrame
final_data.head()

Unnamed: 0,review_id,user_id,business_id,stars,text,restaurant_name,address,city,state,postal_code,latitude,longitude,restaurant_rating,restaurant_review_count,is_open,categories,name,user_review_count,average_stars
0,lWC-xP3rd6obsecCYsGZRg,ak0TdVmGKo4pwqdJSTLwWw,buF9druCkbuXLX526sGELQ,4.0,Apparently Prides Osteria had a rough summer a...,Prides Osteria,240 Rantoul St,Beverly,MA,1915,42.549609,-70.884046,3.5,83,0,"[Wine Bars, Nightlife, Farmers Market, Bars, I...",Mel,63,4.3
1,fLlML7BjkR4_fJnND_hEJw,ak0TdVmGKo4pwqdJSTLwWw,bNZ3-0rse12NKdSVqQ30xw,4.0,"Came with friends, split the funghi pizza and ...",Sulmona,608 Main St,Cambridge,MA,2139,42.362867,-71.093846,4.0,220,1,"[Pizza, Italian, Nightlife, Bars]",Mel,63,4.3
2,pRtbswupEVIG1Ykj9xkL7Q,ak0TdVmGKo4pwqdJSTLwWw,BVsIaKL-8QXVjt0Z9WoFWw,4.0,Went for late lunch had the combination seafoo...,Village Roast Beef & Seafood,10 Bessom St,Marblehead,MA,1945,42.500243,-70.859237,4.5,53,1,"[Seafood, American (Traditional)]",Mel,63,4.3
3,fUYl6bnZy4bSGnbPAizXug,ak0TdVmGKo4pwqdJSTLwWw,4MClvr12OXBNvGu8h1yGpA,5.0,"We were super excited to try Sarma, having bee...",Sarma,249 Pearl St,Somerville,MA,2145,42.38818,-71.095545,4.5,883,1,"[Turkish, Middle Eastern, Moroccan, Tapas/Smal...",Mel,63,4.3
4,jHh2LIXNsnJCMUiyI9pt5w,ak0TdVmGKo4pwqdJSTLwWw,2vH58mhkEl8GdcDug1OwWg,5.0,So glad we made the trip to Woburn for Gene's ...,Gene's Chinese Flatbread Cafe,466 Main St,Woburn,MA,1801,42.481598,-71.150877,4.0,233,1,"[Cafes, Noodles, Chinese]",Mel,63,4.3


In [None]:
final_data.isnull().sum()

review_id                  0
user_id                    0
business_id                0
stars                      0
text                       0
restaurant_name            0
address                    0
city                       0
state                      0
postal_code                0
latitude                   0
longitude                  0
restaurant_rating          0
restaurant_review_count    0
is_open                    0
categories                 0
name                       0
user_review_count          0
average_stars              0
dtype: int64

### Modeling Dataset <a class="anchor" id="model"></a>

Creating a threshold where we are only including restaurants where `restaurant_review_count` is greater than 100. 

In [None]:
# Filter 'final_data' DataFrame to include only rows where the 'restaurant_review_count' is greater than or equal to 100
final_data = final_data[final_data['restaurant_review_count'] >= 100]

# Print a sanity check message showing the minimum number of restaurant reviews in the filtered DataFrame
print(f"Sanity Check: The minimum amount of restaurant reviews is {final_data['restaurant_review_count'].min()}.")  

Sanity Check: The minimum amount of restaurant reviews is 100.


Creating a threshold where we are only including users where `user_review_count` is greater than 100. 

In [None]:
# Filter 'final_data' DataFrame to include only rows where the 'user_review_count' is greater than or equal to 100
final_data = final_data[final_data['user_review_count'] >= 100]

# Print a sanity check message showing the minimum number of user reviews in the filtered DataFrame
print(f"Sanity Check: The minimum amount of user reviews is {final_data['user_review_count'].min()}.")

Sanity Check: The minimum amount of user reviews is 100.


In [None]:
from sklearn import preprocessing

def label_encode_column(df, column_name, encoder_name):
    # Create an instance of the LabelEncoder
    encoder = preprocessing.LabelEncoder()

    # Use the LabelEncoder to transform the specified column
    df[column_name] = encoder.fit_transform(df[column_name])

    # Store the encoder object in the DataFrame for later use if needed
    df[encoder_name] = encoder

    return df

In [None]:
# Label encode the 'review_id' column in the 'final_data' DataFrame
label_encode_column(final_data, 'review_id', 'review_id_encoder')

Unnamed: 0,review_id,user_id,business_id,stars,text,restaurant_name,address,city,state,postal_code,latitude,longitude,restaurant_rating,restaurant_review_count,is_open,categories,name,user_review_count,average_stars,review_id_encoder
40,150339,djp57omz9cccV1wI0_sqqA,SH4c_oijp86ooTJmLvx6SQ,4.0,Cute little spot. Three of us stopped in for l...,Thaitation,129 Jersey St,Boston,MA,02215,42.342315,-71.097004,4.0,315,1,[Thai],Stephanie,351,3.65,LabelEncoder()
41,594094,djp57omz9cccV1wI0_sqqA,H61GvE_VbTXtQbEgTPLchg,2.0,"As much as we like the waitstaff here, the foo...",Howling Wolf Taqueria,76 Lafayette St,Salem,MA,01970,42.519292,-70.893730,4.0,964,1,"[Bars, Arts & Entertainment, Nightlife, Music ...",Stephanie,351,3.65,LabelEncoder()
42,224067,djp57omz9cccV1wI0_sqqA,rbGjumv1jiCJ6FN3rPpciA,3.0,"This is a nothing fancy kinda joint, with kick...",Santarpio's Pizza,71 Newbury St,Peabody,MA,01960,42.527896,-70.993317,3.0,359,1,"[Pizza, American (Traditional), Italian]",Stephanie,351,3.65,LabelEncoder()
43,1019629,djp57omz9cccV1wI0_sqqA,5HMXgD_gui5n0Tc_hadesg,2.0,I had a 5 minute honeymoon with The Gallows wh...,The Gallows,1395 Washington St,Boston,MA,02118,42.341343,-71.070251,4.0,859,1,"[Seafood, Bars, American (New), American (Trad...",Stephanie,351,3.65,LabelEncoder()
44,71707,djp57omz9cccV1wI0_sqqA,EHBDrF-AIIyA_6fV4aNFpg,3.0,"Antique Table is ok, but it's going to have to...",Antique Table,2 Essex St,Lynn,MA,01902,42.475805,-70.926544,4.0,201,1,[Italian],Stephanie,351,3.65,LabelEncoder()
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5572066,240578,Mc4C7fVY0sEcD-U5eOA2Og,HpiE8X3s8xRRRihjp7_IMA,4.0,Quality Italian food and good service in an at...,Mia's Italian Kitchen,8717 International Dr,Orlando,FL,32819,28.439125,-81.470086,4.5,532,1,"[Pizza, Breakfast & Brunch, Italian]",Steven,471,3.80,LabelEncoder()
5572085,941863,huXqrSaGyNO1aZKiM55EUg,HpiE8X3s8xRRRihjp7_IMA,5.0,Ahhhhmazing Chicken Cacciatore!!! And Sunday G...,Mia's Italian Kitchen,8717 International Dr,Orlando,FL,32819,28.439125,-81.470086,4.5,532,1,"[Pizza, Breakfast & Brunch, Italian]",K,176,4.31,LabelEncoder()
5572508,123007,KEF5A094wOUdBG7SsS7qKg,dkiANu_mPavLe1vQNsFmjQ,4.0,If you're looking for a burger and wings you c...,Jimmy’s Burgers and Wings,5256 W Irlo Bronson Memorial Hwy,Kissimmee,FL,34746,28.332246,-81.493330,4.0,256,1,"[Chicken Wings, Desserts, Ice Cream & Frozen Y...",Alberto,268,3.82,LabelEncoder()
5572754,734565,zt9FNJMJNVt65Dl1GMuJqA,071cMCITomxpMOYDMgUWeQ,5.0,Visiting from out of town and this came up in ...,Volcano Hot Pot & BBQ,5877 W Irlo Bronson Memorial Hwy,Kissimmee,FL,34746,28.333532,-81.518989,4.5,280,1,"[Hot Pot, Barbeque, Korean]",Crystal,147,4.38,LabelEncoder()


In [None]:
# Label encode the 'business_id' column in the 'final_data' DataFrame
label_encode_column(final_data, 'business_id', 'business_id_encoder')

Unnamed: 0,review_id,user_id,business_id,stars,text,restaurant_name,address,city,state,postal_code,...,longitude,restaurant_rating,restaurant_review_count,is_open,categories,name,user_review_count,average_stars,review_id_encoder,business_id_encoder
40,150339,djp57omz9cccV1wI0_sqqA,6620,4.0,Cute little spot. Three of us stopped in for l...,Thaitation,129 Jersey St,Boston,MA,02215,...,-71.097004,4.0,315,1,[Thai],Stephanie,351,3.65,LabelEncoder(),LabelEncoder()
41,594094,djp57omz9cccV1wI0_sqqA,4147,2.0,"As much as we like the waitstaff here, the foo...",Howling Wolf Taqueria,76 Lafayette St,Salem,MA,01970,...,-70.893730,4.0,964,1,"[Bars, Arts & Entertainment, Nightlife, Music ...",Stephanie,351,3.65,LabelEncoder(),LabelEncoder()
42,224067,djp57omz9cccV1wI0_sqqA,12401,3.0,"This is a nothing fancy kinda joint, with kick...",Santarpio's Pizza,71 Newbury St,Peabody,MA,01960,...,-70.993317,3.0,359,1,"[Pizza, American (Traditional), Italian]",Stephanie,351,3.65,LabelEncoder(),LabelEncoder()
43,1019629,djp57omz9cccV1wI0_sqqA,1357,2.0,I had a 5 minute honeymoon with The Gallows wh...,The Gallows,1395 Washington St,Boston,MA,02118,...,-71.070251,4.0,859,1,"[Seafood, Bars, American (New), American (Trad...",Stephanie,351,3.65,LabelEncoder(),LabelEncoder()
44,71707,djp57omz9cccV1wI0_sqqA,3498,3.0,"Antique Table is ok, but it's going to have to...",Antique Table,2 Essex St,Lynn,MA,01902,...,-70.926544,4.0,201,1,[Italian],Stephanie,351,3.65,LabelEncoder(),LabelEncoder()
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5572066,240578,Mc4C7fVY0sEcD-U5eOA2Og,4317,4.0,Quality Italian food and good service in an at...,Mia's Italian Kitchen,8717 International Dr,Orlando,FL,32819,...,-81.470086,4.5,532,1,"[Pizza, Breakfast & Brunch, Italian]",Steven,471,3.80,LabelEncoder(),LabelEncoder()
5572085,941863,huXqrSaGyNO1aZKiM55EUg,4317,5.0,Ahhhhmazing Chicken Cacciatore!!! And Sunday G...,Mia's Italian Kitchen,8717 International Dr,Orlando,FL,32819,...,-81.470086,4.5,532,1,"[Pizza, Breakfast & Brunch, Italian]",K,176,4.31,LabelEncoder(),LabelEncoder()
5572508,123007,KEF5A094wOUdBG7SsS7qKg,9377,4.0,If you're looking for a burger and wings you c...,Jimmy’s Burgers and Wings,5256 W Irlo Bronson Memorial Hwy,Kissimmee,FL,34746,...,-81.493330,4.0,256,1,"[Chicken Wings, Desserts, Ice Cream & Frozen Y...",Alberto,268,3.82,LabelEncoder(),LabelEncoder()
5572754,734565,zt9FNJMJNVt65Dl1GMuJqA,237,5.0,Visiting from out of town and this came up in ...,Volcano Hot Pot & BBQ,5877 W Irlo Bronson Memorial Hwy,Kissimmee,FL,34746,...,-81.518989,4.5,280,1,"[Hot Pot, Barbeque, Korean]",Crystal,147,4.38,LabelEncoder(),LabelEncoder()


In [None]:
# Label encode the 'user_id' column in the 'final_data' DataFrame
label_encode_column(final_data, 'user_id', 'user_id_encoder')

Unnamed: 0,review_id,user_id,business_id,stars,text,restaurant_name,address,city,state,postal_code,...,restaurant_rating,restaurant_review_count,is_open,categories,name,user_review_count,average_stars,review_id_encoder,business_id_encoder,user_id_encoder
40,150339,53031,6620,4.0,Cute little spot. Three of us stopped in for l...,Thaitation,129 Jersey St,Boston,MA,02215,...,4.0,315,1,[Thai],Stephanie,351,3.65,LabelEncoder(),LabelEncoder(),LabelEncoder()
41,594094,53031,4147,2.0,"As much as we like the waitstaff here, the foo...",Howling Wolf Taqueria,76 Lafayette St,Salem,MA,01970,...,4.0,964,1,"[Bars, Arts & Entertainment, Nightlife, Music ...",Stephanie,351,3.65,LabelEncoder(),LabelEncoder(),LabelEncoder()
42,224067,53031,12401,3.0,"This is a nothing fancy kinda joint, with kick...",Santarpio's Pizza,71 Newbury St,Peabody,MA,01960,...,3.0,359,1,"[Pizza, American (Traditional), Italian]",Stephanie,351,3.65,LabelEncoder(),LabelEncoder(),LabelEncoder()
43,1019629,53031,1357,2.0,I had a 5 minute honeymoon with The Gallows wh...,The Gallows,1395 Washington St,Boston,MA,02118,...,4.0,859,1,"[Seafood, Bars, American (New), American (Trad...",Stephanie,351,3.65,LabelEncoder(),LabelEncoder(),LabelEncoder()
44,71707,53031,3498,3.0,"Antique Table is ok, but it's going to have to...",Antique Table,2 Essex St,Lynn,MA,01902,...,4.0,201,1,[Italian],Stephanie,351,3.65,LabelEncoder(),LabelEncoder(),LabelEncoder()
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5572066,240578,30023,4317,4.0,Quality Italian food and good service in an at...,Mia's Italian Kitchen,8717 International Dr,Orlando,FL,32819,...,4.5,532,1,"[Pizza, Breakfast & Brunch, Italian]",Steven,471,3.80,LabelEncoder(),LabelEncoder(),LabelEncoder()
5572085,941863,58343,4317,5.0,Ahhhhmazing Chicken Cacciatore!!! And Sunday G...,Mia's Italian Kitchen,8717 International Dr,Orlando,FL,32819,...,4.5,532,1,"[Pizza, Breakfast & Brunch, Italian]",K,176,4.31,LabelEncoder(),LabelEncoder(),LabelEncoder()
5572508,123007,26963,9377,4.0,If you're looking for a burger and wings you c...,Jimmy’s Burgers and Wings,5256 W Irlo Bronson Memorial Hwy,Kissimmee,FL,34746,...,4.0,256,1,"[Chicken Wings, Desserts, Ice Cream & Frozen Y...",Alberto,268,3.82,LabelEncoder(),LabelEncoder(),LabelEncoder()
5572754,734565,81015,237,5.0,Visiting from out of town and this came up in ...,Volcano Hot Pot & BBQ,5877 W Irlo Bronson Memorial Hwy,Kissimmee,FL,34746,...,4.5,280,1,"[Hot Pot, Barbeque, Korean]",Crystal,147,4.38,LabelEncoder(),LabelEncoder(),LabelEncoder()


In [None]:
# Drop the 'review_id_encoder', 'business_id_encoder', and 'user_id_encoder' columns from the 'final_data' DataFrame
final_data = final_data.drop(['review_id_encoder', 'business_id_encoder', 'user_id_encoder'], axis=1)

In [None]:
# Display the first few rows of the 'final_data' DataFrame
final_data.head()

Unnamed: 0,review_id,user_id,business_id,stars,text,restaurant_name,address,city,state,postal_code,latitude,longitude,restaurant_rating,restaurant_review_count,is_open,categories,name,user_review_count,average_stars
40,150339,53031,6620,4.0,Cute little spot. Three of us stopped in for l...,Thaitation,129 Jersey St,Boston,MA,2215,42.342315,-71.097004,4.0,315,1,[Thai],Stephanie,351,3.65
41,594094,53031,4147,2.0,"As much as we like the waitstaff here, the foo...",Howling Wolf Taqueria,76 Lafayette St,Salem,MA,1970,42.519292,-70.89373,4.0,964,1,"[Bars, Arts & Entertainment, Nightlife, Music ...",Stephanie,351,3.65
42,224067,53031,12401,3.0,"This is a nothing fancy kinda joint, with kick...",Santarpio's Pizza,71 Newbury St,Peabody,MA,1960,42.527896,-70.993317,3.0,359,1,"[Pizza, American (Traditional), Italian]",Stephanie,351,3.65
43,1019629,53031,1357,2.0,I had a 5 minute honeymoon with The Gallows wh...,The Gallows,1395 Washington St,Boston,MA,2118,42.341343,-71.070251,4.0,859,1,"[Seafood, Bars, American (New), American (Trad...",Stephanie,351,3.65
44,71707,53031,3498,3.0,"Antique Table is ok, but it's going to have to...",Antique Table,2 Essex St,Lynn,MA,1902,42.475805,-70.926544,4.0,201,1,[Italian],Stephanie,351,3.65


In [None]:
# Extract specific columns 'user_id', 'business_id', 'stars', 'restaurant_name', and 'categories' from final_data
model_data = final_data[['user_id', 'business_id', 'stars', 'restaurant_name', 'categories']]

# Rename the 'stars' column to 'rating' in the 'model_data' DataFrame
model_data.rename(columns={'stars': 'rating'}, inplace=True)

In [None]:
# Display the first few rows of the resulting DataFrame
model_data.head()

Unnamed: 0,user_id,business_id,rating,restaurant_name,categories
40,53031,6620,4.0,Thaitation,[Thai]
41,53031,4147,2.0,Howling Wolf Taqueria,"[Bars, Arts & Entertainment, Nightlife, Music ..."
42,53031,12401,3.0,Santarpio's Pizza,"[Pizza, American (Traditional), Italian]"
43,53031,1357,2.0,The Gallows,"[Seafood, Bars, American (New), American (Trad..."
44,53031,3498,3.0,Antique Table,[Italian]


In [None]:
# Pickle the DataFrame
model_data.to_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/model_data.pkl')

In [None]:
# Filter data for the city of Vancouver
vancouver_data = final_data[(final_data['city'] == 'Vancouver') & (final_data['state'] == 'BC')]

# Extract specific columns 'user_id', 'business_id', and 'stars' from final_data
vancouver_data = vancouver_data[['user_id', 'business_id', 'stars', 'restaurant_name', 'categories']]

# Rename the 'stars' column to 'rating'
vancouver_data.rename(columns={'stars': 'rating'}, inplace=True)

vancouver_data.head()

Unnamed: 0,user_id,business_id,rating,restaurant_name,categories
1101,70315,1407,4.0,Meat & Bread,"[Fast Food, Bakeries, Sandwiches, Salad, Soup,..."
1105,70315,1356,3.0,Edible Canada At the Market,"[Seafood, Canadian (New), American (New), Spec..."
1109,70315,7370,4.0,The Lamplighter Public House,"[Nightlife, Gastropubs, Bars, Pubs]"
1144,70315,1143,5.0,Miku,"[Japanese, Sushi Bars]"
1151,70315,13469,4.0,Lupo,[Italian]


In [None]:
vancouver_data.shape

(64660, 5)

In [None]:
vancouver_data.to_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/vancouver_data.pkl')

: 