# Yelp Yodelers
### Understanding and Predicting Business Success Using Yelp Data

## Problem Statement

The challenge lies in leveraging Yelp data to gain insights into factors that contribute to the success of restaurants and predicting the success of new or existing restaurants based on various features and attributes.

## Dataset Overview

<b>Data source:</b> https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset 
Original source: https://www.yelp.com/dataset 

This dataset contains 5 JSON files, details below.

1. `business.json` - Contains business data including location data, attributes, and categories.
1. `review.json` - Contains full review text data including the user_id that wrote the review and the business_id the review is written for.
1. `user.json` - User data including the user's friend mapping and all the metadata associated with the user.
1. `checkin.json` - Checkins on a business.
1. `tip.json` - Tips written by a user on a business. Tips are shorter than reviews and tend to convey quick suggestions.

**Primarily our problem statement to identify whether a business opened in a particular location will be succeffull or not would be trained on the `business` data**

#### Quick dataset overview

In [1]:
# import libraries
import pandas as pd

In [4]:
# Load Yelp dataset
yelp_df_business = pd.read_json('../data/raw_data/yelp_academic_dataset_business.json', lines=True)
yelp_df_checkin = pd.read_json('../data/raw_data/yelp_academic_dataset_checkin.json', lines=True, nrows=10)
yelp_df_review = pd.read_json('../data/raw_data/yelp_academic_dataset_review.json', lines=True, nrows=10)
yelp_df_tip = pd.read_json('../data/raw_data/yelp_academic_dataset_tip.json', lines=True, nrows=10)
yelp_df_user = pd.read_json('../data/raw_data/yelp_academic_dataset_user.json', lines=True, nrows=10)

### Business data

* `business_id`: Unique identifier for each business in the dataset.
* `name`: Name of the business.
* `address`: Street address of the business location.
* `city`: City where the business is located.
* `state`: State where the business is located.
* `postal_code`: Postal code (ZIP code) of the business location.
* `latitude`: Geographic latitude coordinate of the business.
* `longitude`: Geographic longitude coordinate of the business.
* `stars`: Rating or score assigned to the business.
* `review_count`: Number of reviews the business has received.
* `is_open`: Indicator of whether the business is currently open (binary: 1 for open, 0 for closed).
* `attributes`: Additional attributes or features of the business.
* `categories`: Categories or types of services/products offered by the business.
* `hours`: Operating hours of the business (e.g., opening and closing times).

In [7]:
# Display first few rows of the dataset
yelp_df_business.head(2)


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."


In [8]:
yelp_df_checkin.head(2)

Unnamed: 0,business_id,date
0,---kPU91CF4Lq2-WlRu9Lw,"2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020..."
1,--0iUa4sNDFiZFrAdIWhZQ,"2010-09-13 21:43:09, 2011-05-04 23:08:15, 2011..."


In [9]:
yelp_df_review.head(2)

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3,0,0,0,"If you decide to eat here, just be aware it is...",2018-07-07 22:09:11
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5,1,0,1,I've taken a lot of spin classes over the year...,2012-01-03 15:28:18


In [10]:
yelp_df_tip.head(2)

Unnamed: 0,user_id,business_id,text,date,compliment_count
0,AGNUgVwnZUey3gcPCJ76iw,3uLgwr0qeCNMjKenHJwPGQ,Avengers time with the ladies.,2012-05-18 02:17:21,0
1,NBN4MgHP9D3cw--SnauTkA,QoezRbYQncpRqyrLH6Iqjg,They have lots of good deserts and tasty cuban...,2013-02-05 18:35:10,0


In [12]:
yelp_df_user.head(2)

Unnamed: 0,user_id,name,review_count,yelping_since,useful,funny,cool,elite,friends,fans,...,compliment_more,compliment_profile,compliment_cute,compliment_list,compliment_note,compliment_plain,compliment_cool,compliment_funny,compliment_writer,compliment_photos
0,qVc8ODYU5SZjKXVBgXdI7w,Walker,585,2007-01-25 16:47:26,7217,1259,5994,2007,"NSCy54eWehBJyZdG2iE84w, pe42u7DcCH2QmI81NX-8qA...",267,...,65,55,56,18,232,844,467,467,239,180
1,j14WgRoU_-2ZE1aw1dXrJg,Daniel,4333,2009-01-25 04:35:42,43091,13066,27281,"2009,2010,2011,2012,2013,2014,2015,2016,2017,2...","ueRPE0CX75ePGMqOFVj6IQ, 52oH4DrRvzzl8wh5UXyU0A...",3138,...,264,184,157,251,1847,7054,3131,3131,1521,1946
