# Instructions for Yelp Preprocessed Features

After running the notebook "Yelp Feature Processing", we get 1 pickle object (written in binary):

- yelp_businesses_cleaned.pickle

In [1]:
import pandas as pd
import numpy as np
import sklearn
import sklearn.preprocessing
import re
import pickle

## Load the Saved Pickle Objects

In [2]:
# make sure that the pickle objects are in the current directory
# note 'rb' for reading binary

businesses = pickle.load(open("yelp_businesses_cleaned.pickle", "rb"))

## Data Description

The 0th column is the id (string) of the business.

Columns 1-6 (0-indexing) store the following features (in this order):
- name (string)
- url (string)
- price (string, e.g. "\$", "\$\$", etc.)
- rating (float)
- coordinates.latitude (float)
- coordinates.longitude (float)

Starting from column 7, each column encodes whether a business/row belongs to a category. More about the category encoding below.

In [3]:
businesses.head()

Unnamed: 0,id,name,url,price,rating,coordinates.latitude,coordinates.longitude,Pubs,Indian,Taiwanese,...,Steakhouses,Tex-Mex,Peruvian,Gluten-Free,Diners,Portuguese,American (New),Ramen,Szechuan,Shanghainese
0,ETgJqJHV7BW6pIr9Ox74sA,Amélie,https://www.yelp.com/biz/am%C3%A9lie-new-york?...,$$,4.5,40.7327,-73.99766,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,UA2M9QFZghe-9th2KwLoWQ,Burger & Lobster,https://www.yelp.com/biz/burger-and-lobster-ne...,$$,4.0,40.74007,-73.99344,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,8Oo2AtQEPDfxIOnA8wfXoQ,886,https://www.yelp.com/biz/886-new-york?adjust_c...,$$,4.0,40.72877,-73.98873,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,jjJc_CrkB2HodEinB6cWww,LoveMama,https://www.yelp.com/biz/lovemama-new-york?adj...,$$,4.0,40.730386,-73.986061,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,CwOAKJdX8AMz5iAoA-ZEuA,Uglyduckling,https://www.yelp.com/biz/uglyduckling-brooklyn...,$$,4.0,40.686023,-73.991302,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Business Categories

The possible business categories are listed in the column names from the 7th column to the last column:

In [4]:
businesses.columns[7:]

Index(['Pubs', 'Indian', 'Taiwanese', 'Tacos', 'Meat Shops', 'Italian',
       'Speakeasies', 'Pasta Shops', 'Wine Bars', 'Barbeque',
       ...
       'Steakhouses', 'Tex-Mex', 'Peruvian', 'Gluten-Free', 'Diners',
       'Portuguese', 'American (New)', 'Ramen', 'Szechuan', 'Shanghainese'],
      dtype='object', length=115)

**If a business/row belongs to a category, the value of the category column for that business/row will be set to 1, otherwise this value is set to 0.**

A business/restaurant can belong to several categories, e.g. the first business with `Id=ETgJqJHV7BW6pIr9Ox74sA` and `name=Amélie` has the following categories:

In [5]:
Amelie = businesses[businesses['id'] == 'ETgJqJHV7BW6pIr9Ox74sA'].iloc[0, :]  # pd series object
# only 1 business with this id, hence indexing the 0th row/business

for category in Amelie.index[7:]:
    if Amelie[category] == 1:
        print(category)

Wine Bars
French
