We are working at _**"Board out of your mind"**_, the travellers' magazine from a large flight carrier. They need an article about the top ten cities for dining in North America. We will use a well maintained _yelp_ dataset.

1. Gather the information from this [link](https://www.yelp.com/dataset) and create a dataframe;
2. Explore the data and learn more about the dataset;
3. Clean the dataset so it only has relevant data;
4. Which cities provide the most restaurants;
5. Which cities have the best restaurants;
6. What are the restaurant categories available;
7. Create "persona" that you believe would relate to the audience of the magazine;
8. Pick a best suitable city;
9. Provide some extra information regarding the city;

Data are here : https://drive.google.com/file/d/11v6WVP3729ZK33CNyVQVYuqxlpvrqImg/view?usp=sharing


In [47]:
# Import packages
#import numpy
import pandas as pd
#import matplotlib.pyplot as plt
#import seaborn as sns
#import statsmodels
#import sklearn 

In [48]:
business_df = pd.read_csv("business.csv")

In [49]:
business_df.shape

(192609, 15)

In [50]:
business_df.head()

Unnamed: 0.1,Unnamed: 0,address,attributes,business_id,categories,city,hours,is_open,latitude,longitude,name,postal_code,review_count,stars,state
0,0,2818 E Camino Acequia Drive,{'GoodForKids': 'False'},1SWheh84yJXfytovILXOAQ,"Golf, Active Life",Phoenix,,0,33.522143,-112.018481,Arizona Biltmore Golf Club,85016,5,3.0,AZ
1,1,30 Eglinton Avenue W,"{'RestaurantsReservations': 'True', 'GoodForMe...",QXAEGFB4oINsVuTFxEYKFQ,"Specialty Food, Restaurants, Dim Sum, Imported...",Mississauga,"{'Monday': '9:0-0:0', 'Tuesday': '9:0-0:0', 'W...",1,43.605499,-79.652289,Emerald Chinese Restaurant,L5R 3E7,128,2.5,ON
2,2,"10110 Johnston Rd, Ste 15","{'GoodForKids': 'True', 'NoiseLevel': ""u'avera...",gnKjwL_1w79qoiV3IC_xQQ,"Sushi Bars, Restaurants, Japanese",Charlotte,"{'Monday': '17:30-21:30', 'Wednesday': '17:30-...",1,35.092564,-80.859132,Musashi Japanese Restaurant,28210,170,4.0,NC
3,3,"15655 W Roosevelt St, Ste 237",,xvX2CttrVhyG2z1dFg_0xw,"Insurance, Financial Services",Goodyear,"{'Monday': '8:0-17:0', 'Tuesday': '8:0-17:0', ...",1,33.455613,-112.395596,Farmers Insurance - Paul Lorenz,85338,3,5.0,AZ
4,4,"4209 Stuart Andrew Blvd, Ste F","{'BusinessAcceptsBitcoin': 'False', 'ByAppoint...",HhyxOkGAM07SRYtlQ4wMFQ,"Plumbing, Shopping, Local Services, Home Servi...",Charlotte,"{'Monday': '7:0-23:0', 'Tuesday': '7:0-23:0', ...",1,35.190012,-80.887223,Queen City Plumbing,28217,4,4.0,NC


In [51]:
business_df.info(verbose=True, null_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 192609 entries, 0 to 192608
Data columns (total 15 columns):
Unnamed: 0      192609 non-null int64
address         184927 non-null object
attributes      163773 non-null object
business_id     192609 non-null object
categories      192127 non-null object
city            192608 non-null object
hours           147779 non-null object
is_open         192609 non-null int64
latitude        192609 non-null float64
longitude       192609 non-null float64
name            192609 non-null object
postal_code     191950 non-null object
review_count    192609 non-null int64
stars           192609 non-null float64
state           192609 non-null object
dtypes: float64(3), int64(3), object(9)
memory usage: 22.0+ MB


In [52]:
business_df[["stars","review_count"]].describe()

Unnamed: 0,stars,review_count
count,192609.0,192609.0
mean,3.585627,33.538962
std,1.018458,110.135224
min,1.0,3.0
25%,3.0,4.0
50%,3.5,9.0
75%,4.5,25.0
max,5.0,8348.0


In [53]:
# Select multiple columns (returns a DataFrame)
business_df[['business_id', 'name']].head()

Unnamed: 0,business_id,name
0,1SWheh84yJXfytovILXOAQ,Arizona Biltmore Golf Club
1,QXAEGFB4oINsVuTFxEYKFQ,Emerald Chinese Restaurant
2,gnKjwL_1w79qoiV3IC_xQQ,Musashi Japanese Restaurant
3,xvX2CttrVhyG2z1dFg_0xw,Farmers Insurance - Paul Lorenz
4,HhyxOkGAM07SRYtlQ4wMFQ,Queen City Plumbing


In [54]:
# Delete unwanted columns
business_df.drop(["Unnamed: 0"], axis=1, inplace=True)
# Change NaN values in columns
#business_df['attributes'].fillna('{}', inplace=True)

In [55]:
# Cleaning NaN rows
business_df = business_df.dropna(subset=["categories"])

In [56]:
restaurant_df = business_df[business_df['categories'].str.contains("Restaurants")]

In [57]:
restaurant_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 59371 entries, 1 to 192603
Data columns (total 14 columns):
address         58955 non-null object
attributes      57161 non-null object
business_id     59371 non-null object
categories      59371 non-null object
city            59371 non-null object
hours           45376 non-null object
is_open         59371 non-null int64
latitude        59371 non-null float64
longitude       59371 non-null float64
name            59371 non-null object
postal_code     59269 non-null object
review_count    59371 non-null int64
stars           59371 non-null float64
state           59371 non-null object
dtypes: float64(3), int64(2), object(9)
memory usage: 6.8+ MB


In [58]:
restaurant_df = restaurant_df[restaurant_df.is_open != 0]

In [59]:
restaurant_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 42237 entries, 1 to 192603
Data columns (total 14 columns):
address         41913 non-null object
attributes      40477 non-null object
business_id     42237 non-null object
categories      42237 non-null object
city            42237 non-null object
hours           34568 non-null object
is_open         42237 non-null int64
latitude        42237 non-null float64
longitude       42237 non-null float64
name            42237 non-null object
postal_code     42147 non-null object
review_count    42237 non-null int64
stars           42237 non-null float64
state           42237 non-null object
dtypes: float64(3), int64(2), object(9)
memory usage: 4.8+ MB


In [60]:
restaurant_df.drop(["is_open"], axis=1, inplace=True)

In [61]:
restaurant_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 42237 entries, 1 to 192603
Data columns (total 13 columns):
address         41913 non-null object
attributes      40477 non-null object
business_id     42237 non-null object
categories      42237 non-null object
city            42237 non-null object
hours           34568 non-null object
latitude        42237 non-null float64
longitude       42237 non-null float64
name            42237 non-null object
postal_code     42147 non-null object
review_count    42237 non-null int64
stars           42237 non-null float64
state           42237 non-null object
dtypes: float64(3), int64(1), object(9)
memory usage: 4.5+ MB


In [62]:
#My job starts

In [63]:
restaurant_df.head()

Unnamed: 0,address,attributes,business_id,categories,city,hours,latitude,longitude,name,postal_code,review_count,stars,state
1,30 Eglinton Avenue W,"{'RestaurantsReservations': 'True', 'GoodForMe...",QXAEGFB4oINsVuTFxEYKFQ,"Specialty Food, Restaurants, Dim Sum, Imported...",Mississauga,"{'Monday': '9:0-0:0', 'Tuesday': '9:0-0:0', 'W...",43.605499,-79.652289,Emerald Chinese Restaurant,L5R 3E7,128,2.5,ON
2,"10110 Johnston Rd, Ste 15","{'GoodForKids': 'True', 'NoiseLevel': ""u'avera...",gnKjwL_1w79qoiV3IC_xQQ,"Sushi Bars, Restaurants, Japanese",Charlotte,"{'Monday': '17:30-21:30', 'Wednesday': '17:30-...",35.092564,-80.859132,Musashi Japanese Restaurant,28210,170,4.0,NC
11,2450 E Indian School Rd,"{'RestaurantsTakeOut': 'True', 'BusinessParkin...",1Dfx3zM-rW4n-31KeC8sJg,"Restaurants, Breakfast & Brunch, Mexican, Taco...",Phoenix,"{'Monday': '7:0-0:0', 'Tuesday': '7:0-0:0', 'W...",33.495194,-112.028588,Taco Bell,85016,18,3.0,AZ
13,5981 Andrews Rd,"{'RestaurantsPriceRange2': '2', 'BusinessAccep...",fweCYi8FmbJXHCqLnwuk8w,"Italian, Restaurants, Pizza, Chicken Wings",Mentor-on-the-Lake,"{'Monday': '10:0-0:0', 'Tuesday': '10:0-0:0', ...",41.70852,-81.359556,Marco's Pizza,44060,16,4.0,OH
23,"Center Core - Food Court, Fl 3, Pittsburgh Int...","{'RestaurantsTakeOut': 'True', 'BusinessParkin...",1RHY4K3BD22FK7Cfftn8Mg,"Sandwiches, Salad, Restaurants, Burgers, Comfo...",Pittsburgh,,40.496177,-80.246011,Marathon Diner,15231,35,4.0,PA


In [64]:
restr_city = restaurant_df.sort_values(["city"])
restr_city.head()

Unnamed: 0,address,attributes,business_id,categories,city,hours,latitude,longitude,name,postal_code,review_count,stars,state
133942,2936 Finch Avenue E,"{'HasTV': 'True', 'RestaurantsGoodForGroups': ...",LixFCMGKdptI8WRsjAl5cQ,"Fast Food, Restaurants, Burgers",AGINCOURT,"{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W...",43.794305,-79.329995,McDonald's,M1W 2T4,7,2.0,ON
49301,3850 Sheppard Avenue E,"{'GoodForMeal': ""{'dessert': False, 'latenight...",U_ihDw5JhfmSKBUUkpEQqw,"Burgers, Restaurants, Fast Food",Agincourt,"{'Monday': '7:0-11:0', 'Tuesday': '7:0-11:0', ...",43.784517,-79.291325,McDonald's,M1T 3L4,5,2.5,ON
139099,4625 E Ray Rd,"{'RestaurantsPriceRange2': '1', 'RestaurantsRe...",0Rni7ocMC_Lg2UH0lDeKMQ,"Italian, Pizza, Sandwiches, Restaurants, Food",Ahwatukee,"{'Monday': '10:30-22:0', 'Tuesday': '10:30-22:...",33.318059,-111.983528,Barro's Pizza,85044,99,3.5,AZ
96160,"3646 E Ray Rd, Ste 20","{'RestaurantsGoodForGroups': 'True', 'Restaura...",eVv6cnwhabK8ig5Di2hXQQ,"Sandwiches, Italian, Pizza, Restaurants",Ahwatukee,"{'Monday': '16:0-21:0', 'Tuesday': '16:0-21:0'...",33.316764,-112.004009,Florencia Pizza Bistro,85044,194,4.0,AZ
148434,4232 E Chandler Blvd,"{'GoodForKids': 'True', 'RestaurantsGoodForGro...",qJf61TR4Jq9Xph5RiXPS9A,"Desserts, Food, Coffee & Tea, Creperies, Resta...",Ahwatukee,"{'Monday': '7:0-15:0', 'Tuesday': '7:0-15:0', ...",33.30586,-111.991327,Cupz N' Crepes,85048,300,4.0,AZ


In [65]:
restaurant_df["city"].value_counts()

Toronto             5253
Las Vegas           4221
Montréal            2815
Phoenix             2719
Calgary             2212
                    ... 
Squirrel Hill          1
Mercier                1
Mcfarland              1
springdale             1
Ville Mont-Royal       1
Name: city, Length: 732, dtype: int64

In [66]:
#cities with quantity of restaurants bigger than 500
vc = restaurant_df["city"].value_counts()
vc [vc >= 500]

Toronto        5253
Las Vegas      4221
Montréal       2815
Phoenix        2719
Calgary        2212
Charlotte      1961
Pittsburgh     1708
Mississauga    1142
Cleveland      1087
Scottsdale      931
Mesa            825
Madison         742
Tempe           649
Markham         567
Henderson       566
Chandler        546
Name: city, dtype: int64

In [67]:
restaurant_df["city"].value_counts()[0]

5253

In [68]:
type(restaurant_df["city"].value_counts())


pandas.core.series.Series

In [69]:
#the city with the biggest quantity of restaurants (5253)
restaurant_df["city"].value_counts().index.values[0]

'Toronto'

In [70]:
with_biggest_restrs_qt =restaurant_df["city"].value_counts().index.values[0]

In [71]:
with_biggest_restrs_qt

'Toronto'

In [72]:
best_restaurants = restaurant_df[restaurant_df["stars"] >=5.0]["city"].value_counts()

In [73]:
#the cities has 5.0 or more stars restaurants, quantity
best_restaurants.head(25)

Las Vegas          138
Montréal           124
Toronto            115
Phoenix            105
Calgary             66
Charlotte           45
Pittsburgh          39
Cleveland           39
Scottsdale          37
Mississauga         22
Tempe               17
Mesa                14
Laval               14
Madison             14
Glendale            12
Chandler            11
Henderson           10
Gilbert              9
North Las Vegas      8
Mentor               7
Champaign            7
Vaughan              7
Markham              6
Lakewood             6
Newmarket            6
Name: city, dtype: int64

In [74]:
#categories of restauranrs awailable 
restaurant_df["categories"].value_counts()

Pizza, Restaurants                                                                764
Restaurants, Pizza                                                                761
Chinese, Restaurants                                                              585
Restaurants, Chinese                                                              544
Mexican, Restaurants                                                              538
                                                                                 ... 
Halal, Mediterranean, Afghan, Restaurants                                           1
Sandwiches, Event Planning & Services, Caterers, Delis, Restaurants, Fast Food      1
Indian, Restaurants, Pizza, Vegan, Vegetarian                                       1
Italian, Restaurants, Butcher, Food                                                 1
Breakfast & Brunch, Restaurants, Food, Street Vendors                               1
Name: categories, Length: 23636, dtype: int64

In [75]:
# legend: yang chinese family, 2 children, likes sports and healthy food 


# restaurant_df is changed in the way that now the "attributes" contains dictionary (NaN values are eliminated firstly)
# apply literal_eval to convert data in "attributes" to dictionary


import ast

restaurant_df = restaurant_df.dropna(subset=["attributes"])
restaurant_df["attributes"] = restaurant_df["attributes"].apply(lambda x: ast.literal_eval(str(x)))
restaurant_df.head()

Unnamed: 0,address,attributes,business_id,categories,city,hours,latitude,longitude,name,postal_code,review_count,stars,state
1,30 Eglinton Avenue W,"{'RestaurantsReservations': 'True', 'GoodForMe...",QXAEGFB4oINsVuTFxEYKFQ,"Specialty Food, Restaurants, Dim Sum, Imported...",Mississauga,"{'Monday': '9:0-0:0', 'Tuesday': '9:0-0:0', 'W...",43.605499,-79.652289,Emerald Chinese Restaurant,L5R 3E7,128,2.5,ON
2,"10110 Johnston Rd, Ste 15","{'GoodForKids': 'True', 'NoiseLevel': 'u'avera...",gnKjwL_1w79qoiV3IC_xQQ,"Sushi Bars, Restaurants, Japanese",Charlotte,"{'Monday': '17:30-21:30', 'Wednesday': '17:30-...",35.092564,-80.859132,Musashi Japanese Restaurant,28210,170,4.0,NC
11,2450 E Indian School Rd,"{'RestaurantsTakeOut': 'True', 'BusinessParkin...",1Dfx3zM-rW4n-31KeC8sJg,"Restaurants, Breakfast & Brunch, Mexican, Taco...",Phoenix,"{'Monday': '7:0-0:0', 'Tuesday': '7:0-0:0', 'W...",33.495194,-112.028588,Taco Bell,85016,18,3.0,AZ
13,5981 Andrews Rd,"{'RestaurantsPriceRange2': '2', 'BusinessAccep...",fweCYi8FmbJXHCqLnwuk8w,"Italian, Restaurants, Pizza, Chicken Wings",Mentor-on-the-Lake,"{'Monday': '10:0-0:0', 'Tuesday': '10:0-0:0', ...",41.70852,-81.359556,Marco's Pizza,44060,16,4.0,OH
23,"Center Core - Food Court, Fl 3, Pittsburgh Int...","{'RestaurantsTakeOut': 'True', 'BusinessParkin...",1RHY4K3BD22FK7Cfftn8Mg,"Sandwiches, Salad, Restaurants, Burgers, Comfo...",Pittsburgh,,40.496177,-80.246011,Marathon Diner,15231,35,4.0,PA


In [76]:
# we check that the value at column is dictionary now

type(restaurant_df["attributes"].iloc[2])
# type(restaurant_df["hours"].iloc[2]) is string

dict

In [77]:
restaurant_df["attributes"].head()

1     {'RestaurantsReservations': 'True', 'GoodForMe...
2     {'GoodForKids': 'True', 'NoiseLevel': 'u'avera...
11    {'RestaurantsTakeOut': 'True', 'BusinessParkin...
13    {'RestaurantsPriceRange2': '2', 'BusinessAccep...
23    {'RestaurantsTakeOut': 'True', 'BusinessParkin...
Name: attributes, dtype: object

In [78]:
# the list of possible attributes in the first row, basing on it we may construct our logic

restaurant_df["attributes"].iloc[0].keys()

dict_keys(['RestaurantsReservations', 'GoodForMeal', 'BusinessParking', 'Caters', 'NoiseLevel', 'RestaurantsTableService', 'RestaurantsTakeOut', 'RestaurantsPriceRange2', 'OutdoorSeating', 'BikeParking', 'Ambience', 'HasTV', 'WiFi', 'GoodForKids', 'Alcohol', 'RestaurantsAttire', 'RestaurantsGoodForGroups', 'RestaurantsDelivery'])

In [79]:
# we begin to constract the needed DataFrame
rest_family_df = restaurant_df.copy()
rest_family_df

Unnamed: 0,address,attributes,business_id,categories,city,hours,latitude,longitude,name,postal_code,review_count,stars,state
1,30 Eglinton Avenue W,"{'RestaurantsReservations': 'True', 'GoodForMe...",QXAEGFB4oINsVuTFxEYKFQ,"Specialty Food, Restaurants, Dim Sum, Imported...",Mississauga,"{'Monday': '9:0-0:0', 'Tuesday': '9:0-0:0', 'W...",43.605499,-79.652289,Emerald Chinese Restaurant,L5R 3E7,128,2.5,ON
2,"10110 Johnston Rd, Ste 15","{'GoodForKids': 'True', 'NoiseLevel': 'u'avera...",gnKjwL_1w79qoiV3IC_xQQ,"Sushi Bars, Restaurants, Japanese",Charlotte,"{'Monday': '17:30-21:30', 'Wednesday': '17:30-...",35.092564,-80.859132,Musashi Japanese Restaurant,28210,170,4.0,NC
11,2450 E Indian School Rd,"{'RestaurantsTakeOut': 'True', 'BusinessParkin...",1Dfx3zM-rW4n-31KeC8sJg,"Restaurants, Breakfast & Brunch, Mexican, Taco...",Phoenix,"{'Monday': '7:0-0:0', 'Tuesday': '7:0-0:0', 'W...",33.495194,-112.028588,Taco Bell,85016,18,3.0,AZ
13,5981 Andrews Rd,"{'RestaurantsPriceRange2': '2', 'BusinessAccep...",fweCYi8FmbJXHCqLnwuk8w,"Italian, Restaurants, Pizza, Chicken Wings",Mentor-on-the-Lake,"{'Monday': '10:0-0:0', 'Tuesday': '10:0-0:0', ...",41.708520,-81.359556,Marco's Pizza,44060,16,4.0,OH
23,"Center Core - Food Court, Fl 3, Pittsburgh Int...","{'RestaurantsTakeOut': 'True', 'BusinessParkin...",1RHY4K3BD22FK7Cfftn8Mg,"Sandwiches, Salad, Restaurants, Burgers, Comfo...",Pittsburgh,,40.496177,-80.246011,Marathon Diner,15231,35,4.0,PA
...,...,...,...,...,...,...,...,...,...,...,...,...,...
192587,578 Yonge Street,"{'RestaurantsPriceRange2': '2', 'RestaurantsGo...",oS0CnUbyv0GUoD3L8_3UPQ,"Restaurants, Thai",Toronto,"{'Monday': '0:0-0:0', 'Tuesday': '11:0-23:0', ...",43.665120,-79.384809,Thai Fantasy,M4Y 1Z3,113,4.0,ON
192589,3863 Medina Rd,"{'RestaurantsPriceRange2': '2', 'HasTV': 'Fals...",ghovD5ZTGDQ5Q2U4ERddWw,"Burgers, Restaurants, Fast Food, American (New)",Fairlawn,"{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'...",41.136561,-81.639712,Five Guys,44333,22,4.0,OH
192595,1450 Kingston Rd,"{'BikeParking': 'True', 'RestaurantsAttire': '...",h3QErqS3OZgLJ5Tb6-sLyQ,"Restaurants, Soup, Chinese, Caribbean",Pickering,"{'Monday': '11:0-21:30', 'Tuesday': '11:0-21:3...",43.841844,-79.083881,Asia Hut,L1V 1C1,4,4.5,ON
192596,948 Boulevard de Maisonneuve Est,"{'GoodForKids': 'True', 'WiFi': 'u'no'', 'Rest...",KnafX7T6qSAmSrLhd709vA,"Vietnamese, Soup, Restaurants",Montréal,"{'Monday': '12:0-21:0', 'Tuesday': '12:0-21:0'...",45.517430,-73.558873,Pho Maisonneuve,H2L 1Z1,25,4.0,QC


In [80]:
type(rest_family_df)

pandas.core.frame.DataFrame

In [81]:
#no null in "address"
rest_family_df.dropna(subset = ['address']).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 40185 entries, 1 to 192603
Data columns (total 13 columns):
address         40185 non-null object
attributes      40185 non-null object
business_id     40185 non-null object
categories      40185 non-null object
city            40185 non-null object
hours           33237 non-null object
latitude        40185 non-null float64
longitude       40185 non-null float64
name            40185 non-null object
postal_code     40151 non-null object
review_count    40185 non-null int64
stars           40185 non-null float64
state           40185 non-null object
dtypes: float64(3), int64(1), object(9)
memory usage: 4.3+ MB


In [82]:
# legend : the yang chinese family with 2 children, likes sport, and healthy food 
# list of demands (basing on the column "attributes"): all has to be True, satisfied

kwargs = {'attributes':["OutdoorSeating","HasTV","BikeParking", "GoodForKids","RestaurantsReservations","RestaurantsTakeOut" ] \
}

In [83]:
def condition(x):
    def label_true(label):
        return (x.get(label) == "True") 
    cond1 = label_true("OutdoorSeating") & label_true("HasTV") & label_true("BikeParking") 
    cond2 = label_true("GoodForKids") & label_true("RestaurantsReservations") & label_true("RestaurantsTakeOut")
    return cond1  & cond2

#rest_family_df= rest_family_df[rest_family_df["attributes"].map(lambda x: condition(x))]

rest_family_df= rest_family_df[rest_family_df["attributes"].map(condition)]
rest_family_df.head()

Unnamed: 0,address,attributes,business_id,categories,city,hours,latitude,longitude,name,postal_code,review_count,stars,state
116,165 East Liberty Street,"{'RestaurantsTakeOut': 'True', 'RestaurantsTab...",gyFYZV4b_9TxG1ulQNi0Ig,"Middle Eastern, Restaurants, Salad, Breakfast ...",Toronto,"{'Monday': '10:0-22:0', 'Tuesday': '10:0-22:0'...",43.638442,-79.417237,Paramount Fine Foods,M6K 3K4,40,2.0,ON
141,"7712 Sossamon Ln NW, Unit 140","{'NoiseLevel': ''quiet'', 'RestaurantsReservat...",L0aSDVHNXCl6sY4cfZQ-5Q,"Restaurants, Thai",Concord,"{'Monday': '11:0-21:0', 'Tuesday': '11:0-21:0'...",35.364974,-80.708763,Mai Thai II,28027,108,4.0,NC
157,1514 East Blvd,"{'RestaurantsReservations': 'True', 'BusinessA...",U3kygJOTlTQFlfaZS7sQjA,"Caterers, Hot Dogs, Restaurants, Vegetarian, A...",Charlotte,"{'Monday': '11:0-21:0', 'Tuesday': '11:0-21:0'...",35.199798,-80.842295,JJ's Red Hots - Dilworth,28203,380,4.0,NC
229,"380 Tower Hill Road, Units 23-24-25","{'Alcohol': 'u'full_bar'', 'NoiseLevel': ''ave...",VA232TY-ThR8L5wjuuLZVg,"Mediterranean, Restaurants, Middle Eastern, Tu...",Richmond Hill,"{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'...",43.913723,-79.473839,Sofra Mediterranean Cuisine,L4E 0T8,33,3.0,ON
260,1012 17 Avenue SW,"{'HasTV': 'True', 'RestaurantsTakeOut': 'True'...",tWjfgVtTD5n01Cq9dFWGsA,"Italian, Canadian (New), Food, American (New),...",Calgary,"{'Monday': '0:0-0:0', 'Tuesday': '11:0-22:0', ...",51.038087,-114.084968,Cibo,T2T 0A5,185,3.5,AB


In [84]:
#city_family_df = rest_family_df.loc[0:, ["city", "state"]]
rest_family_df["city"].value_counts().head(10)


Toronto       214
Phoenix       150
Las Vegas     137
Scottsdale    134
Charlotte     133
Montréal       89
Calgary        74
Mesa           62
Pittsburgh     61
Chandler       59
Name: city, dtype: int64

In [85]:
type(rest_family_df['categories'].iloc[0])

str

In [86]:
rest_family_df['categories'].iloc[0]

'Middle Eastern, Restaurants, Salad, Breakfast & Brunch'

In [87]:
rest_family_df1 = rest_family_df.copy() 

In [88]:
rest_family_df1.loc[:,['city', 'state', 'address']].head()


Unnamed: 0,city,state,address
116,Toronto,ON,165 East Liberty Street
141,Concord,NC,"7712 Sossamon Ln NW, Unit 140"
157,Charlotte,NC,1514 East Blvd
229,Richmond Hill,ON,"380 Tower Hill Road, Units 23-24-25"
260,Calgary,AB,1012 17 Avenue SW


In [89]:

rest_family_df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2118 entries, 116 to 192487
Data columns (total 13 columns):
address         2114 non-null object
attributes      2118 non-null object
business_id     2118 non-null object
categories      2118 non-null object
city            2118 non-null object
hours           2017 non-null object
latitude        2118 non-null float64
longitude       2118 non-null float64
name            2118 non-null object
postal_code     2118 non-null object
review_count    2118 non-null int64
stars           2118 non-null float64
state           2118 non-null object
dtypes: float64(3), int64(1), object(9)
memory usage: 231.7+ KB


In [90]:
#the most suitable city is Toronto (if count by quantity of suitable restaurants)


In [91]:
# the not done task yet : to choose most suitable city basing on the "density" of suitable restaurants:
# Suitable restaurants/city population , in this case we choose the most suitable "environment"



In [92]:
#it also happens that there are several cities with the same name like Phoenix in US, 
#but it is the only city in the list (seems so) So, our analysis is incomplete and will be improoved in the next notebooks 
