## Opening an Italian Restaurant in Atlanta

As a high populated city, Atlanta is an excellent place to start a new business, specially a restaurant. As part of this project we are interested in find the best locations in Atlanta to open an Italian restaurant in order to increase the investors profits. We want to focus on areas that are not full of restaurants, but also accesible and with a population who like this kind of places.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

### Collecting the Data

For the data collection we look in a radius of 15 km from the center of Atlanta. We look for the all the Italian places not only the restaurants. We use the word Italian as search query.

* We use the Foursquare API to obtain the  data related to the Italian places in Atlanta.  
* We use Nominatim API to obtain the coordinates of the venues


In [2]:
CLIENT_ID = 'IVUX2ATVRIVIAT3MYAOI3BNB0N5X2BCPEGK3W0FCX5RNN1HN' # your Foursquare ID
CLIENT_SECRET = '1UKWCBE54WZK50IHDYJXAK3GJCXPGQELW5QE30LXCGRA4MG2' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100

In [3]:
geolocator = Nominatim(user_agent="foursquare_agent") 
address='Atlanta, Georgia, GA'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [4]:
search_query = 'Italian'
radius = 15000
LIMIT = 1000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results=requests.get(url).json()
venues=results['response']['venues']     
# transform venues into a dataframe
dataframe = json_normalize(venues)    

In [5]:
dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",,,,,,,False,57cdcdac498eb8c81a58b1d4,Concourse C,US,Atlanta,United States,ATL Airport,12685,"[Concourse C (ATL Airport), Atlanta, GA 30337,...","[{'label': 'display', 'lat': 33.64081508392147...",33.640815,-84.43287,30337.0,GA,Carrabba's Italian Grill,v-1580867581,
1,"[{'id': '4bf58dd8d48988d104951735', 'name': 'B...",,,,,,,False,4c07d42f221620a14b65f775,213 Mitchell St SW,US,Atlanta,United States,Forsyth,516,"[213 Mitchell St SW (Forsyth), Atlanta, GA 303...","[{'label': 'display', 'lat': 33.75168442726135...",33.751684,-84.394816,30303.0,GA,Meockie Mens Italian Suits & Accessories,v-1580867581,
2,"[{'id': '4bf58dd8d48988d1c9941735', 'name': 'I...",,,,,,,False,5054dc77e4b0ff6b61ea3d4d,,US,Atlanta,United States,,1184,"[Atlanta, GA 30303, United States]","[{'label': 'display', 'lat': 33.759367, 'lng':...",33.759367,-84.393521,30303.0,GA,Italian Ice,v-1580867581,
3,"[{'id': '4bf58dd8d48988d1cb941735', 'name': 'F...",,,,,,,False,5403c63b498ea3425961c6f7,,US,Atlanta,United States,,760,"[Atlanta, GA, United States]","[{'label': 'display', 'lat': 33.74513244628906...",33.745132,-84.383499,,GA,PICCOLO ITALIAN FOOD TRUCK,v-1580867581,
4,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",1209944.0,/delivery_provider_grubhub_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",grubhub,https://www.grubhub.com/restaurant/carrabbas-i...,False,4afb504ef964a520d61c22e3,2999 Cumberland Blvd SE,US,Atlanta,United States,,16621,"[2999 Cumberland Blvd SE, Atlanta, GA 30339, U...","[{'label': 'display', 'lat': 33.88115033545552...",33.88115,-84.47407,30339.0,GA,Carrabba's Italian Grill,v-1580867581,


Now we filter the dataframe to keep the most valuable data.

In [6]:
filtered_columns=['name','categories']+[col for col in dataframe.columns if col.startswith('location.')]

In [7]:
dataframe_filtered=dataframe.loc[:,filtered_columns]

for i in range(dataframe.shape[0]):
    categ_list=dataframe_filtered['categories'].iloc[i]
    if len(categ_list) ==0:
        dataframe_filtered['categories'].iloc[i]= 'no category'
    else:
        dataframe_filtered['categories'].iloc[i]=categ_list[0]['name'] 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [8]:
dataframe_filtered.columns=[col.split('.')[-1] for col in dataframe_filtered.columns]
dataframe_filtered.head()
dataframe_filtered.drop(['formattedAddress'], axis=1,inplace=True)
dataframe_filtered.drop(['labeledLatLngs'], axis=1,inplace=True)
dataframe_filtered.drop(['crossStreet'], axis=1,inplace=True)
dataframe_filtered.drop(['state'], axis=1,inplace=True)
dataframe_filtered.drop(['address'], axis=1,inplace=True)
dataframe_filtered.drop(['cc'], axis=1,inplace=True)
dataframe_filtered.drop(['country'], axis=1,inplace=True)

In [9]:
dataframe_filtered=dataframe_filtered[dataframe_filtered['city']=='Atlanta']  

In [10]:
dataframe_filtered.drop(['city'], axis=1,inplace=True)
dataframe_filtered.head()

Unnamed: 0,name,categories,distance,lat,lng,postalCode
0,Carrabba's Italian Grill,Italian Restaurant,12685,33.640815,-84.43287,30337.0
1,Meockie Mens Italian Suits & Accessories,Boutique,516,33.751684,-84.394816,30303.0
2,Italian Ice,Ice Cream Shop,1184,33.759367,-84.393521,30303.0
3,PICCOLO ITALIAN FOOD TRUCK,Food Truck,760,33.745132,-84.383499,
4,Carrabba's Italian Grill,Italian Restaurant,16621,33.88115,-84.47407,30339.0


In [11]:
dataframe_filtered=dataframe_filtered[dataframe_filtered['categories'].str.contains('|'.join(['Italian Restaurant','Pizza Place']))]

In [12]:
dataframe_filtered['postalCode'].unique()

array(['30337', '30339', '30318', nan, '30308', '30309', '30324', '30303',
       '30305', '30326', '30342', '30320', '30341'], dtype=object)

In [13]:
dataframe_filtered.replace(np.nan, 30308, inplace=True)
dataframe_filtered.reset_index()

Unnamed: 0,index,name,categories,distance,lat,lng,postalCode
0,0,Carrabba's Italian Grill,Italian Restaurant,12685,33.640815,-84.43287,30337
1,4,Carrabba's Italian Grill,Italian Restaurant,16621,33.88115,-84.47407,30339
2,5,Artuzzi's Italian Kitchen,Italian Restaurant,6502,33.804112,-84.413797,30318
3,6,Italian Bistro,Pizza Place,3326,33.778747,-84.385658,30308
4,9,Ginas Italian Restaurant and Pizzeria,Pizza Place,2805,33.773899,-84.384804,30308
5,13,Pasta Mia Pizzeria and Italian Restaurant,Pizza Place,3866,33.7834,-84.3836,30309
6,14,Nino's Italian Restaurant,Italian Restaurant,7329,33.80988,-84.35973,30324
7,16,Baraonda Cafe Italiano,Italian Restaurant,2791,33.773785,-84.384856,30308
8,18,Baroni Casual Italian,Italian Restaurant,5820,33.801335,-84.392922,30309
9,19,Papino's,Italian Restaurant,1553,33.762595,-84.385932,30303


In [14]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the center of Atlanta
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Center of Atlanta',
    fill = True,
    fill_color = 'red',
   # fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)


# display map
venues_map

In [15]:
from sklearn.cluster import KMeans

In [16]:
df=pd.get_dummies(dataframe_filtered)
kmeans=KMeans(n_clusters=3, random_state=0).fit(df)

In [17]:
kmeans.labels_

array([2, 2, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 2, 2, 2, 2, 2],
      dtype=int32)

In [18]:
dataframe_filtered['Cluster Labels']=kmeans.labels_
dataframe_filtered.reset_index()


Unnamed: 0,index,name,categories,distance,lat,lng,postalCode,Cluster Labels
0,0,Carrabba's Italian Grill,Italian Restaurant,12685,33.640815,-84.43287,30337,2
1,4,Carrabba's Italian Grill,Italian Restaurant,16621,33.88115,-84.47407,30339,2
2,5,Artuzzi's Italian Kitchen,Italian Restaurant,6502,33.804112,-84.413797,30318,0
3,6,Italian Bistro,Pizza Place,3326,33.778747,-84.385658,30308,1
4,9,Ginas Italian Restaurant and Pizzeria,Pizza Place,2805,33.773899,-84.384804,30308,1
5,13,Pasta Mia Pizzeria and Italian Restaurant,Pizza Place,3866,33.7834,-84.3836,30309,1
6,14,Nino's Italian Restaurant,Italian Restaurant,7329,33.80988,-84.35973,30324,0
7,16,Baraonda Cafe Italiano,Italian Restaurant,2791,33.773785,-84.384856,30308,1
8,18,Baroni Casual Italian,Italian Restaurant,5820,33.801335,-84.392922,30309,0
9,19,Papino's,Italian Restaurant,1553,33.762595,-84.385932,30303,1


In [19]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [20]:
cluster_map=folium.Map(location=[latitude,longitude],zoom_start=13)

# set color scheme for the clusters
x = np.arange(3)
ys = [i + x + (i*x)**2 for i in range(3)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Center of Atlanta',
    fill = True,
    fill_color = 'red',
   # fill_opacity = 0.6
).add_to(cluster_map)



for lat, lng, cat, cluster in zip(dataframe_filtered['lat'], dataframe_filtered['lng'],dataframe_filtered['categories'], 
                                   dataframe_filtered['Cluster Labels']):
     folium.CircleMarker(
     [lat,lng],
     radius=5,
     popup=str(cat),
     color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(cluster_map)

In [21]:
cluster_map

In [22]:
select_cluster=dataframe_filtered[dataframe_filtered['Cluster Labels']==1]

In [23]:
select_cluster

Unnamed: 0,name,categories,distance,lat,lng,postalCode,Cluster Labels
6,Italian Bistro,Pizza Place,3326,33.778747,-84.385658,30308,1
9,Ginas Italian Restaurant and Pizzeria,Pizza Place,2805,33.773899,-84.384804,30308,1
13,Pasta Mia Pizzeria and Italian Restaurant,Pizza Place,3866,33.7834,-84.3836,30309,1
16,Baraonda Cafe Italiano,Italian Restaurant,2791,33.773785,-84.384856,30308,1
19,Papino's,Italian Restaurant,1553,33.762595,-84.385932,30303,1
23,Sono Italiano,Italian Restaurant,2350,33.770114,-84.387711,30308,1


In [61]:
# we are going to set our new radius as 3866

geolocator = Nominatim(user_agent="foursquare_agent") 
address='Atlanta, Georgia, GA'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [62]:
search_query = 'Italian'
radius = 3867
LIMIT = 1000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results=requests.get(url).json()
venues=results['response']['venues']     
# transform venues into a dataframe
data_rest = json_normalize(venues)    

In [63]:
filtered_columns=['name','categories']+[col for col in data_rest.columns if col.startswith('location.') ]
dataframe_filtered=data_rest.loc[:,filtered_columns]

for i in range(data_rest.shape[0]):
    categ_list=dataframe_filtered['categories'].iloc[i]
    if len(categ_list) ==0:
        dataframe_filtered['categories'].iloc[i]= 'no category'
    else:
        dataframe_filtered['categories'].iloc[i]=categ_list[0]['name'] 

In [64]:
dataframe_filtered.columns=[col.split('.')[-1] for col in dataframe_filtered.columns]

In [65]:
dataframe_filtered.drop(['formattedAddress'], axis=1,inplace=True)
dataframe_filtered.drop(['labeledLatLngs'], axis=1,inplace=True)
dataframe_filtered.drop(['crossStreet'], axis=1,inplace=True)
dataframe_filtered.drop(['state'], axis=1,inplace=True)
dataframe_filtered.drop(['address'], axis=1,inplace=True)
dataframe_filtered.drop(['cc'], axis=1,inplace=True)
dataframe_filtered.drop(['country'], axis=1,inplace=True)

In [66]:
dataframe_filtered.drop(['city'], axis=1,inplace=True)
#dataframe_filtered.drop(['neighborhood'], axis=1,inplace=True)

In [67]:
dataframe_filtered

Unnamed: 0,name,categories,distance,lat,lng,postalCode
0,Meockie Mens Italian Suits & Accessories,Boutique,516,33.751684,-84.394816,30303.0
1,Italian Ice,Ice Cream Shop,1184,33.759367,-84.393521,30303.0
2,PICCOLO ITALIAN FOOD TRUCK,Food Truck,760,33.745132,-84.383499,
3,Italian Bistro,Pizza Place,3326,33.778747,-84.385658,
4,Italian House of Tamanza,Sorority House,3528,33.778762,-84.376745,
5,Voga Italian Gelato,Ice Cream Shop,3315,33.762442,-84.358162,30307.0
6,Noni's Bar & Deli,Bar,1435,33.754262,-84.375967,30312.0
7,Ginas Italian Restaurant and Pizzeria,Pizza Place,2805,33.773899,-84.384804,30308.0
8,Giovanni's Italian Ice,Ice Cream Shop,3701,33.781941,-84.39642,
9,Pasta Mia Pizzeria and Italian Restaurant,Pizza Place,3866,33.7834,-84.3836,30309.0


In [68]:
dataframe_filtered.loc[2,'postalCode']=30303

In [69]:
dataframe_filtered.replace(np.nan, 30307, inplace=True)

In [70]:
dataframe_filtered

Unnamed: 0,name,categories,distance,lat,lng,postalCode
0,Meockie Mens Italian Suits & Accessories,Boutique,516,33.751684,-84.394816,30303
1,Italian Ice,Ice Cream Shop,1184,33.759367,-84.393521,30303
2,PICCOLO ITALIAN FOOD TRUCK,Food Truck,760,33.745132,-84.383499,30303
3,Italian Bistro,Pizza Place,3326,33.778747,-84.385658,30307
4,Italian House of Tamanza,Sorority House,3528,33.778762,-84.376745,30307
5,Voga Italian Gelato,Ice Cream Shop,3315,33.762442,-84.358162,30307
6,Noni's Bar & Deli,Bar,1435,33.754262,-84.375967,30312
7,Ginas Italian Restaurant and Pizzeria,Pizza Place,2805,33.773899,-84.384804,30308
8,Giovanni's Italian Ice,Ice Cream Shop,3701,33.781941,-84.39642,30307
9,Pasta Mia Pizzeria and Italian Restaurant,Pizza Place,3866,33.7834,-84.3836,30309
