# Capstone Project - Restaurant Chain Expansion

#### Applied Data Science Capstone via IBM/Coursera

##### By: Bryce Goodsite

## Introduction: Business Problem

Expanding a family run, brick-and-mortar store can be a major decision. One of the biggest of which is the location of the second venue. Much thought needs to go into selecting a new location, such as proximity to customers, rent, and the space available. In recent years, there have been many trends that have existed that help signify an area is “up-and-coming”. 

I have been approached by the owner of a well-known, up-and-coming restaurant about 30 mins outside of Atlanta, way out in the suburbs. The restaurant is a brewpub that features house beers and high quality food, yet not high enough quality to scare off the average customer from going for a casual dinner/lunch. The restaurant has been tremendously successful locally and the owner wants to open a second location further into Atlanta proper, as he believes that there is a great opportunity for his business to do well in the metro area.

## Data 

The data for the neighborhoods of Atlanta was pulled from a list of the neighborhoods on a Wikipedia page, while the rest of the information was ascertained using the Foursquare API.

But first we must import the necessary libraries we will be working with.

Importing all necessary libraries:

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Getting data for Atlanta neighborhoods:




In [2]:
# Use this URL
# https://en.wikipedia.org/wiki/Table_of_Atlanta_neighborhoods_by_population
# https://en.wikipedia.org/wiki/Neighborhoods_in_Atlanta

df = pd.read_html('https://en.wikipedia.org/wiki/Table_of_Atlanta_neighborhoods_by_population')

In [3]:
#Scraping the table from the web-page
df = df[0]

In [4]:
print(df.shape)

# df = df.drop(['Population (2010)','NPU'], axis =1)
df.head()

(161, 3)


Unnamed: 0,Neighborhood,Population (2010),NPU
0,Adair Park,1331,V
1,Adams Park,1763,R
2,Adamsville,2403,H
3,Almond Park,1020,G
4,Ansley Park,2277,E


In [5]:
df.head()

Unnamed: 0,Neighborhood,Population (2010),NPU
0,Adair Park,1331,V
1,Adams Park,1763,R
2,Adamsville,2403,H
3,Almond Park,1020,G
4,Ansley Park,2277,E


### Neighborhood Coordinates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest.

Let's first find the latitude & longitude of Atlanta city center, using a specific, well known address.

----------------------------------------------------------------------------------------

Geographic Center of Atlanta:

In [6]:
address = 'Downtown Atlanta, Atlanta, GA'

atl_geolocator = Nominatim(user_agent="ny_explorer")
atl_location = atl_geolocator.geocode(address)
atl_lat = atl_location.latitude
atl_lon = atl_location.longitude
atl_center = (atl_lat,atl_lon)
print('The geograpical coordinate of Atlanta are {}, {}.'.format(atl_lat, atl_lon))

The geograpical coordinate of Atlanta are 33.7509748, -84.3930464.


Next, we'll build out the rest of the dataframe so that we can obtain coordinates for each neighborhood:

In [7]:
# Loop used to obtain locations of each neighborhood:
end_str = ", Atlanta, GA"
full_name = []
location = []
lat = []
lon =[]

for i in range(df.shape[0]):
    neighborhood_full_name = (df.loc[i,'Neighborhood']+end_str)
    full_name.append(neighborhood_full_name)
    
    geolocator = Nominatim(user_agent="ny_explorer")
    locat = geolocator.geocode(neighborhood_full_name)
    
    if locat is None:
        locat = 'NO ADDRESS'
        location.append(locat)
        lat1 = 0
        lon1 = 0
        lat.append(lat1)
        lon.append(lon1)
        
    else:
        location.append(locat)
        lat1 = locat.latitude
        lat.append(lat1)
    
        lon1 = locat.longitude
        lon.append(lon1)
    
df["Full Name"] = full_name
df["Coords"] = location
df["Latitude"] = lat
df["Longitude"] = lon

df.head(15) 


Unnamed: 0,Neighborhood,Population (2010),NPU,Full Name,Coords,Latitude,Longitude
0,Adair Park,1331,V,"Adair Park, Atlanta, GA","(Adair Park, Pittsburgh, Vine City, Atlanta, F...",33.724685,-84.411146
1,Adams Park,1763,R,"Adams Park, Atlanta, GA","(Adams Park, Atlanta, Fulton County, Georgia, ...",33.712052,-84.456873
2,Adamsville,2403,H,"Adamsville, Atlanta, GA","(Adamsville, Boulder Park, Atlanta, Fulton Cou...",33.759274,-84.505209
3,Almond Park,1020,G,"Almond Park, Atlanta, GA",NO ADDRESS,0.0,0.0
4,Ansley Park,2277,E,"Ansley Park, Atlanta, GA","(Ansley Park, Mayfair, Buckhead, Atlanta, Fult...",33.79455,-84.376315
5,Ardmore,756,E,"Ardmore, Atlanta, GA","(Ardmore Park, A, Hills Park, Atlanta, Fulton ...",33.806282,-84.400028
6,Argonne Forest,590,C,"Argonne Forest, Atlanta, GA",NO ADDRESS,0.0,0.0
7,Arlington Estates,776,P,"Arlington Estates, Atlanta, GA",NO ADDRESS,0.0,0.0
8,Ashview Heights,1292,T,"Ashview Heights, Atlanta, GA","(G, South Airport Road Northwest, Carroll Heig...",33.777046,-84.524666
9,Atlanta University Center,5703,T,"Atlanta University Center, Atlanta, GA",(Robert W. Woodruff Library (Atlanta Universit...,33.751543,-84.413597


In [8]:
address_test = 'Bankhead, Atlanta, GA'
# Adair Park
# Almond Park
# Old Fourth Ward
geolocator = Nominatim(user_agent="ny_explorer")
locat_test = geolocator.geocode(address_test)
if locat_test is None:
        locat_test = 'NO ADDRESS'
else:
    lat_test = locat_test.latitude
    lon_test = locat_test.longitude
print('The geograpical coordinate of this neighborhood are {}, {}.'.format(lat_test, lon_test))
#print(locat_test)

The geograpical coordinate of this neighborhood are 33.7722351, -84.4288824.


In [9]:
df.head()

Unnamed: 0,Neighborhood,Population (2010),NPU,Full Name,Coords,Latitude,Longitude
0,Adair Park,1331,V,"Adair Park, Atlanta, GA","(Adair Park, Pittsburgh, Vine City, Atlanta, F...",33.724685,-84.411146
1,Adams Park,1763,R,"Adams Park, Atlanta, GA","(Adams Park, Atlanta, Fulton County, Georgia, ...",33.712052,-84.456873
2,Adamsville,2403,H,"Adamsville, Atlanta, GA","(Adamsville, Boulder Park, Atlanta, Fulton Cou...",33.759274,-84.505209
3,Almond Park,1020,G,"Almond Park, Atlanta, GA",NO ADDRESS,0.0,0.0
4,Ansley Park,2277,E,"Ansley Park, Atlanta, GA","(Ansley Park, Mayfair, Buckhead, Atlanta, Fult...",33.79455,-84.376315


In [10]:
#Trim rows with 'NO ADDRESS':
df = df[df.Coords != 'NO ADDRESS']
dfTrim = df.reset_index(drop=True)
dfTrim.head(20)
#dfNew
#df.shape
#(97,7)
# Cut-out over 60 neighborhoods

Unnamed: 0,Neighborhood,Population (2010),NPU,Full Name,Coords,Latitude,Longitude
0,Adair Park,1331,V,"Adair Park, Atlanta, GA","(Adair Park, Pittsburgh, Vine City, Atlanta, F...",33.724685,-84.411146
1,Adams Park,1763,R,"Adams Park, Atlanta, GA","(Adams Park, Atlanta, Fulton County, Georgia, ...",33.712052,-84.456873
2,Adamsville,2403,H,"Adamsville, Atlanta, GA","(Adamsville, Boulder Park, Atlanta, Fulton Cou...",33.759274,-84.505209
3,Ansley Park,2277,E,"Ansley Park, Atlanta, GA","(Ansley Park, Mayfair, Buckhead, Atlanta, Fult...",33.79455,-84.376315
4,Ardmore,756,E,"Ardmore, Atlanta, GA","(Ardmore Park, A, Hills Park, Atlanta, Fulton ...",33.806282,-84.400028
5,Ashview Heights,1292,T,"Ashview Heights, Atlanta, GA","(G, South Airport Road Northwest, Carroll Heig...",33.777046,-84.524666
6,Atlanta University Center,5703,T,"Atlanta University Center, Atlanta, GA",(Robert W. Woodruff Library (Atlanta Universit...,33.751543,-84.413597
7,Atlantic Station,1888,E,"Atlantic Station, Atlanta, GA","(Atlantic Station, Vine City, Atlanta, Fulton ...",33.790755,-84.398445
8,Audubon Forest,813,I,"Audubon Forest, Atlanta, GA","(Audubon Forest, Douglas County, Georgia, 3013...",33.715942,-84.776325
9,Bankhead,1929,K,"Bankhead, Atlanta, GA","(Bankhead, 639, Gary Avenue Northwest, Grove P...",33.772235,-84.428882


In [11]:
dfTrim.shape

(98, 7)

Superimposing the neighborhoods onto the map:

In [12]:
# Create map of Atlanta using latitude and longitude values
map_atlanta1 = folium.Map(location=[atl_lat, atl_lon], zoom_start=12)


In [13]:
for latit, lng, neighborhood in zip(dfTrim['Latitude'], dfTrim['Longitude'], dfTrim['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latit, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_atlanta1)  
    
map_atlanta1

The restaurant owner lives in the suburbs to the north of the city, and with Atlanta's traffic being notoriously bad, the decesion is made to limit the search to neighborhoods that are north of Downtown (Central Atlanta). 

In [14]:
#Trim down the dfTrim to neighborhoods only north of East Atlanta
#East Atlanta's coords:
# 33.7401064, -84.3449251

# Downtown Atlanta:
# 33.7509748, -84.3930464

#Bankhead:
# 33.772235, -84.428882

dfTrim = dfTrim[dfTrim.Latitude >= 33.7401064]
dfTrim = dfTrim[dfTrim.Longitude >= -84.428882]
dfNorth = dfTrim.reset_index(drop=True)
dfNorth.shape

(41, 7)

Based on the need for high foot traffic at the start of the restaurant, the owner also wants the restaurant to be located fairly close to large neighborhoods. We will drop neighbor hoods that have less than 1,000 inhabitants living in them.

In [15]:
dfNorth = dfNorth[dfNorth['Population (2010)'] >= 1200]
dfNew = dfNorth.reset_index(drop=True)
dfNew.shape

(33, 7)

Let's see what this gives us...

In [16]:
#Re-center at Ansley Park
address = 'Ansley Park, Atlanta, GA'

atl_geolocator = Nominatim(user_agent="ny_explorer")
atl_location2 = atl_geolocator.geocode(address)
atl_lat2 = atl_location2.latitude
atl_lon2 = atl_location2.longitude
atl_center2 = (atl_lat2,atl_lon2)

In [17]:
map_atlanta2 = folium.Map(location=[atl_lat2, atl_lon2], zoom_start=12)

for latit, lng, neighborhood in zip(dfNew['Latitude'], dfNew['Longitude'], dfNew['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latit, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_atlanta2)  
    
map_atlanta2

Now with the neighborhoods narrowed down, other factors need to be considered. The first is something called the “Starbucks Effect”, which describes the phenomena of how a Starbucks store opening increases home and property values. The data shows that between 1997 and 2014, properties closer to the coffee shop increased in value by 96%, compared to 65% for all U.S. residential properties. In short, the locations of Starbucks can be used as a proverbial “canary in the coal mine” for an up-and-coming area. The locations of these were identified and then plotted. A further analysis will be discussed in later sections. 

For scouting wealthy areas, it’s like heading into an exam with the answer key. Assuming that significant market research has gone into every Starbucks store opening, we can piggyback on those positive conclusions.

Begin Using FourSquare API to segment the neighborhoods:

In [18]:
CLIENT_ID = 'WETZZ3MPWLQ0NPFHAWISSG40DY41EROJPGOPE5GRIMYJK0Z3' # your Foursquare ID
CLIENT_SECRET = 'EYWPQL2M5SMQGQFTBFXUYMEH5ZXKI0J4A0LDP0D4VGPXIUDD' # your Foursquare Secret
VERSION = '20200806' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WETZZ3MPWLQ0NPFHAWISSG40DY41EROJPGOPE5GRIMYJK0Z3
CLIENT_SECRET:EYWPQL2M5SMQGQFTBFXUYMEH5ZXKI0J4A0LDP0D4VGPXIUDD


In [19]:
neighbourhood_latitude = dfNew.loc[32, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = dfNew.loc[32, 'Longitude'] # neighborhood longitude value

neighbourhood_name = dfNew.loc[32, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Virginia-Highland are 33.7826557, -84.3536915.


Since the restaurant is slightly more upscale, the owner wants to be located in a neighborhood that has higher property values, or one that may up-and-coming. While we could sort through mounds of data on home values, there is actually a "canary in the coal mine" indicator that exists that Foursquare can show us: Starbucks locations.

The Starbucks Effect is term to describe the phenomena of how a Starbucks store opening increases home and property values.

The data shows that between 1997 and 2014, properties closer to the coffee shop increased in value by 96%, compared to 65% for all U.S. residential properties.

The following link is a good read for this:
https://realestblog.com/2018/09/15/starbucks-effect/


Similarly, chains such as Whole Foods and Trader Joes are also intentionally placed in central areas of higher net-wealth with a large enough customer base willing to pay for the luxury grocery products.

First, let's identify neighborhoods with the highest concentration of Starbucks locations:

In [20]:
LIMIT = 30 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius
venue_type = 'Starbucks'


# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT,
    venue_type)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=WETZZ3MPWLQ0NPFHAWISSG40DY41EROJPGOPE5GRIMYJK0Z3&client_secret=EYWPQL2M5SMQGQFTBFXUYMEH5ZXKI0J4A0LDP0D4VGPXIUDD&v=20200806&ll=33.7826557,-84.3536915&radius=1000&limit=30&query=Starbucks'

In [21]:
# ['response']['groups'][0]['items']

results = requests.get(url).json()
# results

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [23]:
# Testing how to see what's in JSON files:

#venues = results['response']#['groups']#[0]['items']
#nearby_venues = json_normalize(venues) # flatten JSON
#nearby_venues

In [24]:
venues = results['response']['groups'][0]['items']
#print(venues) 
nearby_venues = json_normalize(venues) # flatten JSON
#print(nearby_venues)
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(20)

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Starbucks,Coffee Shop,33.768107,-84.349411
1,Starbucks,Coffee Shop,33.77368,-84.365994
2,Starbucks,Coffee Shop,33.780327,-84.36887
3,Starbucks,Coffee Shop,33.771846,-84.364052
4,Starbucks,Coffee Shop,33.798065,-84.370793
5,Starbucks Reserve,Coffee Shop,33.778309,-84.38407
6,Starbucks Coffee,Coffee Shop,33.776321,-84.352844
7,Starbucks,Coffee Shop,33.757095,-84.347855
8,Starbucks,Coffee Shop,33.786474,-84.387145
9,Starbucks,Coffee Shop,33.787068,-84.382901


In [25]:
print('{} Starbucks are located within 1K of the neighborhood\'s center, according to Foursquare.'.format(nearby_venues.shape[0]))

30 Starbucks are located within 1K of the neighborhood's center, according to Foursquare.


In [26]:
#All Neighborhoods
dfNew.head()

Unnamed: 0,Neighborhood,Population (2010),NPU,Full Name,Coords,Latitude,Longitude
0,Ansley Park,2277,E,"Ansley Park, Atlanta, GA","(Ansley Park, Mayfair, Buckhead, Atlanta, Fult...",33.79455,-84.376315
1,Atlanta University Center,5703,T,"Atlanta University Center, Atlanta, GA",(Robert W. Woodruff Library (Atlanta Universit...,33.751543,-84.413597
2,Atlantic Station,1888,E,"Atlantic Station, Atlanta, GA","(Atlantic Station, Vine City, Atlanta, Fulton ...",33.790755,-84.398445
3,Brookwood,1834,E,"Brookwood, Atlanta, GA","(Brookwood, Buckhead, Atlanta, Fulton County, ...",33.802605,-84.392983
4,Brookwood Hills,2103,E,"Brookwood Hills, Atlanta, GA","(Brookwood Hills, Buckhead, Atlanta, Fulton Co...",33.807883,-84.390205


## Finding All Cafes:

Along with Starbucks, independently owned cafes are also a sign of a neighborhood’s affluence and local economic health. This is due to a rise in city café culture, which is partly attributed, ironically, to the success of Starbucks. The retail chain got customers used to overpriced coffee, but the local café gave a sense of community and “trendiness” to that overpriced coffee. These also tend to have smaller margins than a Starbucks, so typically they concentrate in more affluent areas. 

In [44]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    venue_type = 'Coffee'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            10000,
            venue_type) #number of values
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [45]:
atl_venues = getNearbyVenues(names=dfNew['Neighborhood'],
                                   latitudes=dfNew['Latitude'],
                                   longitudes=dfNew['Longitude']
                                  )

#Errors with radius of 1000, but does fine with radius of 100???

In [46]:
print(atl_venues.shape)
atl_venues.head()

(435, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ansley Park,33.79455,-84.376315,Starbucks,33.798065,-84.370793,Coffee Shop
1,Ansley Park,33.79455,-84.376315,Sean's Harvest Market,33.788564,-84.369489,Café
2,Ansley Park,33.79455,-84.376315,Panera Bread,33.797759,-84.370515,Bakery
3,Ansley Park,33.79455,-84.376315,Huge x BRASH Coffee,33.792483,-84.385579,Coffee Shop
4,Ansley Park,33.79455,-84.376315,Midtown Plaza Cafe,33.7915,-84.386323,Café


We want to weed out some of the big chains, donut shops (Dunkin Donuts), and dessert shops that also appear in our results:

In [60]:
#atl_ven = atl_venues[atl_venues.Venue == 'Family Dollar']
atl_cafe1 = atl_venues[atl_venues['Venue'] != 'Starbucks']
atl_cafe2 = atl_cafe1[atl_cafe1['Venue Category'] != 'Donut Shop']
atl_cafe3 = atl_cafe2[atl_cafe2['Venue Category'] != 'Bakery']
atl_cafe = atl_cafe3[atl_cafe3['Venue Category'] != 'Dessert Shop']

atl_cafe = atl_cafe.reset_index(drop=True)
print(atl_cafe.shape)
atl_cafe.head()

# Donut Shop    Bakery    Dessert Shop

(339, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ansley Park,33.79455,-84.376315,Sean's Harvest Market,33.788564,-84.369489,Café
1,Ansley Park,33.79455,-84.376315,Huge x BRASH Coffee,33.792483,-84.385579,Coffee Shop
2,Ansley Park,33.79455,-84.376315,Midtown Plaza Cafe,33.7915,-84.386323,Café
3,Ansley Park,33.79455,-84.376315,Linton's Petit Café,33.789463,-84.37448,Café
4,Ansley Park,33.79455,-84.376315,Ovation Coffee & Libations,33.789569,-84.385025,Coffee Shop


# Finding Starbucks:

In [31]:
def getNearbySBUX(names, latitudes, longitudes, radius=100):
    venue_type = 'Starbucks'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            10000,
            venue_type) #number of values
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [32]:
atl_SBUX = getNearbySBUX(names=dfNew['Neighborhood'],
                                   latitudes=dfNew['Latitude'],
                                   longitudes=dfNew['Longitude']
                                  )

In [33]:
#Returns table of all Starbucks in Atlanta
print(atl_SBUX.shape)
atl_SBUX.head(1)

(1645, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ansley Park,33.79455,-84.376315,Starbucks,33.798065,-84.370793,Coffee Shop


How many Starbucks locations are there in the Atlanta area?

In [34]:
#Make sure the locations are actually in Atl:
atl_SBUX = atl_SBUX[(atl_SBUX['Venue Latitude'] >= 33.7401064) & (atl_SBUX['Venue Latitude'] <= 33.845393)] #top and bottom border (bottom/top)
atl_SBUX = atl_SBUX[(atl_SBUX['Venue Longitude'] >= -84.428882) & (atl_SBUX['Venue Longitude'] <= -84.323258)] #side border (right/left)

# Eliminating the duplicate values:
atl_SBUX2 = atl_SBUX.drop_duplicates('Venue Latitude', keep='first')

print('There are {} Starbucks locations within 1 kilometer of any of the neighborhoods\' centers, according to Foursquare.'.format(atl_SBUX2.shape[0]))
#atl_SBUX2.head(10)

There are 44 Starbucks locations within 1 kilometer of any of the neighborhoods' centers, according to Foursquare.


Let's visualize where all of these locations are!

In [35]:
#atl_SBUX2 = atl_SBUX2[(atl_SBUX2['Venue Latitude'] >= 33.7401064) & (atl_SBUX2['Venue Latitude'] <= 33.845393)] #top and bottom border (bottom/top)
#atl_SBUX2 = atl_SBUX2[(atl_SBUX2['Venue Longitude'] >= -84.428882) & (atl_SBUX2['Venue Longitude'] <= -84.323258)] #side border (right/left)
# df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)]

dfSBUX_Map_Loc = atl_SBUX2.reset_index(drop=True)
dfSBUX_Map_Loc.shape

(44, 7)

In [36]:
map_SBUX = folium.Map(location=[atl_lat2, atl_lon2], zoom_start=12)

for latit, lng, neighborhood in zip(dfSBUX_Map_Loc['Venue Latitude'], dfSBUX_Map_Loc['Venue Longitude'], dfSBUX_Map_Loc['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latit, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_SBUX)  
    
map_SBUX

In [37]:
atl_SBUX.head(1)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ansley Park,33.79455,-84.376315,Starbucks,33.798065,-84.370793,Coffee Shop


Dropping the duplicates was useful for mapping the locations. However, they do serve to demonstrate how many Starbucks each city has within 1 Kilometer of its center, which is more important than determing how many Starbucks exist in just each city's borders.

Now let's find out which neighborhoods have the most Starbucks locations **near** them, not neccessarily inside their borders.

In [38]:
SBUX = atl_SBUX.groupby('Neighborhood').count()
SBUX['SBXU Count'] = SBUX['Venue Category']
SBUX.head(1)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,SBXU Count
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Ansley Park,38,38,38,38,38,38,38


In [39]:
SBUX.columns

Index(['Neighborhood Latitude', 'Neighborhood Longitude', 'Venue',
       'Venue Latitude', 'Venue Longitude', 'Venue Category', 'SBXU Count'],
      dtype='object')

In [40]:
SBUX = SBUX.drop(['Neighborhood Latitude', 'Neighborhood Longitude', 'Venue',
       'Venue Latitude', 'Venue Longitude', 'Venue Category'], axis =1)

In [41]:
SBUX = SBUX.sort_values(by=['SBXU Count'], ascending=False)
SBUX



Unnamed: 0_level_0,SBXU Count
Neighborhood,Unnamed: 1_level_1
Inman Park,41
Sweet Auburn,40
Reynoldstown,40
Midtown,40
Margaret Mitchell,40
Cabbagetown,40
Downtown,39
Southwest,39
Georgia Tech,39
Edgewood,39


This shows how many Starbucks are within 1000 meters of a neighborhood center.

Now lets combine the maps of the neighborhood centers and the starbucks to see where there are the most starbucks visually.The above method works, however the problem is that the smaller neighborhoods are often more closely packed together, so there are more Starbucks close to their city centers. This makes sense holistically, as all of the "Buckhead" neighborhoods are much larger in terms of land mass, even if not with respect to poplation. 

The Folium map now displays the neighborhood locations (blue), the Starbucks locations (green), and the independently owned cafes (red).

In [42]:
map_atlanta_combo = folium.Map(location=[atl_lat2, atl_lon2], zoom_start=12)

In [66]:
for latit, lng, neighborhood in zip(dfNew['Latitude'], dfNew['Longitude'], dfNew['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latit, lng],
        radius=7,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_atlanta_combo)  


for latit, lng, neighborhood in zip(dfSBUX_Map_Loc['Venue Latitude'], dfSBUX_Map_Loc['Venue Longitude'], dfSBUX_Map_Loc['Neighborhood']):
    label = 'Starbucks'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latit, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_atlanta_combo)
    
for latit, lng, name in zip(atl_cafe['Venue Latitude'], atl_cafe['Venue Longitude'], atl_cafe['Venue']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latit, lng],
        radius=2,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_atlanta_combo)    

map_atlanta_combo

From here, we can assess the informatainon qualitatively and determine what neighborhoods are optimal based off of the "Starbucks Effect" and the similar trend that independently owned cafes also indicate.

These neighborhoods are: 

1. Downtown
2. Midtown
3. Buckhead