# Introduction

Salt Lake City is a growing metropolitan area. With a booming tech industry and major companies like Adobe, Microsoft, Facebook, and Vivint Solar building sites in northern Utah, population is increasing at a rapid rate in the Beehive State. With this growth comes many business opportunities for food services, such as dessert venues, in the major population center of Salt Lake City. Opening a restaurant of any sort is risky business, since many do not survive the first six months. Knowing prime locations to open venues is helpful in making an informed decision that may contribute to success Specifically, we will be examining areas of Salt Lake City to determine areas where someone interested in opening an ice cream parlor will most likely find success.

# Data

There are a few factors we can look at in determining what neighborhoods are good places to establish an ice cream venue. The number and type of restaurants in an area will give a clue as to which neighborhoods are already known for their food services. Moving to a neighborhood that already has several ice cream parlors means that the area's market will likely be saturated and unable to sustain another. The number of complementary businesses like stadiums, theatres, business districts, malls, and transportation hubs in a neighborhood also tends to have a positive correlation with the number of people who are likely to visit a dessert venue, since it is popular to get dessert after visiting these types of places. Restaurants that serve meals need to be considered separately, since they can be considered both competitors (since they often serve desserts as well as main courses) and complementary businesses (since people may visit an ice cream parlor for dessert after a meal). Population is another statistic that can be taken into account. While a larger population means that the ice cream parlor would be closer to homes (and thus easier to visit), it means that it will often be located further from complementary businesses and working areas. A population of zero in an area means that there are more businesses and less residential areas nearby. Both sides will be taken into account in comparing and pointing out which zip codes are likely good areas for an ice cream parlor. 

Foursquare analysis can be used to examine which neighborhoods have ice cream parlors, restaurants and complementary businesses. http://www.heartandcoeur.com/heart_travel/area/utah_801.php has a table of zip codes for the Salt Lake City, Utah area with neighborhoods listed. https://www.zip-codes.com/city/ut-salt-lake-city.asp has information about the population of these zip codes. In the case that attempts to pull latitude and longitude from geocoder does not work, http://saltlakecity.areaconnect.com/zip2.htm?city=Salt has that information that can be used to compile a .csv file.

A table will be compiled that identifies the number of various businesses in each zip code. Businesses will be grouped by complementary businesses (like those listed above), restaurants, competitors (various dessert venues) as well as populations for each zip code in a table that will be used for clustering zip codes and to analyze for indications which cluster of zip codes has characteristics that are desirable for starting an ice cream parlor (a higher number of complementary businesses and a lower number of competitors). 

## Collect Neighborhood Data

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


import json # library to handle JSON files

import requests

from bs4 import BeautifulSoup

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
SLC_data = requests.get("http://www.heartandcoeur.com/heart_travel/area/utah_801.php")
SLC_data

<Response [200]>

In [3]:
SLC_data.status_code

200

In [4]:
SLC_soup = BeautifulSoup(SLC_data.content, 'html.parser')

In [5]:
print(SLC_soup.prettify())

<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
<html>
 <head>
  <title>
   AREA CODE - CITY - ZIP CODE - STATE - AREA CODE UTAH 	801
  </title>
  <meta content="Travel with your heart is usefull for your heart in all countries;" lang="fr" name="description">
   <meta content="area code,dialing code,zip code,city,state,heart travel,travel with your heart,hospital,hopital,clinique,private clinic,centre de soin, center of care,heart, travel, usefull, capitals, countries, world, country code, currency, asia, africa, europe, america, oceania, middle east,area code, continent, language, ambulance, electricity, plug,region, government,health heart,country,countries,congenital, heart, heart defects, heart disease, birth defects, hypoplastic left heart syndrome, nursing, pediatrics, cardiology, tetralogy of fallot, transposition of the great arteries,

In [5]:
data_rows = SLC_soup.findAll('tr')[2:]

postal_data = []  # create an empty list to hold all the data

for i in range(len(data_rows)-5):  # for each table row
    postal_row = []  # create an empty list for each postal code

    # for each table data element from each table row
    for td in data_rows[i].findAll('td'):        
        # get the text content and append to the player_row 
        postal_row.append(td.getText())        

    # then append each pick/player to the player_data matrix
    postal_data.append(postal_row)

In [6]:
postal_data

[['Alpine - (Utah)', '84004', 'Utah', '801'],
 ['American Fork - (Utah)', '84003', 'Utah', '801'],
 ['Bingham Canyon - (Salt Lake)', '84006', 'Utah', '801'],
 ['Bountiful - (Davis)', '84010', 'Utah', '801'],
 ['Bountiful - (Davis)', '84011', 'Utah', '801'],
 ['Cedar Valley - (Utah)', '84013', 'Utah', '801'],
 ['Centerville - (Davis)', '84014', 'Utah', '801'],
 ['Clearfield - (Davis)', '84015', 'Utah', '801'],
 ['Clearfield - (Davis)', '84016', 'Utah', '801'],
 ['Clearfield - (Davis)', '84089', 'Utah', '801'],
 ['Croydon - (Morgan)', '84018', 'Utah', '801'],
 ['Draper - (Salt Lake)', '84020', 'Utah', '801'],
 ['Eden - (Weber)', '84310', 'Utah', '801'],
 ['Elberta - (Utah)', '84626', 'Utah', '801'],
 ['Farmington - (Davis)', '84025', 'Utah', '801'],
 ['Goshen - (Utah)', '84633', 'Utah', '801'],
 ['Hill Afb - (Davis)', '84056', 'Utah', '801'],
 ['Hooper - (Weber)', '84315', 'Utah', '801'],
 ['Huntsville - (Weber)', '84317', 'Utah', '801'],
 ['Kaysville - (Davis)', '84037', 'Utah', '801'],

In [7]:
# define the dataframe columns
column_names = ['Neighborhood', 'Zip Code', 'State', 'Area Code'] 

# instantiate the dataframe
df_SLC_neighborhoods = pd.DataFrame(postal_data, columns=column_names)

df_SLC_neighborhoods.head()

Unnamed: 0,Neighborhood,Zip Code,State,Area Code
0,Alpine - (Utah),84004,Utah,801
1,American Fork - (Utah),84003,Utah,801
2,Bingham Canyon - (Salt Lake),84006,Utah,801
3,Bountiful - (Davis),84010,Utah,801
4,Bountiful - (Davis),84011,Utah,801


In [8]:
df_SLC2 = df_SLC_neighborhoods.iloc[:,[0,1]]
df_SLC2.head()

Unnamed: 0,Neighborhood,Zip Code
0,Alpine - (Utah),84004
1,American Fork - (Utah),84003
2,Bingham Canyon - (Salt Lake),84006
3,Bountiful - (Davis),84010
4,Bountiful - (Davis),84011


In [9]:
#Selecting for only Zip codes in Salt Lake City
df_SLC3 = df_SLC2[df_SLC2['Neighborhood'].str.contains('Salt Lake City')].reset_index(drop=True)
df_SLC3

Unnamed: 0,Neighborhood,Zip Code
0,Salt Lake City - (Salt Lake),84101
1,Salt Lake City - (Salt Lake),84102
2,Salt Lake City - (Salt Lake),84103
3,Salt Lake City - (Salt Lake),84104
4,Salt Lake City - (Salt Lake),84105
5,Salt Lake City - (Salt Lake),84106
6,Salt Lake City - (Salt Lake),84107
7,Salt Lake City - (Salt Lake),84108
8,Salt Lake City - (Salt Lake),84109
9,Salt Lake City - (Salt Lake),84110


In [None]:
!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder
postal_codes = df_SLC3['Zip Code'].tolist()
for postal_code in postal_codes:
    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Salt Lake City, Utah'.format(postal_code))
      lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    
    df_SLC3.loc[['Postal Code'] == postal_code, 'Latitude'] = latitude
    df_SLC3.loc[['Postal Code'] == postal_code, 'Longitude'] = longitude

In [10]:
SLCNH = pd.read_csv('Salt Lake City Zip Codes.csv')
SLCNH.head(5)

Unnamed: 0,Zip codes,Latitude,Longitude,Population
0,84101,40.756,-111.899,5277
1,84102,40.759,-111.865,17421
2,84103,40.784,-111.876,21084
3,84104,40.75,-111.935,24869
4,84105,40.734,-111.856,22140


I was unable to access geocoder to get the latitude and longitude, so I assembled a .csv file using data from the following websites and will load this .csv file for use in analysis

In [11]:
SLCNH = SLCNH.rename(index=str, columns={"Zip codes": "Zip Code"})
SLCNH.head()

Unnamed: 0,Zip Code,Latitude,Longitude,Population
0,84101,40.756,-111.899,5277
1,84102,40.759,-111.865,17421
2,84103,40.784,-111.876,21084
3,84104,40.75,-111.935,24869
4,84105,40.734,-111.856,22140


In [12]:
SLCNH.dtypes

Zip Code        int64
Latitude      float64
Longitude     float64
Population      int64
dtype: object

In [13]:
df_SLC3 = df_SLC3.convert_objects(convert_numeric=True)
df_SLC3.dtypes

For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  """Entry point for launching an IPython kernel.


Neighborhood    object
Zip Code         int64
dtype: object

In [14]:
df_SLC4 = SLCNH.set_index('Zip Code').join(df_SLC3.set_index('Zip Code'))
df_SLC4.reset_index(inplace=True)
df_SLC4

Unnamed: 0,Zip Code,Latitude,Longitude,Population,Neighborhood
0,84101,40.756,-111.899,5277,Salt Lake City - (Salt Lake)
1,84102,40.759,-111.865,17421,Salt Lake City - (Salt Lake)
2,84103,40.784,-111.876,21084,Salt Lake City - (Salt Lake)
3,84104,40.75,-111.935,24869,Salt Lake City - (Salt Lake)
4,84105,40.734,-111.856,22140,Salt Lake City - (Salt Lake)
5,84106,40.703,-111.856,33384,Salt Lake City - (Salt Lake)
6,84107,40.659,-111.882,30863,Salt Lake City - (Salt Lake)
7,84108,40.738,-111.812,20863,Salt Lake City - (Salt Lake)
8,84109,40.701,-111.816,23858,Salt Lake City - (Salt Lake)
9,84110,40.756,-111.897,0,Salt Lake City - (Salt Lake)


In [15]:
#Importing packageds for clustering and analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [16]:
address = 'Salt Lake City, Utah'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Salt Lake City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Salt Lake City are 40.7670126, -111.8904308.


In [17]:
# create map of Toronto using latitude and longitude values
map_SLC = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(df_SLC4['Latitude'], df_SLC4['Longitude'], df_SLC4['Zip Code']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_SLC)  
    
map_SLC

### Foursquare Data

In [18]:
CLIENT_ID = '4KR0V5UPOSTFI3MTCTRF4VWXNNECKFPWDAOFIPRUBA4TM3MY' # your Foursquare ID
CLIENT_SECRET = 'PQYVUI1NKA4RGF13SATRN33H5NUSD5PPF5HH3CFL1GROKHNT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4KR0V5UPOSTFI3MTCTRF4VWXNNECKFPWDAOFIPRUBA4TM3MY
CLIENT_SECRET:PQYVUI1NKA4RGF13SATRN33H5NUSD5PPF5HH3CFL1GROKHNT


In [19]:
neighborhood_latitude = df_SLC4.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_SLC4.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_SLC4.loc[0, 'Zip Code'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of 84101 are 40.756, -111.899.


In [20]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=4KR0V5UPOSTFI3MTCTRF4VWXNNECKFPWDAOFIPRUBA4TM3MY&client_secret=PQYVUI1NKA4RGF13SATRN33H5NUSD5PPF5HH3CFL1GROKHNT&v=20180605&ll=40.756,-111.899&radius=500&limit=100'

In [21]:
results = requests.get(url).json()
#results

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [25]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,R&R BBQ,BBQ Joint,40.755812,-111.900033
1,Starbucks,Coffee Shop,40.756728,-111.897669
2,Salt Lake Nails,Cosmetics Shop,40.756837,-111.897163
3,pictureline,Camera Store,40.753769,-111.900162
4,Brewvies Cinema Pub,Indie Movie Theater,40.754428,-111.895968


In [26]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

34 venues were returned by Foursquare.


### Explore Salt Lake City in Greater Depth

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues) 

In [29]:
#Use new function to find venues

SLC_venues = getNearbyVenues(names=df_SLC4['Zip Code'],
                                   latitudes=df_SLC4['Latitude'],
                                   longitudes=df_SLC4['Longitude']
                                  )


84101
84102
84103
84104
84105
84106
84107
84108
84109
84110
84111
84112
84113
84114
84115
84116
84117
84118
84119
84120
84121
84122
84123
84124
84125
84126
84127
84128
84130
84131
84132
84133
84134
84136
84138
84139
84141
84143
84144
84145
84147
84148
84150
84151
84152
84153
84157
84158
84165
84170
84171
84180
84184
84189
84190
84199


In [30]:
print(SLC_venues.shape)
SLC_venues.head()

(1049, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,84101,40.756,-111.899,R&R BBQ,40.755812,-111.900033,BBQ Joint
1,84101,40.756,-111.899,Starbucks,40.756728,-111.897669,Coffee Shop
2,84101,40.756,-111.899,Salt Lake Nails,40.756837,-111.897163,Cosmetics Shop
3,84101,40.756,-111.899,pictureline,40.753769,-111.900162,Camera Store
4,84101,40.756,-111.899,Brewvies Cinema Pub,40.754428,-111.895968,Indie Movie Theater


In [31]:
SLC_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
84101,34,34,34,34,34,34
84102,25,25,25,25,25,25
84103,1,1,1,1,1,1
84104,3,3,3,3,3,3
84105,12,12,12,12,12,12
84106,37,37,37,37,37,37
84107,19,19,19,19,19,19
84108,2,2,2,2,2,2
84109,3,3,3,3,3,3
84110,33,33,33,33,33,33


In [32]:
print('There are {} uniques categories.'.format(len(SLC_venues['Venue Category'].unique())))

There are 196 uniques categories.


In [33]:
print(SLC_venues['Venue Category'].unique())

['BBQ Joint' 'Coffee Shop' 'Cosmetics Shop' 'Camera Store'
 'Indie Movie Theater' 'Mexican Restaurant' 'Clothing Store' 'Bakery'
 'Sporting Goods Shop' 'Food Truck' 'Hotel' 'Brewery' 'Thai Restaurant'
 'Hotel Bar' 'Sandwich Place' 'Nightclub' 'Indian Restaurant' 'Sports Bar'
 'Rental Car Location' 'Video Store' 'Restaurant' 'Music Venue' 'Diner'
 'Antique Shop' 'Sculpture Garden' 'Pizza Place' 'Steakhouse'
 'Massage Studio' 'Park' 'Convenience Store' 'Buffet'
 'Gym / Fitness Center' 'Fast Food Restaurant'
 'Paper / Office Supplies Store' 'Laundromat' 'Dive Bar'
 'Light Rail Station' 'Hobby Shop' 'Dessert Shop' 'Music Store'
 'Event Service' 'Scandinavian Restaurant' 'Theater' 'Juice Bar'
 'Flower Shop' 'Pharmacy' 'Cupcake Shop' 'Grocery Store' 'Burger Joint'
 'Yoga Studio' 'Sushi Restaurant' 'Bagel Shop' 'Department Store'
 'Rock Club' 'Mobile Phone Shop' 'Thrift / Vintage Store'
 'Furniture / Home Store' "Women's Store" 'Salon / Barbershop' 'Bank'
 'Chinese Restaurant' 'Discount Store