# Find Best Location to start an Ice Cream Business in Boston Area

# 1. Introduction
Selecting a location for a creating a new business is a key decision in running a existing business. Business decision makers need to be considered different factors to find right location for the business, such as financial , market and the factors
that will affect their demand and increase thier revenue. This project can help those who are planning to open a new business such as an icecream shop and need to find the best location for the shop in a Boston city.

### Business Problem
Starting a new business in city such as Boston can be challenging. Boston has 23 neighborhoods with various
places that attract tourists and people to visit. Neighborhoods of Boston are different in terms of different factors that can
directly or indirectly affect the success chance of business. It is important to evaluate those 23 neighborhoods based on the
factors that affect running a successful business such as the number of competitors, and the potential demand in each of the
neighborhoods. Budget limitations in renting or buying a place also play an important role in this decision making especially for
small businesses.\
In this project, like to help to find the best neighborhood in Boston to open an Ice Cream shop. An ice cream shop is
good to be near a place where many people visit such as near a cinema, park, garden, playground, etc. Also, a neighborhood that has fewer competitors is less risky for starting a new
business.

# 2. Data
1. Extract the information about the neighborhoods of Boston
need to extract the 23 neighborhoods of Boston along with the Recent home sales in Boston (as an estimate of the
renting price of the shop) of each neighborhood. For this purpose, we will use the following website and apply BeautifulSoup
website scraping library to extract the data. https://bostonpads.com/2019-boston-apartment-rental-market-report/
2. Transform the data into pandas dataframe
3. Use GeoPy Python package to get the latitude and the longitude coordinates of all the neighborhoods of Boston
4. Map the neighborhoods using Folium Python library
5. Use Foursquare API to get information about some venues around these neighborhoods
A. Looking for a group of venues in walking distance (500 meters) of each of these neighborhood.
We considered Movie Theater, Playground, Park, Garden, Water Park, General Entertainment, Stadium, Amphitheater,
Aquarium, Street Art, Beach, Recreation Center, Pedestrian Plaza venues. These venues are places that many people
usually visit them for entertainment and hence we will have good demand for ice cream around them. The list of all venue
categories of FourSquare can be found in the following link:
https://developer.foursquare.com/docs/resources/categories

# 3. Methodology

## Mapping Data - Finding the Neighborhoods and Venues

In [3]:
import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://bostonpads.com/2019-boston-apartment-rental-market-report/').read()
soup = bs.BeautifulSoup(source,'lxml')

In [4]:
table = soup.find_all('table')[6]
table_rows = table.find_all('tr')

In [5]:
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    print(row)

[]
['Fenway', '3,129', '3,108', '257', '8.27%']
['Symphony', '1,921', '1,906', '122', '6.40%']
['Roxbury', '722', '722', '45', '6.23%']
['Mission Hill', '2,126', '2,105', '122', '5.80%']
['East Boston', '1,631', '1,599', '76', '4.75%']
['All Areas', '150,057', '70,053', '2,537', '3.62%']
['City Of Boston', '121,304', '41,304', '1,730', '4.19%']
['Outside Boston', '29,874', '28,749', '807', '2.81%']


###  Boston Apartments Data

In [6]:
import pandas as pd

dfs = pd.read_html('https://bostonpads.com/2019-boston-apartment-rental-market-report/',header=0)
for df in dfs:
    print(df)

   Total Apartments Non Luxury  Total Available Apartments Non Luxury  \
0                        70053                                   2537   

  Real Time Availability Rate  Total Vacant Apartments  \
0                       3.62%                     1338   

   Total Vacant Non-Luxury  
0                      506  
Empty DataFrame
Columns: [Total Galleries, Total Pictures, Total Videos]
Index: []
                    Unnamed: 0 July 2017 July 2019 % Change
0  Average 1 Bedroom Apartment    $1,938    $2,068    6.29%
1  Average 2 Bedroom Apartment    $2,439    $2,594    5.98%
   REAL TIME VACANCY RATE  = Total Apartments Currently Vacant
0  REAL TIME VACANCY RATE  =      Total Apartments in Database
   REAL TIME AVAILABILITY RATE  =  \
0  REAL TIME AVAILABILITY RATE  =   

  (Total Apartments Currently Vacant + Apartments Set to Become Available on a Later Day)  
0                       Total Apartments in Database                                       
     Neighborhood  Total Apart

## Segmenting and Clustering Neighborhoods in Boston

### Import Boston data

In [7]:
import json # library to handle JSON files

#conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

url="https://cocl.us/Geospatial_data"
address = 'boston, MA'

geolocator = Nominatim(user_agent="boston")
#geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Boston are 42.3602534, -71.0582912.


### Get the URL using FourSquare API

In [8]:
CLIENT_ID = 'PVLO1XLOLFHLK50CEARO2QPNTY1U0AP4HNYPJIIJODXAGWSS' # your Foursquare ID
CLIENT_SECRET = 'KK42LKUP0OSHVW1GB5JEWUDIOE5T3RSARPDNFIJJ3OLLSHHG' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
radius = 500
#print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

Your credentails:
CLIENT_ID: PVLO1XLOLFHLK50CEARO2QPNTY1U0AP4HNYPJIIJODXAGWSS
CLIENT_SECRET:KK42LKUP0OSHVW1GB5JEWUDIOE5T3RSARPDNFIJJ3OLLSHHG


'https://api.foursquare.com/v2/venues/search?client_id=PVLO1XLOLFHLK50CEARO2QPNTY1U0AP4HNYPJIIJODXAGWSS&client_secret=KK42LKUP0OSHVW1GB5JEWUDIOE5T3RSARPDNFIJJ3OLLSHHG&ll=42.3602534,-71.0582912&v=20180604&radius=500&limit=100'

In [9]:
results = requests.get(url).json()

### Get the Venues and Categories

In [10]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.shape
#dataframe.head()

(100, 24)

In [12]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Recreo Coffee Roasterie,Coffee Shop,1 City Hall Ave,US,Boston,United States,,27,"[1 City Hall Ave, Boston, MA 02108, United Sta...","[{'label': 'display', 'lat': 42.360016, 'lng':...",42.360016,-71.0582,,02108,MA,5b2924603ba767002cdf864f
1,Boston City Hall,City Hall,1 City Hall Sq.,US,Boston,United States,at Congress St.,26,"[1 City Hall Sq. (at Congress St.), Boston, MA...","[{'label': 'display', 'lat': 42.36036733073828...",42.360367,-71.058004,,02201,MA,4a942695f964a5208a2020e3
2,City Hall Plaza,Plaza,1 City Hall Sq,US,Boston,United States,at Cambridge St,118,"[1 City Hall Sq (at Cambridge St), Boston, MA ...","[{'label': 'display', 'lat': 42.35965223045038...",42.359652,-71.059477,,02201-1001,MA,4a74b8c9f964a5205edf1fe3
3,The Freedom Trail,Historic Site,The Freedom Trail,US,Boston,United States,btwn Tremont St & Essex St,397,[The Freedom Trail (btwn Tremont St & Essex St...,"[{'label': 'display', 'lat': 42.35731415901699...",42.357314,-71.061038,,02111,MA,4b41657df964a5203dc625e3
4,City of Boston Credit Union,Credit Union,1 City Hall Sq,US,Boston,United States,,20,"[1 City Hall Sq, Boston, MA 02111, United States]","[{'label': 'display', 'lat': 42.36034576801684...",42.360346,-71.058079,,02111,MA,4cb1e179db32f04d0fa3cb4d


In [13]:
#Create function to know how many venues there are in Boston
import pandas as pd
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Get the Neighborhoods in Boston by FourSquare API

In [14]:
#List of Neighborhood that have venues in Boston
boston_venues = getNearbyVenues(names=dataframe_filtered['name'],
                                   latitudes=dataframe_filtered['lat'],
                                   longitudes=dataframe_filtered['lng']
                                  )

boston_venues.head()

Recreo Coffee Roasterie
Boston City Hall
City Hall Plaza
The Freedom Trail
City of Boston Credit Union
Steaming Kettle
Boston Transportation Department
Old State House
Real Staffing Solutions
Union Square Donuts
Cocobeet
SUBWAY
Bill Russell Statue
Dunkin'
Robert Scibilia Square
Boston Redevelopment Authority Office
Jantzen and Associates, P.C.
McDermott Will & Emery LLP
Triangle Coffee
Boston City Hall Farmers' Market
Policy Room
Hub On Wheels
James Michael Curley Park
M&N Great Taste
Teri-Yummy food truck at City Hall.
Food Trucks at City Hall
City Hall Deli (8th Floor)
FBI Boston Headquarters
Boston Night Market
One Medical
Richard's Barber Shop
Two Center Plaza
State Auditor's Office
One Washington Mall
Revelry Food Truck
Dreamland Wax Museum
Carmen Park
Arnold "Red" Auerbach Statue
Thermopylae
Mayor Kevin White Statue
Trixie's Palace
Pearle Vision
Sears Crescent & Sears Block
The Patios
Boston Seasons
Big Apple Circus
Trolley Dogs
Best Of Boston
Swarovski
Sa Pa Food Truck
Boston De

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Recreo Coffee Roasterie,42.360016,-71.0582,Tatte Bakery & Cafe,42.358451,-71.057981,Bakery
1,Recreo Coffee Roasterie,42.360016,-71.0582,Faneuil Hall Marketplace,42.359978,-71.05641,Historic Site
2,Recreo Coffee Roasterie,42.360016,-71.0582,Saus Restaurant,42.361076,-71.057054,Belgian Restaurant
3,Recreo Coffee Roasterie,42.360016,-71.0582,Boston Public Market,42.36195,-71.057466,Market
4,Recreo Coffee Roasterie,42.360016,-71.0582,Zo,42.359303,-71.060352,Greek Restaurant


In [15]:
print('{} venues were returned by Foursquare.'.format(boston_venues.shape[0]))
boston_venues.head(20)

9911 venues were returned by Foursquare.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Recreo Coffee Roasterie,42.360016,-71.0582,Tatte Bakery & Cafe,42.358451,-71.057981,Bakery
1,Recreo Coffee Roasterie,42.360016,-71.0582,Faneuil Hall Marketplace,42.359978,-71.05641,Historic Site
2,Recreo Coffee Roasterie,42.360016,-71.0582,Saus Restaurant,42.361076,-71.057054,Belgian Restaurant
3,Recreo Coffee Roasterie,42.360016,-71.0582,Boston Public Market,42.36195,-71.057466,Market
4,Recreo Coffee Roasterie,42.360016,-71.0582,Zo,42.359303,-71.060352,Greek Restaurant
5,Recreo Coffee Roasterie,42.360016,-71.0582,Boston Massacre Monument,42.358955,-71.056971,Monument / Landmark
6,Recreo Coffee Roasterie,42.360016,-71.0582,Red's Best,42.36196,-71.057587,Seafood Restaurant
7,Recreo Coffee Roasterie,42.360016,-71.0582,Ruth's Chris Steak House,42.358125,-71.059337,Steakhouse
8,Recreo Coffee Roasterie,42.360016,-71.0582,The Oceanaire Seafood Room,42.359071,-71.059173,Seafood Restaurant
9,Recreo Coffee Roasterie,42.360016,-71.0582,Old State House,42.358865,-71.057462,Historic Site


#### Group the Venues

In [16]:
#Number of venues per neighborhood
boston_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
100 Court St Boston Mass,100,100,100,100,100,100
33 Bowker St,100,100,100,100,100,100
AES Language Institute,100,100,100,100,100,100
Aris Eat Barbecue,100,100,100,100,100,100
"Arnold ""Red"" Auerbach Statue",100,100,100,100,100,100
Best Of Boston,100,100,100,100,100,100
Big Apple Circus,100,100,100,100,100,100
Bill Russell Statue,100,100,100,100,100,100
Birth Certificate Window,100,100,100,100,100,100
Blue Hills Bank Pavillon,100,100,100,100,100,100


In [17]:
print('There are {} uniques categories.'.format(len(boston_venues['Venue Category'].unique())))

There are 113 uniques categories.


In [18]:
# one hot encoding
boston_onehot = pd.get_dummies(boston_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
boston_onehot['Neighborhood'] = boston_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [boston_onehot.columns[-1]] + list(boston_onehot.columns[:-1])
manhattan_onehot = boston_onehot[fixed_columns]

boston_onehot.head()

Unnamed: 0,American Restaurant,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Beer Bar,Beer Garden,Belgian Restaurant,Bookstore,...,Tea Room,Thai Restaurant,Tourist Information Center,Track Stadium,Trail,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Wine Shop,Yoga Studio
0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
boston_grouped = boston_onehot.groupby('Neighborhood').mean().reset_index()
boston_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Beer Bar,Beer Garden,Belgian Restaurant,...,Tea Room,Thai Restaurant,Tourist Information Center,Track Stadium,Trail,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Wine Shop,Yoga Studio
0,100 Court St Boston Mass,0.04,0.0,0.0,0.01,0.03,0.03,0.01,0.01,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01
1,33 Bowker St,0.01,0.0,0.0,0.01,0.06,0.03,0.01,0.01,0.01,...,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
2,AES Language Institute,0.03,0.0,0.0,0.01,0.03,0.03,0.01,0.01,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0
3,Aris Eat Barbecue,0.01,0.0,0.0,0.01,0.05,0.03,0.01,0.01,0.01,...,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0
4,"Arnold ""Red"" Auerbach Statue",0.01,0.01,0.0,0.01,0.03,0.03,0.01,0.01,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0


#### Get Top 5 Venues

In [20]:
num_top_venues = 5

for hood in boston_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = boston_grouped[boston_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----100 Court St Boston Mass----
                 venue  freq
0          Coffee Shop  0.08
1        Historic Site  0.07
2   Seafood Restaurant  0.07
3  American Restaurant  0.04
4   Italian Restaurant  0.04


----33 Bowker St----
                venue  freq
0  Italian Restaurant  0.12
1  Seafood Restaurant  0.08
2         Coffee Shop  0.07
3       Historic Site  0.07
4              Bakery  0.06


----AES Language Institute----
                venue  freq
0         Coffee Shop  0.08
1  Seafood Restaurant  0.08
2       Historic Site  0.07
3  Italian Restaurant  0.04
4               Hotel  0.04


----Aris Eat Barbecue----
                venue  freq
0         Coffee Shop  0.09
1  Seafood Restaurant  0.08
2       Historic Site  0.07
3  Italian Restaurant  0.06
4              Bakery  0.05


----Arnold "Red" Auerbach Statue----
                venue  freq
0         Coffee Shop  0.08
1  Seafood Restaurant  0.08
2       Historic Site  0.07
3                Park  0.04
4  Italian Restaurant  0.0

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [22]:
import numpy as np # library to handle data in a vectorized manner
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = boston_grouped['Neighborhood']

for ind in np.arange(boston_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(boston_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,100 Court St Boston Mass,Coffee Shop,Seafood Restaurant,Historic Site,Italian Restaurant,Sandwich Place
1,33 Bowker St,Italian Restaurant,Seafood Restaurant,Historic Site,Coffee Shop,Bakery
2,AES Language Institute,Seafood Restaurant,Coffee Shop,Historic Site,Italian Restaurant,Hotel
3,Aris Eat Barbecue,Coffee Shop,Seafood Restaurant,Historic Site,Italian Restaurant,Bakery
4,"Arnold ""Red"" Auerbach Statue",Seafood Restaurant,Coffee Shop,Historic Site,Italian Restaurant,Park


#### Clustering the Neighborhoods

In [23]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 2

boston_grouped_clustering = boston_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(boston_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [24]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'ClusterLabels', kmeans.labels_)

boston_merged = dataframe_filtered

# merge boston_grouped with boston_data to add latitude/longitude for each neighborhood
boston_merged = boston_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='name')

boston_merged = boston_merged.dropna()


boston_merged # check the last columns!

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,...,neighborhood,postalCode,state,id,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
13,Dunkin',Donut Shop,2 City Hall Sq,US,Boston,United States,City Hall Plaza,189,"[2 City Hall Sq (City Hall Plaza), Boston, MA ...","[{'label': 'display', 'lat': 42.35899783188332...",...,Downtown Boston,2201,MA,4b9a5171f964a520f3ab35e3,0,Historic Site,Coffee Shop,American Restaurant,Seafood Restaurant,Italian Restaurant
59,Spicy Salaa,Food Truck,City Hall Plaza,US,Boston,United States,Fisher Park,98,"[City Hall Plaza (Fisher Park), Boston, MA 022...","[{'label': 'display', 'lat': 42.35957166573581...",...,Downtown Boston,2203,MA,5925b4dd6fd62668ebf46ba9,0,Coffee Shop,Seafood Restaurant,Historic Site,Italian Restaurant,Sandwich Place


In [25]:
# create map
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(boston_merged['lat'], boston_merged['lng'], boston_merged['name'], boston_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1e             |       h516909a_0         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    certifi-2019.11.28         |   py36h9f0ad1d_1         149 KB  conda-forge
    ------------------------------------------------------------
                       

#### List the venues by Cluster labels

In [27]:
boston_merged.loc[boston_merged['ClusterLabels'] ==0, boston_merged.columns[[1] + list(range(5, boston_merged.shape[1]))]]

Unnamed: 0,categories,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
13,Donut Shop,United States,City Hall Plaza,189,"[2 City Hall Sq (City Hall Plaza), Boston, MA ...","[{'label': 'display', 'lat': 42.35899783188332...",42.358998,-71.05985,Downtown Boston,2201,MA,4b9a5171f964a520f3ab35e3,0,Historic Site,Coffee Shop,American Restaurant,Seafood Restaurant,Italian Restaurant
59,Food Truck,United States,Fisher Park,98,"[City Hall Plaza (Fisher Park), Boston, MA 022...","[{'label': 'display', 'lat': 42.35957166573581...",42.359572,-71.057533,Downtown Boston,2203,MA,5925b4dd6fd62668ebf46ba9,0,Coffee Shop,Seafood Restaurant,Historic Site,Italian Restaurant,Sandwich Place


# 4. Results & Discussion

According to the result of our analysis, downtown Boston (in cluster 1) is the best neighborhood to open an ice cream shop in Boston. It has the greatest number of total venues and is expected to attract many people to visit. This neighborhood with Park, city plazza hall and no ice cream shop. However, the final decision is dependent on the budget of the client to rent the place. According to the budget limitations of the client, we can find other neighborhoods and select the one with the greatest number of venues and a smaller number of ice cream shops around. This clustering can also help our client compare different neighborhoods to make a better decision 

# 5. Conclusion

Used the information about the neighborhoods of Boston to help my client to select the best location to open an ice cream shop in Boston. Considered number of competitors, demand for ice cream, and budget limitations to find the best location options. This project has some limitations. First, we have limited the location options to the neighborhoods of Boston, however considering more specific locations such as apartmental Areas can improve the accuracy of our decision. Second, we used the median of one bedroom rent price in each neighborhood as an estimate of the rent price. Third, we limited our investigation for the venues within 500 meter of the neighborhoods and changing it can affect the clusters. 
Over all Good learning and understood better on a activites needs to be perfromed by a Data scientist.
#### Limitations:
The current limitation for number of clustering. the results may vary based on the cluster selection and distance range maentioned in neighborhood.