# Manhattan Juice Bar
## IBM Data Science Capstone Project - Aditya Maddali

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

The Health & Wellness industry is ever expanding, and the global wellness economy is currently valued at $4.5 trillion. A significant portion of this market is devoted to the healthy eating and nutrition segment at 702 billion dollars. (Source: Global Wellness Institute). 

This focus on healthy eating is seeing a major shift, with the average consumer more conscious than ever about the food they consume. Trends such as transparency and naturalness dominate our daily decisions starting with breakfast, to mindful eating through the day. Companies flaunt the origin, quality and purity of the ingredients as a measure towards transparency while promising minimally processed food to maximize the nutritional benefits.  

A 2019 article published in the International Journal of Obesity shows the correlation between regular exercise and dietary preference. The lead researcher said that compliance with an exercise regimen is strongly co-related with a move towards healthy eating. It was also noted that exercise can change brain function and the rewiring may be behind the urge to eat more healthfully. The ongoing pandemic has also shifted consumer priorities and has the populace looking for health-focused products that can be accessed easily and safely.  

The aim of the project is to help players in the health and wellness segment by identifying highly desirable physical locations for their businesses in Manhattan, New York.  The project particularly focuses on fresh juice stands, juice bars, healthy smoothie stations, juice cleanses or similar businesses. For an independent self-starter hoping to cash in on the recent fitness boom, this project will provide optimal locations to set up a new health-food business that can leverage existing synergistic businesses and startup with very little capital. 
The above-mentioned correlation between healthy habits was utilized to analyze existing fitness centers and recommend prime locations to set up juice bars to maximize profitability. 


## Data <a name="data"></a>

The following data was analyzed to determine the optimal locations to set up juice bars in the Manhattan borough of New York City. 

1.	Locations of Fitness Centers
2.	Locations of Existing Juice Bars
3.	Locations of Existing Health and Wellness Markets
 
The goal of the project is to analyze the three groups of data in order to be within walking distance of fitness centers while avoiding existing juice bars to maximize revenue potential by avoiding competition. Health and Wellness markets were included to further refine the data to identify locations to develop synergistic relationships by partnering with them and reducing start-up costs of establishing a standalone juice bar. 

Data sources 
1.	**NYU Spatial Data Repository** - *2014 New York City data*
2.	**Foursquare API** – *Business/ Landmark location*


In [1]:
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

!pip install geopy
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize
from pandas import DataFrame

import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN 
from sklearn.preprocessing import StandardScaler 

#! pip install folium==0.5.0
import folium

print('Libraries imported')

Libraries imported


In [2]:
with open(r"C:\Users\adoka\Downloads\newyork_data.json") as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 


neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [3]:
# ------------------SENSITIVE CELLS

CLIENT_ID = 'I1S5EMFPHMLKG4TCHBG0DKDEM2COUHJJCBP2E4DTAL5OD5UH' 
CLIENT_SECRET = 'VTK1XYD2HZXLAUQVCXVSQWMDUD50UVBZX5DLX0HZKOTMSYCG' 

#CLIENT_ID = 'WFGLZDWGKGV5PK1WHL04YW2FGG2YDMYDHOBHZ51VI4T3M1QW' # alt
#CLIENT_SECRET = 'BU2I1LZODH20I0KGAVU2LLJJEJ3LM42L4OTXLI3E0HZSUVV4' # alt

VERSION = '20180605'
LIMIT = 100

print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

CLIENT_ID: I1S5EMFPHMLKG4TCHBG0DKDEM2COUHJJCBP2E4DTAL5OD5UH
CLIENT_SECRET:VTK1XYD2HZXLAUQVCXVSQWMDUD50UVBZX5DLX0HZKOTMSYCG


In [4]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [5]:
# -------------------SENSITIVE CODE------------------
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'])

In [7]:
print(manhattan_venues.shape)
manhattan_venues.to_csv()
manhattan_venues.head()

(3204, 5)


Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,Dunkin',40.877136,-73.906666,Donut Shop
4,Marble Hill,Starbucks,40.877531,-73.905582,Coffee Shop


### Extracting Fitness Center data

We are interested in the location of fitness centers in Manhattan. First, we will filter the Manhattan locations for various types of fitness centers.


In [8]:
list = ['Yoga Studio','Gym','Tennis Stadium','Massage Studio','Gym / Fitness Center','Climbing Gym','Cycle Studio','Gymnastics Gym','Pilates Studio','Boxing Gym','Gym Pool']
fitness_loc = manhattan_venues[manhattan_venues['Venue Category'].isin(list)].reset_index(drop=True)
print("Number of fitness centers in Manhattan are:",fitness_loc.shape[0])
fitness_loc.head()

Number of fitness centers in Manhattan are: 194


Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,Bikram Yoga,40.876844,-73.906204,Yoga Studio
1,Marble Hill,Astral Fitness & Wellness Center,40.876705,-73.906372,Gym
2,Marble Hill,Blink Fitness,40.877271,-73.905595,Gym
3,Marble Hill,TCR The Club of Riverdale,40.878628,-73.914568,Tennis Stadium
4,Chinatown,Grand Nature,40.717561,-73.992011,Massage Studio


Let's visualize how these **194 Fitness Centers** are distributed over Manhattan via a heat map

In [9]:
#Getting lat lon for fitness locations to generate heat map
latlons = [[fitness_loc['Venue Latitude'][i], fitness_loc['Venue Longitude'][i]] for i in range(0,fitness_loc.shape[0])]

# create heatmap of fitness centers in Manhattan using latitude and longitude values
address = 'Midtown, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

from folium.plugins import HeatMap

fitness_heatmap = folium.Map(location=[latitude, longitude], zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(fitness_heatmap)
HeatMap(latlons).add_to(fitness_heatmap)

fitness_heatmap

### Extracting Juice Bar data

We will now explore what's around the **194 fitness centers** in Manhttan. We are setting a radius of 500 meters to explore. We are particularly interested in finding and avoiding juice bars that are within 500m of a fitness center.

In [10]:
# ------------------------SENSITIVE CODE---------------------------------
fitness_nearby = getNearbyVenues(names=fitness_loc['Venue'],
                                   latitudes=fitness_loc['Venue Latitude'],
                                   longitudes=fitness_loc['Venue Longitude'])

In [13]:
print('No. of locations near fitness center''s in Manhattan', fitness_nearby.shape[0])
fitness_nearby.head()

No. of locations near fitness centers in Manhattan 17676


Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bikram Yoga,Bikram Yoga,40.876844,-73.906204,Yoga Studio
1,Bikram Yoga,Sam's Pizza,40.879435,-73.905859,Pizza Place
2,Bikram Yoga,Starbucks,40.877531,-73.905582,Coffee Shop
3,Bikram Yoga,The Bronx Public,40.878377,-73.903481,Pub
4,Bikram Yoga,ALDI,40.877836,-73.904656,Supermarket


In [34]:
juicebar_loc = fitness_nearby[fitness_nearby['Venue Category'].isin(['Juice Bar'])].reset_index(drop=True)
juicebar_loc.rename(columns={"Neighborhood":"Fitness Center"},inplace=True)
juicebar_loc.head()

Unnamed: 0,Fitness Center,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Grand Nature,Hawa Smoothies,40.7142,-73.98939,Juice Bar
1,Bread and Yoga,disFruta,40.864613,-73.919199,Juice Bar
2,Bikram Yoga Harlem,Branson Got Juice!,40.825525,-73.943681,Juice Bar
3,Brahman Yoga Studio,Branson Got Juice!,40.825525,-73.943681,Juice Bar
4,Steep Rock West,Oasis Juice Bar,40.815017,-73.958879,Juice Bar


In [35]:
#modifying fitness_loc for merge
fitness_loc.drop(['Venue Category'],1,inplace=True,errors='ignore')
fitness_loc.rename(columns={"Venue":"Fitness Center","Venue Latitude":"FC Latitude","Venue Longitude":"FC Longitude"},inplace=True)
fitness_loc.head()

Unnamed: 0,Neighborhood,Fitness Center,FC Latitude,FC Longitude
0,Marble Hill,Bikram Yoga,40.876844,-73.906204
1,Marble Hill,Astral Fitness & Wellness Center,40.876705,-73.906372
2,Marble Hill,Blink Fitness,40.877271,-73.905595
3,Marble Hill,TCR The Club of Riverdale,40.878628,-73.914568
4,Chinatown,Grand Nature,40.717561,-73.992011


In [36]:
fitness_w_juicebar = pd.merge(fitness_loc,juicebar_loc,on ='Fitness Center', how ='left')
fitness_w_juicebar.drop_duplicates(subset=['Fitness Center','FC Latitude','FC Longitude'],inplace=True)
print(fitness_w_juicebar.shape)
fitness_w_juicebar.head()

(181, 8)


Unnamed: 0,Neighborhood,Fitness Center,FC Latitude,FC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,Bikram Yoga,40.876844,-73.906204,,,,
1,Marble Hill,Astral Fitness & Wellness Center,40.876705,-73.906372,,,,
2,Marble Hill,Blink Fitness,40.877271,-73.905595,Juice Generation,40.751412,-73.976537,Juice Bar
8,Marble Hill,TCR The Club of Riverdale,40.878628,-73.914568,,,,
9,Chinatown,Grand Nature,40.717561,-73.992011,Hawa Smoothies,40.7142,-73.98939,Juice Bar


In [37]:
# one hot encoding on Venue Category to add Juice Bar column to fitness_w_juicebar
onehot = pd.get_dummies(fitness_w_juicebar[['Venue Category']], prefix="", prefix_sep="")
fitness_w_juicebar['Juice Bar'] = onehot['Juice Bar']

fitness_w_juicebar.head()

Unnamed: 0,Neighborhood,Fitness Center,FC Latitude,FC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Juice Bar
0,Marble Hill,Bikram Yoga,40.876844,-73.906204,,,,,0
1,Marble Hill,Astral Fitness & Wellness Center,40.876705,-73.906372,,,,,0
2,Marble Hill,Blink Fitness,40.877271,-73.905595,Juice Generation,40.751412,-73.976537,Juice Bar,1
8,Marble Hill,TCR The Club of Riverdale,40.878628,-73.914568,,,,,0
9,Chinatown,Grand Nature,40.717561,-73.992011,Hawa Smoothies,40.7142,-73.98939,Juice Bar,1


### Fitness Centers of Interest

Removing all the fitness centers that are already serviced by juice bars helps us to focus on fitness centers that we're interested to serve.

In [56]:
potential_loc = fitness_w_juicebar[fitness_w_juicebar['Juice Bar']==0]
potential_loc.drop(['Venue','Venue Latitude', 'Venue Longitude','Venue Category'],1,inplace=True,errors='ignore')
potential_loc=potential_loc.reset_index(drop=True)
print("Number of fitness centers in Manhattan not being serviced by a juice bar:", potential_loc.shape[0])
potential_loc.head()

Number of fitness centers in Manhattan not being serviced by a juice bar: 80


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,Neighborhood,Fitness Center,FC Latitude,FC Longitude,Juice Bar
0,Marble Hill,Bikram Yoga,40.876844,-73.906204,0
1,Marble Hill,Astral Fitness & Wellness Center,40.876705,-73.906372,0
2,Marble Hill,TCR The Club of Riverdale,40.878628,-73.914568,0
3,Washington Heights,Planet Fitness,40.847536,-73.937937,0
4,Washington Heights,Lucille Roberts,40.848487,-73.934636,0


We will now focus on these **80 fitness centers** that do not have a juice bar within 500m of them

## Methodology <a name="methodology"></a>

The data was analyzed in 4 steps:
1.	The location data of the fitness centers in Manhattan was extracted and visualized using a heat map. The red zones broadly indicate the potential locations of the juice bars. 
2.	The location data of existing juice bars was extracted. This data was collated against nearby fitness centers to determine which fitness centers should be excluded from the analysis.
3.	**K-means Clustering** algorithm was used to cluster the fitness centers and refine the potential locations.
4.	The location data of **synergistic businesses** was then extracted. This data was then analyzed to see if they are close to the potential locations as determined from the clustering step above.


## Analysis <a name="analysis"></a>

It was determined that **7 clusters** is optimal to achieve our objectives. We will then visualize the clusters.

In [57]:
#clustering all the fitness centers
fitclusters = 7

fitness_cluster = potential_loc.drop(['Juice Bar'],1)
fitness_cluster

# run k-means clustering
kmeans = KMeans(n_clusters=fitclusters, random_state=0).fit(fitness_cluster[['FC Latitude','FC Longitude']])
cluster_centers = kmeans.cluster_centers_

In [58]:
# add clustering labels-------- SENSITIVE
#fitness_cluster.drop('Cluster Labels',1,inplace=True,errors='ignore')
fitness_cluster.insert(0, 'Cluster Labels', kmeans.labels_)

#fitness_cluster=fitness_cluster.reset_index(drop=True)
fitness_cluster.head()

Unnamed: 0,Cluster Labels,Neighborhood,Fitness Center,FC Latitude,FC Longitude
0,2,Marble Hill,Bikram Yoga,40.876844,-73.906204
1,2,Marble Hill,Astral Fitness & Wellness Center,40.876705,-73.906372
2,2,Marble Hill,TCR The Club of Riverdale,40.878628,-73.914568
3,5,Washington Heights,Planet Fitness,40.847536,-73.937937
4,5,Washington Heights,Lucille Roberts,40.848487,-73.934636


Let's visualize these **7 clusters** of fitness centers along with the existing juice bar locations

In [71]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_clusters)


# set color scheme for the clusters
x = np.arange(fitclusters)
ys = [i + x + (i*x)**2 for i in range(fitclusters+2)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(fitness_cluster['FC Latitude'],fitness_cluster['FC Longitude'],fitness_cluster['Fitness Center'],fitness_cluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=1).add_to(map_clusters)
    
for lat, lng, label in zip(juicebar_loc['Venue Latitude'], juicebar_loc['Venue Longitude'], juicebar_loc['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color=rainbow[fitclusters],
        fill=True,
        fill_color=rainbow[fitclusters],
        fill_opacity=1).add_to(map_clusters)
    
      
map_clusters

### Synergistic Businesses

The 7 clusters shown on the map only indicate a general area in which the juice bars can be located. To improve the chances of success and to reduce capital expenditures the data was further explored to see if there are businesses with whom a synergistic relationship can be formed. These businesses would have similar clientele as the target customers but are not in direct competition and would be typically labeled as Organic Groceries, Farmers' Markets or Health Food Stores.


Synergistic businesses will have these characteristics:

*	Located close to the cluster center (within 500m) so capital expenditure can be reduced via co-location.
*	Have an established reputation so their customer base and foot traffic can be leveraged.


From a quick analysis below, we can see that there are **15 such locations**. We will further drill down to see which of these 15 markets, satisfy our two criteria above.

In [60]:
market_loc = fitness_nearby[fitness_nearby['Venue Category'].isin(['Organic Grocery','Farmers Market','Health Food Store'])].reset_index(drop=True)
market_loc.drop_duplicates(subset=['Venue','Venue Latitude','Venue Longitude'],inplace=True)
market_loc = market_loc.reset_index(drop=True)
market_loc.rename(columns={"Neighborhood":"Fitness Center",
                           "Venue":"Market",
                           "Venue Latitude":"Latitude",
                           "Venue Longitude":"Longitude",
                           "Venue Category":"Category"},inplace=True)
print("Number of market locations:",market_loc.shape[0])
market_loc.head()

Number of market locations: 15


Unnamed: 0,Fitness Center,Market,Latitude,Longitude,Category
0,Grand Nature,Dimes Deli,40.714592,-73.990817,Organic Grocery
1,Bread and Yoga,Inwood Farmers Market,40.869062,-73.92056,Farmers Market
2,EVF Performance,St. Stephen's Greenmarket,40.773982,-73.950735,Farmers Market
3,CYC Fitness,A Matter of Health,40.768028,-73.955933,Health Food Store
4,Manhattan Park Gym,Roosevelt Island Farmer's Market,40.76421,-73.947606,Farmers Market


We need to understand our cluster information better to analyze these 15 markets and dtermine if they satisfy our criteria. Let's tabulate the information for our 7 clusters:

In [25]:
clusterc_ll = pd.DataFrame(fitness_cluster.groupby(['Cluster Labels'])['Neighborhood'].apply(lambda x: x.value_counts().index[0]))
clusterc_ll['Cluster Latitude']  = cluster_centers[:, 0]
clusterc_ll['Cluster Longitude'] = cluster_centers[:, 1]
clusterc_ll = clusterc_ll.reset_index(drop=False)
clusterc_ll

Unnamed: 0,Cluster Labels,Neighborhood,Cluster Latitude,Cluster Longitude
0,0,Lenox Hill,40.765268,-73.958165
1,1,Civic Center,40.719003,-74.006307
2,2,Marble Hill,40.877392,-73.909048
3,3,Clinton,40.759594,-73.998032
4,4,Midtown South,40.746458,-73.985634
5,5,Washington Heights,40.848011,-73.936286
6,6,Carnegie Hill,40.782181,-73.94838


In [69]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_clusters)


# set color scheme for the clusters
x = np.arange(fitclusters)
ys = [i + x + (i*x)**2 for i in range(fitclusters+2)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(fitness_cluster['FC Latitude'],fitness_cluster['FC Longitude'],fitness_cluster['Fitness Center'],fitness_cluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon],radius=5,popup=label,color='blue',fill=True,fill_color='blue',fill_opacity=1).add_to(map_clusters)
    
for lat, lng, label in zip(market_loc['Latitude'], market_loc['Longitude'], market_loc['Market']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=1).add_to(map_clusters)
    
for lat, lng, label in zip(juicebar_loc['Venue Latitude'], juicebar_loc['Venue Longitude'], juicebar_loc['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color=rainbow[fitclusters],
        fill=True,
        fill_color=rainbow[fitclusters],
        fill_opacity=1).add_to(map_clusters)
    
for lat, lon, cluster in zip(clusterc_ll['Cluster Latitude'],clusterc_ll['Cluster Longitude'],clusterc_ll['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.Circle([lat, lon], radius=500, popup= label, color='green', fill=True, fill_opacity=0.1).add_to(map_clusters)
    
     
map_clusters

We can visually see that some of these markets fit **perfectly** within our cluster centers. Let's calculate the distances of all the markets to the cluster centers to determine which of these are within our criteria of **500m**.

In [62]:
# Function to calculate distance between two points on Earth given latitude and longitude
from math import radians, cos, sin, asin, sqrt 
def lldist(lat1, lon1, lat2, lon2): 
      
#converting lat, lon to radians
    lon1 = radians(lon1) 
    lon2 = radians(lon2) 
    lat1 = radians(lat1) 
    lat2 = radians(lat2) 
       
# Haversine formula  
    dlon = lon2 - lon1  
    dlat = lat2 - lat1 
    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
  
    c = 2 * asin(sqrt(a))  
     
# Radius of earth in meters
    r = 6371000
    return(c * r) 

In [63]:
distances = []
for cc in range(0,clusterc_ll.shape[0]):

    for i in range(0,market_loc.shape[0]):
        dist2_cc = lldist(clusterc_ll['Cluster Latitude'][cc],clusterc_ll['Cluster Longitude'][cc],
                          market_loc['Latitude'][i],market_loc['Longitude'][i])
    
        if dist2_cc <500:
            distances.append([clusterc_ll['Neighborhood'][cc],
                              market_loc['Market'][i], 
                              market_loc['Latitude'][i],
                              market_loc['Longitude'][i],
                              clusterc_ll['Cluster Labels'][cc],
                              dist2_cc
                             ])

Locations = DataFrame (distances,columns=["Neighborhood", 'Market Name','Market Latitude','Market Longitude', 
                                          "Cluster Label", "Distance to Cluster Center"])
                                             
Locations

Unnamed: 0,Neighborhood,Market Name,Market Latitude,Market Longitude,Cluster Label,Distance to Cluster Center
0,Lenox Hill,A Matter of Health,40.768028,-73.955933,0,359.929642
1,Civic Center,Tribeca Greenmarket,40.716802,-74.01088,1,456.535864
2,Clinton,Sunac Natural Food,40.760725,-73.998425,3,130.053516
3,Clinton,Terra Market,40.756864,-73.993737,3,472.221335


## Results <a name="results"></a>

Let's take up the clusters individually and discuss the results we obtained.

**1. Lenox Hill, Clinton, Civic Center  - Clusters 0,1,2** 

These three clusters have clear targets per the criteria. These are prime candidates to set up juice bars with minimal cost and greatest chance of success.

In [64]:
Locations.set_index(['Cluster Label'],drop=True)

Unnamed: 0_level_0,Neighborhood,Market Name,Market Latitude,Market Longitude,Distance to Cluster Center
Cluster Label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Lenox Hill,A Matter of Health,40.768028,-73.955933,359.929642
1,Civic Center,Tribeca Greenmarket,40.716802,-74.01088,456.535864
3,Clinton,Sunac Natural Food,40.760725,-73.998425,130.053516
3,Clinton,Terra Market,40.756864,-73.993737,472.221335


**2. Carnegie Hill - Cluster 6**

Carnegie Hill has a good concentration of fitness centers and there are no competing juice bars near the cluster center. However, this location may need significant investment to setup a physical location near the cluster center as there are no existing synergistic businesses that can be leveraged. More cost-benefits analysis need to be performed to proceed.

**3. Marble Hill, Washington Heights - Cluster 3,5**

The concentration of fitness centers is really low in these neighborhoods therefore may not justify the capital required to setup a juice bar. There is a possibility to serve these clusters from a central location with a synergistic business: **Inwood Farmers Market, Inwood, NY**. However, there could be competition from a nearby juice bar in Inwood.

**4. Midtown South - Cluster 4**

Midtown South has a good number of fitness centers however there is competition and no compatible businesses. This should be the last target.

## Discussion

As with any data analysis, the recommendations are only as good as the data used. Stakeholders need to constantly monitor and update this data to ensure that the analysis is providing the right direction.


Also, the assumptions and constraints play a crucial role in analyzing the business case and in providing the resulting recommendations. These need to monitored regularly and updated when the business environment changes.



## Conclusion <a name="conclusion"></a>

Purpose of this project was to analyze the location of fitness centers in Manhattan and suggest optimal locations to setup a fresh juice bar to attract customers from underserved fitness centers. In addition, to leverage the benefits of co-location with a synergistic business. The results for each neighborhood have been clearly discussed above.

Stakeholders can now obtain a lot of valuable information on the distribution of fitness centers in Manhattan and can strategically locate their juice bars depending on their risk appetite and capital availability.



**A final map showing the cluster centers and synergistic business** (if applicable)

In [70]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_clusters)
  
for lat, lng, label in zip(Locations['Market Latitude'], Locations['Market Longitude'], Locations['Market Name']):
    label = folium.Popup(str(label), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=1).add_to(map_clusters)
       
for lat, lon, cluster,poi in zip(clusterc_ll['Cluster Latitude'],clusterc_ll['Cluster Longitude'],clusterc_ll['Cluster Labels'],clusterc_ll['Neighborhood']):
    label = folium.Popup(str(poi)+' Cluster ' + str(cluster), parse_html=True)
    folium.Circle([lat, lon], radius=500, popup= label, color='green', fill=True, fill_opacity=0.1).add_to(map_clusters)
    
     
map_clusters