<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 6>Finding the best place to open Yoga Studio in Manhattan, New York City</font></h1>

## Introduction/Business Problem

The client, leading company in the fitness business wants to expand to New York City. He would like to focus on opening new fitness centers in Manhattan. He needs data that will support his decision about the most suitable location. Data about the main competitors are also asked. Unfortunately, the client doesn't any data which could be used, so we have decided to use open data to solve this problem.

## Data
To solve this problem we use this data:

2014 New York City Neighborhood Names - To get names and coordinates of all neighborhoods in New York City. https://geo.nyu.edu/catalog/nyu_2451_34572

Foursquare Places API to explore neighborhoods and current competitors. https://developer.foursquare.com/docs/api/venues/search

## 0. Python Libraries

At the beginning we are installing geopy and folium packages to python.

In [164]:
!pip install geopy
!pip install folium



It is necessary to import all libraries which we will use.

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Dataset

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web. Feel free to try to find this dataset on your own, but here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

We downloads data from server.

In [6]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data and create an array.

In [165]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
neighborhoods_data = newyork_data['features']

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [166]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [167]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688
5,Manhattan,Manhattanville,40.816934,-73.957385
6,Manhattan,Central Harlem,40.815976,-73.943211
7,Manhattan,East Harlem,40.792249,-73.944182
8,Manhattan,Upper East Side,40.775639,-73.960508
9,Manhattan,Yorkville,40.77593,-73.947118


#### Use geopy library to get the latitude and longitude values of Manhattan, New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [168]:
address = 'Manhattan, New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan, New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan, New York City are 40.7896239, -73.9598939.


#### Create a map of New York with neighborhoods superimposed on top.

In [169]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=500,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### K Means Clustering
There are too many clusters for our analysis and calling the Foursquare API, so we will use K Means Clustering to get 20 clusters.

In [147]:
# set number of clusters
kclusters = 20

manhattan_grouped_clustering = neighborhoods.drop('Neighborhood', 1)
manhattan_grouped_clustering = manhattan_grouped_clustering.drop('Borough', 1)
manhattan_grouped_clustering
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0, n_init=500).fit(manhattan_grouped_clustering)


manhattan_grouped_clustering.insert(0, 'Cluster', kmeans.labels_)

manhattan_grouped_clustering = manhattan_grouped_clustering.groupby('Cluster').mean().reset_index()
manhattan_grouped_clustering

Unnamed: 0,Cluster,Latitude,Longitude
0,0,40.744091,-73.98983
1,1,40.851903,-73.9369
2,2,40.792249,-73.944182
3,3,40.718375,-74.008049
4,4,40.756161,-73.965632
5,5,40.818838,-73.950095
6,6,40.787658,-73.977059
7,7,40.872117,-73.915935
8,8,40.722971,-73.98385
9,9,40.757879,-73.998115


## Then we will display result on a map

In [170]:
# create map of New York using latitude and longitude values
from folium.features import DivIcon
map_newyork2 = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, clabels in zip(manhattan_grouped_clustering['Latitude'], manhattan_grouped_clustering['Longitude'], manhattan_grouped_clustering['Cluster']):
    label = '{}, {}'.format(clabels, "aa")
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=1000,
        popup=label,
        color='black',
        opacity=0.2,
        fill=True,
        fill_color='red',
        fill_opacity=0.2,
        legend_name='Unemployment Rate (%)',
        parse_html=False).add_to(map_newyork2) 

#folium.LayerControl().add_to(map_newyork2)    
map_newyork2

## Foursquare API
We will connect to Foursquare API via credentials and load all data we need.

In [64]:
CLIENT_ID = 'EJCQWYHVCHNTPBQIRO5AS3JDCRWELWCTRQ04MLO5L2VFJ0R1' # your Foursquare ID
CLIENT_SECRET = '5OC1XWIM3WCSZYJUQOFGQCTS4JAEILPA3BCIYOQTTJZN4E3W' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EJCQWYHVCHNTPBQIRO5AS3JDCRWELWCTRQ04MLO5L2VFJ0R1
CLIENT_SECRET:5OC1XWIM3WCSZYJUQOFGQCTS4JAEILPA3BCIYOQTTJZN4E3W


We will save data to two lists.
Venues_list - clustered data
all_venues_list - unclustered data

Into these list we will load data from Foursquare.

In [119]:
venues_list=[]
all_venues_list = []
def getNearbyVenues(clusters, latitudes, longitudes, radius=500):
    
    
    for cluster, lat, lng in zip(clusters, latitudes, longitudes):
        print(cluster)
            
        LIMIT = 100 # limit of number of venues returned by Foursquare API
        radius = 1000 # define radius
        category = "4bf58dd8d48988d102941735" #YOHA STUDIO
        #category = "4bf58dd8d48988d175941735" #GYM
        #category = "4d4b7105d754a06374d81259" #FOOD
        #query =""
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            category
            )
            
        # make the GET request
        venues = requests.get(url).json()["response"]['venues']
        if len(venues)>0:
            venue = requests.get(url).json()["response"]['venues'][0]
            
            for v in venues:
                            
                venues_list.append([v['name'],cluster,v['id']])
                all_venues_list.append([v['name'],cluster,v['id'],v['location']['lat'],v['location']['lng']])
            #print(requests.get(url).json()["response"]["venues"][0]["name"])
        
        # return only relevant information for each nearby venue
        """venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']"""
    
    return#(nearby_venues)

# type your answer here

manhattan_venues = getNearbyVenues(clusters=manhattan_grouped_clustering['Cluster'],
                                   latitudes=manhattan_grouped_clustering['Latitude'],
                                   longitudes=manhattan_grouped_clustering['Longitude']
                                  )

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19


We create new dataframe with Name, Cluster and ID of a Yoga Studio.

In [120]:

p = pd.DataFrame(venues_list, columns=['Name','Cluster','ID'])
p.head()

Unnamed: 0,Name,Cluster,ID
0,Kinespirit @ Studio Riverside,0,4c73d2221b11199c38ef6113
1,Ny Loves Yoga,0,4df0360cd4c04d0392c519a8
2,Momentum Fitness,0,50112087e4b06b8dcac7a0ae
3,NY Loves Yoga,0,4b55d36ff964a520b3f127e3
4,Baby Yoga Center,0,4b9bef50f964a520b23736e3


In [121]:
pa = pd.DataFrame(venues_list)
pa.head()

Unnamed: 0,0,1,2
0,Kinespirit @ Studio Riverside,0,4c73d2221b11199c38ef6113
1,Ny Loves Yoga,0,4df0360cd4c04d0392c519a8
2,Momentum Fitness,0,50112087e4b06b8dcac7a0ae
3,NY Loves Yoga,0,4b55d36ff964a520b3f127e3
4,Baby Yoga Center,0,4b9bef50f964a520b23736e3


We prepare data which we will show on a map.

In [122]:
pa = pd.DataFrame(all_venues_list, columns = ['Name','Cluster','ID','Latitude','Longitude'])
pa.drop_duplicates(subset ="ID", keep = False, inplace = True)
pa.head()

Unnamed: 0,Name,Cluster,ID,Latitude,Longitude
0,Kinespirit @ Studio Riverside,0,4c73d2221b11199c38ef6113,40.787487,-73.976727
3,NY Loves Yoga,0,4b55d36ff964a520b3f127e3,40.785205,-73.97517
5,Yoga for Humanity,0,4bd8bbd7f645c9b6cbf9a8e0,40.785017,-73.977632
7,Upper West Side Yoga and Wellness,0,516827db72da218d58fdf80b,40.785682,-73.97231
8,Boom Boom Room,0,50fa0bf9e4b07dd90e2104dc,40.783482,-73.971806


We prepare data which we will show on a map.

In [138]:
p = pd.DataFrame(venues_list, columns=['Name','Cluster','ID'])
p = p.groupby('Cluster').count().reset_index()

#p = p.sort_values(by=['a'], ascending=False)
ndf = manhattan_grouped_clustering.copy()

ndf = pd.concat([ndf, p['ID']], axis=1)
ndf.rename(columns={"ID": "Num of Yoga studios"}, inplace=True)

ndf

Unnamed: 0,Cluster,Latitude,Longitude,Num of Yoga studios
0,0,40.787658,-73.977059,21
1,1,40.722971,-73.98385,46
2,2,40.872117,-73.915935,4
3,3,40.818838,-73.950095,9
4,4,40.75308,-73.967495,17
5,5,40.779306,-73.950187,18
6,6,40.757879,-73.998115,19
7,7,40.70952,-74.013767,25
8,8,40.851903,-73.9369,2
9,9,40.734105,-73.977714,21


We create a map with dispayed clusters and all Yoga studios. We mark prefered clusters to open new business with red color.

In [163]:
# create map of New York using latitude and longitude values
from folium.features import DivIcon
map_newyork3 = folium.Map(location=[latitude, longitude], zoom_start=11)
text = 'Test'
# add markers to map
from folium.plugins import MarkerCluster
for lat, lng, clabels, a in zip(ndf['Latitude'], ndf['Longitude'], ndf['Cluster'], ndf['Num of Yoga studios']):
    label = 'Cluster {}, has {} Yoga Studios nearby.'.format(clabels, a)
    label = folium.Popup(label, parse_html=True)
    c = 'black'
    o = 0.8
    if clabels  ==  18 or clabels  == 5 or clabels  == 17:
        c = 'red'
    folium.Circle(
        [lat, lng],
        radius=1100,
        popup=label,
        color=c,
        opacity=o,
        fill=True,
        fill_color='grey',
        fill_opacity=a/50/2,
        #fill_opacity=clabels/15,
        legend_name='Unemployment Rate (%)',
        parse_html=False).add_to(map_newyork3)

mc = MarkerCluster()
    
for lat, lng, name, cluster in zip(pa['Latitude'], pa['Longitude'], pa['Name'], pa['Cluster']):
    label = 'Cluster {}, has {} Yoga Studios nearby.'.format(clabels, "a")
    label = folium.Popup(label, parse_html=True)
    #print(lng)
    #if cluster  !=  17:
        #continue
    folium.Marker(
        [lat, lng],
        radius=1,
        icon=folium.Icon(icon='star', color='orange'),
        popup=label,
        color='red',
        opacity=1,
        fill=True,
        fill_color='red',
        fill_opacity=1,
        parse_html=False).add_to(map_newyork3)    
mc.add_to(map_newyork3) 
map_newyork3

### Report of findings
On the map we can see the map of all Yoga Studios in Manhattan, New York. According to the analysis it would be best to open new Yoga Studio in cluster 18, 5, 17 where is high density of population but very few Yoga Studios. The advantage is also nearness of Central Park, because some lessons could be outside. This would need further analysis of popolation in these clusters.


## Analysis of competitors

In [156]:
competitors = pa.copy()
competitors = competitors.sort_values(by=['Name']).reset_index(drop=True)
competitors

Unnamed: 0,Name,Cluster,ID,Latitude,Longitude
0,Akasha Yoga Studio,1,4c8a6cf43dc2a1cdf4cdac32,40.726703,-73.992164
1,Alphabet City Yoga,9,560d4608498e11feafe798d3,40.725714,-73.98056
2,Anomaste (Anomaly Yoga by Jenny),13,51353441e4b0711004938d4a,40.723385,-73.998403
3,Asali Yoga,3,4f6ded7be4b03b3f39c68401,40.829228,-73.949108
4,Ashtanga Sadhana,1,56ba0790498e8546c93ebb20,40.729289,-73.989232
5,Ashtanga Yoga NY,13,4beab3b2415e20a15b4ee5bb,40.72145,-73.99904
6,Ashtanga Yoga Shala,1,4dbaaa505da389d2c2403e50,40.723799,-73.980079
7,Ashtanga Yoga UWS,16,4d6a6c0dde28224b49764fbe,40.779234,-73.983009
8,Athleta Mind Over Madness Yoga,15,51c45588498edcb7ddc7b8a4,40.757761,-73.985508
9,Atmananda Yoga,4,57a010e3498e22f44c5529ef,40.757642,-73.967881


### Summary of competitors
There are 177 Yoga Studios in total.

There are few chain Yoga studios with the leader Y7:
Y7 (5 studion)
Exhale (2 studion)
Land (2 studion)
Nalini Method (2 studion)
Pure Barre (2 studion)
