# Capstone Project - The Battle of Neighborhoods
### Applied Data Science Capstone by Alfonso Lopez

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The **West Stationery Company** (WSC) is expanding their business to the east of USA. During last three years, but it has not been as successful as expected. So, WSC have decided to open at the same time **3 new WSC Stationery Stores** at New York, with the main goal of revert this situation by start having enough presence in one of the most relevant cities of the World.

To support this decision and to select the best neighborhoods for the new stores, WSC has hired us as Data Scientist experts.

WSC has defined this project as ***"The Knowledge Triangle"*** (TKT as secret key inside the company) and many of their resources will focus on it. 

This solution will not be easy. The company has given us the following notes (that won't help in the problem resolution):

* WSC has no experience nor information about its market at NY City. Inside the company, there aren't relevant information that could be used. So, the project can be considered as an empty bottle to be filled.
* The renting price of commercial premises at New York is probable the highest of USA. So, the investment must be done with special care.
* There is a short time to finish the work as the opening is expected by **September, the 1st 2019**.
* As there will be only three stores at the beginning, WSC wants not only the best places, but also, they will be more or less geographically distributed at NY City.
***
With these bare facts, our Data Scientist Team must work in achieving:


#### <center>The best 3 neighborhoods for opening the new WSC Stationery Stores</center>

## Data <a name="data"></a>

Based on definition of our problem, these are the factor that mainly will influence our decision (which neighbors are better for the new stores):

##### Negative influcence:
* Number of stationery shops and bookstore in the neighborhood.

##### Possitive incluence:
* Number of schools or universities.
* Number of other education institutions.
* Population.

##### Additional considerations:
* Distance among selected neighborhoods.

### Data Sources
We will use the following data sources for the project:
> 1. **Geographic Information:** From library ***geopy.geocoders*** we'll obtaing any geographical coordinate from an address. We'll be able to locate New York city (to represent it with the related map). Also, address from venues (FourSquare) we'll help us in selecting the best position for the new WSC stores.<br>
<br>
> 2. **Neighborhoods:** From 
https://geo.nyu.edu/download/file/nyu-2451-34572-geojson.json, we'll take all neighborhood coordinates. At the beginning, we won't exclude any value. This coordinates will partition the city in enough small regions to allow a valid study. We cannot select the exact renting premise, but we can focus WSC infrastructure deparment in the right neighborhood.<br>
<br>
> 3. **Venues:** From https://www.foursquare.com API, we'll select all related educational centers for each neighborhood. Also, we will detect existing stationery stores and bookstores that will condition the sales for the new WSC stores.

### Reading the Neighborhood data and representing it in a NY City map:
From https://geo.nyu.edu we will obtain a detailed dataset of the NY neighborhood, including name and geographical coordinates. We will read a JSON file and will add the information into a new DataFrame.

With this first code lines, we will import several basics libraries.

In [1]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import requests # library to handle requests
import json # library to handle JSON files

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

Reading the JSON with Neighborhood geographic information file

In [2]:
# Reading JSON file and creating Neighborhood DataFrame:

url = 'https://geo.nyu.edu/download/file/nyu-2451-34572-geojson.json'
ny_data = json.loads(requests.get(url).text)

Creating the DataFrame

In [3]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

neighborhoods_data = ny_data['features']

In [4]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

After several meetings, WSC has asked us to focus only in Manhattan and The Bronx boroughs, because they are the most representatives at NY City.

In [5]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [6]:
print(neighborhoods.shape)
print('Bronx', neighborhoods[neighborhoods['Borough'] == 'Bronx'].shape)
print('Queens', neighborhoods[neighborhoods['Borough'] == 'Queens'].shape)
print('Manhattan', neighborhoods[neighborhoods['Borough'] == 'Manhattan'].shape)
print('Staten Island', neighborhoods[neighborhoods['Borough'] == 'Staten Island'].shape)
print('Brooklyn', neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].shape)

neighborhoods.drop(neighborhoods[neighborhoods['Borough'] == 'Queens'].index, inplace = True)
neighborhoods.drop(neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].index, inplace = True)
neighborhoods.drop(neighborhoods[neighborhoods['Borough'] == 'Staten Island'].index, inplace = True)
print(neighborhoods.shape)

(306, 4)
Bronx (52, 4)
Queens (81, 4)
Manhattan (40, 4)
Staten Island (63, 4)
Brooklyn (70, 4)
(92, 4)


#### Map representation of New Yor City (with folium):

In [7]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7308619, -73.9871558.


With these coordinates, we'll present a map of NY City with this center:

In [8]:
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude+0.08, longitude+0.08], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### Foursquare
After having represented all the NY City neighborhoods, let's use Foursquare API to get info on stationeries, bookstores, different educational centers and so on. All of them related with each of these neighborhoods.

By analyzing the available venues at Foursquare, we have selected the following positive ones (with the related Foursquare code):
> * School, HighSchool: 4bf58dd8d48988d13b941735
> * University: 4d4b7105d754a06372d81259

In a negative way, these are the selected venues:
> * Bookstore: 4bf58dd8d48988d114951735
> * Stationery 52f2ab2ebcbc57f1066b8b21

Not all the University related buildings have been selected, nor educational centers. But all stationeries and bookstores will be searched in the Foursquare databases.

#### Define Foursquare Credentials and Version

In [9]:
CLIENT_ID = 'CXOBLEEBQ52ILMYSLGVWSMNKMMWX3JM3P20T2WGWMLVNBX4M' # your Foursquare ID
CLIENT_SECRET = 'F1FGRC3HVDOJOVVIKMPBJV1X5ZSXAI2JUYI1DS1GC2CFOEPL' # your Foursquare Secret

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CXOBLEEBQ52ILMYSLGVWSMNKMMWX3JM3P20T2WGWMLVNBX4M
CLIENT_SECRET:F1FGRC3HVDOJOVVIKMPBJV1X5ZSXAI2JUYI1DS1GC2CFOEPL


#### Let's explore the neighborhoods in our dataframe.
We will ask Foursquare for venues included in the previous lists. We will use several functions to search these venues given a category and a location (latitude, longitude)

In [10]:
# Venue codes directly related with educational institutions (universities, schools):
pos_codes = ['4bf58dd8d48988d13b941735','4d4b7105d754a06372d81259']

# Venue codes directly competing with WSC bookstores (stationeries and other bookstores):
neg_codes = ['4bf58dd8d48988d114951735','52f2ab2ebcbc57f1066b8b21']

In [11]:
# Function to select the category
def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

# Function to retreive venues near location
def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)

    try:
        results = requests.get(url).json()['response']['venues']
        venues = [(item['id'],
                   item['name'],
                   get_categories(item['categories']),
                   (item['location']['lat'], item['location']['lng']),
                   item['location']['distance']) for item in results]   
    except:
        print('Error...')
        venues = []
    return venues


In [12]:
# Function to get all the venues data for possitive or negative venues
def get_venues(positive = True):
    dic_venues = {}

    print('Obtaining venues around candidate locations:', end='')
    if (positive):
        search_categories = pos_codes
    else:
        search_categories = neg_codes
    
    for search_category in search_categories:
        print(search_category)
        for idx, row in neighborhoods.iterrows():
            lat = row['Latitude']
            lon = row['Longitude']
            venues = get_venues_near_location(lat, lon, search_category, CLIENT_ID, 
                                              CLIENT_SECRET, radius=500, limit=100)
            for venue in venues:
                venue_id = venue[0]
                venue_name = venue[1]
                venue_categories = venue[2]
                venue_latlon = venue[3]
                venue_distance = venue[4]
                venue_neighbor = row['Neighborhood']
                venue_group = search_category
                venue_positive = positive
                venue_to_add = (venue_name, venue_categories, venue_latlon[0], venue_latlon[1],
                                venue_distance, venue_neighbor, venue_group, venue_positive)
                dic_venues[venue_id] = venue_to_add
            print(' .', end='')
        print(' done.')

    return dic_venues


Now, we search the new venues and save them to disk (to avoid repeated search with the foursquare.com API

In [13]:
import pickle

# Try to load from local file system in case we did this before
pos_venues = {}
neg_venues = {}

loaded = False
try:
    with open('positive_venues.pkl', 'rb') as f:
        pos_venues = pickle.load(f)
    with open('negative_venues.pkl', 'rb') as f:
        neg_venues = pickle.load(f)
    print('Venues data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    pos_venues = get_venues(True)
    neg_venues = get_venues(False)
    
    # Let's persists this in local file system
    with open('positive_venues.pkl', 'wb') as f:
        pickle.dump(pos_venues, f)
    with open('negative_venues.pkl', 'wb') as f:
        pickle.dump(neg_venues, f)


Obtaining venues around candidate locations:4bf58dd8d48988d13b941735
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.
4d4b7105d754a06372d81259
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.
Obtaining venues around candidate locations:4bf58dd8d48988d114951735
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.
52f2ab2ebcbc57f1066b8b21
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


## Methodology <a name="methodology"></a>

In this project we will focus in selecting at least **3 New York City neighborhoods where WSC company should expanse** its stationery store network. There are 5 main boroughs at NY, but we will reduce the study to only two of them: The Bronx and Manhattan (Queens, Brooklyn and Staten Island have been discarded). The methodology is perfectly extendable to new neighborhoods, just by including these new boroughs to our datasets. So, the study is reduced to 92 neighborhoods instead a maximum of 306 possible at NY City.

Every student or office employee is a potential user a stationery, but office employees usually don't manage the buying of stationery material. However, students are especially active buyers in the WSC network. In fact, WSC shops are one of the best rated in sector. So, the analysis will be focused in detecting how cover the **greatest number of schools and universities** with 3 stores. Also, it will be studied the number of **existing stationeries and bookstores** (that directly compete with WSC) to avoid selecting neighborhoods with a huge number of competitor stores.

In a first step we have collected the required venues (schools, universities, stationeries and bookstores around each neighborhood by using the Foursquare API. This information will contain the geographical coordinates.

In second step we will focus on create 3 clusters (using **k-means clustering**) including only the positive (schools and universities). With this distribution it will be possible to detect a centroid point from which WSC could cover a greater number of potential customer (students).

A third and final step in our analysis, we will calculate the density of negative (stationery and bookstores) venues for every neighborhood. Inside WSC, bookstores are considered half competitive than stationery. So, for this density approach, we will use 1.0 and 0.5 as a multiplier for stationeries and bookstores respectively.

## Analysis <a name="analysis"></a>

First, let's create a new DataFrame with the positive venues and count the number of schools and universities for every neighborhood:

In [14]:
columns = ['Name', 'Categories', 'Latitude', 'Longitude', 'Distance', 'Neighborhood', 'VenueType', 'Positive']
df_education = pd.DataFrame.from_dict(pos_venues, orient = 'index', columns = columns)
df_education['VenueType'].replace(to_replace='4bf58dd8d48988d13b941735', value='School', inplace=True)
df_education['VenueType'].replace(to_replace='4d4b7105d754a06372d81259', value='University', inplace=True)

print('Total number of Schools and Universities:', df_education.shape[0])

Total number of Schools and Universities: 3141


Then, let's do the same with the negative venues (stationeries and bookstores):

In [15]:
df_stationery = pd.DataFrame.from_dict(neg_venues, orient = 'index', columns = columns)
df_stationery['VenueType'].replace(to_replace='52f2ab2ebcbc57f1066b8b21', value='Stationery', inplace=True)
df_stationery['VenueType'].replace(to_replace='4bf58dd8d48988d114951735', value='Bookstore', inplace=True)

print('Total number of Stationeries and Bookstores:', df_stationery.shape[0])

Total number of Stationeries and Bookstores: 232


### Clustering schools and universities

#### Distribution of Schools and Universities by Neighborhood
Let's group the results by neighborhood and list the top 10:

In [16]:
df_educ_by_neigh = df_education.groupby('Neighborhood')['Neighborhood'].count().reset_index(name="Count")
df_educ_by_neigh.sort_values(by='Count', ascending=False, axis=0).head(10)

Unnamed: 0,Neighborhood,Count
12,Civic Center,96
39,Lincoln Square,94
48,Midtown South,90
27,Flatiron,88
69,Soho,87
57,Noho,87
26,Financial District,84
38,Lenox Hill,81
44,Manhattanville,79
9,Chelsea,79


With the following DataFrame, we will obtain 3 groups covering the maximum number of positive venues.

#### Running k-means with k = 3

In [17]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
#import numpy
import numpy as np

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [18]:
df_clustering = df_education[['Latitude', 'Longitude']]
df_clustering.head()

Unnamed: 0,Latitude,Longitude
4e0b731ad164e3547c310f11,40.895331,-73.845918
4f8eed1ce4b019b497ca3d2a,40.891334,-73.845453
4b966a94f964a5200fcb34e3,40.899266,-73.842237
4e661c6a483bd9a975de445f,40.87498,-73.831202
4bc470c7b492d13a5cfea960,40.874512,-73.833307


In [19]:
kclusters = 3
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_clustering)

In [20]:
column_values = pd.Series(kmeans.labels_)

In [21]:
kmeans.labels_

array([1, 1, 1, ..., 0, 0, 0])

In [22]:
df_education.insert(loc=8, column='Cluster', value=kmeans.labels_)

In [23]:
df_education

Unnamed: 0,Name,Categories,Latitude,Longitude,Distance,Neighborhood,VenueType,Positive,Cluster
4e0b731ad164e3547c310f11,Public School 87,"[(School, 4bf58dd8d48988d13b941735)]",40.895331,-73.845918,128,Wakefield,School,True,1
4f8eed1ce4b019b497ca3d2a,Little stars School,"[(Nursery School, 4f4533814b9074f6e4fb0107)]",40.891334,-73.845453,403,Wakefield,School,True,1
4b966a94f964a5200fcb34e3,Mount Saint Michael Academy,"[(High School, 4bf58dd8d48988d13d941735)]",40.899266,-73.842237,657,Wakefield,School,True,1
4e661c6a483bd9a975de445f,MS 181,"[(School, 4bf58dd8d48988d13b941735)]",40.874980,-73.831202,130,Co-op City,School,True,1
4bc470c7b492d13a5cfea960,Harry S Truman High School,"[(High School, 4bf58dd8d48988d13d941735)]",40.874512,-73.833307,284,Co-op City,School,True,1
5252ad2b11d27af63b81bc7b,P.S. 176X @ Truman HS,"[(High School, 4bf58dd8d48988d13d941735)]",40.874320,-73.833133,268,Co-op City,School,True,1
4c12657fa5eb76b02411beb7,PS 178 Dr Selman Waksman School,"[(School, 4bf58dd8d48988d13b941735)]",40.875471,-73.833380,317,Co-op City,School,True,1
4d5d67d53f92236a6387e91d,Middle school 180,"[(Middle School, 4f4533814b9074f6e4fb0106)]",40.873407,-73.832346,225,Co-op City,School,True,1
4e7910e5e4cdb158f1b8fb75,I.S. 180 Dr. Daniel Hale Williams School,"[(School, 4bf58dd8d48988d13b941735)]",40.872442,-73.831715,254,Co-op City,School,True,1
4c9a0407eaa5a143c45cc8e4,PS 176x,"[(School, 4bf58dd8d48988d13b941735)]",40.875268,-73.832776,262,Co-op City,School,True,1


Folium doen's display more than 3.000 markers, so we'll split the result for each cluster:

In [24]:
df_cluster_0 = df_education.loc[df_education['Cluster'] == 0]
df_cluster_1 = df_education.loc[df_education['Cluster'] == 1]
df_cluster_2 = df_education.loc[df_education['Cluster'] == 2]

We will add the centroid of each cluster.

In [25]:
kmeans.cluster_centers_

array([[ 40.7326154 , -73.9934078 ],
       [ 40.85509272, -73.88831772],
       [ 40.78897246, -73.95933385]])

We can represent now each cluster with its centroid point

In [26]:
map_cluster_0 = folium.Map(location=[latitude, longitude], zoom_start=13)    
# add markers to map
for lat, lon,  in zip(df_cluster_0['Latitude'], df_cluster_0['Longitude']):
    folium.Circle([lat, lon], radius=2, color='red', fill=True, fill_color='red', fill_opacity=0.7).add_to(map_cluster_0) 
    
folium.Circle(kmeans.cluster_centers_[0], radius=50, color='black', fill=True, fill_color='black', fill_opacity=1.0).add_to(map_cluster_0)
map_cluster_0

In [27]:
map_cluster_1 = folium.Map(location=[latitude+0.12, longitude+0.12], zoom_start=13)    
# add markers to map
for lat, lon in zip(df_cluster_1['Latitude'], df_cluster_1['Longitude']):
    folium.Circle([lat, lon], radius=2, color='green', fill=True, fill_color='green', fill_opacity=0.7).add_to(map_cluster_1)  

folium.Circle(kmeans.cluster_centers_[1], radius=50, color='black', fill=True, fill_color='black', fill_opacity=1.0).add_to(map_cluster_1)
map_cluster_1

In [28]:
map_cluster_2 = folium.Map(location=[latitude+0.06, longitude+0.06], zoom_start=13)    
# add markers to map
for lat, lon in zip(df_cluster_2['Latitude'], df_cluster_2['Longitude']):
    folium.Circle([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.7).add_to(map_cluster_2)  

folium.Circle(kmeans.cluster_centers_[2], radius=50, color='black', fill=True, fill_color='black', fill_opacity=1.0).add_to(map_cluster_2)
map_cluster_2

### Selecting the clusters
The number of negative venues is not enough for helping in selecting the best neighborhood. So, the best option is select among the neighborhoods with most number of schools and universities for each cluster. we have each cluster divided in the df_cluster_x dataframes.


In [29]:
# For Cluster 0:
df_cluster_0.groupby(['Neighborhood']).count().sort_values(by='Name', ascending=False).head(3)

Unnamed: 0_level_0,Name,Categories,Latitude,Longitude,Distance,VenueType,Positive,Cluster
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Civic Center,96,96,96,96,96,96,96,96
Midtown South,90,90,90,90,90,90,90,90
Flatiron,88,88,88,88,88,88,88,88


In [30]:
# For Cluster 1:
df_cluster_1.groupby(['Neighborhood']).count().sort_values(by='Name', ascending=False).head(3)

Unnamed: 0_level_0,Name,Categories,Latitude,Longitude,Distance,VenueType,Positive,Cluster
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Belmont,66,66,66,66,66,66,66,66
North Riverdale,52,52,52,52,52,52,52,52
University Heights,50,50,50,50,50,50,50,50


In [31]:
# For Cluster 2:
df_cluster_2.groupby(['Neighborhood']).count().sort_values(by='Name', ascending=False).head(3)

Unnamed: 0_level_0,Name,Categories,Latitude,Longitude,Distance,VenueType,Positive,Cluster
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Lincoln Square,94,94,94,94,94,94,94,94
Lenox Hill,81,81,81,81,81,81,81,81
Manhattanville,79,79,79,79,79,79,79,79


### Neighborhood selection.
Based in the last results, our recomendation to WSC is to search a renting location at the following neighborhoods:
* Civic Center
* Belmont
* Lincoln Square<br>
<br>

The following map represents where are located these neighborhoods at NY City:

In [32]:
[[lat0, lon0]] = neighborhoods.loc[neighborhoods['Neighborhood'] == 'Civic Center'][['Latitude', 'Longitude']].get_values()
[[lat1, lon1]] = neighborhoods.loc[neighborhoods['Neighborhood'] == 'Belmont'][['Latitude', 'Longitude']].get_values()
[[lat2, lon2]] = neighborhoods.loc[neighborhoods['Neighborhood'] == 'Lincoln Square'][['Latitude', 'Longitude']].get_values()

In [33]:
map_final = folium.Map(location=[latitude+0.055, longitude+0.05], zoom_start=12)    
# add markers to map
folium.Circle([lat0, lon0], radius=125, color='red', fill=True, fill_color='red', fill_opacity=0.7).add_to(map_final)  
folium.Circle([lat1, lon1], radius=125, color='green', fill=True, fill_color='green', fill_opacity=0.7).add_to(map_final)  
folium.Circle([lat2, lon2], radius=125, color='blue', fill=True, fill_color='blue', fill_opacity=0.7).add_to(map_final)  
map_final

## Results and discussion <a name="results"></a>

Our analysis shows that there is a great number of educational centers in New York City (~2000 in Manhattan and Bronx boroughs), They are completely distributed through the whole city. So, it is not so important the density or distribution of these venues.

After checking the number of stationeries and bookstores, we have seen that they are not relevant in the study (we have only detected 232 in both boroughs).

So, we have dsitributed all the educational centers in three main clusters that will help us in the neighborhood selection. With the new distribution, we have selected the three neighborhoods with most educational centers as the best choice. There are also alternatives to these choices, and they will be completely valid as the difference is not so high.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify New York City neighborhoods from Manhattan and The Bronx boroughs where should be a better choice for deploying the first three new WSC stationery stores. By searching educational centers inside these boroughs and redistributing then into a geographical map, we could discover the high density of these kind of centers inside the city. Clustering those locations was then performed in order to create three major zones of interest for final exploration by stakeholders.

Final decision should be based in these results, but adding it the renting options and price for each one.