<h1 align="center">New York Neighborhood Analysis for Office Relocation</h1>
    

<h2>Introduction : Business Problem</h2>

<p>Pied Piper an upcoming software development company located in the New York, Queens, Bellaire neighborhood. The company is currently on a business expansion plan and would like to open a new office in the New York, Brooklyn area. The current plan is to relocate half of its staff to the new office who will then be responsible for recruiting new staff members and induct them into the company’s core values and business operations. To perform the staff relocation the company would like to ensure a smooth transition of the staff members being relocated to Brooklyn such that they would get the same feel working in the new office as it were in the old office.</p>
<p>Therefore, to assist Pied Piper in the relocation, we will be doing an analysis of the neighborhoods in the Brooklyn area to determine which neighborhoods have a close correlation with the current office location with regards to the key amenities like coffee shops, restaurants, shopping centers, restaurants, schools etc. We will use data science to perform the analysis and provide the company with a list of good neighborhoods that they can set up the new offices in.</p>

<h2>Data</h2>

<p>From the definition of the problem we need to first obtain the neighborhoods classification data for New York, specifically with relation to the Brooklyn borough area and Queens Bellaire neighborhood. With the location data we can then utilize the Foursquare API to explore the different neighborhoods to determine the various social amenities present based on the location category data provided by Foursquare.</p>
<p>We will then perform analysis using data clustering methods to create clusters in the Brooklyn area. From the clusters we can then be able to determine which locations closely match the Bellaire neighborhood in Queens. This can be easily picked out by checking the cluster which will contain the Bellaire neighborhood. We will utilize Folium maps to visualize the clusters to come up with a viable conclusion on the appropriate neighborhood to setup the new office.</p>

<h2>Methodology</h2>

<p>In our process we will first obtain our data which is in json format and clean it up abit to get the relevant information which we will store in a data frame. From the data frame we can the gwt the specific location data for Brooklyn and also include the location data for the company's current office location. Once we have that final filtered neighborhoods locations, we can the leverage on the Foursquare explore API to get the locations in those neighborhoods.</p>
<p>With the Foursquare data we can the perform one hot encoding to make the date ready for K Means clustering. With K Means clustering using a randomly selected cluster size of 10 we will able to generate some relevant clusters which will represent the similar neighborhoods in Brooklyn. Given the we included the current office location in the dataset we will thus be able to easily identify neighborhoods similar to it since they will be in the same cluster</p>
<p>We will highly leverage on Folium to visualize the neighborhoods and the clusters in order to properly come up with a relevant conclusion on the appropriate location for the new office,</p>
<p>Our processes is broken in seven steps as highlighted below:</p>

<h3>1. Download and Explore New York Data</h3>

<p>New York location data is publicly accessible via this url, https://cocl.us/new_york_dataset and thus we can easily download the json data using wget and analyse it. From the data we can create a pandas dataframe through which we can do further analysis to filter out the data of the other neighborhoods not in the Brooklyn area.</p>

<p>Let's first start by importing all the relevant packages that we will need for our analysis</p>

In [10]:
import pandas as pd
import requests

import numpy as np

# !conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
print("Imports Completed")

Imports Completed


<p>Using wget we first download the json file from the url and save it in a json file. The relevant location data is on the <b>features</b> key so we will extract it to the <b>neighborhoods_data</b> variable</p>

In [12]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
    
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

<p>From the neighborhoods data variable we need to further filter out the data to just the information relevant for our analysis. In our case we only need the <b>Boroughs, Neighborhoods, Latitude</b> and <b>Longitude</b> values</p>

In [14]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [15]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

<p>After the filtering we are able to create a <b>neighborhoods</b> data frame with just the four relevant fields as highlighted above. We can further inspect the data frame using the groupby data frame function to see that New York has 5 boroughs each with a number of neighborhoods as can be seen below.</p>

In [16]:
neighborhoods.groupby('Borough').count()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bronx,52,52,52
Brooklyn,70,70,70
Manhattan,40,40,40
Queens,81,81,81
Staten Island,63,63,63


<p>From our problem statement we need to do an analysis of the Brooklyn region so we will need to filter our the rest to remain with just the neighborhoods in the <b>Brooklyn</b> region. From our above groupby analysis we should expect to be left with just <b>70 neighborhoods</b> in our dataset. Given that we need to see the neighborhoods in Brooklyn that are similar to the company's current office location of <b>Bellaire, Queens;</b> we will also extract the location data for Bellaire and include it in our dataset. This should be bring our total dataset to 71 neighborhoods. This way when we perform the neighborhoods clustering we will be select the neighborhoods that the are the in the cluster which contains the Bellaire neighborhood. This will be be based on our understanding that neighborhoods in the same cluster will bear the same social amenities based on the location categories date obtained from Foursquare.</p>

In [18]:
brooklyn_data = neighborhoods[neighborhoods['Borough'].isin(['Brooklyn'])].reset_index(drop=True)
bellaire_data = neighborhoods[neighborhoods['Neighborhood'].isin(['Bellaire'])].reset_index(drop=True)
brooklyn_data = pd.concat([brooklyn_data, bellaire_data]).reset_index(drop=True)
print("The size of the dataframse is ", brooklyn_data.shape)
brooklyn_data.tail()

The size of the dataframse is  (71, 4)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
66,Brooklyn,Homecrest,40.598525,-73.959185
67,Brooklyn,Highland Park,40.681999,-73.890346
68,Brooklyn,Madison,40.609378,-73.948415
69,Brooklyn,Erasmus,40.646926,-73.948177
70,Queens,Bellaire,40.733014,-73.738892


In [19]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


<p>From our brooklyn_data dataset created and highlighted above we will first create a folium map to visualise our current office location in Bellaire and the probable office locations in the Brooklyn area. For distiction, we will highlight the current office location using a blue marker while the Brooklyn locations using a red marker</p>

In [96]:
# create map of Manhattan using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)

def get_color(label):
    if label == 'Bellaire':
        return 'blue'
    else:
        return 'red'

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    folium_label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=folium_label,
        color= get_color(label),
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

<h3>2. Explore the Neighborhoods using Foursqaure</h3>

In [30]:
# The code was removed by Watson Studio for sharing.

<p>We will need to call the Foursquare explore API to get the information of upto 100 locations that are located within a specific neighborhood. This can be done using a simple for loop where we will pass the location data for each neighborhoods and then create a list with the locations return for each neighborhood.</p>

In [31]:
# Define function to get the nearby venues for the neighborhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):     
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [33]:
# Get the nearby venues for the Brooklyn neighborhoods
brookly_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )

<p>From the obtained list stored on the <b>brooklyn_venues</b> dataframe we can inspect it using the shape and head methods to see how much data was obtained. We can also group it per neighborhood and obtain the total unique location categories obtained.</p>

In [35]:
print("The total locations obtained are ",  brookly_venues.shape)
brookly_venues.head()

The total locations obtained are  (2753, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Leo's Casa Calamari,40.6242,-74.030931,Pizza Place
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,The Bookmark Shoppe,40.624577,-74.030562,Bookstore


In [38]:
brookly_venues.groupby('Neighborhood').count().head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,50,50,50,50,50,50
Bay Ridge,83,83,83,83,83,83
Bedford Stuyvesant,29,29,29,29,29,29
Bellaire,12,12,12,12,12,12
Bensonhurst,30,30,30,30,30,30


In [37]:
print('There are {} uniques categories.'.format(len(brookly_venues['Venue Category'].unique())))

There are 282 uniques categories.


<h3>3. Perform One Hot Encoding for the Dataframe</h3>

<p>We can further perform one hot encoding on the data frame to prepare it for use in the K Clustering analysis method. We can also group the one hot encoded values per their corresponding neighborhoods to obtain their mean occurrence</p>

In [40]:
# one hot encoding
brooklyn_one_hot = pd.get_dummies(brookly_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_one_hot['Neighborhood'] = brookly_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [brooklyn_one_hot.columns[-1]] + list(brooklyn_one_hot.columns[:-1])
brooklyn_one_hot = brooklyn_one_hot[fixed_columns]

brooklyn_one_hot.head()

Unnamed: 0,Yoga Studio,Accessories Store,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
brooklyn_grouped = brooklyn_one_hot.groupby('Neighborhood').mean().reset_index()
print("Shape of group data is " + str(brooklyn_grouped.shape))
brooklyn_grouped.head()

Shape of group data is (71, 282)


Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Bath Beach,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bay Ridge,0.0,0.0,0.036145,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.012048,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0
2,Bedford Stuyvesant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0
3,Bellaire,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bensonhurst,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<h3>4. Get the most common venues for the neighborhoods</h3>

<p>We will also futher analyse the location data to obtain the top ten frequent location per neighborhood as seen below</p>

In [43]:
# Create a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [75]:
# Create data frame with top 10 most common venues using the above created method
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Pharmacy,Chinese Restaurant,Cantonese Restaurant,Gas Station,Pizza Place,Bubble Tea Shop,Italian Restaurant,Fast Food Restaurant,Halal Restaurant,Mobile Phone Shop
1,Bay Ridge,Spa,Italian Restaurant,Pizza Place,Bar,American Restaurant,Greek Restaurant,Pharmacy,Bagel Shop,Sushi Restaurant,Thai Restaurant
2,Bedford Stuyvesant,Deli / Bodega,Pizza Place,Café,Coffee Shop,Bar,Juice Bar,Bagel Shop,New American Restaurant,Boutique,Gift Shop
3,Bellaire,Greek Restaurant,Italian Restaurant,Bus Station,Breakfast Spot,Chinese Restaurant,Gym,Halal Restaurant,Coffee Shop,Convenience Store,Moving Target
4,Bensonhurst,Italian Restaurant,Dessert Shop,Chinese Restaurant,Sushi Restaurant,Ice Cream Shop,Donut Shop,Bakery,Supermarket,Noodle House,Cha Chaan Teng


<h3>5. Cluster the Neighborhoods</h3>

<p>Once we have grouped the one hot coded data for the locations per the corresponding neighborhood we are not ready to do the clustering. We will use the K Means clustering method where we choose a cluster size of 10. Once we perform the clustering we will then subsequently view the neighborhoods in the corresponding clusters on a folium map.</p>

In [76]:
# Lets work with a cluster size of 10
kclusters = 10

brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([7, 0, 0, 0, 0, 2, 0, 7, 7, 7], dtype=int32)

In [77]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

brooklyn_merged = brooklyn_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

brooklyn_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brooklyn,Bay Ridge,40.625801,-74.030621,0,Spa,Italian Restaurant,Pizza Place,Bar,American Restaurant,Greek Restaurant,Pharmacy,Bagel Shop,Sushi Restaurant,Thai Restaurant
1,Brooklyn,Bensonhurst,40.611009,-73.99518,0,Italian Restaurant,Dessert Shop,Chinese Restaurant,Sushi Restaurant,Ice Cream Shop,Donut Shop,Bakery,Supermarket,Noodle House,Cha Chaan Teng
2,Brooklyn,Sunset Park,40.645103,-74.010316,7,Pizza Place,Latin American Restaurant,Mexican Restaurant,Bank,Bakery,Gym,Mobile Phone Shop,Pharmacy,Fried Chicken Joint,Women's Store
3,Brooklyn,Greenpoint,40.730201,-73.954241,0,Bar,Pizza Place,Coffee Shop,Cocktail Bar,Yoga Studio,Café,Sushi Restaurant,Mexican Restaurant,French Restaurant,Tea Room
4,Brooklyn,Gravesend,40.59526,-73.973471,0,Bakery,Lounge,Pizza Place,Deli / Bodega,Bus Station,Italian Restaurant,Pharmacy,Record Shop,Donut Shop,Music Venue


<p>Generate a folium map with our cluster data to aid in the visualize the clusters and better understand the distribution of the clusters</p>

In [87]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
rainbow[6] = 'blue'

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3>6. Examine the Clusters</h3>

<p>The folium map above provides a good insight of the similar neighborhoods locations. Our cluster of interest is depicted by the red markers since thats the color that corresponds to the company's current office location in Bellaire. We can then filter the dataset by the Bellaire neighborhood in order to obtain the specific cluster label after which we can use the label to obtain the full list of similar neighborhoods in Brooklyn which will be in the same cluster.</p>

In [88]:
bellaire_cluster_data = brooklyn_merged[brooklyn_merged['Neighborhood'] == 'Bellaire'].reset_index(drop=True)
bellaire_cluster_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Queens,Bellaire,40.733014,-73.738892,0,Greek Restaurant,Italian Restaurant,Bus Station,Breakfast Spot,Chinese Restaurant,Gym,Halal Restaurant,Coffee Shop,Convenience Store,Moving Target


<p>From the above filtering we can see that the Bellaire neighborhood is located in Cluster 0. With this we can now get the other neighborhoods in the same cluster. We are thus able to see that the cluster has 33 neighborhoods which are similar to our current location</p>

In [94]:
brooklyn_similar_neighborhoods = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]
print("The number of similar neighborhoods in Brooklyn are ", brooklyn_similar_neighborhoods.shape)
brooklyn_similar_neighborhoods

The number of similar neighborhoods in Brooklyn are  (33, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bay Ridge,Spa,Italian Restaurant,Pizza Place,Bar,American Restaurant,Greek Restaurant,Pharmacy,Bagel Shop,Sushi Restaurant,Thai Restaurant
1,Bensonhurst,Italian Restaurant,Dessert Shop,Chinese Restaurant,Sushi Restaurant,Ice Cream Shop,Donut Shop,Bakery,Supermarket,Noodle House,Cha Chaan Teng
3,Greenpoint,Bar,Pizza Place,Coffee Shop,Cocktail Bar,Yoga Studio,Café,Sushi Restaurant,Mexican Restaurant,French Restaurant,Tea Room
4,Gravesend,Bakery,Lounge,Pizza Place,Deli / Bodega,Bus Station,Italian Restaurant,Pharmacy,Record Shop,Donut Shop,Music Venue
9,Crown Heights,Pizza Place,Café,Bagel Shop,Museum,Bus Station,Supermarket,Bookstore,Coffee Shop,Bakery,Salon / Barbershop
11,Kensington,Thai Restaurant,Grocery Store,Pizza Place,Sandwich Place,Ice Cream Shop,Playground,Spa,Japanese Restaurant,Gas Station,Mobile Phone Shop
12,Windsor Terrace,Diner,Plaza,Grocery Store,Park,Café,Beer Store,Bakery,Chinese Restaurant,Sushi Restaurant,Coffee Shop
13,Prospect Heights,Bar,Mexican Restaurant,Thai Restaurant,Cocktail Bar,Café,Gourmet Shop,Wine Shop,Bakery,Pizza Place,Ice Cream Shop
15,Williamsburg,Coffee Shop,Bar,Bagel Shop,Yoga Studio,Burger Joint,Korean Restaurant,Latin American Restaurant,Steakhouse,Liquor Store,Lounge
16,Bushwick,Bar,Mexican Restaurant,Coffee Shop,Deli / Bodega,Pizza Place,Thrift / Vintage Store,Discount Store,Bakery,Sandwich Place,Video Game Store


<h2>Results and Discussion</h2>

<p>From the above folium map representation we are thus able to clearly identify the neighborhoods marked in red as the ones with similar amenities to our current office location. The total number of similar neighborhoods is seen as 33 when we further analyse the clustered dataset using the dataframe shape property. Through the folium map we are also able to see that the similar neighborhoods in the cluster are concentrated more on the upper side of Brooklyn which also gives us some confidence on the accuracy of our methodology and thus we can easily concentrate our efforts of setting our office on that side and ignore the few outliers that can be seen on the other parts of Brooklyn </p>

<h2>Conclusion</h2>

<p>The purpose of our analysis was to help Pied Pier, to identify an appropriate location for setup of their new office in the Brooklyn area. Given that the company was relocating part of the staff from their current office location in Bellaire, Queens; they were seeking to ensure minimum disruption to their employees lives by ensuring that they are able to access the similar social amenities in the new office as compared to their current office.</p>
<p>From our analysis, we were thus able to identify the tha Upper Brooklyn side as seen from our Folim map would be very appropriate for the new office location as it offers similar amenities to the old office location. We can thus comfortably conculude that our analysis was a success as it succeded in pointing the company in the right direction in regards to their office relocation process</p>