<h1>USE CASE: Finding a new HQ location for ACME & CO.</h1>
<h3>Challange</h3>
<p>A corporation - ACME & CO. - needs to move to a newer and larger office building to cope with its fast growing number of employees.<br>
The business is currently located in Chelsea, a neighborhood of the New York City, NY,'s borough of Manhattan.<br>
The company's Board of Directors agreed to move to a larger and newer office building in the current City borough of Manhattan, NY.<br>
The ACME & CO. executive team asked the data science / business analysis department (DS&BA dept.) to identify Manhattan, NY,'s neighborhoods similar to Chelsea - in terms of type of facilities, infrastructure, recreational areas and businesses.</p>
<h3>Solution</h3> 
<p>After a segmentation and clustering analysis of the New York City, NY,'s boroughs and neighborhoods, the company's DS&BA dept. finally identified that Chelsea belongs to the city's neighborhoods cluster number 7. This cluster includes the following similar neighborhoods: Greenwich Village, Soho, West Village, Little Italy, and Chinatown.
The company's procurement department can now start engaging with commercial real estate agencies to find the best office buildings located in the identified neighborhoods.</p>

----

<h2>NOTEBOOK SECTIONS</h2>
<a href="#ENVIRONMENT SETUP PHASE">ENVIRONMENT SETUP PHASE</a><br>
<a href="#PROCESSING PHASE">PROCESSING PHASE</a><br>
<a href="#REPORT HEIGHBOURHOOD CLUSTER MAP">REPORT HEIGHBOURHOOD CLUSTER MAP</a><br>
<a href="#REPORT HEIGHBOURHOOD CLUSTER LIST">REPORT HEIGHBOURHOOD CLUSTER LIST</a>

---

<a name="ENVIRONMENT SETUP PHASE"><h2>ENVIRONMENT SETUP PHASE</h2></a>

In [2]:
# Library to handle data in a vectorized manner
import numpy as np 
# Library for data analsysis
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
# Library to handle JSON files
import json 
# Library needed to render dataframe tables
from IPython.display import display, HTML
print('Libraries imported!')

Libraries imported!


In [3]:
# Install Foursquare API lab
!conda install -c conda-forge geopy --yes 
# This will help convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 
print('geopy library installed and imported!')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

   

In [9]:
# Library to handle requests
import requests 
# Tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# Import k-means from clustering stage
from sklearn.cluster import KMeans
print('Libraries imported!')

Libraries imported!


In [5]:
# Download and install Foursquare API lab
!conda install -c conda-forge folium=0.5.0 --yes 
# Import map rendering library
import folium 
print('Folium library installed and imported!')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         560 KB

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge


Downloading and Extracting Packages
folium-0.5.0         | 45 KB

<a id='item1'></a>

In [6]:
# Download NYC information
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [7]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'LXNSRTYLAO40EMNE1SMWQLF1JJATF30SPCIEGL0VTLJWCXH2' # your Foursquare ID
CLIENT_SECRET = 'AUQ54H4GOHU4PRL3ZATZNDLTBLSMLAR51V3GV304MXPWMQMK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

# Key parameters
address = 'Manhattan, NY' # Address of the area to analise
LIMIT = 100 # Limit of number of venues returned by Foursquare API
radius = 500 # Define radius
kclusters = 10 # Neighborhood into n clusters.
def_map_zoom_size = 12 # Default map zoom size
print('Environment defined!')

Environment defined!


<a name="PROCESSING PHASE"><h2>PROCESSING PHASE</h2></a>

In [11]:
# Load the data.
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
# Define a new variable that includes this data.
neighborhoods_data = newyork_data['features']
# Tranform the data into a *pandas* dataframe - define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# Instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

# Loop through the data and fill the dataframe one row at a time.
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
    
# Slice the original dataframe and create a new dataframe of the Manhattan data.
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

# Get the geographical coordinates of Manhattan.
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# Get the neighborhood's latitude and longitude values.
neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

# Create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

# Send the GET request
results = requests.get(url).json()

# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
    
# Clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']
# Flatten JSON
nearby_venues = json_normalize(venues) 
# Filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
# Filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# Clean column names
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]  


# Create a function to repeat the same process to all the neighborhoods in Manhattan
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # Make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# Run getNearbyVenues to explore data
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                  latitudes=manhattan_data['Latitude'],
                                  longitudes=manhattan_data['Longitude']
                                  )

# One hot encoding (generate dummies)
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")
# Add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 
# Move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

# Get each neighborhood along with the top 5 most common venues
num_top_venues = 5
for hood in manhattan_grouped['Neighborhood']:
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
     
# Sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


# Create the new dataframe and display the top 10 venues for each neighborhood.
indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

# Run k-means to cluster the neighborhood into n clusters.
manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# Add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
# Copy data into a new variable manhattan_merged
manhattan_merged = manhattan_data
# Merge latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=def_map_zoom_size)
# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

    
# VISUALISE MAP RESULTING CLUSTERS
display(HTML('<a name="REPORT HEIGHBOURHOOD CLUSTER MAP"><h2>REPORT HEIGHBOURHOOD CLUSTER MAP</h2></a>'))
map_clusters    

<a id='item2'></a>

<a id='item3'></a>

<a id='item4'></a>

<a id='item5'></a>

<a name="REPORT HEIGHBOURHOOD CLUSTER LIST"><h2>REPORT HEIGHBOURHOOD CLUSTER LIST</h2></a>

In [12]:
for n in range(kclusters):
    manhattan_merged_view = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == n, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
    display(HTML('<h3>HEIGHBOURHOOD CLUSTER ' + str(n+1) + '</h3>'))
    display(manhattan_merged_view)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Washington Heights,Café,Mobile Phone Shop,Bakery,Spanish Restaurant,Deli / Bodega
3,Inwood,Café,Mexican Restaurant,Lounge,Pizza Place,Bakery
4,Hamilton Heights,Deli / Bodega,Pizza Place,Mexican Restaurant,Café,Chinese Restaurant
5,Manhattanville,Deli / Bodega,Mexican Restaurant,Park,Coffee Shop,Seafood Restaurant
7,East Harlem,Mexican Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Latin American Restaurant
36,Tudor City,Park,Mexican Restaurant,Greek Restaurant,Café,Asian Restaurant


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
8,Upper East Side,Exhibit,Italian Restaurant,Coffee Shop,Juice Bar,Bakery
31,Noho,Italian Restaurant,Cocktail Bar,French Restaurant,Mexican Restaurant,Pizza Place
32,Civic Center,Italian Restaurant,Gym / Fitness Center,Coffee Shop,Sandwich Place,French Restaurant


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,Roosevelt Island,Park,Sandwich Place,Deli / Bodega,Coffee Shop,School
15,Midtown,Hotel,Coffee Shop,Cocktail Bar,Clothing Store,Theater
16,Murray Hill,Hotel,Sandwich Place,Japanese Restaurant,Coffee Shop,Gym
28,Battery Park City,Park,Coffee Shop,Hotel,Memorial Site,Wine Shop
29,Financial District,Coffee Shop,Steakhouse,Wine Shop,Gym,Hotel
35,Turtle Bay,Italian Restaurant,Hotel,Steakhouse,Sushi Restaurant,Coffee Shop


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
37,Stuyvesant Town,Park,Bar,Playground,Farmers Market,Baseball Field


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Central Harlem,African Restaurant,Public Art,French Restaurant,Chinese Restaurant,Seafood Restaurant
9,Yorkville,Italian Restaurant,Gym,Bar,Coffee Shop,Pizza Place
10,Lenox Hill,Coffee Shop,Sushi Restaurant,Italian Restaurant,Pizza Place,Sporting Goods Shop
12,Upper West Side,Italian Restaurant,Wine Bar,Bar,Coffee Shop,Mediterranean Restaurant
19,East Village,Bar,Wine Bar,Chinese Restaurant,Cocktail Bar,Ice Cream Shop
20,Lower East Side,Café,Coffee Shop,Ramen Restaurant,Pizza Place,Cocktail Bar
25,Manhattan Valley,Coffee Shop,Indian Restaurant,Pizza Place,Yoga Studio,Mexican Restaurant
27,Gramercy,Cocktail Bar,American Restaurant,Italian Restaurant,Bagel Shop,Bar
30,Carnegie Hill,Pizza Place,Coffee Shop,Café,Yoga Studio,Grocery Store
34,Sutton Place,Italian Restaurant,Gym / Fitness Center,Indian Restaurant,Furniture / Home Store,American Restaurant


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Marble Hill,Discount Store,Sandwich Place,Coffee Shop,Yoga Studio,Pizza Place


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Chinatown,Chinese Restaurant,American Restaurant,Cocktail Bar,Spa,Bubble Tea Shop
17,Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop,American Restaurant,Bakery
18,Greenwich Village,Italian Restaurant,Clothing Store,Sushi Restaurant,French Restaurant,Café
22,Little Italy,Bakery,Seafood Restaurant,Café,Sandwich Place,Salon / Barbershop
23,Soho,Clothing Store,Boutique,Shoe Store,Women's Store,Italian Restaurant
24,West Village,Italian Restaurant,New American Restaurant,Cosmetics Shop,Park,Jazz Club


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
26,Morningside Heights,Coffee Shop,Park,Bookstore,American Restaurant,Tennis Court


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
33,Midtown South,Korean Restaurant,Hotel,Hotel Bar,Coffee Shop,Japanese Restaurant


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
13,Lincoln Square,Gym / Fitness Center,Theater,Café,Plaza,Italian Restaurant
14,Clinton,Theater,Gym / Fitness Center,Hotel,American Restaurant,Italian Restaurant
21,Tribeca,Spa,Café,Italian Restaurant,American Restaurant,Boutique
39,Hudson Yards,American Restaurant,Italian Restaurant,Café,Coffee Shop,Gym / Fitness Center


<hr>

Copyright &copy; 2019