# Introduction

New York has always been the vibrant city. It is often selected as the best city for college students who seek professional success, as New York is full of job opportunities. 
Therefore, through completing this capstone project, I would like to find the best place to live in New York for college students. 
Specifically, I will start with the area (Manhattan, Brooklyn) nearby New York University that has 50,000+ students. 
It starts with the assumption that college students like places nearby sports facilities such as gyms and yoga studios.  

# Data to Use

I will use the "newyork_data.json" data that we used for the practice in the earlier module and Forsquare data to find clusters of sports facilities in New York. This will give the recommendation of places to live for young people moving in New York.

# Methodology

I will mainly use K means clustering to recommend five different clusters of sports facilities nearby New York University. 
As a result, I will help NYU students moving into town to settle down easily where they prefer. 

In [140]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install bs4
import folium # map rendering library
import pandas as pd
import requests
from bs4 import BeautifulSoup

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [141]:
# get newyork_data.json
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

# transform the data

neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [142]:
# Create the map of NY
address = '70 Washington Square South, New York, NY'

geolocator = Nominatim(user_agent="nyu_explorer")
location = geolocator.geocode(address)
nyu_latitude = location.latitude
nyu_longitude = location.longitude
print('The geograpical coordinate of NYU are {}, {}.'.format(nyu_latitude, nyu_longitude))

# create map of Austin using latitude and longitude values
map_nyu = folium.Map(location=[nyu_latitude, nyu_longitude], zoom_start=10)


# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyu)  
    

map_nyu

The geograpical coordinate of NYU are 40.72942865, -73.99721780456252.


In [143]:
# Settings
CLIENT_ID = 'XHQFSSGIEVYUKMDKSTXQY3JYRNW0YAFPRLQJJH20ICIU2EZW' # your Foursquare ID
CLIENT_SECRET = 'NN0TAMKCPQK404GVYCJXL5513ULXFLNUW3T0EBU1BMZ3ESV4' # your Foursquare Secret
ACCESS_TOKEN = 'STOVZFQICF4C3VN35C441UMUFLHEJPF1XNNYGYJXNCHQLPXI' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100

# Get nearbyvenues
def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [144]:
# As most buildings of NYU are located at Manhattan and Brooklyn, we'll start with those two boroughs
nyu_data1 = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
nyu_data2 = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
nyu_data = nyu_data1.merge(nyu_data2, how = 'outer')

nyu_venues = getNearbyVenues(names=nyu_data['Neighborhood'],latitudes=nyu_data['Latitude'],
                                  longitudes=nyu_data['Longitude'])

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine

In [145]:
nyu_venues.groupby('Neighborhood').count()
print('There are {} uniques categories.'.format(len(nyu_venues['Venue Category'].unique())))
# one hot encoding
nyu_onehot = pd.get_dummies(nyu_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nyu_onehot['Neighborhood'] = nyu_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nyu_onehot.columns[-1]] + list(nyu_onehot.columns[:-1])
nyu_onehot = nyu_onehot[fixed_columns]

nyu_grouped = nyu_onehot.groupby('Neighborhood').mean().reset_index()
nyu_grouped.head()

There are 259 uniques categories.


Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,African Restaurant,American Restaurant,Animal Shelter,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Beer Garden,Beer Store,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Arts Building,Comedy Club,Comic Shop,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Field,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Health Food Store,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kofte Place,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Library,Lighthouse,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Market,Martial Arts School,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,National Park,New American Restaurant,Nightclub,Non-Profit,Opera House,Optical Shop,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Picnic Shelter,Pie Shop,Pier,Pilates Studio,Pizza Place,Planetarium,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Recreation Center,Reservoir,Restaurant,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stables,State / Provincial Park,Stationery Store,Steakhouse,Street Art,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Yemeni Restaurant
0,Bath Beach,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.07,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0
1,Battery Park City,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0
2,Bay Ridge,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.04,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.11,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01
3,Bedford Stuyvesant,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.08,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.07,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.09,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0
4,Bensonhurst,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.06,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.02,0.0,0.0


In [146]:
sports_facilities = ["Neighborhood","Yoga Studio","Athletics & Sports","Baseball Field","Baseball Stadium","Basketball Court",
            "Bike Trail","Boxing Gym","Climbing Gym","Cycle Studio","Field","Golf Course","Gym","Gym / Fitness Center",
           "Pilates Studio","Pool","Soccer Field","Surf Spot","Tennis Court","Weight Loss Center"]
nyu_sports = nyu_grouped[sports_facilities]
nyu_sports

Unnamed: 0,Neighborhood,Yoga Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym,Climbing Gym,Cycle Studio,Field,Golf Course,Gym,Gym / Fitness Center,Pilates Studio,Pool,Soccer Field,Surf Spot,Tennis Court,Weight Loss Center
0,Bath Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0
1,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0
2,Bay Ridge,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
3,Bedford Stuyvesant,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
4,Bensonhurst,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0
5,Bergen Beach,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
6,Boerum Hill,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0
7,Borough Park,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0
8,Brighton Beach,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0
9,Broadway Junction,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [147]:
num_top_venues = 5

for hood in nyu_sports['Neighborhood']:
    print("----"+hood+"----")
    temp = nyu_sports[nyu_sports['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bath Beach----
                  venue  freq
0             Surf Spot  0.01
1  Gym / Fitness Center  0.01
2                   Gym  0.01
3           Yoga Studio  0.00
4           Golf Course  0.00


----Battery Park City----
                  venue  freq
0          Cycle Studio  0.02
1                   Gym  0.01
2          Soccer Field  0.01
3            Boxing Gym  0.01
4  Gym / Fitness Center  0.01


----Bay Ridge----
                  venue  freq
0  Gym / Fitness Center  0.02
1           Yoga Studio  0.01
2           Golf Course  0.00
3          Tennis Court  0.00
4             Surf Spot  0.00


----Bedford Stuyvesant----
              venue  freq
0       Yoga Studio  0.01
1    Pilates Studio  0.01
2             Field  0.01
3  Baseball Stadium  0.00
4  Basketball Court  0.00


----Bensonhurst----
                  venue  freq
0  Gym / Fitness Center  0.02
1             Surf Spot  0.01
2           Yoga Studio  0.00
3           Golf Course  0.00
4          Tennis Court  0.00


----

In [148]:
# Create the dataframe displaying the top 10 sports facilities for each neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
sports_facilities_sorted = pd.DataFrame(columns=columns)
sports_facilities_sorted['Neighborhood'] = nyu_sports['Neighborhood']

for ind in np.arange(nyu_sports.shape[0]):
    sports_facilities_sorted.iloc[ind, 1:] = return_most_common_venues(nyu_sports.iloc[ind, :], num_top_venues)

sports_facilities_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Surf Spot,Gym / Fitness Center,Gym,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail
1,Battery Park City,Cycle Studio,Boxing Gym,Soccer Field,Gym / Fitness Center,Gym,Weight Loss Center,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court
2,Bay Ridge,Gym / Fitness Center,Yoga Studio,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym,Climbing Gym
3,Bedford Stuyvesant,Field,Pilates Studio,Yoga Studio,Soccer Field,Pool,Surf Spot,Gym / Fitness Center,Gym,Golf Course,Tennis Court
4,Bensonhurst,Gym / Fitness Center,Surf Spot,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym


In [149]:
# K means clustering

# more libraries to import
import random 
import matplotlib.pyplot as plt 
from sklearn.datasets.samples_generator import make_blobs 
%matplotlib inline

# set number of clusters
kclusters = 5

nyu_sports_clustering = nyu_sports.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(nyu_sports_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 0, 0, 0, 0, 4, 3, 0, 4, 4], dtype=int32)

In [150]:
# Create the new dataframe including the cluser and the top 10 sports facilities for each neighborhood
# add clustering labels
sports_facilities_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

nyu_merged = nyu_data

# merge nyu_sports with nyu_data to add latitude/longitude for each neighborhood
nyu_merged = nyu_merged.join(sports_facilities_sorted.set_index('Neighborhood'), on='Neighborhood')

nyu_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,0,Yoga Studio,Athletics & Sports,Basketball Court,Cycle Studio,Baseball Field,Baseball Stadium,Bike Trail,Boxing Gym,Climbing Gym,Weight Loss Center
1,Manhattan,Chinatown,40.715618,-73.994279,0,Soccer Field,Gym / Fitness Center,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
2,Manhattan,Washington Heights,40.851903,-73.9369,0,Baseball Stadium,Yoga Studio,Athletics & Sports,Soccer Field,Baseball Field,Basketball Court,Cycle Studio,Bike Trail,Boxing Gym,Climbing Gym
3,Manhattan,Inwood,40.867684,-73.92121,0,Yoga Studio,Athletics & Sports,Pool,Basketball Court,Cycle Studio,Baseball Field,Baseball Stadium,Bike Trail,Boxing Gym,Climbing Gym
4,Manhattan,Hamilton Heights,40.823604,-73.949688,2,Yoga Studio,Baseball Stadium,Athletics & Sports,Soccer Field,Baseball Field,Basketball Court,Gym,Bike Trail,Boxing Gym,Cycle Studio


In [151]:
# Visualize the resulting clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nyu_merged['Latitude'], nyu_merged['Longitude'], nyu_merged['Neighborhood'], nyu_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examine Clusters

In [152]:
# Cluster 1
cluster1 = nyu_merged.loc[nyu_merged['Cluster Labels'] == 0, nyu_merged.columns[[1] + list(range(5, nyu_merged.shape[1]))]]
cluster1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Yoga Studio,Athletics & Sports,Basketball Court,Cycle Studio,Baseball Field,Baseball Stadium,Bike Trail,Boxing Gym,Climbing Gym,Weight Loss Center
1,Chinatown,Soccer Field,Gym / Fitness Center,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
2,Washington Heights,Baseball Stadium,Yoga Studio,Athletics & Sports,Soccer Field,Baseball Field,Basketball Court,Cycle Studio,Bike Trail,Boxing Gym,Climbing Gym
3,Inwood,Yoga Studio,Athletics & Sports,Pool,Basketball Court,Cycle Studio,Baseball Field,Baseball Stadium,Bike Trail,Boxing Gym,Climbing Gym
18,Greenwich Village,Yoga Studio,Gym / Fitness Center,Gym,Boxing Gym,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail
20,Lower East Side,Yoga Studio,Gym / Fitness Center,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym,Climbing Gym
21,Tribeca,Yoga Studio,Gym / Fitness Center,Cycle Studio,Boxing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Climbing Gym
22,Little Italy,Yoga Studio,Gym / Fitness Center,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym,Climbing Gym
23,Soho,Yoga Studio,Gym / Fitness Center,Gym,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
28,Battery Park City,Cycle Studio,Boxing Gym,Soccer Field,Gym / Fitness Center,Gym,Weight Loss Center,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court


In [153]:
# Cluster 2
cluster2 = nyu_merged.loc[nyu_merged['Cluster Labels'] == 1, nyu_merged.columns[[1] + list(range(5, nyu_merged.shape[1]))]]
cluster2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Lenox Hill,Gym,Field,Gym / Fitness Center,Boxing Gym,Climbing Gym,Cycle Studio,Yoga Studio,Pilates Studio,Golf Course,Tennis Court
11,Roosevelt Island,Gym,Field,Gym / Fitness Center,Boxing Gym,Climbing Gym,Cycle Studio,Yoga Studio,Pilates Studio,Golf Course,Tennis Court
13,Lincoln Square,Gym,Yoga Studio,Gym / Fitness Center,Boxing Gym,Cycle Studio,Field,Pilates Studio,Pool,Golf Course,Tennis Court
14,Clinton,Gym,Yoga Studio,Gym / Fitness Center,Climbing Gym,Boxing Gym,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court
15,Midtown,Gym,Yoga Studio,Gym / Fitness Center,Climbing Gym,Boxing Gym,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court
16,Murray Hill,Gym,Gym / Fitness Center,Yoga Studio,Climbing Gym,Boxing Gym,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court
33,Midtown South,Gym,Yoga Studio,Gym / Fitness Center,Boxing Gym,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail
34,Sutton Place,Gym,Gym / Fitness Center,Yoga Studio,Climbing Gym,Boxing Gym,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court
35,Turtle Bay,Gym,Gym / Fitness Center,Yoga Studio,Climbing Gym,Boxing Gym,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court
36,Tudor City,Gym,Gym / Fitness Center,Yoga Studio,Climbing Gym,Boxing Gym,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court


In [154]:
# Cluster 3
cluster3 = nyu_merged.loc[nyu_merged['Cluster Labels'] == 2, nyu_merged.columns[[1] + list(range(5, nyu_merged.shape[1]))]]
cluster3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Hamilton Heights,Yoga Studio,Baseball Stadium,Athletics & Sports,Soccer Field,Baseball Field,Basketball Court,Gym,Bike Trail,Boxing Gym,Cycle Studio
5,Manhattanville,Yoga Studio,Baseball Stadium,Cycle Studio,Athletics & Sports,Baseball Field,Bike Trail,Boxing Gym,Field,Gym,Soccer Field
6,Central Harlem,Yoga Studio,Baseball Stadium,Gym,Cycle Studio,Athletics & Sports,Baseball Field,Basketball Court,Bike Trail,Boxing Gym,Field
7,East Harlem,Yoga Studio,Gym,Boxing Gym,Field,Pool,Pilates Studio,Gym / Fitness Center,Soccer Field,Golf Course,Tennis Court
8,Upper East Side,Gym,Yoga Studio,Gym / Fitness Center,Cycle Studio,Field,Pilates Studio,Pool,Soccer Field,Golf Course,Tennis Court
9,Yorkville,Gym,Yoga Studio,Gym / Fitness Center,Cycle Studio,Field,Pilates Studio,Pool,Soccer Field,Golf Course,Tennis Court
12,Upper West Side,Yoga Studio,Gym,Field,Soccer Field,Pool,Pilates Studio,Gym / Fitness Center,Surf Spot,Golf Course,Tennis Court
17,Chelsea,Gym,Yoga Studio,Gym / Fitness Center,Boxing Gym,Cycle Studio,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court
19,East Village,Yoga Studio,Gym / Fitness Center,Gym,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
24,West Village,Gym / Fitness Center,Yoga Studio,Gym,Boxing Gym,Cycle Studio,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court


In [155]:
# Cluster 4
cluster4 = nyu_merged.loc[nyu_merged['Cluster Labels'] == 3, nyu_merged.columns[[1] + list(range(5, nyu_merged.shape[1]))]]
cluster4

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,Flatbush,Field,Baseball Field,Gym / Fitness Center,Gym,Tennis Court,Climbing Gym,Athletics & Sports,Baseball Stadium,Basketball Court,Bike Trail
49,Crown Heights,Field,Gym,Climbing Gym,Yoga Studio,Pool,Pilates Studio,Gym / Fitness Center,Soccer Field,Golf Course,Tennis Court
50,East Flatbush,Field,Yoga Studio,Baseball Field,Gym / Fitness Center,Gym,Tennis Court,Surf Spot,Boxing Gym,Athletics & Sports,Baseball Stadium
51,Kensington,Field,Baseball Field,Gym,Tennis Court,Climbing Gym,Athletics & Sports,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
52,Windsor Terrace,Field,Gym,Climbing Gym,Tennis Court,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
53,Prospect Heights,Field,Gym / Fitness Center,Athletics & Sports,Climbing Gym,Gym,Yoga Studio,Pilates Studio,Pool,Golf Course,Tennis Court
59,Cobble Hill,Climbing Gym,Gym,Field,Gym / Fitness Center,Athletics & Sports,Yoga Studio,Pilates Studio,Pool,Golf Course,Tennis Court
60,Carroll Gardens,Climbing Gym,Gym,Field,Athletics & Sports,Gym / Fitness Center,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
62,Gowanus,Field,Yoga Studio,Athletics & Sports,Gym / Fitness Center,Gym,Climbing Gym,Surf Spot,Boxing Gym,Baseball Field,Baseball Stadium
63,Fort Greene,Field,Climbing Gym,Gym,Gym / Fitness Center,Athletics & Sports,Boxing Gym,Yoga Studio,Pilates Studio,Golf Course,Tennis Court


In [156]:
# Cluster 5
cluster5 = nyu_merged.loc[nyu_merged['Cluster Labels'] == 4, nyu_merged.columns[[1] + list(range(5, nyu_merged.shape[1]))]]
cluster5

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,Sunset Park,Gym,Field,Baseball Field,Gym / Fitness Center,Tennis Court,Climbing Gym,Athletics & Sports,Baseball Stadium,Basketball Court,Bike Trail
44,Gravesend,Surf Spot,Gym,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
45,Brighton Beach,Gym,Surf Spot,Athletics & Sports,Baseball Stadium,Weight Loss Center,Climbing Gym,Baseball Field,Basketball Court,Bike Trail,Boxing Gym
46,Sheepshead Bay,Gym,Weight Loss Center,Cycle Studio,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym,Climbing Gym
65,Cypress Hills,Gym,Gym / Fitness Center,Weight Loss Center,Baseball Field,Climbing Gym,Athletics & Sports,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
66,East New York,Gym,Baseball Field,Gym / Fitness Center,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
67,Starrett City,Gym,Gym / Fitness Center,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
68,Canarsie,Gym / Fitness Center,Baseball Field,Gym,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
69,Flatlands,Gym / Fitness Center,Gym,Weight Loss Center,Climbing Gym,Athletics & Sports,Baseball Field,Baseball Stadium,Basketball Court,Bike Trail,Boxing Gym
71,Manhattan Beach,Gym,Baseball Stadium,Weight Loss Center,Cycle Studio,Athletics & Sports,Baseball Field,Basketball Court,Bike Trail,Boxing Gym,Climbing Gym


# Results

Most common sports facilities in cluster1 neighborhoods are Yoga Studios and Gym / Fitness Center

In [157]:
cluster1['Neighborhood']

0             Marble Hill
1               Chinatown
2      Washington Heights
3                  Inwood
18      Greenwich Village
20        Lower East Side
21                Tribeca
22           Little Italy
23                   Soho
28      Battery Park City
29     Financial District
32           Civic Center
40              Bay Ridge
41            Bensonhurst
47      Manhattan Terrace
55           Williamsburg
56               Bushwick
57     Bedford Stuyvesant
58       Brooklyn Heights
61               Red Hook
70            Mill Island
74           Borough Park
75          Dyker Heights
89      East Williamsburg
90             North Side
91             South Side
92          Ocean Parkway
93          Fort Hamilton
100            Mill Basin
101          Fulton Ferry
102          Vinegar Hill
105                 Dumbo
108               Madison
Name: Neighborhood, dtype: object

Most common sports facilities in cluster2 neighborhoods are Gyms

In [158]:
cluster2['Neighborhood']

10          Lenox Hill
11    Roosevelt Island
13      Lincoln Square
14             Clinton
15             Midtown
16         Murray Hill
33       Midtown South
34        Sutton Place
35          Turtle Bay
36          Tudor City
39        Hudson Yards
Name: Neighborhood, dtype: object

Most common sports facilities in cluster3 neighborhoods are Yoga Studios and Gyms

In [159]:
cluster3['Neighborhood']

4        Hamilton Heights
5          Manhattanville
6          Central Harlem
7             East Harlem
8         Upper East Side
9               Yorkville
12        Upper West Side
17                Chelsea
19           East Village
24           West Village
25       Manhattan Valley
26    Morningside Heights
27               Gramercy
30          Carnegie Hill
31                   Noho
37        Stuyvesant Town
38               Flatiron
43             Greenpoint
54            Brownsville
83             Ocean Hill
97         Remsen Village
Name: Neighborhood, dtype: object

Most common sports facilities in cluster4 neighborhoods are Fields and Climbing Gyms

In [160]:
cluster4['Neighborhood']

48                      Flatbush
49                 Crown Heights
50                 East Flatbush
51                    Kensington
52               Windsor Terrace
53              Prospect Heights
59                   Cobble Hill
60               Carroll Gardens
62                       Gowanus
63                   Fort Greene
64                    Park Slope
78                  Clinton Hill
80                      Downtown
81                   Boerum Hill
82     Prospect Lefferts Gardens
87           Prospect Park South
94                   Ditmas Park
95                       Wingate
96                         Rugby
103                   Weeksville
109                      Erasmus
Name: Neighborhood, dtype: object

Most common sports facilities in cluster5 neighborhoods are Gyms and Gyms / Fitness Centers

In [161]:
cluster5['Neighborhood']

42           Sunset Park
44             Gravesend
45        Brighton Beach
46        Sheepshead Bay
65         Cypress Hills
66         East New York
67         Starrett City
68              Canarsie
69             Flatlands
71       Manhattan Beach
72          Coney Island
73            Bath Beach
76       Gerritsen Beach
77           Marine Park
79              Sea Gate
84             City Line
85          Bergen Beach
86               Midwood
88            Georgetown
98              New Lots
99       Paerdegat Basin
104    Broadway Junction
106            Homecrest
107        Highland Park
Name: Neighborhood, dtype: object

# Discussion

Even though there are some variations in different clusters, it is notable that gyms, fitness centers, and yoga studios are the most common sports facilities that college students can find in New York City, especially nearby New York University buildings. 
It well reflects the trend of sports facilities college students prefer: weight training, yoga, and cardio workout.

# Conclusion

This capstone project proves that the location of sports facilities reflects the needs of residents. As young people spend most of their time around New York University, the area around it is filled with sports facilities they like.   