# Data Science Capstone Project Code Submission

### This captsone project will conduct analysis of various restaurant information for Suffolk Country, MA. The purpose of this analysis will be to dirve decision making for a prospective restauranteer as to what type, and what location would be best served for a new venue. I will use the geo.nyu site for puspose of pulling location data, which will be subsequently be married in a data frame with their respective geographic coordinates. Subsequently, I shall use the Foursquare API for purposes of exploring each neighborhood and their venue assortments. This will be run through K-means and cluster analysis, with finishing statistical modeling to determine the optimum location and venue style for the greater Suffolk area. 

## Download various libraries and functionalities previously shown in this module

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.12.5          |   py36h5fab9bb_1         143 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.1.0                |     pyhd3deb0d_0          64 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         240 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.1.0-pyhd3deb0d_0

The following packages will be UPDATED:

  certifi                          2020.12.5-py36h5fab9bb_0 --> 202

## Step 1: Dowload and conduct initial data cleaning

### The data was derived from the same site used for Week 3 exploration, but contains Suffolk County info for this exercise, it can be found here: https://geo.nyu.edu/download/file/harvard-mgisgeonamx2-geojson.json


In [28]:
import json

In [30]:
with open('harvard-mgisgeonamx2-geojson.json') as json_data:
  massachusetts_data = json.load(json_data)

In [31]:
massachusetts_data

{'type': 'FeatureCollection',
 'totalFeatures': 1835,
 'features': [{'type': 'Feature',
   'id': 'MGISGEONAMX2.1',
   'geometry': {'type': 'Point', 'coordinates': [-70.86436054, 42.84482233]},
   'geometry_name': 'the_geom',
   'properties': {'PLACES_': 3,
    'PLACES_ID': 1,
    'X': 251961.859,
    'Y': 955105.25,
    'OFFSETX': 0,
    'OFFSETY': 0,
    'HEIGHT': 100,
    'SYMBOL': 1,
    'LEVEL_': 1,
    'TEXT': 'S A L I S B U R Y',
    'NAME': 'SALISBURY',
    'FEATURE': 'PPL',
    'COUNTY': 25009,
    'COORD': '',
    'DATE_': 1978,
    'ELEVATION': 25,
    'SOURCE': 'USGS',
    'TILE_NAME': '146'}},
  {'type': 'Feature',
   'id': 'MGISGEONAMX2.2',
   'geometry': {'type': 'Point', 'coordinates': [-70.81461765, 42.84174158]},
   'geometry_name': 'the_geom',
   'properties': {'PLACES_': 2,
    'PLACES_ID': 2,
    'X': 256030.875,
    'Y': 954794.5,
    'OFFSETX': 0,
    'OFFSETY': 0,
    'HEIGHT': 76.2,
    'SYMBOL': 1,
    'LEVEL_': 1,
    'TEXT': 'SALISBURY BEACH',
    'NAME': 'SA

### So we continue data cleaning, by removing all data except feature key, which breaks down information into neighborhoods

In [32]:
neighborhoods_data = massachusetts_data['features']

In [33]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'MGISGEONAMX2.1',
 'geometry': {'type': 'Point', 'coordinates': [-70.86436054, 42.84482233]},
 'geometry_name': 'the_geom',
 'properties': {'PLACES_': 3,
  'PLACES_ID': 1,
  'X': 251961.859,
  'Y': 955105.25,
  'OFFSETX': 0,
  'OFFSETY': 0,
  'HEIGHT': 100,
  'SYMBOL': 1,
  'LEVEL_': 1,
  'TEXT': 'S A L I S B U R Y',
  'NAME': 'SALISBURY',
  'FEATURE': 'PPL',
  'COUNTY': 25009,
  'COORD': '',
  'DATE_': 1978,
  'ELEVATION': 25,
  'SOURCE': 'USGS',
  'TILE_NAME': '146'}}

### Continuation of creating data we can manipulate for decision making, we transfer it to a pandas DF 

In [34]:
column_names = ['COUNTY', 'Neighborhood', 'Latitude', 'Longitude'] 

neighborhoods = pd.DataFrame(columns=column_names)

In [35]:
for data in neighborhoods_data:
    COUNTY = neighborhood_name = data['properties']['COUNTY'] 
    neighborhood_name = data['properties']['NAME']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'COUNTY': COUNTY,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [36]:
neighborhoods.head()

Unnamed: 0,COUNTY,Neighborhood,Latitude,Longitude
0,25009,SALISBURY,42.844822,-70.864361
1,25009,SALISBURY BEACH,42.841742,-70.814618
2,25009,BROWNS POINT,42.838659,-70.83387
3,25009,RINGS ISLAND,42.816168,-70.867222
4,25009,PLUM ISLAND,42.813622,-70.808103


In [37]:
print('The dataframe has {} COUNTY unique results and {} neighborhoods.'.format(
        len(neighborhoods['COUNTY'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 15 COUNTY unique results and 1835 neighborhoods.


### Employ geopy to gain lats and longs

In [38]:
address = 'Boston, MA'

geolocator = Nominatim(user_agent="MA_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Boston are 42.3602534, -71.0582912.


### Employ folium to create map for future cluster analysis

In [39]:
map_boston = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, COUNTY, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['COUNTY'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, COUNTY)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_boston)  
    
map_boston

### This map is relatively useless at this point, so lets parse down to Suffolk county.

In [40]:
BostonSuffolkCounty_data = neighborhoods[neighborhoods['COUNTY'] == 25025].reset_index(drop=True)
BostonSuffolkCounty_data.head()

Unnamed: 0,COUNTY,Neighborhood,Latitude,Longitude
0,25025,POINT OF PINES,42.437468,-70.965568
1,25025,BEACHMONT,42.395601,-70.990215
2,25025,REVERE,42.411107,-71.018667
3,25025,CHELSEA,42.39143,-71.03514
4,25025,ORIENT HEIGHTS,42.387261,-71.009795


In [41]:
# create map of Boston using latitude and longitude values
map_Boston2 = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(BostonSuffolkCounty_data['Latitude'], BostonSuffolkCounty_data['Longitude'], BostonSuffolkCounty_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=12,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Boston2)  
    
map_Boston2

### The next step is to employ FourSquare API to further integrate data and set conditions for future segmetnation of neighborhoods

In [42]:
CLIENT_ID = 'IEZQZK0MKCEMGP2NUPJDKEQHJTVYNHQZE1ARYCIHPDP3SFQI' # your Foursquare ID
CLIENT_SECRET = 'VFXGYAC2YXUTZWI2MCR2NPXUWWQKGXEYKQXHIEAJCBWXFBR3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [43]:
BostonSuffolkCounty_data.loc[0, 'Neighborhood']

'POINT OF PINES'

In [44]:
neighborhood_latitude = BostonSuffolkCounty_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = BostonSuffolkCounty_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = BostonSuffolkCounty_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of POINT OF PINES are 42.43746753, -70.96556756.


### From here we use the API to determine the top restaurants near this coordinate

In [45]:
LIMIT = 200 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)


In [46]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '600954e349257e3ad571343a'},
 'response': {'headerLocation': 'Revere',
  'headerFullLocation': 'Revere',
  'headerLocationGranularity': 'city',
  'totalResults': 6,
  'suggestedBounds': {'ne': {'lat': 42.44646753900001,
    'lng': -70.9533954303484},
   'sw': {'lat': 42.42846752099999, 'lng': -70.9777396896516}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4fee7c47e4b01127cba03ba8',
       'name': 'Point of Pines Private Beach',
       'location': {'address': 'Rice Ave',
        'crossStreet': 'Fowler Ave',
        'lat': 42.437731268209006,
        'lng': -70.96879679484049,
        'labeledLatLngs': [{'label': 'display',
          'lat': 42.437731268209006,
          'lng': -70.96879679484049}],
        'distance': 266,
        '

In [47]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [48]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Point of Pines Private Beach,Beach,42.437731,-70.968797
1,Revere Beach-North,Beach,42.434256,-70.971749
2,Pest Arrest Of New England,Business Service,42.439712,-70.96596
3,Pine River Rock Beach,River,42.437419,-70.96916
4,Mirage,Restaurant,42.441175,-70.967157


In [49]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


### Establish function to repeat this process before using it on each element of the data set

In [50]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [51]:
BostonSuffolkCounty_venues = getNearbyVenues(names=BostonSuffolkCounty_data['Neighborhood'],
                                   latitudes=BostonSuffolkCounty_data['Latitude'],
                                   longitudes=BostonSuffolkCounty_data['Longitude']
                                  )

POINT OF PINES
BEACHMONT
REVERE
CHELSEA
ORIENT HEIGHTS
CHARLESTOWN
WINTHROP
FORT WARREN
BOSTON
FORT INDEPENDENCE
ROXBURY
NEWSTEAD MONTEGRADE
FOREST HILLS
DORCHESTER
ROSLINDALE
NEPONSET
ASHMONT
MATTAPAN
FAIRMOUNT
ALLSTON
FANEUIL
BRIGHTON
ABERDEEN
BELLEVUE
HIGHLAND
GERMANTOWN
READVILLE


In [52]:
print(BostonSuffolkCounty_venues.shape)
BostonSuffolkCounty_venues.head()

(1117, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,POINT OF PINES,42.437468,-70.965568,Point of Pines Private Beach,42.437731,-70.968797,Beach
1,POINT OF PINES,42.437468,-70.965568,Revere Beach-North,42.434256,-70.971749,Beach
2,POINT OF PINES,42.437468,-70.965568,Pest Arrest Of New England,42.439712,-70.96596,Business Service
3,POINT OF PINES,42.437468,-70.965568,Pine River Rock Beach,42.437419,-70.96916,River
4,POINT OF PINES,42.437468,-70.965568,Mirage,42.441175,-70.967157,Restaurant


In [53]:
BostonSuffolkCounty_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ABERDEEN,94,94,94,94,94,94
ALLSTON,100,100,100,100,100,100
ASHMONT,28,28,28,28,28,28
BEACHMONT,26,26,26,26,26,26
BELLEVUE,40,40,40,40,40,40
BOSTON,100,100,100,100,100,100
BRIGHTON,79,79,79,79,79,79
CHARLESTOWN,74,74,74,74,74,74
CHELSEA,50,50,50,50,50,50
DORCHESTER,18,18,18,18,18,18


### Finally, determine total number of venue types seen in the data set

In [54]:
print('There are {} uniques categories.'.format(len(BostonSuffolkCounty_venues['Venue Category'].unique())))

There are 218 uniques categories.


### The last portion of analysis will involve determining most popular venue by neighborhood, and conducting cluster analysis

In [55]:
# one hot encoding
BostonSuffolkCounty_onehot = pd.get_dummies(BostonSuffolkCounty_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
BostonSuffolkCounty_onehot['Neighborhood'] = BostonSuffolkCounty_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [BostonSuffolkCounty_onehot.columns[-1]] + list(BostonSuffolkCounty_onehot.columns[:-1])
BostonSuffolkCounty_onehot = BostonSuffolkCounty_onehot[fixed_columns]

BostonSuffolkCounty_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bath House,Beach,Beer Bar,Beer Garden,Belgian Restaurant,Big Box Store,Board Shop,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Circus,Clothing Store,Coffee Shop,College Hockey Rink,College Stadium,Comedy Club,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Hill,Historic Site,History Museum,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Jazz Club,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Lawyer,Library,Light Rail Station,Lighthouse,Liquor Store,Locksmith,Lounge,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,National Park,New American Restaurant,Noodle House,Opera House,Optical Shop,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Post Office,Pub,Racetrack,Record Shop,Rental Car Location,Restaurant,River,Road,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Chalet,Smoke Shop,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,POINT OF PINES,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,POINT OF PINES,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,POINT OF PINES,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,POINT OF PINES,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,POINT OF PINES,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [56]:
BostonSuffolkCounty_onehot.shape

(1117, 219)

In [57]:
BostonSuffolkCounty_grouped = BostonSuffolkCounty_onehot.groupby('Neighborhood').mean().reset_index()
BostonSuffolkCounty_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bath House,Beach,Beer Bar,Beer Garden,Belgian Restaurant,Big Box Store,Board Shop,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Circus,Clothing Store,Coffee Shop,College Hockey Rink,College Stadium,Comedy Club,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Hill,Historic Site,History Museum,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Island,Italian Restaurant,Japanese Restaurant,Jazz Club,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Lawyer,Library,Light Rail Station,Lighthouse,Liquor Store,Locksmith,Lounge,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,National Park,New American Restaurant,Noodle House,Opera House,Optical Shop,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Post Office,Pub,Racetrack,Record Shop,Rental Car Location,Restaurant,River,Road,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Chalet,Smoke Shop,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,ABERDEEN,0.0,0.0,0.010638,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.042553,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.010638,0.0,0.021277,0.0,0.0,0.06383,0.0,0.0,0.0,0.021277,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.031915,0.0,0.0,0.010638,0.010638,0.0,0.010638,0.0,0.0,0.0,0.0,0.031915,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.021277,0.031915,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.021277,0.031915,0.010638,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.021277,0.0,0.06383,0.010638,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.010638,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.031915,0.0,0.010638,0.010638,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0
1,ALLSTON,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.06,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0
2,ASHMONT,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0
3,BEACHMONT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BELLEVUE,0.0,0.0,0.075,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.025,0.0,0.05,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,BOSTON,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.05,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.04,0.01,0.04,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0
6,BRIGHTON,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037975,0.025316,0.025316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.050633,0.0,0.0,0.0,0.050633,0.0,0.012658,0.025316,0.0,0.0,0.0,0.0,0.0,0.050633,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.0,0.012658,0.025316,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025316,0.037975,0.012658,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658,0.012658,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.025316,0.0,0.101266,0.0,0.012658,0.0,0.0,0.0,0.0,0.037975,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025316,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.025316,0.0,0.012658,0.0,0.0,0.0,0.0,0.025316,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.0
7,CHARLESTOWN,0.0,0.0,0.027027,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.013514,0.040541,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.040541,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.040541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040541,0.0,0.0,0.0,0.027027,0.013514,0.013514,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.013514,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.013514,0.013514,0.0,0.040541,0.0,0.027027,0.013514,0.013514,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.027027,0.013514,0.013514,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0
8,CHELSEA,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,DORCHESTER,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.166667,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [58]:
BostonSuffolkCounty_grouped.shape

(27, 219)

In [59]:
# determine top eight venues per location
num_top_venues = 8

for hood in BostonSuffolkCounty_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = BostonSuffolkCounty_grouped[BostonSuffolkCounty_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ABERDEEN----
               venue  freq
0               Café  0.06
1        Pizza Place  0.06
2        Coffee Shop  0.04
3             Bakery  0.04
4               Bank  0.04
5   Sushi Restaurant  0.03
6      Grocery Store  0.03
7  Convenience Store  0.03


----ALLSTON----
                venue  freq
0         Coffee Shop  0.06
1   Korean Restaurant  0.05
2     Thai Restaurant  0.04
3              Bakery  0.04
4     Bubble Tea Shop  0.03
5  Chinese Restaurant  0.03
6         Pizza Place  0.03
7    Sushi Restaurant  0.03


----ASHMONT----
               venue  freq
0               Park  0.07
1         Donut Shop  0.07
2      Grocery Store  0.07
3     Hardware Store  0.04
4     Breakfast Spot  0.04
5           Pharmacy  0.04
6  Convenience Store  0.04
7        Pizza Place  0.04


----BEACHMONT----
                venue  freq
0        Liquor Store  0.12
1                Park  0.08
2          Food Truck  0.08
3       Metro Station  0.08
4      Sandwich Place  0.08
5      Discount Store

In [60]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [61]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = BostonSuffolkCounty_grouped['Neighborhood']

for ind in np.arange(BostonSuffolkCounty_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(BostonSuffolkCounty_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABERDEEN,Pizza Place,Café,Coffee Shop,Bank,Bakery,Donut Shop,Mexican Restaurant,Grocery Store,Sushi Restaurant,Convenience Store
1,ALLSTON,Coffee Shop,Korean Restaurant,Bakery,Thai Restaurant,Sushi Restaurant,Pizza Place,Bubble Tea Shop,Chinese Restaurant,Mexican Restaurant,Liquor Store
2,ASHMONT,Grocery Store,Park,Donut Shop,Italian Restaurant,Sandwich Place,Breakfast Spot,Metro Station,Pharmacy,Speakeasy,Caribbean Restaurant
3,BEACHMONT,Liquor Store,Park,Sandwich Place,Food Truck,Metro Station,Italian Restaurant,Beach,Donut Shop,Racetrack,Cosmetics Shop
4,BELLEVUE,American Restaurant,Thai Restaurant,Pizza Place,Park,Italian Restaurant,Diner,Café,Candy Store,Sandwich Place,Coffee Shop
5,BOSTON,Coffee Shop,Historic Site,Park,Italian Restaurant,Bakery,Seafood Restaurant,Sandwich Place,Restaurant,Hotel,Salad Place
6,BRIGHTON,Pizza Place,Chinese Restaurant,Café,Convenience Store,Pub,Bakery,Grocery Store,Greek Restaurant,Bank,Sandwich Place
7,CHARLESTOWN,Park,Pizza Place,Bar,Café,Gastropub,Donut Shop,Pub,Playground,Sandwich Place,National Park
8,CHELSEA,Hotel,Pizza Place,Donut Shop,Mexican Restaurant,Train Station,Bank,American Restaurant,Harbor / Marina,Spanish Restaurant,Discount Store
9,DORCHESTER,Pizza Place,Vegetarian / Vegan Restaurant,Fried Chicken Joint,Southern / Soul Food Restaurant,Platform,Golf Course,Fast Food Restaurant,Market,Chinese Restaurant,Liquor Store


### This shows most common occurance of each venue type per location...pay attention to the number of pizza places, asian themed food and cafes

### Conduct k means clustering

In [62]:
# set number of clusters
kclusters = 8

BostonSuffolkCounty_grouped_clustering = BostonSuffolkCounty_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(BostonSuffolkCounty_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 7, 1, 1, 1, 1, 1, 6], dtype=int32)

In [63]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

BostonSuffolkCounty_merged = BostonSuffolkCounty_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
BostonSuffolkCounty_merged = BostonSuffolkCounty_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

BostonSuffolkCounty_merged # check the last columns

Unnamed: 0,COUNTY,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,25025,POINT OF PINES,42.437468,-70.965568,2,River,Beach,Restaurant,Business Service,Zoo Exhibit,Flower Shop,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
1,25025,BEACHMONT,42.395601,-70.990215,7,Liquor Store,Park,Sandwich Place,Food Truck,Metro Station,Italian Restaurant,Beach,Donut Shop,Racetrack,Cosmetics Shop
2,25025,REVERE,42.411107,-71.018667,1,Pharmacy,Bank,Pizza Place,Skating Rink,Café,Sandwich Place,Chinese Restaurant,Construction & Landscaping,Convenience Store,Plaza
3,25025,CHELSEA,42.39143,-71.03514,1,Hotel,Pizza Place,Donut Shop,Mexican Restaurant,Train Station,Bank,American Restaurant,Harbor / Marina,Spanish Restaurant,Discount Store
4,25025,ORIENT HEIGHTS,42.387261,-71.009795,1,Sandwich Place,Pizza Place,Harbor / Marina,Pool Hall,Baseball Field,Café,Skating Rink,Circus,Mexican Restaurant,Coffee Shop
5,25025,CHARLESTOWN,42.377601,-71.065068,1,Park,Pizza Place,Bar,Café,Gastropub,Donut Shop,Pub,Playground,Sandwich Place,National Park
6,25025,WINTHROP,42.373326,-70.98869,1,Pharmacy,Park,Dance Studio,Bank,Deli / Bodega,Construction & Landscaping,Restaurant,Chinese Restaurant,Pizza Place,Gift Shop
7,25025,FORT WARREN,42.321138,-70.930108,3,Island,Park,Historic Site,Seafood Restaurant,Discount Store,Dive Bar,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Football Stadium
8,25025,BOSTON,42.358096,-71.061861,1,Coffee Shop,Historic Site,Park,Italian Restaurant,Bakery,Seafood Restaurant,Sandwich Place,Restaurant,Hotel,Salad Place
9,25025,FORT INDEPENDENCE,42.337719,-71.009582,0,Harbor / Marina,Pier,Park,Boat or Ferry,Lighthouse,Playground,Hot Dog Joint,Trail,Historic Site,Zoo Exhibit


In [64]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(BostonSuffolkCounty_merged['Latitude'], BostonSuffolkCounty_merged['Longitude'], BostonSuffolkCounty_merged['Neighborhood'], BostonSuffolkCounty_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=25,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster break down

#### Cluster 1

In [65]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 0, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,FORT INDEPENDENCE,Harbor / Marina,Pier,Park,Boat or Ferry,Lighthouse,Playground,Hot Dog Joint,Trail,Historic Site,Zoo Exhibit


#### Cluster 2

In [66]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 1, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,REVERE,Pharmacy,Bank,Pizza Place,Skating Rink,Café,Sandwich Place,Chinese Restaurant,Construction & Landscaping,Convenience Store,Plaza
3,CHELSEA,Hotel,Pizza Place,Donut Shop,Mexican Restaurant,Train Station,Bank,American Restaurant,Harbor / Marina,Spanish Restaurant,Discount Store
4,ORIENT HEIGHTS,Sandwich Place,Pizza Place,Harbor / Marina,Pool Hall,Baseball Field,Café,Skating Rink,Circus,Mexican Restaurant,Coffee Shop
5,CHARLESTOWN,Park,Pizza Place,Bar,Café,Gastropub,Donut Shop,Pub,Playground,Sandwich Place,National Park
6,WINTHROP,Pharmacy,Park,Dance Studio,Bank,Deli / Bodega,Construction & Landscaping,Restaurant,Chinese Restaurant,Pizza Place,Gift Shop
8,BOSTON,Coffee Shop,Historic Site,Park,Italian Restaurant,Bakery,Seafood Restaurant,Sandwich Place,Restaurant,Hotel,Salad Place
10,ROXBURY,Pizza Place,Park,Donut Shop,Italian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Supermarket,Mobile Phone Shop,Café
11,NEWSTEAD MONTEGRADE,Park,Coffee Shop,Brewery,Pizza Place,Gym,Chinese Restaurant,Mexican Restaurant,Pharmacy,Football Stadium,Museum
12,FOREST HILLS,Pizza Place,Indian Restaurant,Rental Car Location,Park,American Restaurant,Bar,Bus Station,Pet Store,Scenic Lookout,Sandwich Place
14,ROSLINDALE,Pizza Place,American Restaurant,Italian Restaurant,Discount Store,Donut Shop,Flower Shop,Bar,Bakery,Park,Grocery Store


#### Cluster 3

In [67]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 2, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,POINT OF PINES,River,Beach,Restaurant,Business Service,Zoo Exhibit,Flower Shop,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant


#### Cluster 4

In [68]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 3, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,FORT WARREN,Island,Park,Historic Site,Seafood Restaurant,Discount Store,Dive Bar,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Football Stadium


#### Cluster 5

In [69]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 4, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,GERMANTOWN,Drugstore,Donut Shop,Food & Drink Shop,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Chinese Restaurant,Breakfast Spot,Pool,Frozen Yogurt Shop


#### Cluster 6

In [70]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 5, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,READVILLE,Bakery,Pizza Place,Construction & Landscaping,Gym,Italian Restaurant,Deli / Bodega,Dive Bar,Rental Car Location,Donut Shop,Clothing Store


#### Cluster 7

In [71]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 6, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,DORCHESTER,Pizza Place,Vegetarian / Vegan Restaurant,Fried Chicken Joint,Southern / Soul Food Restaurant,Platform,Golf Course,Fast Food Restaurant,Market,Chinese Restaurant,Liquor Store


#### Cluster 8

In [72]:
BostonSuffolkCounty_merged.loc[BostonSuffolkCounty_merged['Cluster Labels'] == 7, BostonSuffolkCounty_merged.columns[[1] + list(range(5, BostonSuffolkCounty_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,BEACHMONT,Liquor Store,Park,Sandwich Place,Food Truck,Metro Station,Italian Restaurant,Beach,Donut Shop,Racetrack,Cosmetics Shop


# Conclusion:

### Initial insights seem to show Pizza Places, Coffee Shops and Asian Themed Restaurants as the most common
### Cluster 2 appears to be the most venue saturated during this dtg of data gain, with the remaining clusters having limited density
### Recommendations to a future restaurantor include placement of a pizza place in a non cluster 2 location to avoid saturation and for best longevity.
### Additionally, geographic analysis seems to include lone dining establishments being seen on the outskirts of the Boston area. Setting up an italian, or asian inspired restaurant of nicer statture may be lucrative given the lack of selection in the outer clusters