# This is the capstone project for the IBM Data Science Professional Certificate by Kayla Morris

## Step 1: Introduction to the business problem and who would be interested in this project

There is a local land developer in the Providence, RI area that is looking for an area to build a Luxury apartment complex.  As a local data scientist, I have been tasked with finding the best location in the city to build based on amentities.  The city is situated at the mouth of the Providence River at the head of Narragansett Bay. Providence was one of the first cities in the country to industrialize and became noted for its textile manufacturing and subsequent machine tool, jewelry, and silverware industries. It is well known for having diverse food, culture, art and events around the city.  


This information will be of use not only to land developers in the area but also people who may be looking for the best places to live in the city of Providence.  

## Step 2: Laying out the data that being utilized and how it will solve the problem

https://data.providenceri.gov/Reference/Neighborhood-Index/nigu-z8tj

Providence, RI has an open data site which has a neighborhood Index laying out the names of all the neighborhoods in the area.  Based on this data, I will find and cluster the areas that have the best range of amenities to provide the people who would reside in the new apartment complex which the land developer is looking to build. I will do this by running the information through the Foursquare API and finding out which clusters in the area have amentities these tenants may be seeking.   

## Step 3: Scrape, Clean, Wrangle data

In [1]:
import pandas as pd

df = pd.read_csv('Neighborhood_Index.csv')
df = df.drop(labels=14, axis=0)
df = df.drop(labels=34, axis=0)
df.reset_index
df.head(36)

Unnamed: 0,Neighborhood Code,Neighborhood Description
0,1010,Federal Hill
1,1020,Armory
2,1021,Armory (Historic)
3,1200,S. Elmwood
4,1210,Washington Park
5,1220,Resev Triangle
6,1230,Elmwood
7,1240,Lower South Providence
8,1250,Hospital South
9,1260,West End


## Step 4: Analyze data

We will start by finding the longitude and latitude of each neighborhood for future clustering, we will use Arcgis to do this 

In [2]:
#import all of our necessary libraries

import numpy as np

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

! pip install arcgis
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

print('Libraries imported.')

Libraries imported.


In [3]:
#definte the function to get the coordinates

def get_x_y_prov(address1):
   lat_coords = 0
   lng_coords = 0
   g = geocode(address='{}, Providence, Rhode Island, USA'.format(address1))[0]
   lng_coords = g['location']['x']
   lat_coords = g['location']['y']
   return str(lat_coords) +","+ str(lng_coords)

In [4]:
#seperate out just the column we are using to get the coordinates
prov_coord = df['Neighborhood Description']    
prov_coord

0               Federal Hill
1                     Armory
2          Armory (Historic)
3                 S. Elmwood
4            Washington Park
5             Resev Triangle
6                    Elmwood
7     Lower South Providence
8             Hospital South
9                   West End
10    Manton, Mount Pleasant
11             East Pleasant
12    Olneyville Residential
13     Olneyville Industrial
15          Hart Silver Lake
16         Mount Pleasant NE
17                Smith Hill
18                  Elmhurst
19               PC Elmhurst
20                W Wanskuck
21                  Wanskuck
22                 N Charles
23              W Blackstone
24                  Woodward
25                 S Charles
26                Mount Hope
27                 Fox Point
28            Fox Point West
29            College Hill E
30                   Wayland
31                Blackstone
32            College Hill 1
33               Hope Street
35            College Hill 2
Name: Neighbor

In [5]:
#get the coordinates
prov_latlong = prov_coord.apply(lambda x: get_x_y_prov(x))

In [6]:
#split up to just the latitudes to append to the table

lat_prov = prov_latlong.apply(lambda x: x.split(',')[0])

In [7]:
#split up to just the longitudes
lon_prov = prov_latlong.apply(lambda x: x.split(',')[1])

In [8]:
#append to the table
result = pd.concat([df,lat_prov.astype(float), lon_prov.astype(float)], axis=1)
result.columns= ['Neighborhood_Code','Neighborhood_Description','Latitude','Longitude']
result.head()

Unnamed: 0,Neighborhood_Code,Neighborhood_Description,Latitude,Longitude
0,1010,Federal Hill,41.82121,-71.4297
1,1020,Armory,41.854455,-71.41474
2,1021,Armory (Historic),41.82387,-71.41199
3,1200,S. Elmwood,41.78348,-71.41892
4,1210,Washington Park,41.79146,-71.39428


In [9]:
providence = geocode(address='Providence, Rhode Island, USA')[0]
prov_long_coord = providence['location']['x']
prov_lat_coord = providence['location']['y']


# create map 
map_prov = folium.Map(location=[prov_lat_coord, prov_long_coord], zoom_start=11)

# add markers to map
for lat, lng, label in zip(result['Latitude'], result['Longitude'], result['Neighborhood_Description']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_prov)  
    
map_prov

Next, we will go ahead and take a look at the area by using the foursquare API

In [10]:
CLIENT_ID = '2LKOCWCCPK4CIJLINCQNRGK4X004YNGNWTCS2UAEVWQ0QJHG' # your Foursquare ID
CLIENT_SECRET = 'Y1G2SHIXDO3MP1V3XYJFBTPMFJNITXXSB2YBTOJQUMA02H5W' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [11]:
LIMIT=100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
#create dataframe of the venues
import requests

prov_venues = getNearbyVenues(result['Neighborhood_Description'], result['Latitude'], result['Longitude'])

Federal Hill
Armory
Armory (Historic)
S. Elmwood
Washington Park
Resev Triangle
Elmwood
Lower South Providence
Hospital South
West End
Manton, Mount Pleasant
East Pleasant
Olneyville Residential
Olneyville Industrial
Hart Silver Lake
Mount Pleasant NE
Smith Hill
Elmhurst
PC Elmhurst
W Wanskuck
Wanskuck
N Charles
W Blackstone
Woodward
S Charles
Mount Hope
Fox Point
Fox Point West
College Hill E
Wayland
Blackstone
College Hill 1
Hope Street
College Hill 2


In [13]:
#check the dataframe
prov_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,Federal Hill,41.82121,-71.4297,Julians,New American Restaurant
1,Federal Hill,41.82121,-71.4297,Seven Stars Bakery,Bakery
2,Federal Hill,41.82121,-71.4297,Schastea,Creperie
3,Federal Hill,41.82121,-71.4297,Courtland Club,Cocktail Bar
4,Federal Hill,41.82121,-71.4297,Columbus Theatre,Movie Theater


In [14]:
#group the venues by the Venue Category 
prov_venues.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,Fox Point West,41.824759,-71.420286,Michael Kors
American Restaurant,S. Elmwood,41.854455,-71.400680,wilderness cafe
Art Gallery,Resev Triangle,41.824759,-71.409878,The Spot Underground
Asian Restaurant,Resev Triangle,41.828320,-71.394110,Wong's Kitchen
Automotive Shop,Woodward,41.864229,-71.438272,Touch Of Class Auto Salon
...,...,...,...,...
Vegetarian / Vegan Restaurant,Fox Point West,41.828320,-71.400680,by CHLOE. Providence
Video Store,Hope Street,41.819452,-71.395209,Redbox
Watch Shop,Woodward,41.864229,-71.438272,Watchworks
Wine Bar,Resev Triangle,41.823870,-71.411990,Fortnight


In [15]:
#make new table with the Venue Category column
prov_venue_cat = pd.get_dummies(prov_venues[['Venue Category']], prefix="", prefix_sep="")
prov_venue_cat

Unnamed: 0,Accessories Store,American Restaurant,Art Gallery,Asian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,...,Thai Restaurant,Theater,Trail,Tram Station,University,Vegetarian / Vegan Restaurant,Video Store,Watch Shop,Wine Bar,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
701,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
702,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
703,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
704,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
prov_venue_cat['Neighborhood'] = prov_venues['Neighborhood'] 

# moving neighborhood column to the first column
fixed_columns = [prov_venue_cat.columns[-1]] + list(prov_venue_cat.columns[:-1])
prov_venue_cat = prov_venue_cat[fixed_columns]

prov_venue_cat.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Asian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,...,Thai Restaurant,Theater,Trail,Tram Station,University,Vegetarian / Vegan Restaurant,Video Store,Watch Shop,Wine Bar,Yoga Studio
0,Federal Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Federal Hill,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Federal Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Federal Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Federal Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
#grouping by the neighborhood and setting to the mean to normalize the data
prov_grouped = prov_venue_cat.groupby('Neighborhood').mean().reset_index()
prov_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Asian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,...,Thai Restaurant,Theater,Trail,Tram Station,University,Vegetarian / Vegan Restaurant,Video Store,Watch Shop,Wine Bar,Yoga Studio
0,Armory,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0
1,Armory (Historic),0.0,0.057692,0.019231,0.038462,0.0,0.0,0.0,0.0,0.0,...,0.0,0.038462,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0
2,Blackstone,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,College Hill 1,0.0,0.02,0.0,0.02,0.0,0.02,0.02,0.0,0.0,...,0.02,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02
4,College Hill 2,0.0,0.02,0.0,0.02,0.0,0.02,0.02,0.0,0.0,...,0.02,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02


In [18]:
#define function for the most common venues

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [20]:
# create a new dataframe
prov_venues_sorted = pd.DataFrame(columns=columns)
prov_venues_sorted['Neighborhood'] = prov_grouped['Neighborhood']

for ind in np.arange(prov_grouped.shape[0]):
    prov_venues_sorted.iloc[ind, 1:] = return_most_common_venues(prov_grouped.iloc[ind, :], num_top_venues)

prov_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Armory,Rental Car Location,American Restaurant,Chinese Restaurant,Donut Shop,Tram Station,Convenience Store,Fishing Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
1,Armory (Historic),Café,Bar,American Restaurant,Hotel,Gay Bar,Asian Restaurant,Sandwich Place,Greek Restaurant,Clothing Store,Theater
2,Blackstone,Playground,Gym / Fitness Center,Gastropub,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck,Food,Flower Shop,Fishing Store
3,College Hill 1,Coffee Shop,Pizza Place,Mexican Restaurant,Food Truck,Korean Restaurant,Salon / Barbershop,Jewelry Store,Juice Bar,Clothing Store,Ramen Restaurant
4,College Hill 2,Coffee Shop,Pizza Place,Mexican Restaurant,Food Truck,Korean Restaurant,Salon / Barbershop,Jewelry Store,Juice Bar,Clothing Store,Ramen Restaurant


Now we will need to cluster our data to see if we can make determinations about the places around the city with the most amenities 

In [21]:
# set number of clusters
k_num_clusters = 5

prov_grouped_clustering = prov_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans_prov = KMeans(n_clusters=k_num_clusters, random_state=0).fit(prov_grouped_clustering)
kmeans_prov

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)

In [22]:
kmeans_prov.labels_

array([0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 1, 0, 0,
       4, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0], dtype=int32)

In [23]:
#add cluster labels
prov_venues_sorted.insert(0, 'Cluster Labels', kmeans_prov.labels_ +1)

In [24]:
#join the original dataframe and the new dataframe together to see the whole picture
prov_final = result

prov_final = prov_final.join(prov_venues_sorted.set_index('Neighborhood'), on='Neighborhood_Description')

prov_final.head()

Unnamed: 0,Neighborhood_Code,Neighborhood_Description,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1010,Federal Hill,41.82121,-71.4297,1,Italian Restaurant,Bar,American Restaurant,Bakery,Pizza Place,Gourmet Shop,Chinese Restaurant,Hookah Bar,Flower Shop,Mexican Restaurant
1,1020,Armory,41.854455,-71.41474,1,Rental Car Location,American Restaurant,Chinese Restaurant,Donut Shop,Tram Station,Convenience Store,Fishing Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
2,1021,Armory (Historic),41.82387,-71.41199,1,Café,Bar,American Restaurant,Hotel,Gay Bar,Asian Restaurant,Sandwich Place,Greek Restaurant,Clothing Store,Theater
3,1200,S. Elmwood,41.78348,-71.41892,1,Food Truck,Playground,Exhibit,American Restaurant,Park,Pub,Sculpture Garden,Lake,Restaurant,Fast Food Restaurant
4,1210,Washington Park,41.79146,-71.39428,1,Pub,Bus Station,Pizza Place,Liquor Store,Convenience Store,Chinese Restaurant,Harbor / Marina,Hookah Bar,Donut Shop,Dog Run


In [25]:
#drop any NaN values to prevent skew
prov_final = prov_final.dropna(subset=['Cluster Labels'])

In [26]:
map_clusters_prov = folium.Map(location=[prov_lat_coord, prov_long_coord], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(prov_final['Latitude'], prov_final['Longitude'], prov_final['Neighborhood_Description'], prov_final['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters_prov)
        
map_clusters_prov

### Lets take a look at the clusters and see how the amentities of each neighborhood stack up

In [27]:
prov_final.loc[prov_final['Cluster Labels'] == 1, prov_final.columns[[1] + list(range(5, prov_final.shape[1]))]]

Unnamed: 0,Neighborhood_Description,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Federal Hill,Italian Restaurant,Bar,American Restaurant,Bakery,Pizza Place,Gourmet Shop,Chinese Restaurant,Hookah Bar,Flower Shop,Mexican Restaurant
1,Armory,Rental Car Location,American Restaurant,Chinese Restaurant,Donut Shop,Tram Station,Convenience Store,Fishing Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
2,Armory (Historic),Café,Bar,American Restaurant,Hotel,Gay Bar,Asian Restaurant,Sandwich Place,Greek Restaurant,Clothing Store,Theater
3,S. Elmwood,Food Truck,Playground,Exhibit,American Restaurant,Park,Pub,Sculpture Garden,Lake,Restaurant,Fast Food Restaurant
4,Washington Park,Pub,Bus Station,Pizza Place,Liquor Store,Convenience Store,Chinese Restaurant,Harbor / Marina,Hookah Bar,Donut Shop,Dog Run
5,Resev Triangle,Café,Bar,American Restaurant,Hotel,Gay Bar,Asian Restaurant,Sandwich Place,Greek Restaurant,Clothing Store,Theater
6,Elmwood,Pizza Place,Lawyer,Pharmacy,Fried Chicken Joint,Asian Restaurant,Gym,Sandwich Place,Food Truck,Food,Flower Shop
7,Lower South Providence,Bar,Gas Station,Donut Shop,Lounge,Yoga Studio,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck,Food
8,Hospital South,Bar,Nightclub,Sandwich Place,Yoga Studio,Shipping Store,Mexican Restaurant,Coffee Shop,Gas Station,Gay Bar,Museum
9,West End,Liquor Store,Plaza,Donut Shop,Deli / Bodega,Park,Breakfast Spot,Farmers Market,Supermarket,Frozen Yogurt Shop,Fried Chicken Joint


Cluster 2

In [28]:
prov_final.loc[prov_final['Cluster Labels'] == 2, prov_final.columns[[1] + list(range(5, prov_final.shape[1]))]]

Unnamed: 0,Neighborhood_Description,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,N Charles,Pizza Place,Fishing Store,Chinese Restaurant,Cosmetics Shop,Rental Car Location,Grocery Store,Flower Shop,Fried Chicken Joint,Food Truck,Food
25,S Charles,Pizza Place,Fishing Store,Chinese Restaurant,Cosmetics Shop,Rental Car Location,Grocery Store,Flower Shop,Fried Chicken Joint,Food Truck,Food


Cluster 3

In [29]:
prov_final.loc[prov_final['Cluster Labels'] == 3, prov_final.columns[[1] + list(range(5, prov_final.shape[1]))]]

Unnamed: 0,Neighborhood_Description,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,W Blackstone,Playground,Gym / Fitness Center,Gastropub,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck,Food,Flower Shop,Fishing Store
31,Blackstone,Playground,Gym / Fitness Center,Gastropub,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck,Food,Flower Shop,Fishing Store


Cluster 4

In [30]:
prov_final.loc[prov_final['Cluster Labels'] == 4, prov_final.columns[[1] + list(range(5, prov_final.shape[1]))]]

Unnamed: 0,Neighborhood_Description,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,"Manton, Mount Pleasant",Bakery,Italian Restaurant,Massage Studio,Insurance Office,Hotel,Gastropub,Dog Run,Donut Shop,Electronics Store,Exhibit
16,Mount Pleasant NE,Bakery,Italian Restaurant,Massage Studio,Insurance Office,Hotel,Gastropub,Dog Run,Donut Shop,Electronics Store,Exhibit


Cluster 5

In [31]:
prov_final.loc[prov_final['Cluster Labels'] == 5, prov_final.columns[[1] + list(range(5, prov_final.shape[1]))]]

Unnamed: 0,Neighborhood_Description,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,PC Elmhurst,Ice Cream Shop,Donut Shop,Student Center,Theater,Yoga Studio,Flower Shop,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck


# Methodology, Discussion, & Conclusion can be found in the powerpoint presentation 