## Introduction

In this lab, you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in New York City. You will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the *k*-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in New York City and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Salt Lake City</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

Download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


<a id='item1'></a>

## 1. Explore Salt Lake City Dataset

The geocoded json file was located at: https://github.com/blackmad/neighborhoods/blob/master/gn-salt-lake-city.geojson?short_path=fe4b44d

Thank you to blackmad of the github community for acquiring this data.

In [2]:
!wget -q -O 'slc_data.geojson' https://raw.githubusercontent.com/blackmad/neighborhoods/master/gn-salt-lake-city.geojson
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

In [3]:
with open('slc_data.geojson') as geojson_data:
    slc_data = json.load(geojson_data)

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [5]:
neighborhoods_data = slc_data['features']

Let's take a look at the second item in this list. The first is the longest, so I'll use the second as a reference.

In [6]:
neighborhoods_data[1]

{'type': 'Feature',
 'properties': {'fclass': 'P',
  'name': 'Sugar House',
  'countryCode': 'US',
  'geonameid': '5782227',
  'created_at': '2013-03-13T17:10:45.249Z',
  'cartodb_id': 1,
  'updated_at': '2013-03-13T17:10:45.526Z',
  'fcode': 'PPLX',
  'lat': 40.720018707036814,
  'parents': '5780993',
  'adminCode4': '',
  'lng': -111.84583584422354,
  'adminCode1': 'UT',
  'adminCode2': '035',
  'adminCode3': ''},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[-111.85389, 40.733506],
     [-111.850887, 40.733506],
     [-111.847969, 40.733507],
     [-111.845107, 40.733505],
     [-111.842266, 40.733504],
     [-111.842266, 40.733589],
     [-111.836512, 40.733585],
     [-111.833618, 40.733584],
     [-111.830758, 40.733583],
     [-111.829603, 40.733582],
     [-111.827896, 40.733581],
     [-111.82504, 40.73358],
     [-111.823912, 40.733579],
     [-111.821639, 40.733578],
     [-111.820591, 40.733578],
     [-111.82032, 40.733577],
     [-111.819595, 40.732736],
    

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [7]:
# define the dataframe columns
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [8]:
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [9]:
for data in neighborhoods_data:
    neighborhood_name = data['properties']['name']
        
#    neighborhood_latlon = data['geometry']['coordinates']
#    neighborhood_lat = neighborhood_latlon[1]
#    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhood_lat = data['properties']['lat']
    neighborhood_lng = data['properties']['lng']
    
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lng}, ignore_index=True)

Quickly examine the resulting dataframe.

In [10]:
neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Sugar House,40.720019,-111.845836
1,Bonneville Hills,40.737244,-111.829994
2,Wasatch Hollow,40.737587,-111.845218
3,Liberty-Wells,40.733535,-111.879644
4,Glendale,40.736957,-111.98453


#### Use geopy library to get the latitude and longitude values of Salt Lake City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [11]:
address = 'Salt Lake City, UT'

geolocator = Nominatim(user_agent="ut_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Salt Lake City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Salt Lake City are 40.7670126, -111.8904308.


#### Create a map of Salt Lake City with neighborhoods superimposed on top.

In [12]:
# create map of New York using latitude and longitude values
map_slc = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_slc)  
    
map_slc

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [13]:
CLIENT_ID = 'ADTOKVZFZPYAUL0TGHFPBJEWTMQOMZJGQLH0HFYHPKEDSLLZ' # your Foursquare ID
CLIENT_SECRET = '0OKRJQNQ3ONOPIXDSJOXEFMC5JTRLQCNGYGHMAIESUTNHXBF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ADTOKVZFZPYAUL0TGHFPBJEWTMQOMZJGQLH0HFYHPKEDSLLZ
CLIENT_SECRET:0OKRJQNQ3ONOPIXDSJOXEFMC5JTRLQCNGYGHMAIESUTNHXBF


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [14]:
neighborhoods.loc[0, 'Neighborhood']

'Sugar House'

Get the neighborhood's latitude and longitude values.

In [15]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Sugar House are 40.720018707036814, -111.84583584422354.


#### Now, let's get the top 100 venues that are in Sugar House within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [16]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius
#query = '4d4b7105d754a06374d81259' # limit search to food providers

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius,
#    query,
    LIMIT)
url # display URL




'https://api.foursquare.com/v2/venues/explore?&client_id=ADTOKVZFZPYAUL0TGHFPBJEWTMQOMZJGQLH0HFYHPKEDSLLZ&client_secret=0OKRJQNQ3ONOPIXDSJOXEFMC5JTRLQCNGYGHMAIESUTNHXBF&v=20180605&ll=40.720018707036814,-111.84583584422354&radius=2000&limit=100'

Send the GET request and examine the resutls

In [17]:
results = requests.get(url).json()
#results

{'meta': {'code': 200, 'requestId': '5d74474b8afbe0003ac17cc3'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Sugar House',
  'headerFullLocation': 'Sugar House, Salt Lake City',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 130,
  'suggestedBounds': {'ne': {'lat': 40.73801872503683,
    'lng': -111.82213053090204},
   'sw': {'lat': 40.7020186890368, 'lng': -111.86954115754504}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c27a4d85c5ca5932e2548fe',
       'name': 'Sugar House Park',
       'location': {'address': '1350 E 2100 S',
        'lat': 40.72342102794725,
        'lng': -111.84970378875732,
        'labeledL

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Sugar House Park,Park,40.723421,-111.849704
1,Bruges Waffles & Frites,Belgian Restaurant,40.720746,-111.85817
2,The Dodo Restaurant,American Restaurant,40.726118,-111.852102
3,Spitz,Mediterranean Restaurant,40.723462,-111.856943
4,Red Lobster,Seafood Restaurant,40.720633,-111.854114


And how many venues were returned by Foursquare?

In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


<a id='item2'></a>

## 2. Explore Neighborhoods in Manhattan

#### Let's create a function to repeat the same process to all the neighborhoods in Salt Lake City

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *manhattan_venues*.

In [22]:
# type your answer here

slc_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )



Sugar House
Bonneville Hills
Wasatch Hollow
Liberty-Wells
Glendale
East Central/East Liberty Park
Central City/Liberty-Wells
Foothill/Sunnyside
Sunnyside East Association
Yalecrest
Ball Park
East Central
Central City
Downtown
Poplar Grove
Rose Park
Greater Avenues
Jordan Meadows
Fairpark
Capitol Hill
East Bench
Westpointe


#### Let's check the size of the resulting dataframe

In [23]:
print(slc_venues.shape)
slc_venues.head()

(1487, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Sugar House,40.720019,-111.845836,Sugar House Park,40.723421,-111.849704,Park
1,Sugar House,40.720019,-111.845836,Bruges Waffles & Frites,40.720746,-111.85817,Belgian Restaurant
2,Sugar House,40.720019,-111.845836,The Dodo Restaurant,40.726118,-111.852102,American Restaurant
3,Sugar House,40.720019,-111.845836,Spitz,40.723462,-111.856943,Mediterranean Restaurant
4,Sugar House,40.720019,-111.845836,Red Lobster,40.720633,-111.854114,Seafood Restaurant


Let's check how many venues were returned for each neighborhood

In [24]:
slc_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ball Park,100,100,100,100,100,100
Bonneville Hills,78,78,78,78,78,78
Capitol Hill,5,5,5,5,5,5
Central City,100,100,100,100,100,100
Central City/Liberty-Wells,100,100,100,100,100,100
Downtown,100,100,100,100,100,100
East Bench,71,71,71,71,71,71
East Central,100,100,100,100,100,100
East Central/East Liberty Park,100,100,100,100,100,100
Fairpark,100,100,100,100,100,100


#### Let's find out how many unique categories can be curated from all the returned venues

In [25]:
print('There are {} uniques categories.'.format(len(slc_venues['Venue Category'].unique())))

There are 195 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Neighborhood

In [26]:
# one hot encoding
slc_onehot = pd.get_dummies(slc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
slc_onehot['Neighborhood'] = slc_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [slc_onehot.columns[-1]] + list(slc_onehot.columns[:-1])
slc_onehot = slc_onehot[fixed_columns]

slc_onehot.head()

Unnamed: 0,Zoo Exhibit,ATM,Accessories Store,Airport,Airport Service,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beer Bar,Belgian Restaurant,Big Box Store,Bike Trail,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Camera Store,Candy Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Library,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Ethiopian Restaurant,Exhibit,Fast Food Restaurant,Festival,Food,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gas Station,Gastropub,Gay Bar,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Light Rail Station,Lingerie Store,Liquor Store,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Optical Shop,Organic Grocery,Paper / Office Supplies Store,Park,Peruvian Restaurant,Pet Store,Pharmacy,Piercing Parlor,Pilates Studio,Pizza Place,Planetarium,Plaza,Pool,Print Shop,Pub,RV Park,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Street Food Gathering,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Sugar House,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Sugar House,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Sugar House,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Sugar House,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Sugar House,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [27]:
slc_onehot.shape

(1487, 195)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [28]:
slc_grouped = slc_onehot.groupby('Neighborhood').mean().reset_index()
slc_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,Accessories Store,Airport,Airport Service,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beer Bar,Belgian Restaurant,Big Box Store,Bike Trail,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Camera Store,Candy Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Library,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Ethiopian Restaurant,Exhibit,Fast Food Restaurant,Festival,Food,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gas Station,Gastropub,Gay Bar,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Light Rail Station,Lingerie Store,Liquor Store,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Office,Optical Shop,Organic Grocery,Paper / Office Supplies Store,Park,Peruvian Restaurant,Pet Store,Pharmacy,Piercing Parlor,Pilates Studio,Pizza Place,Planetarium,Plaza,Pool,Print Shop,Pub,RV Park,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Street Food Gathering,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Ball Park,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.02,0.0,0.06,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.01,0.01,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01
1,Bonneville Hills,0.089744,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.025641,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.012821,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.012821,0.025641,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.012821,0.0,0.0,0.038462,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.012821,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.012821,0.0,0.012821,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.012821,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.012821,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.038462,0.012821
2,Capitol Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central City,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.03,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.04,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.02,0.01,0.0,0.03,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0
4,Central City/Liberty-Wells,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.05,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.02,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.05,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01
5,Downtown,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.05,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.03,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.1,0.01,0.01,0.0,0.02,0.0,0.04,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0
6,East Bench,0.098592,0.014085,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.042254,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.014085,0.0,0.0,0.042254,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.014085,0.0,0.0,0.014085,0.0,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.042254,0.0,0.0,0.014085,0.0,0.014085,0.0,0.014085,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.028169,0.0,0.0,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.070423,0.0,0.0,0.014085,0.0,0.0,0.056338,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042254,0.014085,0.0,0.0,0.0,0.0,0.014085,0.014085,0.0,0.014085,0.0,0.014085,0.0,0.014085,0.0,0.014085,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.014085,0.014085,0.0,0.0,0.0,0.0,0.014085,0.014085
7,East Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.04,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.04,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01
8,East Central/East Liberty Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.03,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01
9,Fairpark,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.09,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.1,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.07,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [29]:
slc_grouped.shape

(21, 195)

#### Let's print each neighborhood along with the top 5 most common venues

In [30]:
num_top_venues = 5

for hood in slc_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = slc_grouped[slc_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ball Park----
             venue  freq
0      Coffee Shop  0.07
1          Brewery  0.06
2              Bar  0.06
3  Thai Restaurant  0.05
4            Hotel  0.04


----Bonneville Hills----
                venue  freq
0         Zoo Exhibit  0.09
1                Park  0.08
2         Pizza Place  0.05
3  Salon / Barbershop  0.04
4         Yoga Studio  0.04


----Capitol Hill----
                 venue  freq
0                Trail   0.8
1       Scenic Lookout   0.2
2          Zoo Exhibit   0.0
3  Peruvian Restaurant   0.0
4           Nail Salon   0.0


----Central City----
                           venue  freq
0                    Pizza Place  0.05
1                 Sandwich Place  0.04
2  Vegetarian / Vegan Restaurant  0.04
3                  Grocery Store  0.04
4                            Bar  0.04


----Central City/Liberty-Wells----
             venue  freq
0      Coffee Shop  0.07
1  Thai Restaurant  0.05
2              Bar  0.05
3           Bakery  0.05
4          Brewery  0

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = slc_grouped['Neighborhood']

for ind in np.arange(slc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(slc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ball Park,Coffee Shop,Bar,Brewery,Thai Restaurant,Hotel,BBQ Joint,Chinese Restaurant,Sandwich Place,Indian Restaurant,Greek Restaurant
1,Bonneville Hills,Zoo Exhibit,Park,Pizza Place,Bank,Yoga Studio,Grocery Store,Salon / Barbershop,Bakery,Coffee Shop,Mexican Restaurant
2,Capitol Hill,Trail,Scenic Lookout,Zoo,Food Stand,Food Court,Food,Festival,Fast Food Restaurant,Exhibit,Ethiopian Restaurant
3,Central City,Pizza Place,Sandwich Place,Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Grocery Store,Burger Joint,Yoga Studio,Greek Restaurant,Italian Restaurant
4,Central City/Liberty-Wells,Coffee Shop,Bakery,Bar,Thai Restaurant,Grocery Store,Asian Restaurant,Vegetarian / Vegan Restaurant,Brewery,Gift Shop,Hotel


<a id='item4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [33]:
# set number of clusters
kclusters = 5

slc_grouped_clustering = slc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(slc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

slc_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
slc_merged = slc_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

slc_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Sugar House,40.720019,-111.845836,1.0,Pizza Place,Mexican Restaurant,Burger Joint,Coffee Shop,Gym / Fitness Center,Park,Deli / Bodega,Soup Place,Bar,Sushi Restaurant
1,Bonneville Hills,40.737244,-111.829994,1.0,Zoo Exhibit,Park,Pizza Place,Bank,Yoga Studio,Grocery Store,Salon / Barbershop,Bakery,Coffee Shop,Mexican Restaurant
2,Wasatch Hollow,40.737587,-111.845218,1.0,Coffee Shop,Burger Joint,Grocery Store,Mexican Restaurant,Park,Pizza Place,Yoga Studio,Soup Place,Steakhouse,Bank
3,Liberty-Wells,40.733535,-111.879644,1.0,Coffee Shop,Bar,Chinese Restaurant,Pizza Place,Thai Restaurant,Sandwich Place,Tea Room,Breakfast Spot,Burger Joint,American Restaurant
4,Glendale,40.736957,-111.98453,1.0,Sandwich Place,Business Service,Fast Food Restaurant,Paper / Office Supplies Store,Deli / Bodega,Asian Restaurant,Furniture / Home Store,Gas Station,Bar,Office


<a id='item5'></a>

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1

In [35]:
slc_merged.loc[slc_merged['Cluster Labels'] == 0, slc_merged.columns[[1] + list(range(5, slc_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,40.807994,Scenic Lookout,Zoo,Food Stand,Food Court,Food,Festival,Fast Food Restaurant,Exhibit,Ethiopian Restaurant


#### Cluster 2

In [36]:
slc_merged.loc[slc_merged['Cluster Labels'] == 1, slc_merged.columns[[1] + list(range(5, slc_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,40.720019,Mexican Restaurant,Burger Joint,Coffee Shop,Gym / Fitness Center,Park,Deli / Bodega,Soup Place,Bar,Sushi Restaurant
1,40.737244,Park,Pizza Place,Bank,Yoga Studio,Grocery Store,Salon / Barbershop,Bakery,Coffee Shop,Mexican Restaurant
2,40.737587,Burger Joint,Grocery Store,Mexican Restaurant,Park,Pizza Place,Yoga Studio,Soup Place,Steakhouse,Bank
3,40.733535,Bar,Chinese Restaurant,Pizza Place,Thai Restaurant,Sandwich Place,Tea Room,Breakfast Spot,Burger Joint,American Restaurant
4,40.736957,Business Service,Fast Food Restaurant,Paper / Office Supplies Store,Deli / Bodega,Asian Restaurant,Furniture / Home Store,Gas Station,Bar,Office
5,40.745717,Yoga Studio,New American Restaurant,Breakfast Spot,Bakery,Grocery Store,Pizza Place,Thai Restaurant,Massage Studio,Mexican Restaurant
6,40.745703,Bakery,Bar,Thai Restaurant,Grocery Store,Asian Restaurant,Vegetarian / Vegan Restaurant,Brewery,Gift Shop,Hotel
7,40.745827,Zoo Exhibit,Hotel,Coffee Shop,Grocery Store,American Restaurant,Burger Joint,Gym / Fitness Center,Sandwich Place,Mexican Restaurant
8,40.747254,Zoo Exhibit,American Restaurant,Hotel,Trail,Pizza Place,Mexican Restaurant,Gym / Fitness Center,Grocery Store,Bakery
9,40.746267,Pizza Place,Grocery Store,Bakery,Park,Hotel,New American Restaurant,Gift Shop,Salon / Barbershop,Yoga Studio


#### Cluster 3

In [37]:
slc_merged.loc[slc_merged['Cluster Labels'] == 2, slc_merged.columns[[1] + list(range(5, slc_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,40.76034,American Restaurant,Gym / Fitness Center,Rental Car Location,Transportation Service,Bus Stop,Distribution Center,Gym,Business Service,Airport Service
17,40.777788,American Restaurant,Sandwich Place,Athletics & Sports,Food Court,Motel,Gas Station,Business Service,Bank,Bakery


#### Cluster 4

In [38]:
slc_merged.loc[slc_merged['Cluster Labels'] == 3, slc_merged.columns[[1] + list(range(5, slc_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,40.804781,Airport Service,Zoo,Dry Cleaner,Fountain,Food Truck,Food Stand,Food Court,Food,Festival


#### Cluster 5

In [39]:
slc_merged.loc[slc_merged['Cluster Labels'] == 4, slc_merged.columns[[1] + list(range(5, slc_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,40.801514,Park,Convenience Store,Discount Store,Grocery Store,Pet Store,Golf Course,Bank,Bakery,Food Court


## Conclusion: 
#### Based on this limited analysis, Sugar House would be the best neighborhood in Salt Lake City to locate a new technology company, followed by Liberty Wells.