# Coursera Capstone Week Five / Final Project #
In this notebook I will use a dataset of QLD Mining towns to determine the types of venues that exist in the towns and then use clustering techniques to determine how towns are alike / different. This will be useful to Mining companies in helping them shape their employee value proposition to better attract and retain employees in remote towns. 

Completed by Bill Lovell Oct 2019

## Step one - get data and clean it ##

In [47]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [59]:

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder

!conda install -c conda-forge lxml --yes
!conda install -c conda-forge BeautifulSoup4 --yes

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


In [72]:
import folium

pip install lxml
pip install BeautifulSoup4

In [147]:
#load the data and check its shape
data = pd.read_csv('mining towns.csv',header=0,encoding = 'unicode_escape')
data.shape

(28, 5)

In [148]:
data.head()

Unnamed: 0,Town,Latitude,Longitude,Population,Unnamed: 4
0,Toowoomba,-27.56056,151.95386,96567,
1,Chinchilla,-26.7383,150.6217,4780,
2,Miles,-26.6583,150.1872,1169,
3,Roma,-26.5694,148.7838,6905,
4,Dalby,-27.1944,151.266,10861,


In [149]:
df = data

In [150]:
#column Unamed:4 is wrong - so will drop that
df.drop(df.columns[df.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)

In [151]:
#checck the shape is still right, but two new columns#
df.shape

(28, 4)

In [152]:
df.head()

Unnamed: 0,Town,Latitude,Longitude,Population
0,Toowoomba,-27.56056,151.95386,96567
1,Chinchilla,-26.7383,150.6217,4780
2,Miles,-26.6583,150.1872,1169
3,Roma,-26.5694,148.7838,6905
4,Dalby,-27.1944,151.266,10861


## Step Two Get the venue data ##

In [153]:
print('The dataframe has {} Towns'.format(
        len(df['Town'].unique()),
        df.shape[0]
    )
)

The dataframe has 27 Towns


In [154]:
df.dtypes

Town           object
Latitude       object
Longitude     float64
Population      int64
dtype: object

In [155]:
df.Latitude = df.Latitude.astype(np.float64)
df.Population = df.Population.astype(np.float64)

In [156]:
df.dtypes

Town           object
Latitude      float64
Longitude     float64
Population    float64
dtype: object

In [157]:
df1 = df
address = 'Queensland, Australia'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Queensland are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Queensland are -21.9182856, 144.4588889.


In [158]:
# create map of QLD using latitude and longitude values
map_QLD = folium.Map(location=[latitude, longitude], zoom_start=10000)

# add markers to map
for lat, lng, Population, df1 in zip(df1['Latitude'], df1['Longitude'], df1['Population'], df1['Town']):
    label = '{}, {}'.format(df1, Population)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_QLD)  
    
map_QLD

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

Define Foursquare Credentials and Version

In [159]:

CLIENT_ID = 'FNS1LWQRG1YX14TDHYFXMI1R0R10STHTJ1YKF1E355BQREKO' # your Foursquare ID
CLIENT_SECRET = 'OLCSB5XKAN2OTHQV3GMFEOQNXKEUUFFQPII4VSCQ4LX30PLP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FNS1LWQRG1YX14TDHYFXMI1R0R10STHTJ1YKF1E355BQREKO
CLIENT_SECRET:OLCSB5XKAN2OTHQV3GMFEOQNXKEUUFFQPII4VSCQ4LX30PLP


In [160]:
df.head()

Unnamed: 0,Town,Latitude,Longitude,Population
0,Toowoomba,-27.56056,151.95386,96567.0
1,Chinchilla,-26.7383,150.6217,4780.0
2,Miles,-26.6583,150.1872,1169.0
3,Roma,-26.5694,148.7838,6905.0
4,Dalby,-27.1944,151.266,10861.0


In [161]:
df.dtypes

Town           object
Latitude      float64
Longitude     float64
Population    float64
dtype: object

Let's explore the first neighborhood in our dataframe.
Get the neighborhood's name.

In [162]:
df.loc[0, 'Town']

'Toowoomba'

In [163]:
#Get the neighborhood's latitude and longitude values.

neighborhood_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Town'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Toowoomba are -27.56056, 151.95386000000002.


Now, let's get the top 100 venues that are in Town 1 within a radius of 5000 meters.
First, let's create the GET request URL. Name your URL url2.

In [164]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 5000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=FNS1LWQRG1YX14TDHYFXMI1R0R10STHTJ1YKF1E355BQREKO&client_secret=OLCSB5XKAN2OTHQV3GMFEOQNXKEUUFFQPII4VSCQ4LX30PLP&v=20180605&ll=-27.56056,151.95386000000002&radius=5000&limit=100'

In [165]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d982964492822002e047db6'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Toowoomba',
  'headerFullLocation': 'Toowoomba',
  'headerLocationGranularity': 'city',
  'totalResults': 84,
  'suggestedBounds': {'ne': {'lat': -27.515559954999954,
    'lng': 152.00452542109844},
   'sw': {'lat': -27.605560045000043, 'lng': 151.9031945789016}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '511d9721e4b0aceff5a5344c',
       'name': 'Ground Up Espresso Bar',
       'location': {'address': 'Searles Walk',
        'lat': -27.56281835696577,
        'lng': 151.95267104967976,
        'labeledLatLngs': [{'label': 'display',
          'lat': -27.5628183569

In [166]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [167]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Ground Up Espresso Bar,Coffee Shop,-27.562818,151.952671
1,Phat Burgers,Burger Joint,-27.56315,151.953051
2,Engine Room Cafe,Breakfast Spot,-27.55697,151.95089
3,Empire Theatre,Theater,-27.562869,151.955711
4,The Spotted Cow,Steakhouse,-27.55569,151.95444


In [168]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

84 venues were returned by Foursquare.


Now do it for all of QLD

In [169]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [170]:

QLD_venues = getNearbyVenues(names=df['Town'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Toowoomba
Chinchilla
Miles
Roma
Dalby
Gladstone
Rockhampton
Mackay
Moranbah
Dysart
Emerald
Blackwater
Clermont
Emerald
Barcladine
Blackall
Weipa
MT ISA
Winton
Clonclurry
Julia Creek
Prosipine
Quilpie
Middlemount
Mount Morgan
Collinsville
Charters Towers
Longreach


In [171]:
print(QLD_venues.shape)
QLD_venues.head()

(310, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Toowoomba,-27.56056,151.95386,Ground Up Espresso Bar,-27.562818,151.952671,Coffee Shop
1,Toowoomba,-27.56056,151.95386,Phat Burgers,-27.56315,151.953051,Burger Joint
2,Toowoomba,-27.56056,151.95386,Engine Room Cafe,-27.55697,151.95089,Breakfast Spot
3,Toowoomba,-27.56056,151.95386,Empire Theatre,-27.562869,151.955711,Theater
4,Toowoomba,-27.56056,151.95386,The Spotted Cow,-27.55569,151.95444,Steakhouse


In [174]:
#Let's check how many venues were returned for each neighborhood

QLD_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barcladine,3,3,3,3,3,3
Blackall,3,3,3,3,3,3
Blackwater,6,6,6,6,6,6
Charters Towers,4,4,4,4,4,4
Chinchilla,5,5,5,5,5,5
Clermont,7,7,7,7,7,7
Clonclurry,4,4,4,4,4,4
Collinsville,3,3,3,3,3,3
Dalby,7,7,7,7,7,7
Dysart,3,3,3,3,3,3


In [175]:
#Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(QLD_venues['Venue Category'].unique())))

There are 84 uniques categories.


##Analyse each neighborhood##

In [176]:
# one hot encoding
tr_onehot = pd.get_dummies(QLD_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tr_onehot['Neighborhood'] = QLD_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [tr_onehot.columns[-1]] + list(tr_onehot.columns[:-1])
tr_onehot = tr_onehot[fixed_columns]

tr_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Lounge,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Athletics & Sports,Australian Restaurant,Bakery,Bar,Beach,Big Box Store,Bistro,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Business Service,Café,Campground,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store,Discount Store,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Italian Restaurant,Juice Bar,Liquor Store,Lounge,Mexican Restaurant,Motel,Mountain,Movie Theater,Multiplex,Museum,Paper / Office Supplies Store,Park,Pizza Place,Portuguese Restaurant,Pub,RV Park,Rental Car Location,Rental Service,Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Shopping Mall,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Tapas Restaurant,Thai Restaurant,Theater,Train Station,Video Game Store,Water Park,Women's Store
0,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


In [177]:
#the columns came out in a weird order - so create a new data frame with Neighborhood as the first column
cols = list(tr_onehot.columns.values)
cols

['Neighborhood',
 'Airport',
 'Airport Lounge',
 'Airport Terminal',
 'American Restaurant',
 'Art Gallery',
 'Arts & Crafts Store',
 'Athletics & Sports',
 'Australian Restaurant',
 'Bakery',
 'Bar',
 'Beach',
 'Big Box Store',
 'Bistro',
 'Botanical Garden',
 'Bowling Alley',
 'Breakfast Spot',
 'Brewery',
 'Burger Joint',
 'Business Service',
 'Café',
 'Campground',
 'Chinese Restaurant',
 'Coffee Shop',
 'Concert Hall',
 'Construction & Landscaping',
 'Convenience Store',
 'Cricket Ground',
 'Department Store',
 'Discount Store',
 'Eastern European Restaurant',
 'Electronics Store',
 'Fast Food Restaurant',
 'Fish & Chips Shop',
 'Food & Drink Shop',
 'Furniture / Home Store',
 'Garden',
 'Garden Center',
 'Gas Station',
 'Golf Course',
 'Grocery Store',
 'Gym',
 'Gym / Fitness Center',
 'Harbor / Marina',
 'Health & Beauty Service',
 'History Museum',
 'Hostel',
 'Hotel',
 'Hotel Bar',
 'Ice Cream Shop',
 'Italian Restaurant',
 'Juice Bar',
 'Liquor Store',
 'Lounge',
 'Mexican Re

In [178]:
tr_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Lounge,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Athletics & Sports,Australian Restaurant,Bakery,Bar,Beach,Big Box Store,Bistro,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Business Service,Café,Campground,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store,Discount Store,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Italian Restaurant,Juice Bar,Liquor Store,Lounge,Mexican Restaurant,Motel,Mountain,Movie Theater,Multiplex,Museum,Paper / Office Supplies Store,Park,Pizza Place,Portuguese Restaurant,Pub,RV Park,Rental Car Location,Rental Service,Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Shopping Mall,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Tapas Restaurant,Thai Restaurant,Theater,Train Station,Video Game Store,Water Park,Women's Store
0,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,Toowoomba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


In [179]:
tr_onehot.shape

(310, 85)

In [181]:
#Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
tr_grouped = tr_onehot.groupby('Neighborhood').mean().reset_index()
tr_grouped

Unnamed: 0,Neighborhood,Airport,Airport Lounge,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Athletics & Sports,Australian Restaurant,Bakery,Bar,Beach,Big Box Store,Bistro,Botanical Garden,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Business Service,Café,Campground,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store,Discount Store,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Italian Restaurant,Juice Bar,Liquor Store,Lounge,Mexican Restaurant,Motel,Mountain,Movie Theater,Multiplex,Museum,Paper / Office Supplies Store,Park,Pizza Place,Portuguese Restaurant,Pub,RV Park,Rental Car Location,Rental Service,Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Shopping Mall,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Tapas Restaurant,Thai Restaurant,Theater,Train Station,Video Game Store,Water Park,Women's Store
0,Barcladine,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Blackall,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Blackwater,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0
3,Charters Towers,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chinchilla,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Clermont,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Clonclurry,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Collinsville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Dalby,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Dysart,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [182]:
#Let's print each neighborhood along with the top 5 most common venues

num_top_venues = 5

for hood in tr_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = tr_grouped[tr_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barcladine----
               venue  freq
0            Airport  0.33
1         Campground  0.33
2  Food & Drink Shop  0.33
3              Motel  0.00
4        Pizza Place  0.00


----Blackall----
         venue  freq
0      Airport  0.33
1   Campground  0.33
2       Bistro  0.33
3          Pub  0.00
4  Pizza Place  0.00


----Blackwater----
                venue  freq
0               Motel  0.17
1                 Gym  0.17
2                Café  0.17
3  Athletics & Sports  0.17
4     Thai Restaurant  0.17


----Charters Towers----
                  venue  freq
0  Fast Food Restaurant  0.25
1                  Café  0.25
2                Resort  0.25
3           Gas Station  0.25
4               Airport  0.00


----Chinchilla----
                  venue  freq
0               Airport   0.2
1                 Motel   0.2
2         Grocery Store   0.2
3                Bakery   0.2
4  Fast Food Restaurant   0.2


----Clermont----
           venue  freq
0         Hostel  0.14
1           P

In [183]:
#put all that in a DF - sort by popularity

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [184]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = tr_grouped['Neighborhood']

for ind in np.arange(tr_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tr_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(15)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barcladine,Airport,Food & Drink Shop,Campground,Airport Terminal,Fast Food Restaurant,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store,Discount Store
1,Blackall,Airport,Bistro,Campground,Airport Terminal,Fast Food Restaurant,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store,Discount Store
2,Blackwater,Café,Thai Restaurant,Gym,Motel,Athletics & Sports,Fast Food Restaurant,Garden,Department Store,Golf Course,Concert Hall
3,Charters Towers,Café,Gas Station,Resort,Fast Food Restaurant,Discount Store,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
4,Chinchilla,Airport,Bakery,Fast Food Restaurant,Motel,Grocery Store,Art Gallery,Concert Hall,Convenience Store,Cricket Ground,Department Store
5,Clermont,Grocery Store,Golf Course,Gas Station,Café,Hostel,Brewery,Park,Discount Store,Concert Hall,Construction & Landscaping
6,Clonclurry,Airport Terminal,Supermarket,Motel,Bakery,Women's Store,Electronics Store,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
7,Collinsville,Grocery Store,Pub,Motel,Campground,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store
8,Dalby,Fast Food Restaurant,Gym / Fitness Center,Rental Car Location,Motel,Construction & Landscaping,Café,Arts & Crafts Store,Concert Hall,Convenience Store,Cricket Ground
9,Dysart,Pizza Place,Rental Car Location,Motel,Women's Store,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground


## Cluster the neighborhoods ##

In [198]:
# set number of clusters
kclusters = 4

tr_grouped_clustering1 = tr_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(tr_grouped_clustering1)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 1, 2, 3, 2, 2, 1, 2], dtype=int32)

In [200]:
#Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels3', kmeans.labels_)

tr_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
tr_merged = tr_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Town')

tr_merged.head(15) # check the last columns!


Unnamed: 0,Town,Latitude,Longitude,Population,Cluster Labels3,Cluster Labels2,Cluster Labels1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Toowoomba,-27.56056,151.95386,96567.0,1.0,0.0,2.0,Fast Food Restaurant,Café,Shopping Mall,Grocery Store,Sandwich Place,Supermarket,Department Store,Pizza Place,Gas Station,Pub
1,Chinchilla,-26.7383,150.6217,4780.0,2.0,0.0,0.0,Airport,Bakery,Fast Food Restaurant,Motel,Grocery Store,Art Gallery,Concert Hall,Convenience Store,Cricket Ground,Department Store
2,Miles,-26.6583,150.1872,1169.0,3.0,0.0,2.0,Grocery Store,Café,Hotel,Bar,Park,Eastern European Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
3,Roma,-26.5694,148.7838,6905.0,1.0,0.0,2.0,Airport,Supermarket,Gas Station,Fast Food Restaurant,Bakery,Grocery Store,Arts & Crafts Store,Art Gallery,Construction & Landscaping,Convenience Store
4,Dalby,-27.1944,151.266,10861.0,1.0,0.0,2.0,Fast Food Restaurant,Gym / Fitness Center,Rental Car Location,Motel,Construction & Landscaping,Café,Arts & Crafts Store,Concert Hall,Convenience Store,Cricket Ground
5,Gladstone,-23.8416,151.2498,32073.0,1.0,0.0,2.0,Harbor / Marina,Grocery Store,Fast Food Restaurant,Sandwich Place,Furniture / Home Store,Multiplex,Hotel,Pizza Place,Pub,Rental Car Location
6,Rockhampton,-23.3791,150.51,61724.0,1.0,0.0,2.0,Fast Food Restaurant,Café,Supermarket,Coffee Shop,Pizza Place,Airport,Restaurant,Juice Bar,Multiplex,Hotel
7,Mackay,-21.1425,149.1821,74219.0,1.0,0.0,2.0,Fast Food Restaurant,Grocery Store,Coffee Shop,Shopping Mall,Multiplex,Sandwich Place,Gym / Fitness Center,Café,Mexican Restaurant,Concert Hall
8,Moranbah,-22.0028,148.0579,8626.0,1.0,0.0,2.0,Shopping Mall,Sandwich Place,Supermarket,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
9,Dysart,-22.5881,148.3486,3003.0,2.0,1.0,4.0,Pizza Place,Rental Car Location,Motel,Women's Store,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground


In [201]:
#there are Nan in the Clusters - drop them using Cluster Labels as a proxy
tr_merged = tr_merged.dropna(subset=['Cluster Labels3'])

In [202]:
tr_merged.head()

Unnamed: 0,Town,Latitude,Longitude,Population,Cluster Labels3,Cluster Labels2,Cluster Labels1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Toowoomba,-27.56056,151.95386,96567.0,1.0,0.0,2.0,Fast Food Restaurant,Café,Shopping Mall,Grocery Store,Sandwich Place,Supermarket,Department Store,Pizza Place,Gas Station,Pub
1,Chinchilla,-26.7383,150.6217,4780.0,2.0,0.0,0.0,Airport,Bakery,Fast Food Restaurant,Motel,Grocery Store,Art Gallery,Concert Hall,Convenience Store,Cricket Ground,Department Store
2,Miles,-26.6583,150.1872,1169.0,3.0,0.0,2.0,Grocery Store,Café,Hotel,Bar,Park,Eastern European Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
3,Roma,-26.5694,148.7838,6905.0,1.0,0.0,2.0,Airport,Supermarket,Gas Station,Fast Food Restaurant,Bakery,Grocery Store,Arts & Crafts Store,Art Gallery,Construction & Landscaping,Convenience Store
4,Dalby,-27.1944,151.266,10861.0,1.0,0.0,2.0,Fast Food Restaurant,Gym / Fitness Center,Rental Car Location,Motel,Construction & Landscaping,Café,Arts & Crafts Store,Concert Hall,Convenience Store,Cricket Ground


In [203]:
#Explore the clusters using a map
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tr_merged['Latitude'], tr_merged['Longitude'], tr_merged['Town'], tr_merged['Cluster Labels3']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup= label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examine the clusters

Cluster 1

In [204]:
tr_merged.loc[tr_merged['Cluster Labels3'] == 0, tr_merged.columns[[1] + list(range(5, tr_merged.shape[1]))]]

Unnamed: 0,Latitude,Cluster Labels2,Cluster Labels1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,-23.5563,2.0,3.0,Airport,Food & Drink Shop,Campground,Airport Terminal,Fast Food Restaurant,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store,Discount Store
15,-24.4167,2.0,3.0,Airport,Bistro,Campground,Airport Terminal,Fast Food Restaurant,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store,Discount Store
22,-26.6167,2.0,3.0,Airport,Pub,Food & Drink Shop,River,Discount Store,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground


Cluster 2

In [205]:
tr_merged.loc[tr_merged['Cluster Labels3'] == 1, tr_merged.columns[[1] + list(range(5, tr_merged.shape[1]))]]

Unnamed: 0,Latitude,Cluster Labels2,Cluster Labels1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,-27.56056,0.0,2.0,Fast Food Restaurant,Café,Shopping Mall,Grocery Store,Sandwich Place,Supermarket,Department Store,Pizza Place,Gas Station,Pub
3,-26.5694,0.0,2.0,Airport,Supermarket,Gas Station,Fast Food Restaurant,Bakery,Grocery Store,Arts & Crafts Store,Art Gallery,Construction & Landscaping,Convenience Store
4,-27.1944,0.0,2.0,Fast Food Restaurant,Gym / Fitness Center,Rental Car Location,Motel,Construction & Landscaping,Café,Arts & Crafts Store,Concert Hall,Convenience Store,Cricket Ground
5,-23.8416,0.0,2.0,Harbor / Marina,Grocery Store,Fast Food Restaurant,Sandwich Place,Furniture / Home Store,Multiplex,Hotel,Pizza Place,Pub,Rental Car Location
6,-23.3791,0.0,2.0,Fast Food Restaurant,Café,Supermarket,Coffee Shop,Pizza Place,Airport,Restaurant,Juice Bar,Multiplex,Hotel
7,-21.1425,0.0,2.0,Fast Food Restaurant,Grocery Store,Coffee Shop,Shopping Mall,Multiplex,Sandwich Place,Gym / Fitness Center,Café,Mexican Restaurant,Concert Hall
8,-22.0028,0.0,2.0,Shopping Mall,Sandwich Place,Supermarket,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
10,-23.5273,0.0,2.0,Fast Food Restaurant,Airport,Supermarket,Furniture / Home Store,Italian Restaurant,Discount Store,Shopping Mall,Gas Station,Video Game Store,Train Station
11,-23.5792,0.0,2.0,Café,Thai Restaurant,Gym,Motel,Athletics & Sports,Fast Food Restaurant,Garden,Department Store,Golf Course,Concert Hall
13,-23.5273,0.0,2.0,Fast Food Restaurant,Airport,Supermarket,Furniture / Home Store,Italian Restaurant,Discount Store,Shopping Mall,Gas Station,Video Game Store,Train Station


Cluster 3

In [206]:
tr_merged.loc[tr_merged['Cluster Labels3'] == 2, tr_merged.columns[[1] + list(range(5, tr_merged.shape[1]))]]

Unnamed: 0,Latitude,Cluster Labels2,Cluster Labels1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,-26.7383,0.0,0.0,Airport,Bakery,Fast Food Restaurant,Motel,Grocery Store,Art Gallery,Concert Hall,Convenience Store,Cricket Ground,Department Store
9,-22.5881,1.0,4.0,Pizza Place,Rental Car Location,Motel,Women's Store,Chinese Restaurant,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
16,-12.6493,0.0,0.0,Campground,Motel,Australian Restaurant,Bakery,Fish & Chips Shop,Eastern European Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
18,-22.3913,0.0,0.0,Motel,Campground,Art Gallery,Hotel,Hotel Bar,Bakery,Electronics Store,Construction & Landscaping,Convenience Store,Cricket Ground
19,-20.7069,0.0,0.0,Airport Terminal,Supermarket,Motel,Bakery,Women's Store,Electronics Store,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
23,-22.8197,1.0,1.0,Motel,Scenic Lookout,Women's Store,Campground,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store
24,-23.6436,0.0,0.0,Grocery Store,Hotel Bar,RV Park,Bakery,Campground,Garden,Furniture / Home Store,Coffee Shop,Concert Hall,Golf Course
25,-20.55,0.0,2.0,Grocery Store,Pub,Motel,Campground,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Department Store
27,-23.4405,0.0,0.0,Airport,Bakery,Grocery Store,Museum,RV Park,Campground,History Museum,Furniture / Home Store,Cricket Ground,Gas Station


Cluster 4

In [207]:
tr_merged.loc[tr_merged['Cluster Labels3'] == 3, tr_merged.columns[[1] + list(range(5, tr_merged.shape[1]))]]

Unnamed: 0,Latitude,Cluster Labels2,Cluster Labels1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,-26.6583,0.0,2.0,Grocery Store,Café,Hotel,Bar,Park,Eastern European Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground
12,-22.8319,0.0,2.0,Grocery Store,Golf Course,Gas Station,Café,Hostel,Brewery,Park,Discount Store,Concert Hall,Construction & Landscaping
21,-20.405,0.0,2.0,Women's Store,Train Station,Coffee Shop,Grocery Store,Airport Terminal,Electronics Store,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground


Thanky you