# Location Based Market Analysis to identify Optimal Business Location in Colombo

The notebook will include the following processes:

1. Scrape data related to Colombo suburbs and perform data pre-processing
2. Obtain location coordinates of the suburbs in Colombo using GeoPy
3. Obtain venue and other location related data using Foursquare API
3. Analyze and cluster neighborhoods in Colombo based on target market

## 1. Scrape data related to Colombo suburbs and perform data pre-processing

Import the required libraries

In [106]:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
import folium
import sweetviz as sv
from IPython.display import IFrame
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

Obtain the data from the html webpage

In [107]:
webpage = 'https://en.wikipedia.org/wiki/Postal_codes_in_Sri_Lanka'
data = requests.get(webpage).text
soup = BeautifulSoup(data, 'html.parser')

In [108]:
column_names = ['Province', 'District', 'City', 'Postal Code']
df_colombo = pd.DataFrame(columns=column_names)

for a in soup.find('table',{"class":"wikitable sortable"}).find_all('tr'):
    td = a.find_all('td')
    if(len(td) > 0):
        province = td[0].text.replace('\n', '')
        district = td[1].text.replace('\n', '')
        city = td[2].text.replace('\n', '')
        postal_code = td[3].text.replace('\n', '')

        df_colombo = df_colombo.append({'Province': province, 'District': district, 'City': city, 'Postal Code': postal_code}
        , ignore_index = True)
df_colombo.head()

Unnamed: 0,Province,District,City,Postal Code
0,Eastern,Ampara,Ampara,32000
1,Eastern,Kattankudy,Kattankudy,30100
2,Eastern,Batticaloa,Batticaloa,30000
3,North Central,Anuradhapura,Anuradhapura,50000
4,Uva,Badulla,Badulla,90001


## Perform data pre-processing

Remove data related to other districts except Colombo

In [109]:
df_colombo = df_colombo[df_colombo.District == 'Colombo'].reset_index(drop = True)
df_colombo.drop('Province', axis = 1, inplace = True)
df_colombo.head()

Unnamed: 0,District,City,Postal Code
0,Colombo,Battaramulla,10120
1,Colombo,Bambalapitiya,400
2,Colombo,Wellawatte,600
3,Colombo,Colpetty,300
4,Colombo,Narahenpita,500


Rename city names which has been recently changed

In [110]:
df_colombo['City'].replace({"Colpetty": "Kollupitiya", 'Hultsdorf': 'Aluthkade East'}, inplace = True)

## 2. Obtain location coordinates of the suburbs in Colombo using GeoPy

Obtain the geographical coordinates of the suburbs using the GeoPy package and merge with the existing dataframe

In [111]:
df_coordinates = pd.DataFrame(columns = ['District', 'City', 'Latitude', 'Longitude'])
for city, district in zip(df_colombo['City'], df_colombo['District']):
    address = city + ', ' + district
    geolocator = Nominatim(user_agent="tor_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    df_coordinates = df_coordinates.append({'District': district, 'City': city, 'Latitude': latitude, 'Longitude': longitude},                  ignore_index = True)
df_coordinates.head()

Unnamed: 0,District,City,Latitude,Longitude
0,Colombo,Battaramulla,6.902181,79.919578
1,Colombo,Bambalapitiya,6.902486,79.854597
2,Colombo,Wellawatte,6.874384,79.859118
3,Colombo,Kollupitiya,6.913526,79.850813
4,Colombo,Narahenpita,6.905727,79.88213


Create a map of Colombo with neighborhoods superimposed on top

In [112]:
address = 'Colombo, Srilanka'
geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Colombo are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Colombo are 6.9349969, 79.8538463.


In [113]:
map_colombo = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, district, city in zip(df_coordinates['Latitude'], df_coordinates['Longitude'], df_coordinates['District'], df_coordinates['City']):
    label = '{}, {}'.format(city, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_colombo)
map_colombo

## 3. Obtain venue and other location related data using Foursquare API

Initialize the Foursquare API credentials and obtain the venue data

In [114]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NTA5QWSTGJJIEVUXRKPKK2QNSN5T14RGDE22PKCCX001QOMZ
CLIENT_SECRET:YEJ3DJ1ACR02GFFGHY1ZUDITL5W4A5WR2BDPFB1NLB4KILAZ


In [115]:
LIMIT = 100
radius = 500

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [116]:
colombo_venues = getNearbyVenues(names=df_coordinates['City'],
                                   latitudes=df_coordinates['Latitude'],
                                   longitudes=df_coordinates['Longitude']
                                  )

In [117]:
print('Venue dataset shape: ', colombo_venues.shape)
colombo_venues.head()

Venue dataset shape:  (388, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Battaramulla,6.902181,79.919578,Klassy,6.900579,79.920607,Bakery
1,Battaramulla,6.902181,79.919578,Arpico Super Center,6.902215,79.917139,Department Store
2,Battaramulla,6.902181,79.919578,Il Gelato Pelawatte,6.899955,79.921299,Food
3,Battaramulla,6.902181,79.919578,Dinemore,6.899238,79.922145,Fast Food Restaurant
4,Battaramulla,6.902181,79.919578,Pillawoos,6.902067,79.918643,Asian Restaurant


## 4. Analyze and cluster neighborhoods in Colombo based on target market
Perform exploratory data analysis of the obtained dataset

Check the number of venues for each neighborhood and the number of unique categories from all the returned venues

In [118]:
colombo_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aluthkade East,6,6,6,6,6,6
Bambalapitiya,40,40,40,40,40,40
Battaramulla,11,11,11,11,11,11
Borella,23,23,23,23,23,23
Cinnamon Gardens,49,49,49,49,49,49
Dehiwala,17,17,17,17,17,17
Dematagoda,6,6,6,6,6,6
Fort,18,18,18,18,18,18
Grandpass,4,4,4,4,4,4
Kollupitiya,43,43,43,43,43,43


Visualize the dataset using Sweetviz and analyze the frequency of the venue in each neighborhood

In [119]:
venue_report = sv.analyze(colombo_venues)
venue_report.show_html('Venue.html')
IFrame(src='Venue.html', width=1800, height=600)

:FEATURES DONE:                    |                     | [  0%]   00:04  -> (00:00 left)
:PAIRWISE DONE:                    |█████████████████████| [100%]   00:00  -> (00:00 left)
Creating Associations graph...DONE!


Here we are able to see the distrubution of the venues among the neighborhoods in Colombo, where Cinnamon gardens have the highest venue data followed by Kollupitiya and Bambalapitiya.

Also we are able to analyze the types of venues in the dataset with bakery, clothing stores, cafes and asian restaurants being the most common types of venues in most neighborhoods

In [120]:
print('There are {} uniques categories.'.format(len(colombo_venues['Venue Category'].unique())))

There are 107 uniques categories.


Analyze each neighborhood

In [121]:
colombo_onehot = pd.get_dummies(colombo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
colombo_onehot['Neighborhood'] = colombo_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [colombo_onehot.columns[-1]] + list(colombo_onehot.columns[:-1])
colombo_onehot = colombo_onehot[fixed_columns]

print(colombo_onehot.shape)
colombo_onehot.head()

(388, 108)


Unnamed: 0,Neighborhood,Arcade,Art Gallery,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Badminton Court,Bakery,Bank,Bar,...,Sri Lankan Restaurant,Supermarket,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Women's Store
0,Battaramulla,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Battaramulla,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Battaramulla,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Battaramulla,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Battaramulla,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [122]:
colombo_group = colombo_onehot.groupby('Neighborhood').mean().reset_index()
colombo_group

Unnamed: 0,Neighborhood,Arcade,Art Gallery,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Badminton Court,Bakery,Bank,Bar,...,Sri Lankan Restaurant,Supermarket,Sushi Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Women's Store
0,Aluthkade East,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0
1,Bambalapitiya,0.0,0.0,0.0,0.0,0.0,0.0,0.075,0.0,0.025,...,0.025,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0
2,Battaramulla,0.0,0.0,0.0,0.090909,0.0,0.0,0.454545,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Borella,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,...,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.043478
4,Cinnamon Gardens,0.0,0.040816,0.0,0.0,0.020408,0.020408,0.0,0.0,0.040816,...,0.020408,0.0,0.0,0.020408,0.0,0.020408,0.040816,0.0,0.0,0.020408
5,Dehiwala,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824
6,Dematagoda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Fort,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.055556
8,Grandpass,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
9,Kollupitiya,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,...,0.069767,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.023256


Obtain the top 5 most common venues in each neighborhood

In [123]:
num_top_venues = 5

for hood in colombo_group['Neighborhood']:
    print("----"+hood+"----")
    temp = colombo_group[colombo_group['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aluthkade East----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.17
1               Asian Restaurant  0.17
2                  Hot Dog Joint  0.17
3              Indian Restaurant  0.17
4         Furniture / Home Store  0.17


----Bambalapitiya----
                venue  freq
0               Hotel  0.10
1        Dessert Shop  0.08
2  Chinese Restaurant  0.08
3              Bakery  0.08
4         Coffee Shop  0.05


----Battaramulla----
               venue  freq
0             Bakery  0.45
1   Department Store  0.09
2   Asian Restaurant  0.09
3     Clothing Store  0.09
4  Electronics Store  0.09


----Borella----
            venue  freq
0  Clothing Store  0.13
1  Cosmetics Shop  0.09
2      Restaurant  0.09
3             Spa  0.04
4           Hotel  0.04


----Cinnamon Gardens----
         venue  freq
0         Café  0.12
1  Coffee Shop  0.08
2  Art Gallery  0.04
3      Theater  0.04
4          Gym  0.04


----Dehiwala----
            venue  freq
0  Cosme

### Create a dataframe to display the top 10 venues for each neighborhood

In [124]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [125]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
colombo_venues_new = pd.DataFrame(columns=columns)
colombo_venues_new['Neighborhood'] = colombo_group['Neighborhood']

for ind in np.arange(colombo_group.shape[0]):
    colombo_venues_new.iloc[ind, 1:] = return_most_common_venues(colombo_group.iloc[ind, :], num_top_venues)

colombo_venues_new.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aluthkade East,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Hot Dog Joint,Indian Restaurant,Furniture / Home Store,Women's Store,Cosmetics Shop,Cricket Ground,Department Store
1,Bambalapitiya,Hotel,Bakery,Chinese Restaurant,Dessert Shop,Coffee Shop,Restaurant,Thai Restaurant,Lingerie Store,Jewelry Store,Clothing Store
2,Battaramulla,Bakery,Fast Food Restaurant,Clothing Store,Asian Restaurant,Department Store,Food,Electronics Store,Women's Store,Cosmetics Shop,Cricket Ground
3,Borella,Clothing Store,Cosmetics Shop,Restaurant,Women's Store,Hotel,Convenience Store,Office,Mediterranean Restaurant,Pizza Place,Donut Shop
4,Cinnamon Gardens,Café,Coffee Shop,Art Gallery,Theater,Pub,Gym,Nightclub,Bar,Women's Store,Cocktail Bar


### Cluster and analyze the Toronto neighborhoods using K-means

The neighborhoods will be segmented into 3 different clusters

In [126]:
clustersize = 3

colombo_cluster = colombo_group.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=clustersize, random_state=0).fit(colombo_cluster)

kmeans.labels_[0:10]

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])

In [127]:
colombo_venues_new.insert(0, 'Cluster Labels', kmeans.labels_)

colombo_cluster_final = df_coordinates

colombo_cluster_final = colombo_cluster_final.join(colombo_venues_new.set_index('Neighborhood'), on='City')

colombo_cluster_final.head()

Unnamed: 0,District,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Colombo,Battaramulla,6.902181,79.919578,1.0,Bakery,Fast Food Restaurant,Clothing Store,Asian Restaurant,Department Store,Food,Electronics Store,Women's Store,Cosmetics Shop,Cricket Ground
1,Colombo,Bambalapitiya,6.902486,79.854597,0.0,Hotel,Bakery,Chinese Restaurant,Dessert Shop,Coffee Shop,Restaurant,Thai Restaurant,Lingerie Store,Jewelry Store,Clothing Store
2,Colombo,Wellawatte,6.874384,79.859118,0.0,Hotel,Fast Food Restaurant,Clothing Store,Seafood Restaurant,Asian Restaurant,Indian Restaurant,Café,Women's Store,Vegetarian / Vegan Restaurant,Ice Cream Shop
3,Colombo,Kollupitiya,6.913526,79.850813,0.0,Sri Lankan Restaurant,Coffee Shop,Pub,Fast Food Restaurant,Hotel,Food Court,Shopping Mall,Clothing Store,Movie Theater,Pizza Place
4,Colombo,Narahenpita,6.905727,79.88213,2.0,Golf Course,IT Services,Gift Shop,Women's Store,Fast Food Restaurant,Convenience Store,Cosmetics Shop,Cricket Ground,Department Store,Dessert Shop


### Visualize the resulting clusters generated by K-means

In [129]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(clustersize)
ys = [i + x + (i*x)**2 for i in range(clustersize)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(colombo_cluster_final['Latitude'], colombo_cluster_final['Longitude'], colombo_cluster_final['City'], colombo_cluster_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    cluster = np.nan_to_num(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine the generated clusters

The generated clusters can be analyzed and segmented based on the most common venues types

### Cluster 1

In [130]:
colombo_cluster_final.loc[colombo_cluster_final['Cluster Labels'] == 0, colombo_cluster_final.columns[[1] + list(range(5, colombo_cluster_final.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Bambalapitiya,Hotel,Bakery,Chinese Restaurant,Dessert Shop,Coffee Shop,Restaurant,Thai Restaurant,Lingerie Store,Jewelry Store,Clothing Store
2,Wellawatte,Hotel,Fast Food Restaurant,Clothing Store,Seafood Restaurant,Asian Restaurant,Indian Restaurant,Café,Women's Store,Vegetarian / Vegan Restaurant,Ice Cream Shop
3,Kollupitiya,Sri Lankan Restaurant,Coffee Shop,Pub,Fast Food Restaurant,Hotel,Food Court,Shopping Mall,Clothing Store,Movie Theater,Pizza Place
5,Borella,Clothing Store,Cosmetics Shop,Restaurant,Women's Store,Hotel,Convenience Store,Office,Mediterranean Restaurant,Pizza Place,Donut Shop
6,Cinnamon Gardens,Café,Coffee Shop,Art Gallery,Theater,Pub,Gym,Nightclub,Bar,Women's Store,Cocktail Bar
7,Dematagoda,Café,Department Store,Pool Hall,Supermarket,Bus Stop,Women's Store,Eye Doctor,Convenience Store,Cosmetics Shop,Cricket Ground
8,Fort,Platform,Market,Women's Store,Seafood Restaurant,Asian Restaurant,Bookstore,Casino,Department Store,Vegetarian / Vegan Restaurant,History Museum
9,Aluthkade East,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Hot Dog Joint,Indian Restaurant,Furniture / Home Store,Women's Store,Cosmetics Shop,Cricket Ground,Department Store
11,Maradana,Bus Station,Restaurant,Convenience Store,Asian Restaurant,Bookstore,Pool,Juice Bar,Diner,Movie Theater,Soccer Field
12,Grandpass,Gym / Fitness Center,Cricket Ground,Tea Room,Athletics & Sports,Flea Market,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Diner


### Cluster 2

In [131]:
colombo_cluster_final.loc[colombo_cluster_final['Cluster Labels'] == 1, colombo_cluster_final.columns[[1] + list(range(5, colombo_cluster_final.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battaramulla,Bakery,Fast Food Restaurant,Clothing Store,Asian Restaurant,Department Store,Food,Electronics Store,Women's Store,Cosmetics Shop,Cricket Ground
10,Kotahena,Supermarket,Asian Restaurant,Pizza Place,Multiplex,Bakery,Indian Restaurant,IT Services,Dessert Shop,Donut Shop,Dive Spot
17,Rajagiriya,Asian Restaurant,Pizza Place,Shopping Mall,Chinese Restaurant,Bakery,Italian Restaurant,Pharmacy,Café,Bus Station,Seafood Restaurant
20,Maharagama,Bus Station,Bakery,Supermarket,Women's Store,Movie Theater,Bookstore,Boutique,Burger Joint,Chinese Restaurant,Electronics Store
22,Mount Lavinia,Bakery,Coffee Shop,Indian Restaurant,Gym,Eye Doctor,Pakistani Restaurant,Fast Food Restaurant,Pizza Place,Supermarket,Mediterranean Restaurant


### Cluster 3

In [132]:
colombo_cluster_final.loc[colombo_cluster_final['Cluster Labels'] == 2, colombo_cluster_final.columns[[1] + list(range(5, colombo_cluster_final.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Narahenpita,Golf Course,IT Services,Gift Shop,Women's Store,Fast Food Restaurant,Convenience Store,Cosmetics Shop,Cricket Ground,Department Store,Dessert Shop


### Cluster Segmentation
The following observations were identified upon examaning the clusters:

1. The first cluster consists of the consumer target market with almost all businesses targeted at residential consumers, with the        most common venues being resturants, coffee shops, supermarkets, consumer stores and other recretional venues for customers. 
   This cluster group is best suited for businesses looking to sell their end product or services to everyday consumers.


2. The second cluster consists of a combination of consumer market and business market with resturants, supermarts and also IT companies, electronic stores and other service companies, with a mix of businesses which cater to consumers and also businesses which cater to business users and industries. These neighborhoods are optimal if the business looks to sell it's products or services to both target markets 



3. The third cluster consists of the business and industrial target market where most of the businesses cater towards other businesses or industries, and have relatively less number of consumer focused businesses compared to the other clusters