# Strategic Location Recommendations for Opening Coffee Shop Business in South Jakarta
Bryan Purba </br>
October 28, 2020

# Capstone Project - The Battle of Neighborhoods (Week 2)
## Applied Data Science Capstone by IBM / Coursera

## Table of contents
* [Introduction: Business Problem](###introduction)
* [Data](###data)
* [Methodology](###methodology)
* [Analysis](###analysis)
* [Results and Discussion](###results)
* [Conclusion](###conclusion)




## Introduction - Business Problem
<p>
South Jakarta is one of municipalities of Jakarta. South Jakarta is the richest municipality compared to other municipalities of Jakarta, with a lot of housing for middle to upper class citizens and a major business center. South Jakarta has 10 districts with a population around 2.296.977 [1].  As you can see, South Jakarta is one of favorite city for starting business. However, what are the best business recommendations in 2020 ?. There are many options, but the best one maybe coffee shop.

According to [nowjakarta.co.id](http://nowjakarta.co.id)'s post The Emerging Business of Coffee Shops in Indonesia [2]. Indonesia's coffee shop business has good prospect in 2020, estimated the market value of coffee shops in Indonesia reach IDR 4,8 trillion per year. Although the momentum started in 2016 where the market size of coffee shops has increased very significantly where Kopi Kenangan, Janji Jiwa, Fore, and Tuku are the brands which considered as pioneer for this momentum. Opening coffee shop in 2020 still has good prospect because there is an online survey being conducted of young generation (generation Y and Z). The survey results , among others, showed that the coffee-to-go shops providing quality RTD Coffee at affordable prices is in high demand by this currently population dominating generation [2].

Therefore, as a resident of Indonesia and the potential of coffee shops business in 2020, i decided to analyze one of six key success factors, it is strategic location. In this project, we will give best distric candidates recommendation for strategic locations problem based on neighboring business that can affect your business both negatively and positively. This recomendation can help stakeholders for further analysis to opening their coffee shop business.
</p>

In [61]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Data Description
Here is how i acquired my dataset,

- First, i found the list of districts of South Jakarta in Wikipedia and manually create the dataset according to the list [1]
- Next, i update my dataset by adding latitude and longitude. For this matter, i used geopy [4] to acquire latitude and longitude of districts in South Jakarta
- After that, i update my dataset by adding nearby venues for each districts. For this matter,  I used Foursquare API [5] to acquired nearby venues for each districts in South Jakarta. 
- Next, to remove side effect of high radius value, i find duplicate venues based on column=['Venue', 'Venue Latitude, 'Venue Longitude'] and remove the row.
- Finally, i acquired dataset which contains 750 venues of all districts of South Jakarta

In [62]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [63]:
list_of_subdistricts = "https://raw.githubusercontent.com/bryanpurba/Coursera_Capstone/master/districs_of_south%20jakarta.csv"
df_south_jakarta = pd.read_csv(list_of_subdistricts)
df_south_jakarta

Unnamed: 0,Districs
0,Cilandak
1,Jagakarsa
2,Kebayoran Lama
3,Kebayoran Baru
4,Mampang Prapatan
5,Pancoran
6,Pasar Minggu
7,Pesanggrahan
8,Setiabudi
9,Tebet


In [64]:
latitude = []
longitude = []
for sd in df_south_jakarta['Districs']:
  address = '{}, South Jakarta'.format(sd)
  geolocator = Nominatim(user_agent="ny_explorer")
  location = geolocator.geocode(address)
  latitude.append(location.latitude)
  longitude.append(location.longitude)
df_south_jakarta['Latitude'] = latitude
df_south_jakarta['Longitude'] = longitude

df_south_jakarta.head()

Unnamed: 0,Districs,Latitude,Longitude
0,Cilandak,-6.283818,106.804863
1,Jagakarsa,-6.330101,106.822237
2,Kebayoran Lama,-6.249128,106.777782
3,Kebayoran Baru,-6.243164,106.79985
4,Mampang Prapatan,-6.250878,106.823021


In [65]:
address = 'South Jakarta, Indonesia'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_south_jakarta = folium.Map(location=[latitude, longitude], zoom_start=12)

In [66]:

# add markers to map
for lat, lng, distric, sub_distric in zip(df_south_jakarta['Latitude'], df_south_jakarta['Longitude'], "South Jakarta", df_south_jakarta['Districs']):
    label = '{}, {}'.format(sub_distric, distric)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=25,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_south_jakarta)  
    
map_south_jakarta

In [67]:
CLIENT_ID = 'PARHPFYPBADPKA43AY2YGQN5L5GBMYQDY2YIWFB0AUTRS3SP' # your Foursquare ID

CLIENT_SECRET = 'B4ZL1LN2V13SMT4BK4VVGAXPXRBPD31TT0UHGS1KFWO0GHDT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PARHPFYPBADPKA43AY2YGQN5L5GBMYQDY2YIWFB0AUTRS3SP
CLIENT_SECRET:B4ZL1LN2V13SMT4BK4VVGAXPXRBPD31TT0UHGS1KFWO0GHDT


In [68]:
def getNearbyVenues(names, latitudes, longitudes, radius=1200):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        LIMIT = 300
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Districs', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [69]:
south_jakarta_venues = getNearbyVenues(names=df_south_jakarta['Districs'],
                                   latitudes=df_south_jakarta['Latitude'],
                                   longitudes=df_south_jakarta['Longitude']
                                  )
south_jakarta_venues.to_csv(r'south_jakarta_venues.csv')

Cilandak
Jagakarsa
Kebayoran Lama
Kebayoran Baru
Mampang Prapatan
Pancoran
Pasar Minggu
Pesanggrahan
Setiabudi
Tebet


In [70]:
south_jakarta_venues.head()

Unnamed: 0,Districs,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cilandak,-6.283818,106.804863,Zap Permanent Hair Removal,-6.283846,106.807332,Health & Beauty Service
1,Cilandak,-6.283818,106.804863,Twin House Noodles & Beyond,-6.278725,106.804894,Noodle House
2,Cilandak,-6.283818,106.804863,Bulaf Cafe,-6.287187,106.801288,Café
3,Cilandak,-6.283818,106.804863,Sophie Authentique,-6.277665,106.801904,French Restaurant
4,Cilandak,-6.283818,106.804863,Apotek Aji Waras,-6.278053,106.806364,Pharmacy


In [71]:
df_duplicated = south_jakarta_venues[south_jakarta_venues.duplicated(subset=['Venue', 'Venue Latitude', 'Venue Longitude'], keep=False)]

In [72]:
venues_south_jakarta = south_jakarta_venues.drop(south_jakarta_venues.index[[532]]).reset_index(drop=True)
venues_south_jakarta.to_csv(r'venues_south_jakarta.csv')

## Methodolgy
In this project, we will analyze venues around distric center and clustering the venues for each districs. We will focus on identifying neigboring venues and categorize it as positive venues and negative venues. For simplicity, we assume negative venues is another coffee shops and positive venues is other venues. 

First step, we have collected data of nearby venues around 1200 meter from center of districs. To understand the data more, we use descriptive statistics and findout the number of venues by Venue Category.

Second step, we assume that negative venues is coffee shops and positive venues is other venues. By categorizing these venues, we obtained visualization map that tells good candidate districs for opening coffee shop business from the number of negative venues.

Third step, we focus to analyze positive venues and create cluster of most common venues for each districs. We will find out characteristics for each cluster. These cluster will be usefull to define characteristics of districts

Finally, i will combine all information, that is positive venues, negative venues, and characteristics for each districts. we will present visualization map from all information, and finally give list of recommended districts.

In [73]:
venues_south_jakarta.groupby('Districs').nunique()[['Venue', 'Venue Category']]

Unnamed: 0_level_0,Venue,Venue Category
Districs,Unnamed: 1_level_1,Unnamed: 2_level_1
Cilandak,95,51
Jagakarsa,23,18
Kebayoran Baru,96,47
Kebayoran Lama,99,57
Mampang Prapatan,84,46
Pancoran,62,36
Pasar Minggu,38,25
Pesanggrahan,31,21
Setiabudi,92,52
Tebet,99,46


In [74]:
coffee_shops = venues_south_jakarta[venues_south_jakarta['Venue Category'] == 'Coffee Shop']
coffee_shops.groupby('Districs').nunique()[['Venue', 'Venue Category']]

Unnamed: 0_level_0,Venue,Venue Category
Districs,Unnamed: 1_level_1,Unnamed: 2_level_1
Cilandak,13,1
Jagakarsa,1,1
Kebayoran Baru,10,1
Kebayoran Lama,6,1
Mampang Prapatan,6,1
Pancoran,7,1
Pasar Minggu,2,1
Pesanggrahan,1,1
Setiabudi,10,1
Tebet,7,1


In [75]:
positive_venues = venues_south_jakarta[venues_south_jakarta['Venue Category'] != 'Coffee Shop']
positive_venues.reset_index(drop=True).groupby('Districs').nunique()[['Venue', 'Venue Category']]

Unnamed: 0_level_0,Venue,Venue Category
Districs,Unnamed: 1_level_1,Unnamed: 2_level_1
Cilandak,82,50
Jagakarsa,22,17
Kebayoran Baru,86,46
Kebayoran Lama,93,56
Mampang Prapatan,78,45
Pancoran,55,35
Pasar Minggu,36,24
Pesanggrahan,30,20
Setiabudi,82,51
Tebet,92,45


### Visualize Negative venues for each districts from South Jakarta map

In [76]:
for venue, lat, lng in zip(coffee_shops['Venue'], coffee_shops['Venue Latitude'], coffee_shops['Venue Longitude']):
  label = '{}'.format(venue)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
  [lat, lng],
  radius=5,
  popup=label,
  color='yellow',
  fill=True,
  fill_color='#3186cc',
  fill_opacity=0.7,
  parse_html=False).add_to(map_south_jakarta)
map_south_jakarta

Good, from this visualization, we know that Jagakarsa, Pesanggrahan, and Pasar Minggu have fewer negative venues than the others. 

In [77]:
for venue, lat, lng in zip(positive_venues['Venue'], positive_venues['Venue Latitude'], positive_venues['Venue Longitude']):
  label = '{}'.format(venue)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
  [lat, lng],
  radius=5,
  popup=label,
  color='green',
  fill=True,
  fill_color='#3186cc',
  fill_opacity=0.7,
  parse_html=False).add_to(map_south_jakarta)
map_south_jakarta

### Finding most common venues for each districs

In [78]:
# one hot encoding
positive_onehot = pd.get_dummies(positive_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
positive_onehot['Districs'] = positive_venues['Districs'] 

# move neighborhood column to the first column
fixed_columns = [positive_onehot.columns[-1]] + list(positive_onehot.columns[:-1])
positive_onehot = positive_onehot[fixed_columns]

positive_onehot.head()

Unnamed: 0,Districs,Acehnese Restaurant,Airport,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Balinese Restaurant,Bar,Basketball Court,Basketball Stadium,Bistro,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Bus Line,Café,Camera Store,Campground,Capitol Building,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Comfort Food Restaurant,Concert Hall,Convenience Store,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,General Entertainment,German Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,High School,Hobby Shop,Hookah Bar,Hotel,Hotel Bar,Housing Development,Ice Cream Shop,Indian Restaurant,Indonesian Meatball Place,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Lounge,Malay Restaurant,Manadonese Restaurant,Market,Massage Studio,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Music Store,Music Venue,Nightclub,Noodle House,Other Great Outdoors,Padangnese Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Service,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pizza Place,Pool,Pool Hall,Pub,Radio Station,Ramen Restaurant,Record Shop,Recording Studio,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Satay Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Steakhouse,Street Food Gathering,Sundanese Restaurant,Supermarket,Sushi Restaurant,Tailor Shop,Tech Startup,Thai Restaurant,Toy / Game Store,Train Station,Turkish Restaurant,Udon Restaurant,University,Vape Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Women's Store
0,Cilandak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Cilandak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Cilandak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Cilandak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Cilandak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [79]:
positive_grouped = positive_onehot.groupby('Districs').mean().reset_index()
positive_grouped

Unnamed: 0,Districs,Acehnese Restaurant,Airport,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Balinese Restaurant,Bar,Basketball Court,Basketball Stadium,Bistro,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Bus Line,Café,Camera Store,Campground,Capitol Building,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Comfort Food Restaurant,Concert Hall,Convenience Store,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,General Entertainment,German Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,High School,Hobby Shop,Hookah Bar,Hotel,Hotel Bar,Housing Development,Ice Cream Shop,Indian Restaurant,Indonesian Meatball Place,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Lounge,Malay Restaurant,Manadonese Restaurant,Market,Massage Studio,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Music Store,Music Venue,Nightclub,Noodle House,Other Great Outdoors,Padangnese Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Service,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pizza Place,Pool,Pool Hall,Pub,Radio Station,Ramen Restaurant,Record Shop,Recording Studio,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Satay Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Steakhouse,Street Food Gathering,Sundanese Restaurant,Supermarket,Sushi Restaurant,Tailor Shop,Tech Startup,Thai Restaurant,Toy / Game Store,Train Station,Turkish Restaurant,Udon Restaurant,University,Vape Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Women's Store
0,Cilandak,0.0,0.0,0.0,0.0,0.0,0.0,0.048193,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.024096,0.0,0.0,0.0,0.012048,0.0,0.072289,0.0,0.0,0.0,0.0,0.024096,0.012048,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.012048,0.036145,0.012048,0.0,0.0,0.012048,0.0,0.036145,0.048193,0.0,0.0,0.012048,0.0,0.0,0.0,0.024096,0.0,0.012048,0.024096,0.012048,0.012048,0.012048,0.0,0.012048,0.012048,0.0,0.0,0.0,0.024096,0.012048,0.0,0.048193,0.0,0.024096,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.024096,0.0,0.036145,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.036145,0.0,0.012048,0.0,0.012048,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0
1,Jagakarsa,0.041667,0.0,0.0,0.041667,0.0,0.0,0.125,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Kebayoran Baru,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.034483,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.011494,0.022989,0.0,0.034483,0.011494,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.011494,0.011494,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.068966,0.011494,0.011494,0.0,0.011494,0.0,0.0,0.011494,0.0,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.08046,0.022989,0.103448,0.011494,0.011494,0.0,0.045977,0.0,0.011494,0.011494,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045977,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.022989,0.0,0.0,0.022989,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.011494,0.011494,0.022989,0.0,0.0,0.011494,0.034483,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494
3,Kebayoran Lama,0.0,0.0,0.010753,0.010753,0.010753,0.0,0.032258,0.0,0.021505,0.064516,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.021505,0.010753,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.043011,0.021505,0.0,0.0,0.0,0.0,0.0,0.010753,0.043011,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.010753,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.021505,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.032258,0.0,0.0,0.021505,0.010753,0.053763,0.010753,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.021505,0.021505,0.0,0.0,0.021505,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.032258,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.021505,0.010753,0.010753,0.010753,0.010753,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.043011,0.0,0.0,0.0,0.010753,0.010753,0.010753,0.010753,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.010753,0.0,0.0
4,Mampang Prapatan,0.012658,0.012658,0.012658,0.0,0.012658,0.0,0.063291,0.0,0.0,0.037975,0.0,0.025316,0.0,0.0,0.025316,0.012658,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.0,0.025316,0.0,0.012658,0.0,0.012658,0.0,0.012658,0.0,0.0,0.025316,0.025316,0.025316,0.0,0.0,0.0,0.0,0.0,0.050633,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025316,0.012658,0.025316,0.0,0.0,0.0,0.0,0.012658,0.037975,0.0,0.025316,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.025316,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658,0.037975,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.012658,0.0,0.012658,0.0,0.101266,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037975,0.0,0.0,0.0,0.012658,0.0,0.025316,0.0,0.025316,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0
5,Pancoran,0.016949,0.0,0.0,0.016949,0.0,0.016949,0.050847,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.101695,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.033898,0.016949,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.084746,0.0,0.0,0.016949,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.033898,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.033898,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.033898,0.033898,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0
6,Pasar Minggu,0.0,0.0,0.025641,0.0,0.0,0.0,0.102564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.025641,0.051282,0.0,0.025641,0.0,0.025641,0.025641,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.025641,0.025641,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.102564,0.0,0.051282,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Pesanggrahan,0.0,0.0,0.0,0.032258,0.032258,0.0,0.032258,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.032258,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.193548,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.096774,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Setiabudi,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.012048,0.024096,0.0,0.0,0.024096,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.036145,0.0,0.0,0.036145,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.012048,0.012048,0.0,0.0,0.012048,0.0,0.024096,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.096386,0.012048,0.0,0.012048,0.012048,0.012048,0.024096,0.036145,0.048193,0.024096,0.0,0.012048,0.012048,0.0,0.024096,0.0,0.0,0.0,0.012048,0.012048,0.0,0.012048,0.012048,0.0,0.0,0.024096,0.0,0.012048,0.012048,0.012048,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.036145,0.012048,0.0,0.0,0.0,0.0,0.0,0.024096,0.012048,0.024096,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.024096,0.0
9,Tebet,0.0,0.0,0.0,0.010753,0.032258,0.0,0.086022,0.010753,0.010753,0.064516,0.0,0.010753,0.010753,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.053763,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.021505,0.0,0.0,0.0,0.021505,0.0,0.010753,0.021505,0.010753,0.021505,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.021505,0.16129,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.021505,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.021505,0.0,0.010753,0.0,0.010753,0.032258,0.0,0.0,0.0,0.010753,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.021505,0.0,0.021505,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [80]:
num_top_venues = 20

for hood in positive_grouped['Districs']:
    print("----"+hood+"----")
    temp = positive_grouped[positive_grouped['Districs'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----Cilandak----
                    venue  freq
0                    Café  0.07
1       French Restaurant  0.05
2   Indonesian Restaurant  0.05
3        Asian Restaurant  0.05
4    Fast Food Restaurant  0.04
5              Steakhouse  0.04
6              Food Truck  0.04
7   Padangnese Restaurant  0.04
8                   Diner  0.02
9          Breakfast Spot  0.02
10             Restaurant  0.02
11         Ice Cream Shop  0.02
12     Chinese Restaurant  0.02
13                    Gym  0.02
14    Japanese Restaurant  0.02
15                 Bakery  0.02
16      Convenience Store  0.02
17           Noodle House  0.02
18      German Restaurant  0.02
19          Grocery Store  0.01


----Jagakarsa----
                    venue  freq
0   Indonesian Restaurant  0.12
1        Asian Restaurant  0.12
2                    Café  0.08
3              Food Court  0.08
4          Soccer Stadium  0.08
5     Acehnese Restaurant  0.04
6                   Hotel  0.04
7            Noodle House  0.04
8  

In [81]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [82]:
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Districs']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
positive_venues_sorted = pd.DataFrame(columns=columns)
positive_venues_sorted['Districs'] = positive_grouped['Districs']

for ind in np.arange(positive_grouped.shape[0]):
    positive_venues_sorted.iloc[ind, 1:] = return_most_common_venues(positive_grouped.iloc[ind, :], num_top_venues)

positive_venues_sorted.head()

Unnamed: 0,Districs,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Cilandak,Café,Asian Restaurant,Indonesian Restaurant,French Restaurant,Food Truck,Steakhouse,Fast Food Restaurant,Padangnese Restaurant,Diner,Restaurant,Chinese Restaurant,Japanese Restaurant,Noodle House,Ice Cream Shop,German Restaurant,Gym,Breakfast Spot,Convenience Store,Bakery,Thai Restaurant
1,Jagakarsa,Indonesian Restaurant,Asian Restaurant,Soccer Stadium,Food Court,Café,Hotel,Lake,Noodle House,Other Great Outdoors,Park,Flea Market,Pharmacy,Department Store,Convenience Store,Acehnese Restaurant,BBQ Joint,Arcade,German Restaurant,Automotive Shop,Arts & Crafts Store
2,Kebayoran Baru,Japanese Restaurant,Indonesian Restaurant,Food Truck,Noodle House,Korean Restaurant,Sushi Restaurant,Hotel,BBQ Joint,Café,Steakhouse,Italian Restaurant,Seafood Restaurant,Salon / Barbershop,Burger Joint,Restaurant,Asian Restaurant,Gourmet Shop,Juice Bar,Javanese Restaurant,General Entertainment
3,Kebayoran Lama,Bakery,Japanese Restaurant,Dessert Shop,Chinese Restaurant,Steakhouse,Asian Restaurant,Ice Cream Shop,Café,Pizza Place,Korean Restaurant,Bubble Tea Shop,Seafood Restaurant,Indonesian Restaurant,Clothing Store,BBQ Joint,Multiplex,Noodle House,Hardware Store,Music Store,French Restaurant
4,Mampang Prapatan,Restaurant,Asian Restaurant,Food Truck,Noodle House,Bakery,Indonesian Restaurant,Snack Place,Japanese Restaurant,Steakhouse,Middle Eastern Restaurant,Hotel,Hobby Shop,Fast Food Restaurant,Farmers Market,Donut Shop,Spa,Bar,Comfort Food Restaurant,Bistro,Dessert Shop


Good, we finished the dataframes, after we are going to create cluster that help us define characteristic of these venues.

In [83]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

positive_grouped_clustering = positive_grouped.drop('Districs', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(positive_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 0, 0, 0, 1, 1, 2, 0, 2], dtype=int32)

In [84]:
# add clustering labels
positive_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

south_jakarta_merged = df_south_jakarta


# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
south_jakarta_merged = south_jakarta_merged.join(positive_venues_sorted.set_index('Districs'), on='Districs')

south_jakarta_merged # check the last columns!

Unnamed: 0,Districs,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Cilandak,-6.283818,106.804863,0,Café,Asian Restaurant,Indonesian Restaurant,French Restaurant,Food Truck,Steakhouse,Fast Food Restaurant,Padangnese Restaurant,Diner,Restaurant,Chinese Restaurant,Japanese Restaurant,Noodle House,Ice Cream Shop,German Restaurant,Gym,Breakfast Spot,Convenience Store,Bakery,Thai Restaurant
1,Jagakarsa,-6.330101,106.822237,1,Indonesian Restaurant,Asian Restaurant,Soccer Stadium,Food Court,Café,Hotel,Lake,Noodle House,Other Great Outdoors,Park,Flea Market,Pharmacy,Department Store,Convenience Store,Acehnese Restaurant,BBQ Joint,Arcade,German Restaurant,Automotive Shop,Arts & Crafts Store
2,Kebayoran Lama,-6.249128,106.777782,0,Bakery,Japanese Restaurant,Dessert Shop,Chinese Restaurant,Steakhouse,Asian Restaurant,Ice Cream Shop,Café,Pizza Place,Korean Restaurant,Bubble Tea Shop,Seafood Restaurant,Indonesian Restaurant,Clothing Store,BBQ Joint,Multiplex,Noodle House,Hardware Store,Music Store,French Restaurant
3,Kebayoran Baru,-6.243164,106.79985,0,Japanese Restaurant,Indonesian Restaurant,Food Truck,Noodle House,Korean Restaurant,Sushi Restaurant,Hotel,BBQ Joint,Café,Steakhouse,Italian Restaurant,Seafood Restaurant,Salon / Barbershop,Burger Joint,Restaurant,Asian Restaurant,Gourmet Shop,Juice Bar,Javanese Restaurant,General Entertainment
4,Mampang Prapatan,-6.250878,106.823021,0,Restaurant,Asian Restaurant,Food Truck,Noodle House,Bakery,Indonesian Restaurant,Snack Place,Japanese Restaurant,Steakhouse,Middle Eastern Restaurant,Hotel,Hobby Shop,Fast Food Restaurant,Farmers Market,Donut Shop,Spa,Bar,Comfort Food Restaurant,Bistro,Dessert Shop
5,Pancoran,-6.258085,106.842733,1,Convenience Store,Indonesian Restaurant,Asian Restaurant,Pizza Place,Fast Food Restaurant,Clothing Store,Bookstore,Steakhouse,Street Food Gathering,Supermarket,Music Venue,Noodle House,Food Court,Middle Eastern Restaurant,Food Truck,Café,Bubble Tea Shop,Padangnese Restaurant,Food Stand,Dim Sum Restaurant
6,Pasar Minggu,-6.29195,106.827835,1,Convenience Store,Indonesian Restaurant,Asian Restaurant,Noodle House,Food Truck,Café,Japanese Restaurant,Food Court,Breakfast Spot,Chinese Restaurant,Caribbean Restaurant,Restaurant,Campground,Food & Drink Shop,Bus Line,Burger Joint,Soup Place,Music Store,Gas Station,High School
7,Pesanggrahan,-6.255458,106.763112,2,Indonesian Restaurant,Pizza Place,Food Truck,Bakery,Café,Burger Joint,Soup Place,Gym,Convenience Store,Noodle House,Restaurant,Music Venue,Fried Chicken Joint,Shabu-Shabu Restaurant,Fruit & Vegetable Store,Steakhouse,Diner,Arcade,Art Gallery,Asian Restaurant
8,Setiabudi,-6.221706,106.826308,0,Hotel,Japanese Restaurant,Café,Buffet,Shopping Mall,Italian Restaurant,Dim Sum Restaurant,Bar,Wine Bar,Indonesian Restaurant,Lounge,Performing Arts Venue,Food Court,Spa,Steakhouse,Javanese Restaurant,Asian Restaurant,Multiplex,BBQ Joint,Fast Food Restaurant
9,Tebet,-6.226016,106.858396,2,Indonesian Restaurant,Asian Restaurant,Bakery,Café,Art Gallery,Karaoke Bar,Seafood Restaurant,Pizza Place,Steakhouse,Donut Shop,Dessert Shop,Middle Eastern Restaurant,Convenience Store,Restaurant,Soup Place,Indonesian Meatball Place,Fast Food Restaurant,Grocery Store,Sundanese Restaurant,Basketball Stadium


In [85]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(south_jakarta_merged['Latitude'], south_jakarta_merged['Longitude'], south_jakarta_merged['Districs'], south_jakarta_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=40,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)

for venue, lat, lng in zip(positive_venues['Venue'], positive_venues['Venue Latitude'], positive_venues['Venue Longitude']):
  label = '{}'.format(venue)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
  [lat, lng],
  radius=10,
  popup=label,
  color='green',
  fill=True,
  fill_color='#3186cc',
  fill_opacity=0.7,
  parse_html=False).add_to(map_clusters)

for venue, lat, lng in zip(coffee_shops['Venue'], coffee_shops['Venue Latitude'], coffee_shops['Venue Longitude']):
  label = '{}'.format(venue)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
  [lat, lng],
  radius=10,
  popup=label,
  color='yellow',
  fill=True,
  fill_color='#3186cc',
  fill_opacity=0.7,
  parse_html=False).add_to(map_clusters)
map_clusters

In [89]:
rainbow

['#8000ff', '#80ffb4', '#ff0000']

### Examine the Clusters

In [86]:
south_jakarta_merged.loc[south_jakarta_merged['Cluster Labels'] == 0, south_jakarta_merged.columns[[0] + list(range(4, south_jakarta_merged.shape[1]))]]

Unnamed: 0,Districs,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Cilandak,Café,Asian Restaurant,Indonesian Restaurant,French Restaurant,Food Truck,Steakhouse,Fast Food Restaurant,Padangnese Restaurant,Diner,Restaurant,Chinese Restaurant,Japanese Restaurant,Noodle House,Ice Cream Shop,German Restaurant,Gym,Breakfast Spot,Convenience Store,Bakery,Thai Restaurant
2,Kebayoran Lama,Bakery,Japanese Restaurant,Dessert Shop,Chinese Restaurant,Steakhouse,Asian Restaurant,Ice Cream Shop,Café,Pizza Place,Korean Restaurant,Bubble Tea Shop,Seafood Restaurant,Indonesian Restaurant,Clothing Store,BBQ Joint,Multiplex,Noodle House,Hardware Store,Music Store,French Restaurant
3,Kebayoran Baru,Japanese Restaurant,Indonesian Restaurant,Food Truck,Noodle House,Korean Restaurant,Sushi Restaurant,Hotel,BBQ Joint,Café,Steakhouse,Italian Restaurant,Seafood Restaurant,Salon / Barbershop,Burger Joint,Restaurant,Asian Restaurant,Gourmet Shop,Juice Bar,Javanese Restaurant,General Entertainment
4,Mampang Prapatan,Restaurant,Asian Restaurant,Food Truck,Noodle House,Bakery,Indonesian Restaurant,Snack Place,Japanese Restaurant,Steakhouse,Middle Eastern Restaurant,Hotel,Hobby Shop,Fast Food Restaurant,Farmers Market,Donut Shop,Spa,Bar,Comfort Food Restaurant,Bistro,Dessert Shop
8,Setiabudi,Hotel,Japanese Restaurant,Café,Buffet,Shopping Mall,Italian Restaurant,Dim Sum Restaurant,Bar,Wine Bar,Indonesian Restaurant,Lounge,Performing Arts Venue,Food Court,Spa,Steakhouse,Javanese Restaurant,Asian Restaurant,Multiplex,BBQ Joint,Fast Food Restaurant


In [87]:
south_jakarta_merged.loc[south_jakarta_merged['Cluster Labels'] == 1, south_jakarta_merged.columns[[0] + list(range(4, south_jakarta_merged.shape[1]))]]


Unnamed: 0,Districs,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
1,Jagakarsa,Indonesian Restaurant,Asian Restaurant,Soccer Stadium,Food Court,Café,Hotel,Lake,Noodle House,Other Great Outdoors,Park,Flea Market,Pharmacy,Department Store,Convenience Store,Acehnese Restaurant,BBQ Joint,Arcade,German Restaurant,Automotive Shop,Arts & Crafts Store
5,Pancoran,Convenience Store,Indonesian Restaurant,Asian Restaurant,Pizza Place,Fast Food Restaurant,Clothing Store,Bookstore,Steakhouse,Street Food Gathering,Supermarket,Music Venue,Noodle House,Food Court,Middle Eastern Restaurant,Food Truck,Café,Bubble Tea Shop,Padangnese Restaurant,Food Stand,Dim Sum Restaurant
6,Pasar Minggu,Convenience Store,Indonesian Restaurant,Asian Restaurant,Noodle House,Food Truck,Café,Japanese Restaurant,Food Court,Breakfast Spot,Chinese Restaurant,Caribbean Restaurant,Restaurant,Campground,Food & Drink Shop,Bus Line,Burger Joint,Soup Place,Music Store,Gas Station,High School


In [88]:
south_jakarta_merged.loc[south_jakarta_merged['Cluster Labels'] == 2, south_jakarta_merged.columns[[0] + list(range(4, south_jakarta_merged.shape[1]))]]

Unnamed: 0,Districs,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
7,Pesanggrahan,Indonesian Restaurant,Pizza Place,Food Truck,Bakery,Café,Burger Joint,Soup Place,Gym,Convenience Store,Noodle House,Restaurant,Music Venue,Fried Chicken Joint,Shabu-Shabu Restaurant,Fruit & Vegetable Store,Steakhouse,Diner,Arcade,Art Gallery,Asian Restaurant
9,Tebet,Indonesian Restaurant,Asian Restaurant,Bakery,Café,Art Gallery,Karaoke Bar,Seafood Restaurant,Pizza Place,Steakhouse,Donut Shop,Dessert Shop,Middle Eastern Restaurant,Convenience Store,Restaurant,Soup Place,Indonesian Meatball Place,Fast Food Restaurant,Grocery Store,Sundanese Restaurant,Basketball Stadium


## Result and Discussion

In this project, we want to know which Districts is better for opening coffee shop business in term of strategic location. The question is, what characteristic that a district need to have, so it can be categorize best candidate for further analysis by stakeholders. For this question, **the hypothesis is nearby venues that can attracts large crowds to the area and has potential to buy coffee**.

From negative venues, we know that the good candidates for opening coffee shop business are Pesanggrahan, Pasar Minggu, and Jagakarsa. To findout the best candidates from these good candidates, we need to analyze each cluster. In general from all cluster, we find out that this dataset tell us the most common venues in South Jakarta is Restaurants.  

First Cluster, although the dominant venues are restaurant, we can see that this cluster has more venues variants than other clusters. However, is this cluster has good characteristic for opening coffee shop business ?. The answer is best characteristic, this cluster has some venues that can attracts low of people that has potential to buy coffee. For example, Soccer Stadium, Shopping Mall, Hobby shop, Barber, Spa, Gym, etc.

Second Cluster, same as first cluster that restaurant is the dominant venues, however it has less variation than first cluster. To answer whether this cluster has good characteristic for opening coffee shop business. The answer is good characteristic, because this cluster has Soccer Stadium, Hotel, Bookstore, Clothing Store, Supermarket, Campground, Schools, etc.

Third cluster has quite good characteristic for opening coffee shop, However Pesanggrahan has worse characteristic because top 10 most common venues are mostly food business, there is only gym that has positive impact. it is little different with Tebet, that has Art Galery and Karoeke Bar that gives more positive impact.

Result from all of this, we can rank the cluster from best to worst in terms of best characteristic for opening coffee shop business are 

1. First Cluster
2. Second Cluster
3. Third Cluster

Therefore, from good candidates that we obtained by filtering the districts using negative venues (coffee shop venues) those are Pesanggrahan, Pasar Minggu, and Jagakarsa. We recommend the best distric candidates for further analysis are Pasar Minggu and Jagakarsa based on cluster analysis.

## Conclusion
In this project, we will give best distric candidates recommendation for strategic locations problem based on neighboring business which can help stakeholders for further analysis in opening coffee shop business. By using wikipedia, geopy, and Foursquare we successfully create dataset that contains nearby venues around 1200 meter from each districts. From the analysis, we recommend the best district candidates for further analysis are Pasar Minggu and Jagakarsa.

For final decision, stakeholders need to considered other key success factor that is, 
1. Innovative product
2. Offer value for money but remain competitive
3. Cosy and clean venue
4. Good marketing and innovative promotions.
5. Innovative in selling and distribution.