# Introduction
Bangladesh is a country of over 150 million people with a GDP of 250 US Billion dollars. After its independence from Pakistan in 1971, Bangladesh has moved itself from being an under-developed to a developing country within a span of 30 years. As the economy has grown and still growing at a fast pace (600% increase in GDP within 60 years), the buying power of general people is also increasing rapidly. One of the indicators of this event is the burgeoning growth of restaurants around the country, especially in Dhaka, the Capital city of Bangladesh. While it was hard to find a decent coffee place around the capital around 1980-90s, now-a-days big chains like Starbucks, Gloria-Jeans are found at every corner of the city.

In a statistical report by Bangladesh Bureau of Statistics, it is cited that the restaurant market in Bangladesh will reach a whopping 56 Million USD, contributing around 2.1% of the total GDP of Bangladesh by the year 2021. Several social changes can be selected as the reason of this upward trend. As the economy is growing, most of the people in the city are working and they are looking for affordable options for their daily eating routine. At the same time, people under 25, which is 50% of the total population, prefers the ever-growing market of fast foods. Another interesting reason can be attributed to the dating habit of the younger generation. In Dhaka, it is hard to find a calm and quiet place where you can spend time with your loved ones as it is the most densely populated city in the world. The restaurants provide an alternative where people can go spend some quality time with their better halves.

Dhaka, being the most densely populated city in the world, is a place where the availability of jobs is scarce compared to the number of people. That is why, a great number of people is looking for entrepreneurship opportunities to make a living. Restaurant business, being one of the most burgeoning market in Dhaka, is where the people are trying to invest most. In this report, I plan to identify the preference of people around different neighborhoods in Dhaka city over different types of restaurants. I believe, this report will help a lot of young and upcoming entrepreneurs who are looking to invest in restaurant business around the neighborhood they are living in, to help them decide on the type of restaurant they should invest.


# Data Source 
The Datasets I will use for this exercise are the following:
1.	List of postal codes in Bangladesh
List of postal codes in Bangladesh are available in the following Wikipedia page. I will only use the postal codes for Dhaka district as I will plan to identify the preferences of people over different restaurants around Dhaka district. 
https://en.wikipedia.org/wiki/List_of_postal_codes_in_Bangladesh
2.	Foursquare data
I will use foursquare to data to find out relevant information about the restaurants around Dhaka district, i.e., location, restaurant type, etc.

### Read table from wikipedia using 'BeautifulSoup' and create a pandas Dataframe
In this section, I gather the data ragrding postal codes in Dhaka distrcit from the wikipedia page <href>https://en.wikipedia.org/wiki/List_of_postal_codes_in_Bangladesh</href>. Here are the steps: 

#### 1. Import the relevant libraries

In [1]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

#### 2. Read the table from the wikipedia page
Using BeatifullSoup, I read the table from the wikipedia page containing the postal codes of Dhaka district.

In [2]:
page = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_in_Bangladesh").text
soup = BeautifulSoup(page, 'lxml')

In [3]:
MyTable = soup.find('table',{'class':'wikitable'})

In [4]:
ths = MyTable.find_all('th')
headings = [th.text.strip() for th in ths]
headings

['District', 'Thana', 'SubOffice', 'Post Code']

In [5]:
rows = MyTable.find_all('tr')
data = []
for row in rows:
    data.append([t.text.strip() for t in row.find_all('td')])

#### 3. Transform the table indo pandas dataframe
In this section, I transform the wiki table into dataframe. I also create a feature "ADDRESS" using the columns of the dataframe. This feature will be used to extract the lattitude and longitude values of the locations.

In [6]:
df = pd.DataFrame(data, columns = headings)
dhaka = df[df['District'] == 'Dhaka'].reset_index(drop = True)
dhaka = dhaka.iloc[1:].reset_index(drop=True)
dhaka['ADDRESS'] = dhaka['Post Code'].astype(str) + ', ' + dhaka['Thana'] + ', ' + dhaka['District'] + ', Bangladesh'
dhaka.head(5)

Unnamed: 0,District,Thana,SubOffice,Post Code,ADDRESS
0,Dhaka,Dhamrai,Kalampur,1350,"01350, Dhamrai, Dhaka, Bangladesh"
1,Dhaka,Dhanmondi,Jigatala TSO,1209,"1209, Dhanmondi, Dhaka, Bangladesh"
2,Dhaka,Gulshan,Banani TSO,1213,"1213, Gulshan, Dhaka, Bangladesh"
3,Dhaka,Gulshan,Badda,1212,"1212, Gulshan, Dhaka, Bangladesh"
4,Dhaka,Gulshan,Gulshan Model Town,1212,"1212, Gulshan, Dhaka, Bangladesh"


### Collect Lattitude and Longitude information for the locations
In this section, I collect the lattitude and longitude of the locations saved as the fetaure "ADDRESS" in the dataframe "dhaka". I have used the library "Geopy" to conduct this exercise. After collecting the information, I have appended the information as new features in my original dataframe "dhaka".

To learn about the procsess further, plick in the following link: <href>https://towardsdatascience.com/geocode-with-python-161ec1e62b89</href>.

In [7]:
from geopy.extra.rate_limiter import RateLimiter
from geopy import Nominatim

locator = Nominatim(user_agent='myGeocoder')

# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

# 2- - create location column
dhaka['location'] = dhaka['ADDRESS'].apply(geocode)

# 3 - create longitude, laatitude and altitude from location column (returns tuple)
dhaka['point'] = dhaka['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# 4 - split point column into latitude, longitude and altitude columns
dhaka[['latitude', 'longitude', 'altitude']] = pd.DataFrame(dhaka['point'].tolist(), index=dhaka.index)


In [8]:
dhaka = dhaka.dropna(how='any',axis=0)
dhaka.head(100)

Unnamed: 0,District,Thana,SubOffice,Post Code,ADDRESS,location,point,latitude,longitude,altitude
0,Dhaka,Dhamrai,Kalampur,1350,"01350, Dhamrai, Dhaka, Bangladesh","(ধামরাই, ধামরাই উপজেলা, ঢাকা জেলা, ঢাকা বিভাগ,...","(23.920162, 90.2108702, 0.0)",23.920162,90.21087,0.0
1,Dhaka,Dhanmondi,Jigatala TSO,1209,"1209, Dhanmondi, Dhaka, Bangladesh","(ধানমন্ডি আ/এ, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 12...","(23.7535496, 90.37312384681817, 0.0)",23.75355,90.373124,0.0
2,Dhaka,Gulshan,Banani TSO,1213,"1213, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1213, Ba...","(23.7833829, 90.4119635, 0.0)",23.783383,90.411963,0.0
3,Dhaka,Gulshan,Badda,1212,"1212, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1212, Ba...","(23.7930078, 90.4106613, 0.0)",23.793008,90.410661,0.0
4,Dhaka,Gulshan,Gulshan Model Town,1212,"1212, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1212, Ba...","(23.7930078, 90.4106613, 0.0)",23.793008,90.410661,0.0
5,Dhaka,Jatrabari,Dhania TSO,1236,"1236, Jatrabari, Dhaka, Bangladesh","(যাত্রাবাড়ী, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 123...","(23.7104228, 90.4344666, 0.0)",23.710423,90.434467,0.0
6,Dhaka,Joypara,Joypara,1331,"1331, Joypara, Dhaka, Bangladesh","(joypara, দোহার, দোহার উপজেলা, ঢাকা জেলা, ঢাকা...","(23.6075985, 90.1249616, 0.0)",23.607599,90.124962,0.0
7,Dhaka,Joypara,Narisha,1332,"1332, Joypara, Dhaka, Bangladesh","(joypara, দোহার, দোহার উপজেলা, ঢাকা জেলা, ঢাকা...","(23.6075985, 90.1249616, 0.0)",23.607599,90.124962,0.0
8,Dhaka,Joypara,Palamganj,1331,"1331, Joypara, Dhaka, Bangladesh","(joypara, দোহার, দোহার উপজেলা, ঢাকা জেলা, ঢাকা...","(23.6075985, 90.1249616, 0.0)",23.607599,90.124962,0.0
9,Dhaka,Keraniganj,Ati,1312,"1312, Keraniganj, Dhaka, Bangladesh","(ভূঁইয়া এস্টেট, কেরাণীগঞ্জ উপজেলা, ঢাকা জেলা,...","(23.72495955, 90.34308827838592, 0.0)",23.72496,90.343088,0.0


# Data Visualization
### Create initial map of Dhaka district
Now that I have all the relevant data for this exercise, it is time to create some initial visualizations on the spatial data in the dataframe. This section creates a folium map where the addresses are shown (blue markers). This map gives us a general idea about the sparsity of the locations.

It can be easily seen that most of the locations are concentrated around Dhaka metropolitan area. As we move away from the metro area, the locations get sparsed. This observation is pretty intuitive and logical.

In [9]:
# Get the lon-lat for Toronto
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Dhaka, Bangladesh'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude_dh = location.latitude
longitude_dh = location.longitude
print('The geograpical coordinate of Dhaka are {}, {}.'.format(latitude_dh, longitude_dh))

The geograpical coordinate of Dhaka are 23.7593572, 90.3788136.


In [10]:
# create map of Toronto using latitude and longitude values
map_dhaka = folium.Map(location=[latitude_dh, longitude_dh], zoom_start=10)

# add markers to map
for lat, lng, addresses in zip(dhaka['latitude'], dhaka['longitude'], dhaka['ADDRESS']):
    label = '{}'.format(addresses)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dhaka)  
    
map_dhaka

### Collect venues data using FourSquare
Now, I collect the venue data around the location in the datafram 'dhaka' using "Foursquare" API.

In [11]:
# Foursquare credentials
CLIENT_ID = 'GYQ1PPNIN4JOPSJDUZ00QPW12YGINODNK2QW4LT1DDGLYUTK' # your Foursquare ID
CLIENT_SECRET = 'VRVXQAY3ICMA3YUUS3PDTSWQCZUEZMZJYE2MRGFDZE1LH4DX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GYQ1PPNIN4JOPSJDUZ00QPW12YGINODNK2QW4LT1DDGLYUTK
CLIENT_SECRET:VRVXQAY3ICMA3YUUS3PDTSWQCZUEZMZJYE2MRGFDZE1LH4DX


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
# type your answer here
LIMIT = 100
dhaka_venues = getNearbyVenues(names=dhaka['ADDRESS'],
                                   latitudes=dhaka['latitude'],
                                   longitudes=dhaka['longitude']
                                  )

01350, Dhamrai, Dhaka, Bangladesh
1209, Dhanmondi, Dhaka, Bangladesh
1213, Gulshan, Dhaka, Bangladesh
1212, Gulshan, Dhaka, Bangladesh
1212, Gulshan, Dhaka, Bangladesh
1236, Jatrabari, Dhaka, Bangladesh
1331, Joypara, Dhaka, Bangladesh
1332, Joypara, Dhaka, Bangladesh
1331, Joypara, Dhaka, Bangladesh
1312, Keraniganj, Dhaka, Bangladesh
1311, Keraniganj, Dhaka, Bangladesh
1313, Keraniganj, Dhaka, Bangladesh
1310, Keraniganj, Dhaka, Bangladesh
1219, Khilgaon, Dhaka, Bangladesh
1229, Khilkhet, Dhaka, Bangladesh
1216, Mirpur, Dhaka, Bangladesh
1207, Mohammadpur, Dhaka, Bangladesh
1225, Mohammadpur, Dhaka, Bangladesh
1222, Motijheel, Dhaka, Bangladesh
1223, Motijheel, Dhaka, Bangladesh
1323, Nawabganj, Dhaka, Bangladesh
1325, Nawabganj, Dhaka, Bangladesh
1322, Nawabganj, Dhaka, Bangladesh
1321, Nawabganj, Dhaka, Bangladesh
1324, Nawabganj, Dhaka, Bangladesh
1320, Nawabganj, Dhaka, Bangladesh
1205, New market, Dhaka, Bangladesh
1217, Ramna, Dhaka, Bangladesh
1348, Savar, Dhaka, Bangladesh
13

In [14]:
print(dhaka_venues.shape)
dhaka_venues.head(100)

(251, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"01350, Dhamrai, Dhaka, Bangladesh",23.920162,90.210870,Dhamrai Bazar,23.919938,90.211445,Market
1,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Sausly's,23.755445,90.375762,Sandwich Place
2,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Drik Gallery,23.752380,90.369972,Art Gallery
3,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Nando's,23.753045,90.369766,Portuguese Restaurant
4,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,BFC,23.755495,90.375534,Fried Chicken Joint
5,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Dhanmondi Rd 27,23.753419,90.372098,Scenic Lookout
6,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Tehari Ghar,23.753635,90.376701,Asian Restaurant
7,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Coffee World,23.753002,90.369519,Café
8,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Crimson Cup Coffee,23.753006,90.369566,Coffee Shop
9,"1209, Dhanmondi, Dhaka, Bangladesh",23.753550,90.373124,Meena Bazar,23.753046,90.369543,Department Store


#### Observation
We can see that, only 248 venues are found for 46 locations in the dataframe 'dhaka'. This is also logical. "Foursquare" is not popular in Bangladesh or around Dhaka city. That is why, the data collected from FourSquare is quite small. This is one of the setbacks of this exercise as 246 is too small a value for the total number of establishments situated in Dhaka.

In [15]:
dhaka_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"01350, Dhamrai, Dhaka, Bangladesh",1,1,1,1,1,1
"1204, Sutrapur, Dhaka, Bangladesh",1,1,1,1,1,1
"1205, New market, Dhaka, Bangladesh",4,4,4,4,4,4
"1207, Mohammadpur, Dhaka, Bangladesh",7,7,7,7,7,7
"1209, Dhanmondi, Dhaka, Bangladesh",18,18,18,18,18,18
"1212, Gulshan, Dhaka, Bangladesh",108,108,108,108,108,108
"1213, Gulshan, Dhaka, Bangladesh",9,9,9,9,9,9
"1215, Tejgaon, Dhaka, Bangladesh",3,3,3,3,3,3
"1216, Mirpur, Dhaka, Bangladesh",1,1,1,1,1,1
"1217, Ramna, Dhaka, Bangladesh",6,6,6,6,6,6


In [16]:
print('There are {} uniques categories.'.format(len(dhaka_venues['Venue Category'].unique())))

There are 56 uniques categories.


# Methodology
### One-hot encoding for different venue types
In this section, I apply one-hot encoding to generate a binary-matrix of restaurent type present in each address in the dataframe "dhaka". This matrix will later help determine the clustering of the locations in terms of venues.

In [17]:
# one hot encoding
dh_onehot = pd.get_dummies(dhaka_venues[['Venue Category']], prefix="", prefix_sep="")

#dh_onehot.drop(['Neighborhood'],axis = 1)

# add neighborhood column back to dataframe
dh_onehot['Neighborhood'] = dhaka_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dh_onehot.columns[-1]] + list(dh_onehot.columns[:-1])
dh_onehot = dh_onehot[fixed_columns]

dh_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Entertainment,Asian Restaurant,Bakery,Bar,Bookstore,Breakfast Spot,Burger Joint,...,Restaurant,Sandwich Place,Scenic Lookout,Shopping Mall,Social Club,Spa,Steakhouse,Supermarket,Sushi Restaurant,Turkish Restaurant
0,"01350, Dhamrai, Dhaka, Bangladesh",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"1209, Dhanmondi, Dhaka, Bangladesh",0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,"1209, Dhanmondi, Dhaka, Bangladesh",0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"1209, Dhanmondi, Dhaka, Bangladesh",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"1209, Dhanmondi, Dhaka, Bangladesh",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
dh_onehot.shape

(251, 57)

In [19]:
dh_grouped = dh_onehot.groupby('Neighborhood').mean().reset_index()
dh_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Entertainment,Asian Restaurant,Bakery,Bar,Bookstore,Breakfast Spot,Burger Joint,...,Restaurant,Sandwich Place,Scenic Lookout,Shopping Mall,Social Club,Spa,Steakhouse,Supermarket,Sushi Restaurant,Turkish Restaurant
0,"01350, Dhamrai, Dhaka, Bangladesh",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"1204, Sutrapur, Dhaka, Bangladesh",0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"1205, New market, Dhaka, Bangladesh",0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"1207, Mohammadpur, Dhaka, Bangladesh",0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0
4,"1209, Dhanmondi, Dhaka, Bangladesh",0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.055556,...,0.0,0.055556,0.055556,0.111111,0.0,0.055556,0.0,0.0,0.0,0.0


In [20]:
dh_grouped.shape

(27, 57)

### List top-5 venues
In this section I generate the top 5 venue-types for each locaion.

In [21]:
num_top_venues = 5

for hood in dh_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dh_grouped[dh_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----01350, Dhamrai, Dhaka, Bangladesh----
                 venue  freq
0               Market   1.0
1  American Restaurant   0.0
2          Art Gallery   0.0
3       Ice Cream Shop   0.0
4    Indian Restaurant   0.0


----1204, Sutrapur, Dhaka, Bangladesh----
                 venue  freq
0            Bookstore   1.0
1  American Restaurant   0.0
2            Nightclub   0.0
3    Indian Restaurant   0.0
4   Italian Restaurant   0.0


----1205, New market, Dhaka, Bangladesh----
                 venue  freq
0          Bus Station  0.50
1            Bookstore  0.25
2               Market  0.25
3  American Restaurant  0.00
4            Nightclub  0.00


----1207, Mohammadpur, Dhaka, Bangladesh----
              venue  freq
0              Park  0.14
1  Asian Restaurant  0.14
2        Steakhouse  0.14
3  Halal Restaurant  0.14
4       Coffee Shop  0.14


----1209, Dhanmondi, Dhaka, Bangladesh----
               venue  freq
0  Indian Restaurant  0.11
1        Art Gallery  0.11
2   Asian Restaur

### List top-10 venues
In this section, I provide the top-10 venue types for each location.

In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dh_grouped['Neighborhood']

for ind in np.arange(dh_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dh_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"01350, Dhamrai, Dhaka, Bangladesh",Market,Turkish Restaurant,Halal Restaurant,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Farmers Market,Electronics Store,Donut Shop,Diner
1,"1204, Sutrapur, Dhaka, Bangladesh",Bookstore,Turkish Restaurant,Coffee Shop,Grocery Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Farmers Market,Electronics Store,Donut Shop
2,"1205, New market, Dhaka, Bangladesh",Bus Station,Bookstore,Market,Turkish Restaurant,Coffee Shop,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Farmers Market,Electronics Store
3,"1207, Mohammadpur, Dhaka, Bangladesh",Coffee Shop,Clothing Store,Steakhouse,Asian Restaurant,Halal Restaurant,Bus Station,Park,Diner,Department Store,Dessert Shop
4,"1209, Dhanmondi, Dhaka, Bangladesh",Art Gallery,Asian Restaurant,Shopping Mall,Indian Restaurant,Fried Chicken Joint,Coffee Shop,Lake,Department Store,Café,Burger Joint


### Clustering
In this section, I apply clustering method to create a separation between the locations in terms of theier most-common venues. At first, I create the cluster label values for the locations using K-Means clustering, where I consider at most 5 clusters to be created. Then I attach the cluster labels to each location in the dataframe "dhaka", along with theme top-10 common venues for each location.

In [24]:
# set number of clusters
kclusters = 5

dh_grouped_clustering = dh_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:100] 

array([1, 3, 1, 2, 2, 2, 2, 2, 4, 2, 2, 0, 0, 2, 2, 2, 2, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1])

In [25]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dh_merged = dhaka

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dh_merged = dh_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='ADDRESS')

dh_merged.head() # check the last columns!

Unnamed: 0,District,Thana,SubOffice,Post Code,ADDRESS,location,point,latitude,longitude,altitude,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dhaka,Dhamrai,Kalampur,1350,"01350, Dhamrai, Dhaka, Bangladesh","(ধামরাই, ধামরাই উপজেলা, ঢাকা জেলা, ঢাকা বিভাগ,...","(23.920162, 90.2108702, 0.0)",23.920162,90.21087,0.0,...,Market,Turkish Restaurant,Halal Restaurant,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Farmers Market,Electronics Store,Donut Shop,Diner
1,Dhaka,Dhanmondi,Jigatala TSO,1209,"1209, Dhanmondi, Dhaka, Bangladesh","(ধানমন্ডি আ/এ, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 12...","(23.7535496, 90.37312384681817, 0.0)",23.75355,90.373124,0.0,...,Art Gallery,Asian Restaurant,Shopping Mall,Indian Restaurant,Fried Chicken Joint,Coffee Shop,Lake,Department Store,Café,Burger Joint
2,Dhaka,Gulshan,Banani TSO,1213,"1213, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1213, Ba...","(23.7833829, 90.4119635, 0.0)",23.783383,90.411963,0.0,...,Hotel,Café,Food Truck,Indian Restaurant,Office,Turkish Restaurant,Coffee Shop,Fast Food Restaurant,Farmers Market,Electronics Store
3,Dhaka,Gulshan,Badda,1212,"1212, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1212, Ba...","(23.7930078, 90.4106613, 0.0)",23.793008,90.410661,0.0,...,Indian Restaurant,Café,Hotel,Italian Restaurant,Korean Restaurant,Asian Restaurant,Restaurant,Coffee Shop,Donut Shop,Fast Food Restaurant
4,Dhaka,Gulshan,Gulshan Model Town,1212,"1212, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1212, Ba...","(23.7930078, 90.4106613, 0.0)",23.793008,90.410661,0.0,...,Indian Restaurant,Café,Hotel,Italian Restaurant,Korean Restaurant,Asian Restaurant,Restaurant,Coffee Shop,Donut Shop,Fast Food Restaurant


In [26]:
dh_merged.head(100)

Unnamed: 0,District,Thana,SubOffice,Post Code,ADDRESS,location,point,latitude,longitude,altitude,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dhaka,Dhamrai,Kalampur,1350,"01350, Dhamrai, Dhaka, Bangladesh","(ধামরাই, ধামরাই উপজেলা, ঢাকা জেলা, ঢাকা বিভাগ,...","(23.920162, 90.2108702, 0.0)",23.920162,90.21087,0.0,...,Market,Turkish Restaurant,Halal Restaurant,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Farmers Market,Electronics Store,Donut Shop,Diner
1,Dhaka,Dhanmondi,Jigatala TSO,1209,"1209, Dhanmondi, Dhaka, Bangladesh","(ধানমন্ডি আ/এ, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 12...","(23.7535496, 90.37312384681817, 0.0)",23.75355,90.373124,0.0,...,Art Gallery,Asian Restaurant,Shopping Mall,Indian Restaurant,Fried Chicken Joint,Coffee Shop,Lake,Department Store,Café,Burger Joint
2,Dhaka,Gulshan,Banani TSO,1213,"1213, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1213, Ba...","(23.7833829, 90.4119635, 0.0)",23.783383,90.411963,0.0,...,Hotel,Café,Food Truck,Indian Restaurant,Office,Turkish Restaurant,Coffee Shop,Fast Food Restaurant,Farmers Market,Electronics Store
3,Dhaka,Gulshan,Badda,1212,"1212, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1212, Ba...","(23.7930078, 90.4106613, 0.0)",23.793008,90.410661,0.0,...,Indian Restaurant,Café,Hotel,Italian Restaurant,Korean Restaurant,Asian Restaurant,Restaurant,Coffee Shop,Donut Shop,Fast Food Restaurant
4,Dhaka,Gulshan,Gulshan Model Town,1212,"1212, Gulshan, Dhaka, Bangladesh","(গুলশান, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 1212, Ba...","(23.7930078, 90.4106613, 0.0)",23.793008,90.410661,0.0,...,Indian Restaurant,Café,Hotel,Italian Restaurant,Korean Restaurant,Asian Restaurant,Restaurant,Coffee Shop,Donut Shop,Fast Food Restaurant
5,Dhaka,Jatrabari,Dhania TSO,1236,"1236, Jatrabari, Dhaka, Bangladesh","(যাত্রাবাড়ী, ঢাকা, ঢাকা জেলা, ঢাকা বিভাগ, 123...","(23.7104228, 90.4344666, 0.0)",23.710423,90.434467,0.0,...,Electronics Store,Restaurant,Bus Station,Turkish Restaurant,Clothing Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Farmers Market,Donut Shop
6,Dhaka,Joypara,Joypara,1331,"1331, Joypara, Dhaka, Bangladesh","(joypara, দোহার, দোহার উপজেলা, ঢাকা জেলা, ঢাকা...","(23.6075985, 90.1249616, 0.0)",23.607599,90.124962,0.0,...,,,,,,,,,,
7,Dhaka,Joypara,Narisha,1332,"1332, Joypara, Dhaka, Bangladesh","(joypara, দোহার, দোহার উপজেলা, ঢাকা জেলা, ঢাকা...","(23.6075985, 90.1249616, 0.0)",23.607599,90.124962,0.0,...,,,,,,,,,,
8,Dhaka,Joypara,Palamganj,1331,"1331, Joypara, Dhaka, Bangladesh","(joypara, দোহার, দোহার উপজেলা, ঢাকা জেলা, ঢাকা...","(23.6075985, 90.1249616, 0.0)",23.607599,90.124962,0.0,...,,,,,,,,,,
9,Dhaka,Keraniganj,Ati,1312,"1312, Keraniganj, Dhaka, Bangladesh","(ভূঁইয়া এস্টেট, কেরাণীগঞ্জ উপজেলা, ঢাকা জেলা,...","(23.72495955, 90.34308827838592, 0.0)",23.72496,90.343088,0.0,...,,,,,,,,,,


In [27]:
dh_merged = dh_merged.dropna(subset=['Cluster Labels'])
print(dh_merged[dh_merged['Cluster Labels'].isnull()])

Empty DataFrame
Columns: [District, Thana, SubOffice, Post Code, ADDRESS, location, point, latitude, longitude, altitude, Cluster Labels, 1st Most Common Venue, 2nd Most Common Venue, 3rd Most Common Venue, 4th Most Common Venue, 5th Most Common Venue, 6th Most Common Venue, 7th Most Common Venue, 8th Most Common Venue, 9th Most Common Venue, 10th Most Common Venue]
Index: []

[0 rows x 21 columns]


# Results
The clustering obtained from the analysis above is shown in a folium map in this section. The map shows different clusters with different colors.

In [28]:
# create map
map_clusters = folium.Map(location=[latitude_dh, longitude_dh], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dh_merged['latitude'], dh_merged['longitude'], dh_merged['ADDRESS'], dh_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Conclusion
This exercise was aimed to create a cluster of different restaurent venues situated acroos the district of Dhaka, the capital of Bangladesh. At first the postal codes of Dhaka district were collected from Wikipedia using "BeautifulSoup". Then the lattitude and longitude data were collected using the python llibrary "GeoPy". After the datset had complete information about the lattitudes and longitudes of each location, the restaurent venues were collected using "FourSquare" API. Then K-Means clustering was applied to create at most 5 clusters of the datset. From the map above, we can see that, cluster-2 is the most spreadout cluster among others.

The clsutering of the locations of different venues will definitely help the young entrepreneurs around Dhaka city to decide on the type of restaurent to invest on around their neighbourhood. One posible direction for future work is to use a data-source for venues that has more data than the Foursqure platform. Dhaka district most certainly has more than 246 restaurents. The more data we can collect about the restaurents around dhaka district, the clustering process will provide more tangible separation.