In [1]:
import pandas as pd
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; } .CodeMirror pre {font-size: 8pt;} div.output_text{line-height:1; font-size:.9em !important;}</style>"))
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 500)
pd.options.display.float_format = '{:,.2f}'.format

# The Battle Of Neighborhoods

### Introduction

The United Arab Emirates (UAE; Arabic: دولة الإمارات العربية المتحدة‎ Dawlat al-ʾImārāt al-ʿArabīyyah al-Muttaḥidah), sometimes simply called the Emirates (Arabic: الإمارات‎ al-ʾImārāt), is the most modern and advance countries across the Middle East region often referred to as the Business Hub of Middle East. The UAE is a federation of seven emirates consisting of Abu Dhabi, Ajman, Dubai, Fujairah, Ras al-Khaimah, Sharjah and Umm al-Quwain. 

Dubai (/duːˈbaɪ/ doo-BY; Arabic: دبي‎ Dubay, Gulf Arabic: Arabic pronunciation: [dʊˈbɑj]) is the largest and most populous city in the United Arab Emirates (hereinafter referred to as UAE). The UAE is the most modern and advanced countries across the Middle East region and is also the central business hub location for local, regional and global businesses. Dubai is one of the 7 states in the UAE and welcomes over 30000 passengers each day through its state of the art world class airports and hosts over 60% of the entire expatriate population across the country. 

### Objective

Being a very busy city, one of the most challenging aspects is to find the best neighborhood in Dubai. A neighborhood that is in close proximity of the work location, within the budget, offers quality schooling and other required facilities. 

Our project is envisaged to analyze and classify areas of Dubai by leveraging Foursquare APIs and Machine Learning techniques such as Segmentation and Clustering. The final objective of this project is to Segment/Classify areas of Dubai based on Common Places captured from Foursquare APIs.

- Similarities and Dissimilarities between the Sectors (Sector 2 and Sector 3) of Dubai
- Classification of Areas by attractions for Residents and Tourists

### Data

The Data has been acquired through Wikipedia and maintained in a CSV format at:
https://github.com/ecopk/Coursera_Capstone/blob/master/DubaiSectorsData.csv

In [2]:
import numpy as np
import pandas as pd
dfdxb = pd.read_csv('DubaiSectorsData.csv')

In [3]:
dfdxb.head()

Unnamed: 0,CommunityCode,Sectors,Population,Area,AreaArabic
0,101,Sector 1,2,NAKHLAT DEIRA,نخلة ديرة
1,111,Sector 1,1550,AL CORNICHE,الكورنيش
2,112,Sector 1,6621,AL RASS,الرأس
3,113,Sector 1,14963,AL DHAGAYA,الضغاية
4,114,Sector 1,2563,AL BUTEEN,البطين


In [4]:
print('The Dataframe has {} Sectors and {} Communities.'.format(len(dfdxb['Sectors'].unique()), dfdxb.shape[0]))
dfdxb.groupby('Sectors').count()

The Dataframe has 9 Sectors and 225 Communities.


Unnamed: 0_level_0,CommunityCode,Population,Area,AreaArabic
Sectors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sector 1,23,23,23,23
Sector 2,35,35,35,35
Sector 3,57,57,57,57
Sector 4,10,10,10,10
Sector 5,18,18,18,18
Sector 6,31,31,31,31
Sector 7,7,7,7,7
Sector 8,16,16,16,16
Sector 9,28,28,28,28


#### Integrating Latitude and Longitude

In [5]:
import geocoder
SSK_API_KEY='AIzaSyCGMWm7FNWp7zamQX-LJOjckjjvfb5MYxk'
def get_latlng(area_name):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, United Arab Emirates'.format(area_name), key=SSK_API_KEY)
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [6]:
areas = dfdxb['Area'] + ' - DUBAI -'
coords = [ get_latlng(area_name) for area_name in areas.tolist() ]

In [7]:
dfdxb_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
dfdxb['Latitude'] = dfdxb_coords['Latitude']
dfdxb['Longitude'] = dfdxb_coords['Longitude']

In [8]:
dfdxb.head()

Unnamed: 0,CommunityCode,Sectors,Population,Area,AreaArabic,Latitude,Longitude
0,101,Sector 1,2,NAKHLAT DEIRA,نخلة ديرة,25.36,55.39
1,111,Sector 1,1550,AL CORNICHE,الكورنيش,25.28,55.33
2,112,Sector 1,6621,AL RASS,الرأس,25.27,55.3
3,113,Sector 1,14963,AL DHAGAYA,الضغاية,25.27,55.3
4,114,Sector 1,2563,AL BUTEEN,البطين,25.26,55.32


#### Integrating GeoPostCode Data

In [9]:
from geopy.geocoders import Nominatim
import folium
address = 'Dubai, United Arab Emirates'
geolocator = Nominatim(user_agent='my-application')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_dxb = folium.Map(location=[latitude, longitude], zoom_start=11)

In [10]:
for lat, lng, borough, neighborhood in zip(dfdxb['Latitude'], dfdxb['Longitude'], dfdxb['Sectors'], dfdxb['Area']):
    label = '{}, {}'.format('Dubai',neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#ff5400',
        fill_opacity=0.7).add_to(map_dxb)
map_dxb

### Methodology

Upon Integrating Communities with GeoPostCodes, we obtained the Latitude and Longitudes we will now use Foursquare APIs to explore the neighborhoods in Dubai. Later, we will use the Explore function to generate the Most Common Venues Category for each of the neighborhoods to build Clusters by leveraging K-Means Clustering Algorithm.

For the above use-case evaluation, we will consider Sector 3 of Dubai which has the highest population of 1,080,338

In [11]:
dfdxb.groupby(['Sectors'], as_index=False).agg({'CommunityCode':'count','Population':'sum'}).sort_values(by=['Population'], ascending=False).head()

Unnamed: 0,Sectors,CommunityCode,Population
2,Sector 3,57,1080338
1,Sector 2,35,609204
0,Sector 1,23,447137
4,Sector 5,18,420902
5,Sector 6,31,294585


#### Comparison of Sector 2 and Sector 3

In [12]:
#DataFrame of Sector 2
sec2 = dfdxb[dfdxb['Sectors'] == 'Sector 2'].reset_index(drop=True)

In [13]:
#DataFrame of Sector 3
sec3 = dfdxb[dfdxb['Sectors'] == 'Sector 3'].reset_index(drop=True)

#### Sector 2 Map

In [19]:
address = 'Dubai, Dubai'
geolocator = Nominatim(user_agent='my-application')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_sec2 = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, label in zip(sec2['Latitude'], sec2['Longitude'], sec2['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#ff003b',
        fill=True,
        fill_color='#ff003b',
        fill_opacity=0.7).add_to(map_sec2)  
map_sec2

#### Sector 3 Map

In [17]:
address = 'Dubai, Dubai'
geolocator = Nominatim(user_agent='my-application')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_sec3 = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, label in zip(sec3['Latitude'], sec3['Longitude'], sec3['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#ffbb00',
        fill_opacity=0.7).add_to(map_sec3)  
map_sec3

### Foursquare API Configuration

In [28]:
import requests
CLIENT_ID = 'MBCZHQ2AG4FKYCQKZ2LRQO44LVONPLOQ3SKOXBYT5AJ0YFOL' # your Foursquare ID
CLIENT_SECRET = 'K4M01ZE3GRUMF0ROMUWTHM2E0EUQGWRB0JZOBMDXFZ52HHE4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

#### Venues in Sector 2

In [30]:
sec2_venues = getNearbyVenues(names=sec2['Area'],
                                   latitudes=sec2['Latitude'],
                                   longitudes=sec2['Longitude']
                                  )
print(sec2_venues.shape)
sec2_venues.head()

NAD SHAMMA
AL GARHOUD
UM RAMOOL
AL RASHIDIYA
DUBAI  AIRPORT
AL TWAR FIRST
AL TWAR SECOND
AL TWAR THIRD
AL NAHDA FIRST
AL QUSAIS FIRST
AL QUSAIS SECOND
AL QUSAIS THIRD
AL NAHDA SECOND
AL QUSAIS IND. FIRST
AL QUSAIS IND. SECOND
MUHAISANAH THIRD
MUHAISANAH FOURTH
AL QUSAIS IND. THIRD
AL QUSAIS IND. FOURTH
AL QUSAIS IND FIFTH
MURDAF
MUSHRAIF
MUHAISANAH FIRST
AL MEZHAR FIRST
AL MEZHAR SECOND
MUHAISANAH SECOND 
OUD AL MUTEEN FIRST
OUD AL MUTEEN SECOND 
MUHAISANAH FIFTH
OUD AL MUTEEN THIRD 
WADI ALAMRADI
AL KHAWANEEJ ONE
AL KHAWANEEJ TWO
AL AYAS
AL TTAY
(246, 7)


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,NAD SHAMMA,25.22,55.38,Subway,25.21,55.39,Sandwich Place
1,NAD SHAMMA,25.22,55.38,Al Maaref Sports Hall,25.22,55.39,Badminton Court
2,NAD SHAMMA,25.22,55.38,كافتيريا كوخ الدجاج,25.22,55.39,Burger Joint
3,NAD SHAMMA,25.22,55.38,"Jumeriah Zabeel Saray (Dubai, United Arab Emir...",25.21,55.38,Hotel
4,NAD SHAMMA,25.22,55.38,Shaikh's Airport (مطار الشيوخ),25.21,55.39,Airport


In [31]:
print('There are {} uniques categories in Sector 2.'.format(len(sec2_venues['Venue Category'].unique())))
sec2_venues.groupby('Area').count()

There are 77 uniques categories in Sector 2.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AL AYAS,40,40,40,40,40,40
AL GARHOUD,4,4,4,4,4,4
AL KHAWANEEJ ONE,2,2,2,2,2,2
AL MEZHAR FIRST,5,5,5,5,5,5
AL MEZHAR SECOND,1,1,1,1,1,1
AL NAHDA FIRST,18,18,18,18,18,18
AL NAHDA SECOND,5,5,5,5,5,5
AL QUSAIS FIRST,14,14,14,14,14,14
AL QUSAIS IND FIFTH,1,1,1,1,1,1
AL QUSAIS IND. FIRST,11,11,11,11,11,11


### Venues in Sector 3

In [32]:
sec3_venues = getNearbyVenues(names=sec3['Area'],
                                   latitudes=sec3['Latitude'],
                                   longitudes=sec3['Longitude']
                                  )
print(sec3_venues.shape)
sec3_venues.head()

JUMEIRA BAY
WORLD ISLANDS
JUMEIRA ISLAND 2
AL SHANDAGA
AL SUQ AL KABEER
AL HAMRIYA
UM HURAIR FIRST
UM HURAIR SECOND
AL RAFFA
AL MANKHOOL
AL KARAMA
OUD METHA
MADINAT DUBAI AL MELAHEYAH (AL MINA)
AL HUDAIBA
AL JAFLIYA
AL KIFAF
ZAABEEL FIRST
AL JADAF
JUMEIRA FIRST
AL BADA
AL SATWA
TRADE CENTER FIRST
TRADE CENTER SECOND
ZAABEEL SECOND
JUMEIRA SECOND
AL WASL
BURJ KHALIFA
AL KALIJ AL TEJARI
AL MERKADH
JUMEIRA THIRD
AL SAFFA FIRST
AL GOZE FIRST
AL GOZE SECOND
UM SUQAIM FIRST
AL SAFFA SECOND
AL GOZE THIRD
AL GOZE FOURTH
UM SUQAIM SECOND
AL MANARA
AL GOZE IND. FIRST 
AL GOZE IND. SECOND 
UM SUQAIM THIRD
UM AL SHEIF
AL GOZE IND. THIRD 
AL GOZE IND. FOURTH 
AL SAFOUH FIRST
AL BARSHAA FIRST
AL BARSHAA THIRD
AL BAESHAA SECOND
NAKHLAT JUMEIRA 
AL SOFOUH SECOND
AL THANYAH FIRST (V. RABIE SAHRA'A)
AL THANYAH SECOND (JEBEL ALI HORSE RACING)
AL THANYAH THIRD (EMIRATE HILLS SECOND)
MARSA DUBAI (AL MINA AL SEYAHI) 
AL THANYAH FIFTH (EMIRATE HILLS FIRST) 
AL THANYAH FOURTH (EMIRATE HILLS THIRD) 
(1201, 7)


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,JUMEIRA BAY,25.08,55.15,Public Cafe,25.08,55.15,Café
1,JUMEIRA BAY,25.08,55.15,Pullman Dubai Jumeirah Lakes Towers-Hotel and ...,25.08,55.15,Hotel
2,JUMEIRA BAY,25.08,55.15,Le Petit Belge,25.08,55.15,Belgian Restaurant
3,JUMEIRA BAY,25.08,55.15,Friends Avenue Cafe JLT,25.08,55.15,Café
4,JUMEIRA BAY,25.08,55.15,The White Room Spa,25.08,55.15,Spa


In [33]:
print('There are {} uniques categories in Sector 3.'.format(len(sec3_venues['Venue Category'].unique())))
sec3_venues.groupby('Area').count()

There are 199 uniques categories in Sector 3.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AL BADA,6,6,6,6,6,6
AL BAESHAA SECOND,11,11,11,11,11,11
AL BARSHAA FIRST,12,12,12,12,12,12
AL BARSHAA THIRD,4,4,4,4,4,4
AL GOZE FIRST,1,1,1,1,1,1
AL GOZE FOURTH,5,5,5,5,5,5
AL GOZE IND. FIRST,5,5,5,5,5,5
AL GOZE IND. FOURTH,5,5,5,5,5,5
AL GOZE IND. SECOND,5,5,5,5,5,5
AL GOZE IND. THIRD,3,3,3,3,3,3


# Analysing Sector 2 and Sector 3

### Analyse Sector 2

In [36]:
sec2_onehot = pd.get_dummies(sec2_venues[['Venue Category']], prefix="", prefix_sep="")
sec2_onehot['Area'] = sec2_venues['Area']
fixed_columns = [sec2_onehot.columns[-1]] + list(sec2_onehot.columns[:-1])
sec2_onehot = sec2_onehot[fixed_columns]
print('{} rows were returned after one hot encoding.'.format(sec2_onehot.shape[0]))
sec2_grouped = sec2_onehot.groupby('Area').mean().reset_index()
print('{} rows were returned after grouping.'.format(sec2_grouped.shape[0]))

246 rows were returned after one hot encoding.
33 rows were returned after grouping.


In [38]:
num_top_venues = 5
for hood in sec2_grouped['Area']:
    print("----"+hood+"----")
    temp = sec2_grouped[sec2_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AL AYAS----
                       venue  freq
0  Middle Eastern Restaurant  0.12
1                       Café  0.08
2                      Hotel  0.05
3       Gym / Fitness Center  0.05
4          Convenience Store  0.05


----AL GARHOUD----
                venue  freq
0        Dessert Shop  0.25
1    Airport Terminal  0.25
2               Hotel  0.25
3   Convenience Store  0.25
4  Italian Restaurant  0.00


----AL KHAWANEEJ ONE----
                     venue  freq
0                   Lounge  0.50
1  Comfort Food Restaurant  0.50
2        Accessories Store  0.00
3       Italian Restaurant  0.00
4            Metro Station  0.00


----AL MEZHAR FIRST----
                 venue  freq
0               Bakery  0.40
1        Grocery Store  0.20
2       Sandwich Place  0.20
3            Cafeteria  0.20
4  Japanese Restaurant  0.00


----AL MEZHAR SECOND----
                       venue  freq
0              Grocery Store  1.00
1          Accessories Store  0.00
2         Italian Restaurant

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 8
indicators = ['st', 'nd', 'rd']
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
sec2_areas_venues_sorted = pd.DataFrame(columns=columns)
sec2_areas_venues_sorted['Area'] = sec2_grouped['Area']
for ind in np.arange(sec2_grouped.shape[0]):
    sec2_areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sec2_grouped.iloc[ind, :], num_top_venues)
sec2_areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,AL AYAS,Middle Eastern Restaurant,Café,Bar,Restaurant,Burger Joint,Italian Restaurant,Gym / Fitness Center,Hotel
1,AL GARHOUD,Convenience Store,Airport Terminal,Dessert Shop,Hotel,Vietnamese Restaurant,Cafeteria,Café,Candy Store
2,AL KHAWANEEJ ONE,Lounge,Comfort Food Restaurant,Vietnamese Restaurant,Convenience Store,Cafeteria,Café,Candy Store,Clothing Store
3,AL MEZHAR FIRST,Bakery,Cafeteria,Grocery Store,Sandwich Place,Vietnamese Restaurant,Convenience Store,Café,Candy Store
4,AL MEZHAR SECOND,Grocery Store,Vietnamese Restaurant,Cosmetics Shop,Cafeteria,Café,Candy Store,Clothing Store,Coffee Shop


### Analyse Sector 3

In [37]:
sec3_onehot = pd.get_dummies(sec3_venues[['Venue Category']], prefix="", prefix_sep="")
sec3_onehot['Area'] = sec3_venues['Area']
fixed_columns = [sec3_onehot.columns[-1]] + list(sec3_onehot.columns[:-1])
sec3_onehot = sec3_onehot[fixed_columns]
print('{} rows were returned after one hot encoding.'.format(sec3_onehot.shape[0]))
sec3_grouped = sec3_onehot.groupby('Area').mean().reset_index()
print('{} rows were returned after grouping.'.format(sec3_grouped.shape[0]))

1201 rows were returned after one hot encoding.
56 rows were returned after grouping.


In [39]:
num_top_venues = 5
for hood in sec3_grouped['Area']:
    print("----"+hood+"----")
    temp = sec3_grouped[sec3_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AL BADA----
                       venue  freq
0                       Park  0.33
1                       Café  0.17
2                       Pool  0.17
3          Indian Restaurant  0.17
4  Middle Eastern Restaurant  0.17


----AL BAESHAA SECOND----
                venue  freq
0                 Gym  0.09
1       Auto Workshop  0.09
2  Photography Studio  0.09
3     Photography Lab  0.09
4         Coffee Shop  0.09


----AL BARSHAA FIRST----
                     venue  freq
0   Furniture / Home Store  0.17
1                     Café  0.08
2  Health & Beauty Service  0.08
3           Ice Cream Shop  0.08
4                Cafeteria  0.08


----AL BARSHAA THIRD----
                  venue  freq
0             Cafeteria  0.25
1  Fast Food Restaurant  0.25
2                  Café  0.25
3  Herbs & Spices Store  0.25
4     Afghan Restaurant  0.00


----AL GOZE FIRST----
               venue  freq
0      Shopping Mall  1.00
1  Afghan Restaurant  0.00
2           Pharmacy  0.00
3        Music

                       venue  freq
0                Coffee Shop  0.13
1                       Café  0.10
2  Middle Eastern Restaurant  0.07
3                 Restaurant  0.05
4                      Hotel  0.05


----TRADE CENTER SECOND----
        venue  freq
0       Hotel  0.11
1      Lounge  0.09
2  Restaurant  0.07
3   Nightclub  0.07
4         Bar  0.07


----UM AL SHEIF----
               venue  freq
0  French Restaurant  1.00
1  Afghan Restaurant  0.00
2          Pet Store  0.00
3        Music Venue  0.00
4         Nail Salon  0.00


----UM HURAIR FIRST----
                  venue  freq
0                  Café  0.14
1                 Hotel  0.06
2           Coffee Shop  0.06
3    Italian Restaurant  0.05
4  Gym / Fitness Center  0.03


----UM HURAIR SECOND----
                       venue  freq
0          Indian Restaurant  0.12
1                       Café  0.09
2     Furniture / Home Store  0.09
3  Middle Eastern Restaurant  0.06
4                  Pool Hall  0.06


----UM SUQA

In [41]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 8
indicators = ['st', 'nd', 'rd']
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
sec3_areas_venues_sorted = pd.DataFrame(columns=columns)
sec3_areas_venues_sorted['Area'] = sec3_grouped['Area']
for ind in np.arange(sec3_grouped.shape[0]):
    sec3_areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sec3_grouped.iloc[ind, :], num_top_venues)
sec3_areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,AL BADA,Park,Pool,Indian Restaurant,Café,Middle Eastern Restaurant,Zoo,Frozen Yogurt Shop,Fried Chicken Joint
1,AL BAESHAA SECOND,Motorcycle Shop,Photography Lab,Coffee Shop,Gym,BBQ Joint,Auto Workshop,Arts & Crafts Store,Gym / Fitness Center
2,AL BARSHAA FIRST,Furniture / Home Store,Football Stadium,Cafeteria,Café,Fast Food Restaurant,Art Gallery,Arts & Crafts Store,Ice Cream Shop
3,AL BARSHAA THIRD,Cafeteria,Café,Fast Food Restaurant,Herbs & Spices Store,Zoo,Food Stand,Furniture / Home Store,Frozen Yogurt Shop
4,AL GOZE FIRST,Shopping Mall,Food & Drink Shop,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain


## Machine Learning

### K-mean Cluster Sector 2

In [46]:
from sklearn.cluster import KMeans
kclusters = 5
sec2_grouped_clustering = sec2_grouped.drop('Area', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sec2_grouped_clustering)
kmeans.labels_[0:10] 
sec2_merged = sec2
kmeans.labels_=np.append(kmeans.labels_, [1,0])
sec2_merged['Cluster Labels'] = kmeans.labels_
sec2_merged = sec2_merged.join(sec2_areas_venues_sorted.set_index('Area'), on='Area')
sec2_merged.head()

Unnamed: 0,CommunityCode,Sectors,Population,Area,AreaArabic,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,213,Sector 2,2790,NAD SHAMMA,ند شما,25.22,55.38,1,Accessories Store,Sandwich Place,Badminton Court,Burger Joint,Hotel,Airport,Electronics Store,Dessert Shop
1,214,Sector 2,17254,AL GARHOUD,القرهود,25.24,55.35,1,Convenience Store,Airport Terminal,Dessert Shop,Hotel,Vietnamese Restaurant,Cafeteria,Café,Candy Store
2,215,Sector 2,2829,UM RAMOOL,ام رمول,25.23,55.37,1,Auto Garage,Vietnamese Restaurant,Cosmetics Shop,Cafeteria,Café,Candy Store,Clothing Store,Coffee Shop
3,216,Sector 2,33719,AL RASHIDIYA,الراشدية,25.22,55.39,3,Fast Food Restaurant,Motorcycle Shop,Cafeteria,Café,Vietnamese Restaurant,Comfort Food Restaurant,Bus Station,Candy Store
4,221,Sector 2,22,DUBAI AIRPORT,مطار دبي الدولي,25.25,55.37,3,Airport Service,Cosmetics Shop,Pharmacy,Convenience Store,Cafeteria,Café,Candy Store,Clothing Store


### K-mean Cluster Sector 3

In [56]:
from sklearn.cluster import KMeans
kclusters = 5
sec3_grouped_clustering = sec3_grouped.drop('Area', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sec3_grouped_clustering)
kmeans.labels_[0:10] 
sec3_merged = sec3
kmeans.labels_=np.append(kmeans.labels_, [1])
sec3_merged['Cluster Labels'] = kmeans.labels_
sec3_merged = sec3_merged.join(sec3_areas_venues_sorted.set_index('Area'), on='Area')
sec3_merged.head()

Unnamed: 0,CommunityCode,Sectors,Population,Area,AreaArabic,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,302,Sector 3,20,JUMEIRA BAY,شاطئ جميرا,25.08,55.15,0,Café,Mediterranean Restaurant,Chinese Restaurant,Hotel,Supermarket,Convenience Store,Light Rail Station,Spa
1,303,Sector 3,4,WORLD ISLANDS,جزر العالم,25.23,55.17,0,Beach,Zoo,Food Court,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
2,304,Sector 3,3,JUMEIRA ISLAND 2,جزيرة جميرا 2,25.06,55.15,0,Restaurant,Pool,Zoo,Food & Drink Shop,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
3,311,Sector 3,2,AL SHANDAGA,الشندغة,25.27,55.3,0,Market,Middle Eastern Restaurant,Art Gallery,Tunnel,Flower Shop,Beach,Food Stand,Garden
4,312,Sector 3,44783,AL SUQ AL KABEER,السوق الكبير,25.26,55.29,0,Indian Restaurant,Hotel,Chinese Restaurant,Pub,Nightclub,Electronics Store,Asian Restaurant,Middle Eastern Restaurant


In [58]:
import matplotlib.cm as cm
import matplotlib.colors as colors
dxb_lat = 24.2087614
dxb_lng = 56.1253848
# Create The MAP Between 3.1343385, 101.6863371
sec2_clusters = folium.Map(location=[dxb_lat, dxb_lng], zoom_start=12)
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
for lat, lon, poi, cluster in zip(sec2_merged['Latitude'], sec2_merged['Longitude'], sec2_merged['Area'], sec2_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.5).add_to(sec2_clusters)
       
sec2_clusters

In [59]:
import matplotlib.cm as cm
import matplotlib.colors as colors
dxb_lat = 24.2087614
dxb_lng = 56.1253848
# Create The MAP Between 3.1343385, 101.6863371
sec3_clusters = folium.Map(location=[dxb_lat, dxb_lng], zoom_start=12)
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
for lat, lon, poi, cluster in zip(sec3_merged['Latitude'], sec3_merged['Longitude'], sec3_merged['Area'], sec3_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.5).add_to(sec3_clusters)
       
sec3_clusters

### Results - Sector 2

In [60]:
sec2_merged.loc[sec2_merged['Cluster Labels'] == 0, sec2_merged.columns[[2] + list(range(5, sec2_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
31,11169,25.25,55.49,0,Lounge,Comfort Food Restaurant,Vietnamese Restaurant,Convenience Store,Cafeteria,Café,Candy Store,Clothing Store
34,805,25.2,55.27,0,Middle Eastern Restaurant,Café,Bar,Restaurant,Burger Joint,Italian Restaurant,Gym / Fitness Center,Hotel


In [61]:
sec2_merged.loc[sec2_merged['Cluster Labels'] == 1, sec2_merged.columns[[2] + list(range(5, sec2_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,2790,25.22,55.38,1,Accessories Store,Sandwich Place,Badminton Court,Burger Joint,Hotel,Airport,Electronics Store,Dessert Shop
1,17254,25.24,55.35,1,Convenience Store,Airport Terminal,Dessert Shop,Hotel,Vietnamese Restaurant,Cafeteria,Café,Candy Store
2,2829,25.23,55.37,1,Auto Garage,Vietnamese Restaurant,Cosmetics Shop,Cafeteria,Café,Candy Store,Clothing Store,Coffee Shop
5,10250,25.27,55.37,1,Fast Food Restaurant,Bus Station,Supermarket,Mediterranean Restaurant,Cafeteria,American Restaurant,Department Store,Restaurant
6,4378,25.26,55.38,1,Coffee Shop,Mobile Phone Shop,Juice Bar,Shopping Plaza,Basketball Court,Restaurant,Vietnamese Restaurant,Bus Station
7,9851,25.25,55.39,1,Airport,Pakistani Restaurant,Vietnamese Restaurant,Convenience Store,Cafeteria,Café,Candy Store,Clothing Store
8,23760,25.27,55.37,1,Cafeteria,Indian Restaurant,Convenience Store,Café,Basketball Court,Ice Cream Shop,Hotel,Pakistani Restaurant
9,41225,25.27,55.37,1,Fast Food Restaurant,Bus Station,Supermarket,Mediterranean Restaurant,Cafeteria,American Restaurant,Department Store,Restaurant
10,11144,25.28,55.4,1,Fast Food Restaurant,Sandwich Place,Filipino Restaurant,Falafel Restaurant,Gas Station,Shopping Mall,Grocery Store,Comfort Food Restaurant
12,54678,25.29,55.38,1,Frozen Yogurt Shop,Fast Food Restaurant,Sporting Goods Shop,Asian Restaurant,Coffee Shop,Convenience Store,Cafeteria,Café


In [62]:
sec2_merged.loc[sec2_merged['Cluster Labels'] == 2, sec2_merged.columns[[2] + list(range(5, sec2_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
28,2,25.27,55.4,2,Market,Cafeteria,Park,Vietnamese Restaurant,Breakfast Spot,Bus Station,Café,Candy Store
29,6110,25.27,55.45,2,Café,Vietnamese Restaurant,Convenience Store,Bus Station,Cafeteria,Candy Store,Clothing Store,Coffee Shop
30,3120,25.2,55.48,2,Farm,Vietnamese Restaurant,Convenience Store,Bus Station,Cafeteria,Café,Candy Store,Clothing Store


In [63]:
sec2_merged.loc[sec2_merged['Cluster Labels'] == 3, sec2_merged.columns[[2] + list(range(5, sec2_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
3,33719,25.22,55.39,3,Fast Food Restaurant,Motorcycle Shop,Cafeteria,Café,Vietnamese Restaurant,Comfort Food Restaurant,Bus Station,Candy Store
4,22,25.25,55.37,3,Airport Service,Cosmetics Shop,Pharmacy,Convenience Store,Cafeteria,Café,Candy Store,Clothing Store
11,6480,25.26,55.39,3,Soccer Field,Gas Station,Candy Store,Restaurant,Bagel Shop,Vietnamese Restaurant,Convenience Store,Cafeteria
13,8181,25.29,55.39,3,Fast Food Restaurant,Coffee Shop,Flower Shop,Sporting Goods Shop,Mobile Phone Shop,Café,Restaurant,Bakery
24,10463,25.24,55.46,3,Grocery Store,Vietnamese Restaurant,Cosmetics Shop,Cafeteria,Café,Candy Store,Clothing Store,Coffee Shop


In [64]:
sec2_merged.loc[sec2_merged['Cluster Labels'] == 4, sec2_merged.columns[[2] + list(range(5, sec2_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
32,2786,25.24,55.52,4,,,,,,,,


### Results - Sector 3

In [66]:
sec3_merged.loc[sec3_merged['Cluster Labels'] == 0, sec3_merged.columns[[2] + list(range(5, sec3_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,20,25.08,55.15,0,Café,Mediterranean Restaurant,Chinese Restaurant,Hotel,Supermarket,Convenience Store,Light Rail Station,Spa
1,4,25.23,55.17,0,Beach,Zoo,Food Court,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
2,3,25.06,55.15,0,Restaurant,Pool,Zoo,Food & Drink Shop,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
3,2,25.27,55.3,0,Market,Middle Eastern Restaurant,Art Gallery,Tunnel,Flower Shop,Beach,Food Stand,Garden
4,44783,25.26,55.29,0,Indian Restaurant,Hotel,Chinese Restaurant,Pub,Nightclub,Electronics Store,Asian Restaurant,Middle Eastern Restaurant
5,31957,25.26,55.3,0,Restaurant,Café,Convenience Store,Plaza,Beach,Shopping Mall,Burger Joint,Steakhouse
6,5118,25.23,55.32,0,Café,Hotel,Coffee Shop,Italian Restaurant,Middle Eastern Restaurant,Restaurant,Sushi Restaurant,Bar
7,4579,25.25,55.31,0,Indian Restaurant,Café,Furniture / Home Store,BBQ Joint,Pool Hall,North Indian Restaurant,Middle Eastern Restaurant,Vegetarian / Vegan Restaurant
8,41266,25.26,55.29,0,Hotel,Indian Restaurant,Chinese Restaurant,Pizza Place,Hookah Bar,North Indian Restaurant,Bakery,Nightclub
9,36808,25.25,55.29,0,Filipino Restaurant,Hotel,Asian Restaurant,Seafood Restaurant,Thai Restaurant,Fast Food Restaurant,Campground,Pakistani Restaurant


In [67]:
sec3_merged.loc[sec3_merged['Cluster Labels'] == 1, sec3_merged.columns[[2] + list(range(5, sec3_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
47,11680,25.1,55.2,1,Cafeteria,Café,Fast Food Restaurant,Herbs & Spices Store,Zoo,Food Stand,Furniture / Home Store,Frozen Yogurt Shop
56,17260,25.06,55.17,1,Gym,Coffee Shop,Dessert Shop,Pharmacy,Café,Shopping Mall,Supermarket,Gym / Fitness Center


In [68]:
sec3_merged.loc[sec3_merged['Cluster Labels'] == 2, sec3_merged.columns[[2] + list(range(5, sec3_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
32,4463,25.14,55.25,2,Convenience Store,Park,Shopping Mall,Bus Station,Café,Pharmacy,Gym,Market


In [69]:
sec3_merged.loc[sec3_merged['Cluster Labels'] == 3, sec3_merged.columns[[2] + list(range(5, sec3_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
53,12025,25.14,55.21,3,Café,Spa,Zoo,Asian Restaurant,Theater,Coffee Shop,Shopping Mall,Supermarket


In [70]:
sec3_merged.loc[sec3_merged['Cluster Labels'] == 4, sec3_merged.columns[[2] + list(range(5, sec3_merged.shape[1]))]]

Unnamed: 0,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
19,51467,25.22,55.27,4,Park,Pool,Indian Restaurant,Café,Middle Eastern Restaurant,Zoo,Frozen Yogurt Shop,Fried Chicken Joint


### Discussions

As illustrated above in the Clusters of Sector 2 and Sector 3, we learn that a more pristine Classification can be achieved through the grouping of most commonly available Venue Categories. 
<b>- Sector 2 Cluster Division</b>
    - Cluster 1 Restaurants
    - Cluster 2 Fast Food Restaurants 
    - Cluster 3 Farm
    
<b>- Sector 3 Cluster Division</b>
    - Cluster 1 Hotels
    - Cluster 2 Café
    - Cluster 3 Convenience Store

<i>To extend this, we can further classify the Cluster Category Taxonomy so that we have a more concise view of Area Classification alongside deep Correlation Analysis. Further analysis, however is still pending due to the lack of Data Availability. The more rich dataset we could obtain the better the probability of delivering new insights is.</i>

In [75]:
sec2_merged['1st Most Common Venue'].unique()

array(['Accessories Store', 'Convenience Store', 'Auto Garage',
       'Fast Food Restaurant', 'Airport Service', 'Coffee Shop',
       'Airport', 'Cafeteria', 'Soccer Field', 'Frozen Yogurt Shop',
       'Garden', 'Middle Eastern Restaurant', 'Beach Bar', 'Restaurant',
       'Asian Restaurant', nan, 'Hotel', 'Bakery', 'Grocery Store',
       'Café', 'Market', 'Farm', 'Lounge'], dtype=object)

### Conclusion

Foursquare API is a powerful utility that enables us to capture detailed data about Common Places based on Geo Spatial Information of Dubai. Foursquare API has been proven valid in case of our Business Problem as defined above. 
- Similarities and Dissimilarities between the Sectors of Dubai
- Classification of Areas by proximity, Residential, Tourism and Other

<i>In the end, Dubai is a heavily populated city and it offers a lot of attraction throughout its Sectors. However, it is proven that Sector 3 offers more attraction towards Tourism whilst Sector 2 has a more subtle influence of Residents.</i>