# Coursera - Capstone Project for IBM Data Science Certificate
### "The battle of the neighborhoods"  **by Junaid Akram**

**Description**

This is the final assignment the Coursera Capstone Project
> https://www.coursera.org/learn/applied-data-science-capstone/home/info

**Objective:** discover the best neighborhood in dubai for opening a new restaurant.


**Data Sources:** (csv file of Dubai Neighborhood coordinates via Google Drive)

> https://github.com/Junaid-Akram/Coursera_Capstone

**GitHub repository**

> https://github.com/Junaid-Akram/Coursera_Capstone

**Table of contents:**

*   **System & Data Setup**
*   **Part 1** - Create initial table with 103 postal codes ('Postcode', 'Borough','Neighborhood')
*   **Part 2** - Setup Dubai Neighborhood map using folium & 'Latititude', 'Longitude' from csv
*   **Part 3** - Venue clustering by neighborhood and analysis of 'best' fit for new location

## System & Data Setup

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

#mapping tools
!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install folium
import folium # map rendering library

def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn



**be sure to load 'Dubai_neighborhoods.csv' into your working directory**

In [2]:
import os
cwd = os.getcwd()
cwd

'C:\\Users\\JUNAID'

In [3]:
# read csv file once loaded into working directory listed above
Geospacial_Coordinates = pd.read_csv('Dubai_neighborhoods.csv', sep = ',') 
# examine the shape of original input data
print(Geospacial_Coordinates.shape)

(24, 9)


## Part 2 - Setup Dubai Neighborhood Map

In [4]:
import json
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [5]:
Geo = pd.DataFrame(Geospacial_Coordinates)
Geo.head()
Geo = Geo.drop([23])
Geo

Unnamed: 0,Neighborhood,Avg Rent Per Unit,Z-Score,Distance from Palm,Distance from Zabeel,Distance from Jumeirah,Latitude,Longitude,Unnamed: 8
0,Discovery Gardens,44672,-1.53,8.18,26.15,20.73,25.039,55.1445,
1,Dubai Silicon Oasis,54417,-1.3,24.96,13.31,16.39,25.1279,55.3863,
2,Jumeirah Village Circle,60068,-1.17,9.16,20.56,16.13,25.0602,55.2094,
3,Dubai Sports City,62753,-1.1,11.36,22.32,18.28,25.0391,55.2176,
4,Remraam,67284,-0.99,16.71,25.27,22.27,25.0014,55.2508,
5,Al Furjan,73648,-0.84,9.7,27.28,22.02,25.0252,55.1459,
6,Jumeirah Village Triangle,82014,-0.64,8.87,22.78,18.04,25.0473,55.19,
7,Motor City,83876,-0.6,12.61,20.9,17.42,25.045,55.2397,
8,Damac Hills,94630,-0.34,16.4,22.41,19.37,25.0275,55.2524,
9,Al Sufouh,95804,-0.31,0.7,17.88,12.02,25.1134,55.1762,


In [6]:
Geo.dtypes

Neighborhood               object
Avg Rent Per Unit          object
Z-Score                   float64
Distance from Palm        float64
Distance from Zabeel      float64
Distance from Jumeirah    float64
Latitude                  float64
Longitude                 float64
Unnamed: 8                float64
dtype: object

In [7]:
address = 'Dubai, UAE'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dubai, UAE are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dubai, UAE are 25.0750095, 55.1887608818332.


In [8]:
# create map of Dubai using latitude and longitude values
map_dubai = folium.Map(location=[25.0750095, 55.1887608818332], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(Geo['Latitude'], Geo['Longitude'], Geo['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_dubai)  
    
map_dubai

### Part 2a - initial neighborhood comparison using Foursquare API

In [9]:
CLIENT_ID = 'XLXNHNRNJVCFXNWROSSADYIB2LJBDAJCURHAUSLV5W00VHEE' # my Foursquare ID
CLIENT_SECRET = 'VMK3ZE2NME3ZPS2GN5NLXWBP4XIOIQ1Q5C1500YIOFSPMOGB' # your Foursquare Secret
VERSION = '20190503' # Foursquare API version
radius = 500
LIMIT = 250

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XLXNHNRNJVCFXNWROSSADYIB2LJBDAJCURHAUSLV5W00VHEE
CLIENT_SECRET:VMK3ZE2NME3ZPS2GN5NLXWBP4XIOIQ1Q5C1500YIOFSPMOGB


**Let's explore 'Dubai Marina'.. that sounds like a cool spot**

In [10]:
#define objects for 'Studio District' index [15] in Geo
neighborhood_latitude = Geo.loc[15, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Geo.loc[15, 'Longitude'] # neighborhood longitude value
neighborhood_name = Geo.loc[15, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Dubai Marina are 25.0805, 55.1403.


**Now, let's get the top 100 venues that are in Dubai Marina within a radius of 500 meters.**

In [11]:
#step 1 - create the correct GET request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id=XLXNHNRNJVCFXNWROSSADYIB2LJBDAJCURHAUSLV5W00VHEE&client_secret=VMK3ZE2NME3ZPS2GN5NLXWBP4XIOIQ1Q5C1500YIOFSPMOGB&v=20190503&ll=25.0805,55.1403&radius=500&limit=250'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display GET request URL

'https://api.foursquare.com/v2/venues/explore?&client_id=XLXNHNRNJVCFXNWROSSADYIB2LJBDAJCURHAUSLV5W00VHEE&client_secret=VMK3ZE2NME3ZPS2GN5NLXWBP4XIOIQ1Q5C1500YIOFSPMOGB&v=20190503&ll=25.0805,55.1403&radius=500&limit=250'

In [37]:
results = requests.get(url).json()
results # remove ';' if you want to see json data

{'meta': {'code': 200, 'requestId': '5ccd1f29dd5797242d91e237'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 62,
  'suggestedBounds': {'ne': {'lat': 25.101500004500004,
    'lng': 55.18255985763224},
   'sw': {'lat': 25.092499995499992, 'lng': 55.172640142367754}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '567b3200498e729e9b749bf4',
       'name': 'AURIS INN Al MUHANNA Hotel',
       'location': {'address': 'Barsha Heights',
        'lat': 25.094750026564046,
        'lng': 55.17705810666746,
        'labeledLatLngs': [{'label': 'display',
          '

**clean the json and structure it into a *pandas* dataframe.**

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [15]:
venues = results['response']['groups'][0]['items']
    
df_Marina = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df_Marina = df_Marina.loc[:, filtered_columns]

# filter the category for each row
df_Marina['venue.categories'] = df_Marina.apply(get_category_type, axis=1)

# clean columns

df_Marina.columns = [col.split(".")[-1] for col in df_Marina.columns]
df_Marina.insert(0, 'neighborhood', 'Dubai Marina')

print('{} venues were returned by Foursquare.'.format(df_Marina.shape[0]))
df_Marina.head()

93 venues were returned by Foursquare.


Unnamed: 0,neighborhood,name,categories,lat,lng
0,Dubai Marina,Park Island برج پارك آيلاند,Residential Building (Apartment / Condo),25.082267,55.142127
1,Dubai Marina,Café Bateel,Café,25.081826,55.138066
2,Dubai Marina,Zaatar w Zeit,Middle Eastern Restaurant,25.080036,55.142305
3,Dubai Marina,Club Stretch,Yoga Studio,25.079337,55.142253
4,Dubai Marina,Caribou Coffee,Coffee Shop,25.081849,55.139867


**create a map of the Marina district and highlight nearby venues

In [16]:
map_marina = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=15)

# add markers to map
for lat, lng, name, categories in zip(df_Marina['lat'], df_Marina['lng'], df_Marina['name'], df_Marina['categories']):
  label = '{},{}'.format(categories,name)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
      [lat, lng],
      radius=5,
      popup=label,
      color='blue',
      fill=True,
      fill_color='#3186cc',
      fill_opacity=0.7).add_to(map_marina) 
    
map_marina

### Let's create a similar dataframe for each neighborhood: 

**index # 9 - Al Sufouh**

In [18]:
#define objects for 'Al Soufouh' index [9] in Geo
neighborhood_latitude = Geo.loc[9, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Geo.loc[9, 'Longitude'] # neighborhood longitude value
neighborhood_name = Geo.loc[9, 'Neighborhood'] # neighborhood name

#step 1 - create the correct GET request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
df_ASufouh = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df_ASufouh = df_ASufouh.loc[:, filtered_columns]

# filter the category for each row
df_ASufouh['venue.categories'] = df_ASufouh.apply(get_category_type, axis=1)

# clean columns

df_ASufouh.columns = [col.split(".")[-1] for col in df_ASufouh.columns]
df_ASufouh.insert(0, 'neighborhood', 'Al Sufouh')

print('{} venues were returned by Foursquare.'.format(df_ASufouh.shape[0]))
df_ASufouh.head()

4 venues were returned by Foursquare.


Unnamed: 0,neighborhood,name,categories,lat,lng
0,Al Sufouh,Emirates Co-Op,Grocery Store,25.112435,55.173827
1,Al Sufouh,Shaikhath Al Arab Cafeteria شيخة العرب كافتيريا,Cafeteria,25.112604,55.173924
2,Al Sufouh,Marina Pharmacy,Pharmacy,25.11253,55.173813
3,Al Sufouh,Al Sufouh Park,Playground,25.113393,55.17166


**index # 10 DIFC**

In [19]:
#define objects for 'DIFC' index [10] in Geo
neighborhood_latitude = Geo.loc[10, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Geo.loc[10, 'Longitude'] # neighborhood longitude value
neighborhood_name = Geo.loc[10, 'Neighborhood'] # neighborhood name

#step 1 - create the correct GET request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
df_DIFC = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df_DIFC = df_DIFC.loc[:, filtered_columns]

# filter the category for each row
df_DIFC['venue.categories'] = df_DIFC.apply(get_category_type, axis=1)

# clean columns

df_DIFC.columns = [col.split(".")[-1] for col in df_DIFC.columns]
df_DIFC.insert(0, 'neighborhood', 'DIFC')

print('{} venues were returned by Foursquare.'.format(df_DIFC.shape[0]))
df_DIFC.head()

76 venues were returned by Foursquare.


Unnamed: 0,neighborhood,name,categories,lat,lng
0,DIFC,The Sunken Garden,Hookah Bar,25.212093,55.280039
1,DIFC,Café Belge,Belgian Restaurant,25.212062,55.279914
2,DIFC,The Ritz-Carlton,Hotel,25.213067,55.279473
3,DIFC,Burger & Lobster,Seafood Restaurant,25.211287,55.281787
4,DIFC,Carnival by Trèsind,Indian Restaurant,25.21101,55.282113


**index # 11 Business Bay**

In [20]:
#define objects for 'Business_Bay' index [11] in Geo
neighborhood_latitude = Geo.loc[11, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Geo.loc[11, 'Longitude'] # neighborhood longitude value
neighborhood_name = Geo.loc[11, 'Neighborhood'] # neighborhood name

#step 1 - create the correct GET request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
df_Business_Bay = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df_Business_Bay = df_Business_Bay.loc[:, filtered_columns]

# filter the category for each row
df_Business_Bay['venue.categories'] = df_Business_Bay.apply(get_category_type, axis=1)

# clean columns

df_Business_Bay.columns = [col.split(".")[-1] for col in df_Business_Bay.columns]
df_Business_Bay.insert(0, 'neighborhood', 'Business Bay')

print('{} venues were returned by Foursquare.'.format(df_Business_Bay.shape[0]))
df_Business_Bay.head()

25 venues were returned by Foursquare.


Unnamed: 0,neighborhood,name,categories,lat,lng
0,Business Bay,Gulf Court Hotel Business Bay,Hotel,25.182244,55.274908
1,Business Bay,"Renaissance Downtown Hotel, Dubai",Hotel,25.185675,55.27365
2,Business Bay,Basta!,Italian Restaurant,25.18575,55.273635
3,Business Bay,BHAR,Middle Eastern Restaurant,25.185598,55.273534
4,Business Bay,Six Senses Spa Dubai,Spa,25.185825,55.273563


**index # 12 Jumeirah Lakes Towers**

In [21]:
#define objects for 'JLT' index [12] in Geo
neighborhood_latitude = Geo.loc[12, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Geo.loc[12, 'Longitude'] # neighborhood longitude value
neighborhood_name = Geo.loc[12, 'Neighborhood'] # neighborhood name

#step 1 - create the correct GET request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
df_JLT = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df_JLT = df_JLT.loc[:, filtered_columns]

# filter the category for each row
df_JLT['venue.categories'] = df_JLT.apply(get_category_type, axis=1)

# clean columns

df_JLT.columns = [col.split(".")[-1] for col in df_JLT.columns]
df_JLT.insert(0, 'neighborhood', 'Jumeirah Lakes Towers')

print('{} venues were returned by Foursquare.'.format(df_JLT.shape[0]))
df_JLT.head()

35 venues were returned by Foursquare.


Unnamed: 0,neighborhood,name,categories,lat,lng
0,Jumeirah Lakes Towers,Bait Maryam,Theme Restaurant,25.070765,55.141889
1,Jumeirah Lakes Towers,Betawi,Indonesian Restaurant,25.069993,55.141876
2,Jumeirah Lakes Towers,Fidelity Fitness Club,Gym,25.068784,55.14166
3,Jumeirah Lakes Towers,Wokyo Noodle Bar,Noodle House,25.068151,55.140931
4,Jumeirah Lakes Towers,Golositalia,Italian Restaurant,25.069462,55.140268


**index 13 Barsha Heights**

In [22]:
#define objects for 'Barsha' index [12] in Geo
neighborhood_latitude = Geo.loc[13, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Geo.loc[13, 'Longitude'] # neighborhood longitude value
neighborhood_name = Geo.loc[13, 'Neighborhood'] # neighborhood name

#step 1 - create the correct GET request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
df_Barsha = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df_Barsha = df_Barsha.loc[:, filtered_columns]

# filter the category for each row
df_Barsha['venue.categories'] = df_Barsha.apply(get_category_type, axis=1)

# clean columns

df_Barsha.columns = [col.split(".")[-1] for col in df_Barsha.columns]
df_Barsha.insert(0, 'neighborhood', 'Barsha Heights')

print('{} venues were returned by Foursquare.'.format(df_Barsha.shape[0]))
df_Barsha.head()

62 venues were returned by Foursquare.


Unnamed: 0,neighborhood,name,categories,lat,lng
0,Barsha Heights,AURIS INN Al MUHANNA Hotel,Hotel,25.09475,55.177058
1,Barsha Heights,TRYP by Wyndham Dubai,Hotel,25.097234,55.174834
2,Barsha Heights,Beef King,Chinese Restaurant,25.096673,55.175715
3,Barsha Heights,Fuchsia,Thai Restaurant,25.095363,55.178584
4,Barsha Heights,MMA Fitness Center,Gym / Fitness Center,25.096647,55.175727


**analysis of venue distribution**

In [23]:
df_venues = pd.concat([df_Barsha, df_JLT, df_Business_Bay, df_DIFC, df_Marina, df_ASufouh])
df_venues['count'] = 1
df_venues.shape

(295, 6)

In [24]:
total_venues = pd.pivot_table(df_venues,index=["neighborhood"], values=["count"],aggfunc=np.sum)
total_venues

Unnamed: 0_level_0,count
neighborhood,Unnamed: 1_level_1
Al Sufouh,4
Barsha Heights,62
Business Bay,25
DIFC,76
Dubai Marina,93
Jumeirah Lakes Towers,35


In [25]:
df_venues2 = df_venues.copy()
df_venues3 = df_venues.copy()
df_venues_rest = df_venues2[df_venues2['categories'].str.contains('Restaurant')].reset_index(drop=True)
df_venues_rest['Venue Type'] = 'Restaurant'
df_venues_hotel = df_venues3[df_venues3['categories'].str.contains('Hotel')].reset_index(drop=True)
df_venues_hotel['Venue Type'] = 'Hotel'
df_venues_final = pd.concat([df_venues_rest,df_venues_hotel]).reset_index(drop=True)
df_venues_final.shape

(136, 7)

In [26]:
pivot = pd.pivot_table(df_venues_final,index=["neighborhood","Venue Type"], values=["count"],aggfunc=np.sum)
pivot

Unnamed: 0_level_0,Unnamed: 1_level_0,count
neighborhood,Venue Type,Unnamed: 2_level_1
Barsha Heights,Hotel,14
Barsha Heights,Restaurant,17
Business Bay,Hotel,5
Business Bay,Restaurant,12
DIFC,Hotel,5
DIFC,Restaurant,27
Dubai Marina,Hotel,11
Dubai Marina,Restaurant,29
Jumeirah Lakes Towers,Hotel,1
Jumeirah Lakes Towers,Restaurant,15


In [27]:
df_venues_final.groupby('neighborhood')['Venue Type']\
    .value_counts()\
    .unstack(level=1)\
    .plot.bar(stacked=True)

<matplotlib.axes._subplots.AxesSubplot at 0x9c98350>

**create 'one hot' file with dummy values by venue category**

In [28]:
# one hot encoding
dubai_onehot = pd.get_dummies(df_venues_final[['categories']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dubai_onehot['neighborhood'] = df_venues_final['neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dubai_onehot.columns[-1]] + list(dubai_onehot.columns[:-1])
dubai_onehot = dubai_onehot[fixed_columns]

dubai_onehot.head()

Unnamed: 0,neighborhood,African Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,English Restaurant,Fast Food Restaurant,...,Peruvian Restaurant,Restaurant,Russian Restaurant,Seafood Restaurant,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Vietnamese Restaurant
0,Barsha Heights,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barsha Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,Barsha Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barsha Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barsha Heights,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [29]:
dubai_grouped = dubai_onehot.groupby('neighborhood').mean().reset_index()
dubai_grouped

Unnamed: 0,neighborhood,African Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,English Restaurant,Fast Food Restaurant,...,Peruvian Restaurant,Restaurant,Russian Restaurant,Seafood Restaurant,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Vietnamese Restaurant
0,Barsha Heights,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.032258,0.0
1,Business Bay,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,...,0.0,0.176471,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0
2,DIFC,0.03125,0.03125,0.0625,0.03125,0.0,0.03125,0.0,0.03125,0.03125,...,0.03125,0.125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0
3,Dubai Marina,0.0,0.025,0.1,0.0,0.025,0.025,0.0,0.025,0.025,...,0.0,0.05,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Jumeirah Lakes Towers,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0,0.125


In [30]:
dubai_grouped.shape

(5, 37)

**Let's print each neighborhood along with the top 5 most common venues**

In [31]:
num_top_venues = 5

for hood in dubai_grouped['neighborhood']:
    print("----"+hood+"----")
    temp = dubai_grouped[dubai_grouped['neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barsha Heights----
                       venue  freq
0                      Hotel  0.42
1  Middle Eastern Restaurant  0.23
2         Italian Restaurant  0.06
3          Indian Restaurant  0.03
4         Turkish Restaurant  0.03


----Business Bay----
                       venue  freq
0                      Hotel  0.24
1                 Restaurant  0.18
2  Middle Eastern Restaurant  0.12
3         Italian Restaurant  0.12
4        Japanese Restaurant  0.06


----DIFC----
                venue  freq
0               Hotel  0.16
1          Restaurant  0.12
2  Italian Restaurant  0.12
3   Indian Restaurant  0.09
4    Asian Restaurant  0.06


----Dubai Marina----
                       venue  freq
0                      Hotel  0.25
1         Italian Restaurant  0.15
2  Middle Eastern Restaurant  0.12
3           Asian Restaurant  0.10
4                 Restaurant  0.05


----Jumeirah Lakes Towers----
                        venue  freq
0       Vietnamese Restaurant  0.12
1          Ita

**First, let's write a function to sort the venues in descending order.**

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [33]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['neighborhood'] = dubai_grouped['neighborhood']

for ind in np.arange(dubai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dubai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Barsha Heights,Hotel,Middle Eastern Restaurant,Italian Restaurant,French Restaurant,Hotel Bar
1,Business Bay,Hotel,Restaurant,Italian Restaurant,Middle Eastern Restaurant,Japanese Restaurant
2,DIFC,Hotel,Italian Restaurant,Restaurant,Indian Restaurant,Asian Restaurant
3,Dubai Marina,Hotel,Italian Restaurant,Middle Eastern Restaurant,Asian Restaurant,Restaurant
4,Jumeirah Lakes Towers,Vietnamese Restaurant,Indian Restaurant,Italian Restaurant,Thai Restaurant,Molecular Gastronomy Restaurant


In [35]:
#define objects for 'DIFC' index [10] in Geo
neighborhood_latitude = Geo.loc[10, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Geo.loc[10, 'Longitude'] # neighborhood longitude value
neighborhood_name = Geo.loc[10, 'Neighborhood'] # neighborhood name

map_DIFC = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=15)

# add markers to map
for lat, lng, name, categories in zip(df_DIFC['lat'], df_DIFC['lng'], df_DIFC['name'], df_DIFC['categories']):
  label = '{},{}'.format(categories,name)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
      [lat, lng],
      radius=5,
      popup=label,
      color='blue',
      fill=True,
      fill_color='#3181cc',
      fill_opacity=0.7).add_to(map_DIFC) 
    
map_DIFC