# Week 5 Final Report

## Best suburbs for opening a new Shopping Mall in Belgrade, Serbia

1. Build a dataframe of neighborhoods in Belgrade, Serbia by web scraping the data from Wikipedia page
2. Get the geographical coordinates of the neighborhoods
3. Obtain the venue data for the neighborhoods from Foursquare API
4. Explore and cluster the neighborhoods
5. Select the best cluster to open a new shopping mall

**Import libraries**

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium --yes
import folium # map rendering library

print("Libraries imported.")

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


**Scrap data from Wikipedia page into a DataFrame**

In [2]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Suburbs_of_Belgrade").text
soup = BeautifulSoup(data, 'html.parser')
neighborhoodList = []
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)
neighborhoodList = neighborhoodList[1:-1]
bgd_df = pd.DataFrame({"Neighborhood": neighborhoodList})

print(bgd_df.size)
bgd_df.head()

133


Unnamed: 0,Neighborhood
0,Amerić
1,"Arapovac, Serbia"
2,Arnajevo
3,Babe (Sopot)
4,Baćevac


**Getting the geographical coordinates**

In [3]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Belgrade, Serbia'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords
coords = [ get_latlng(neighborhood) for neighborhood in bgd_df["Neighborhood"].tolist() ]
print(coords[:10])

[[44.81310000000008, 20.46329000000003], [44.81310000000008, 20.46329000000003], [44.81310000000008, 20.46329000000003], [44.53606000000008, 20.535560000000032], [44.81310000000008, 20.46329000000003], [44.81310000000008, 20.46329000000003], [44.570340000000044, 20.42318000000006], [44.71777000000003, 20.428480000000036], [44.398610000000076, 20.36889000000002], [44.29989000000006, 20.27323000000007]]


In [4]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [5]:
bgd_df['Latitude'] = df_coords['Latitude']
bgd_df['Longitude'] = df_coords['Longitude']

In [6]:
print(bgd_df.shape)
bgd_df.head(10)

(133, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Amerić,44.8131,20.46329
1,"Arapovac, Serbia",44.8131,20.46329
2,Arnajevo,44.8131,20.46329
3,Babe (Sopot),44.53606,20.53556
4,Baćevac,44.8131,20.46329
5,Baljevac,44.8131,20.46329
6,Barajevo,44.57034,20.42318
7,Barič,44.71777,20.42848
8,Baroševac,44.39861,20.36889
9,Barzilovica,44.29989,20.27323


In [7]:
bgd_df.to_csv("bgd_df.csv", index=False)

**Create a map of Belgrade with neighborhoods superimposed on top**

In [8]:
# get the coordinates of Belgrade, Republic of Serbia/Europe
address = 'Belgrade, Serbia'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Belgrade, Serbia {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Belgrade, Serbia 44.8178131, 20.4568974.


In [9]:
# create map of Belgrade suburbs using latitude and longitude values
map_bgd = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(bgd_df['Latitude'], bgd_df['Longitude'], bgd_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_bgd)  
    
map_bgd

In [10]:
# save the map as HTML file
map_bgd.save('map_bgd.html')

**Use the Foursquare API to explore the neighborhoods**

In [11]:
# define Foursquare Credentials and Version
CLIENT_ID = 'TCGHIU1SHNMQEXJW30Y41CQLYFNCNRXZV12GH0EN3KYDHATH' # your Foursquare ID
CLIENT_SECRET = 'D0ARF0QPTGJODIQUEOHI3VMNS2OFD0WYQ3VTFD4UJKKEXUXB' # your Foursquare Secret
VERSION = '20191215' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TCGHIU1SHNMQEXJW30Y41CQLYFNCNRXZV12GH0EN3KYDHATH
CLIENT_SECRET:D0ARF0QPTGJODIQUEOHI3VMNS2OFD0WYQ3VTFD4UJKKEXUXB


In [12]:
radius = 2500
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(bgd_df['Latitude'], bgd_df['Longitude'], bgd_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [13]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(7583, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Amerić,44.8131,20.46329,Pozorište na Terazijama,44.81275,20.461343,Theater
1,Amerić,44.8131,20.46329,Azbuka Gastro Progres,44.810942,20.461822,Jazz Club
2,Amerić,44.8131,20.46329,Café Moskva,44.812868,20.460984,Café
3,Amerić,44.8131,20.46329,Club svetskih putnika | The Club of Globe Trot...,44.816656,20.464364,Lounge
4,Amerić,44.8131,20.46329,Trg Nikole Pašića,44.812492,20.463187,Plaza


In [14]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amerić,100,100,100,100,100,100
"Arapovac, Serbia",100,100,100,100,100,100
Arnajevo,100,100,100,100,100,100
Babe (Sopot),4,4,4,4,4,4
Baljevac,100,100,100,100,100,100
Barajevo,7,7,7,7,7,7
Barič,35,35,35,35,35,35
Baroševac,1,1,1,1,1,1
Barzilovica,3,3,3,3,3,3
Bastav,100,100,100,100,100,100


In [15]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 168 uniques categories.


In [16]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Theater', 'Jazz Club', 'Café', 'Lounge', 'Plaza', 'Hotel',
       'Coffee Shop', 'Boarding House', 'Cosmetics Shop', 'Buffet',
       'Park', 'Hostel', 'Event Space', 'Beer Bar', 'Art Gallery',
       'Restaurant', 'Meze Restaurant', 'Gym', 'Pedestrian Plaza',
       'BBQ Joint', 'Bar', 'Gastropub', 'Ice Cream Shop', 'Jewelry Store',
       'Dessert Shop', 'Creperie', 'Sandwich Place', 'Pizza Place',
       'Candy Store', 'Eastern European Restaurant', 'Italian Restaurant',
       'Salad Place', 'Bookstore', 'Arcade', 'Nightclub',
       'Vegetarian / Vegan Restaurant', 'Cocktail Bar',
       'Sushi Restaurant', 'Burrito Place', 'Museum', 'Bed & Breakfast',
       'Cultural Center', 'Wine Bar', 'Clothing Store', 'Pie Shop',
       'Track', 'Indoor Play Area', 'Botanical Garden',
       'Falafel Restaurant', 'Stables'], dtype=object)

In [17]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

**Analyze Each Neighborhood**

In [18]:
# one hot encoding
bgd_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bgd_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bgd_onehot.columns[-1]] + list(bgd_onehot.columns[:-1])
bgd_onehot = bgd_onehot[fixed_columns]

print(bgd_onehot.shape)
bgd_onehot.head()

(7583, 169)


Unnamed: 0,Neighborhoods,Airport,American Restaurant,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brewery,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Butcher,Café,Campanian Restaurant,Campground,Candy Store,Car Wash,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Cosmetics Shop,Creperie,Cultural Center,Dessert Shop,Diner,Doner Restaurant,Drugstore,Eastern European Restaurant,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Food,Food Court,Forest,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Home Service,Hostel,Hotel,IT Services,Ice Cream Shop,Indoor Play Area,Irish Pub,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Lake,Lounge,Market,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Meze Restaurant,Mobile Phone Shop,Modern European Restaurant,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Nature Preserve,Neighborhood,Nightclub,Noodle House,Opera House,Outlet Store,Palace,Park,Pedestrian Plaza,Pet Store,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Racecourse,Rafting,Recreation Center,Resort,Restaurant,River,Rock Climbing Spot,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Sausage Shop,Scenic Lookout,Seafood Restaurant,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Tennis Stadium,Theater,Theme Park,Toll Booth,Toy / Game Store,Track,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Vineyard,Wine Bar
0,Amerić,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,Amerić,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Amerić,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Amerić,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Amerić,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
bgd_grouped = bgd_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(bgd_grouped.shape)
bgd_grouped

(117, 169)


Unnamed: 0,Neighborhoods,Airport,American Restaurant,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brewery,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Butcher,Café,Campanian Restaurant,Campground,Candy Store,Car Wash,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Cosmetics Shop,Creperie,Cultural Center,Dessert Shop,Diner,Doner Restaurant,Drugstore,Eastern European Restaurant,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Food,Food Court,Forest,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Home Service,Hostel,Hotel,IT Services,Ice Cream Shop,Indoor Play Area,Irish Pub,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Lake,Lounge,Market,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Meze Restaurant,Mobile Phone Shop,Modern European Restaurant,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Nature Preserve,Neighborhood,Nightclub,Noodle House,Opera House,Outlet Store,Palace,Park,Pedestrian Plaza,Pet Store,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Racecourse,Rafting,Recreation Center,Resort,Restaurant,River,Rock Climbing Spot,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Sausage Shop,Scenic Lookout,Seafood Restaurant,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soccer Stadium,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Tennis Stadium,Theater,Theme Park,Toll Booth,Toy / Game Store,Track,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Vineyard,Wine Bar
0,Amerić,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.07,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.02,0.0,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.0,0.04,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.01,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01
1,"Arapovac, Serbia",0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.07,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.02,0.0,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.0,0.04,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.01,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01
2,Arnajevo,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.07,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.02,0.0,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.0,0.04,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.01,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01
3,Babe (Sopot),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Baljevac,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.07,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.02,0.0,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.0,0.04,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.01,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01
5,Barajevo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.571429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Barič,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.028571,0.057143,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.228571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.171429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0
7,Baroševac,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Barzilovica,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bastav,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.07,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.02,0.0,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.0,0.04,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.01,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01


In [20]:
len(bgd_grouped[bgd_grouped["Shopping Mall"] > 0])

6

**Create a new DataFrame for Shopping Mall data only**

In [21]:
bgd_mall = bgd_grouped[["Neighborhoods","Shopping Mall"]]
bgd_mall

Unnamed: 0,Neighborhoods,Shopping Mall
0,Amerić,0.0
1,"Arapovac, Serbia",0.0
2,Arnajevo,0.0
3,Babe (Sopot),0.0
4,Baljevac,0.0
5,Barajevo,0.0
6,Barič,0.0
7,Baroševac,0.0
8,Barzilovica,0.0
9,Bastav,0.0


# Cluster Neighborhoods

In [22]:
# set number of clusters
kclusters = 3

bgd_clustering = bgd_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bgd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [23]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
bgd_merged = bgd_mall.copy()

# add clustering labels
bgd_merged["Cluster Labels"] = kmeans.labels_

bgd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
bgd_merged

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Amerić,0.0,0
1,"Arapovac, Serbia",0.0,0
2,Arnajevo,0.0,0
3,Babe (Sopot),0.0,0
4,Baljevac,0.0,0
5,Barajevo,0.0,0
6,Barič,0.0,0
7,Baroševac,0.0,0
8,Barzilovica,0.0,0
9,Bastav,0.0,0


In [24]:
# merge Belgrade's suburbs_grouped with Belgrade's suburbs_data to add latitude/longitude for each neighborhood
bgd_merged = bgd_merged.join(bgd_df.set_index("Neighborhood"), on="Neighborhood")

print(bgd_merged.shape)
bgd_merged.head() # check the last columns!

(117, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Amerić,0.0,0,44.8131,20.46329
1,"Arapovac, Serbia",0.0,0,44.8131,20.46329
2,Arnajevo,0.0,0,44.8131,20.46329
3,Babe (Sopot),0.0,0,44.53606,20.53556
4,Baljevac,0.0,0,44.8131,20.46329


In [25]:
# sort the results by Cluster Labels
print(bgd_merged.shape)
bgd_merged.sort_values(["Cluster Labels"], inplace=True)
bgd_merged

(117, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Amerić,0.0,0,44.8131,20.46329
83,Rogača (Sopot),0.0,0,44.45852,20.52115
82,Ritopek,0.0,0,44.8131,20.46329
81,Ripanj,0.0,0,44.8131,20.46329
80,Ralja (Sopot),0.0,0,44.57458,20.55994
78,Radmilovac,0.0,0,44.8131,20.46329
77,Radiofar,0.0,0,44.8131,20.46329
76,Rabrovac,0.0,0,44.8131,20.46329
75,Pudarci,0.0,0,44.8131,20.46329
74,Progar,0.0,0,44.8131,20.46329


## Visualizing the resulting clusters

In [63]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bgd_merged['Latitude'], bgd_merged['Longitude'], bgd_merged['Neighborhood'], bgd_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Examine Clusters in each claster**

In [27]:
bgd_merged.loc[bgd_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Amerić,0.0,0,44.8131,20.46329
83,Rogača (Sopot),0.0,0,44.45852,20.52115
82,Ritopek,0.0,0,44.8131,20.46329
81,Ripanj,0.0,0,44.8131,20.46329
80,Ralja (Sopot),0.0,0,44.57458,20.55994
78,Radmilovac,0.0,0,44.8131,20.46329
77,Radiofar,0.0,0,44.8131,20.46329
76,Rabrovac,0.0,0,44.8131,20.46329
75,Pudarci,0.0,0,44.8131,20.46329
74,Progar,0.0,0,44.8131,20.46329


In [28]:
bgd_merged.loc[bgd_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
97,Surčin,0.076923,1,44.79306,20.28028
65,Obrenovac,0.047619,1,44.65373,20.20248
89,Rvati (Obrenovac),0.047619,1,44.65668,20.18601
53,Lazarevac,0.052632,1,44.38497,20.2572


In [29]:
bgd_merged.loc[bgd_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
79,Rajkovac (Mladenovac),0.133333,2,44.463036,20.686049


**Observations:¶**

Most of the shopping malls are concentrated in the central area of Belgrade city - capitol of Republic of Sebia, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 0 has very low number to totally no hopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 2 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of shopping malls and suffering from intense competition.