# IBM Applied Data Science Capstone
### Week 5 Assignment
**Opening a New Shopping Mall in Bogota, Colombia**
- Build a dataframe of neighborhoods in Bogota, Colombia by web scraping the data from Wikipedia page
- Get the geographical coordinates of the Districts ("Localidades")
- Obtain the venue data for the Districts from Foursquare API
- Explore and cluster the Districts
- Select the best cluster to open a new shopping mall
<img alt="Emprendimientos basados en Bogotá han levantado capital por más de ..." class="n3VNCb" src="https://es.investinbogota.org/sites/default/files/node/news/field_news_imagen/Emprendimientos%20en%20Bogota%CC%81.jpg" data-noaft="1" jsname="HiaYvf" jsaction="load:XAeZkd;" style="width: 433px; height: 225.415px; margin: 32.6426px 0px;">

<img alt="Ubicación del sitio de estudio en la ciudad de Bogotá, Colombia ..." class="n3VNCb" src="https://www.researchgate.net/publication/319919188/figure/fig1/AS:540497013899264@1505875938341/Figura-1-Ubicacion-del-sitio-de-estudio-en-la-ciudad-de-Bogota-Colombia-Fuente-Mapa.png" data-noaft="1" jsname="HiaYvf" jsaction="load:XAeZkd;" style="width: 433px; height: 206.312px; margin: 42.1941px 0px;">

### 1. Import libraries

In [1]:
#pip install geocoder

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported")

Libraries imported


### 2. Scrap data from Wikipedia page into a DataFrame

In [3]:
# send the GET request
data = requests.get("https://es.wikipedia.org/wiki/Anexo:Barrios_de_Bogot%C3%A1").text

In [4]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [5]:
# create a list to store neighborhood data
neighborhoodList = []

In [6]:
# append the data into the list 
for row in soup.find_all("div", class_="mw-parser-output")[0].findAll("td"):
    neighborhoodList.append(row.text)

In [7]:
# create a new DataFrame from the list
BG_df = pd.DataFrame({"Neighborhood": neighborhoodList})

In [8]:
# Cleaning the data (Bogotá have just 19 Districts a.k.a "Neighborhood")
BG_df = BG_df[ (BG_df.Neighborhood.str[0] == "0")|(BG_df.Neighborhood.str[0] == "1")]
BG_df = BG_df[ BG_df.Neighborhood.str.len()>5 ]
BG_df = BG_df[ BG_df.Neighborhood.str.len()<20 ]
BG_df["Neighborhood"] = BG_df["Neighborhood"].str[:-1]
BG_df["Neighborhood"] = BG_df["Neighborhood"].str[3:]
BG_df= BG_df[BG_df.Neighborhood != "de Octubre"]
BG_df.drop_duplicates(inplace=True)
BG_df.reset_index(inplace=True)
BG_df.drop("index", axis=1,inplace=True)
BG_df

Unnamed: 0,Neighborhood
0,Usaquén
1,Chapinero
2,Santa Fe
3,San Cristóbal
4,Usme
5,Tunjuelito
6,Bosa
7,Kennedy
8,Fontibón
9,Engativá


In [9]:
# print the number of rows of the dataframe
BG_df.shape

(19, 1)

### 3. Get the geographical coordinates

In [10]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis(f"{neighborhood}, Bogota, Colombia")
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [11]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in BG_df["Neighborhood"].tolist() ]

In [12]:
coords

[[4.692590000000052, -74.03008999999997],
 [4.638480000000072, -74.06020999999998],
 [4.594590000000039, -74.06404999999995],
 [4.576430000000073, -74.09313999999995],
 [4.4982800000000225, -74.10744999999997],
 [4.561820000000068, -74.12733999999995],
 [4.609740000000045, -74.18279999999999],
 [4.627480000000048, -74.17021999999997],
 [4.686370000000068, -74.15099999999995],
 [4.701270000000022, -74.11268999999999],
 [4.734380000000044, -74.08562999999998],
 [4.669710000000066, -74.07784999999996],
 [4.623290000000054, -74.07224999999994],
 [4.617910000000052, -74.07846999999998],
 [4.596580000000074, -74.11201999999997],
 [4.633340000000032, -74.10627999999997],
 [4.594370000000026, -74.07688999999993],
 [4.576500000000067, -74.11516999999998],
 [4.553670000000068, -74.14647999999994]]

In [13]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [14]:
# merge the coordinates into the original dataframe
BG_df['Latitude'] = df_coords['Latitude']
BG_df['Longitude'] = df_coords['Longitude']

In [15]:
# check the neighborhoods and the coordinates
print(BG_df.shape)
BG_df

(19, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Usaquén,4.69259,-74.03009
1,Chapinero,4.63848,-74.06021
2,Santa Fe,4.59459,-74.06405
3,San Cristóbal,4.57643,-74.09314
4,Usme,4.49828,-74.10745
5,Tunjuelito,4.56182,-74.12734
6,Bosa,4.60974,-74.1828
7,Kennedy,4.62748,-74.17022
8,Fontibón,4.68637,-74.151
9,Engativá,4.70127,-74.11269


In [16]:
# save the DataFrame as CSV file
BG_df.to_csv("BG_df.csv", index=False)

### 4. Create a map of Bogotá with neighborhoods superimposed on top

In [17]:
# get the coordinates of Bogota
address = "Bogota, Colombia"

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f"The geograpical coordinate of Bogotá, Colombia {latitude}, {longitude}.")

The geograpical coordinate of Bogotá, Colombia 4.59808, -74.0760439.


In [18]:
# create map of Bogota using latitude and longitude values
map_BG = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(BG_df['Latitude'], BG_df['Longitude'], BG_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_BG)  
    
map_BG

In [19]:
# save the map as HTML file
map_BG.save('map_BG.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [20]:
# define Foursquare Credentials and Version
CLIENT_ID = '3KFLCR4BJ2CY5V5CTKNPQZK0MZAW2RZSYSAJBY11FETIYKRW'
CLIENT_SECRET = 'K1QIW1EHNACESGU5ALSLQTDTYZPV02GWRKQUHK4EAOFOIT0T'
VERSION = '20180605' # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)

My credentails:
CLIENT_ID: 3KFLCR4BJ2CY5V5CTKNPQZK0MZAW2RZSYSAJBY11FETIYKRW
CLIENT_SECRET: K1QIW1EHNACESGU5ALSLQTDTYZPV02GWRKQUHK4EAOFOIT0T


**Now, let's get the top 100 venues that are within a radius of 5000 meters (Districts in Bogotá are larger than Neighborhoods).**

In [60]:
radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(BG_df['Latitude'], BG_df['Longitude'], BG_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [61]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1689, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Usaquén,4.69259,-74.03009,WeWork Usaquén,4.694304,-74.032745,Coworking Space
1,Usaquén,4.69259,-74.03009,Catación Pública,4.695898,-74.028142,Coffee Shop
2,Usaquén,4.69259,-74.03009,Parque Usaquén,4.695163,-74.030927,Park
3,Usaquén,4.69259,-74.03009,W Bogotá Hotel,4.693273,-74.034641,Hotel
4,Usaquén,4.69259,-74.03009,La Puerta De Alcalá,4.694399,-74.029996,Spanish Restaurant


**Let's check how many venues were returned for each neighorhood**

In [62]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Antonio Nariño,100,100,100,100,100,100
Barrios Unidos,100,100,100,100,100,100
Bosa,55,55,55,55,55,55
Chapinero,100,100,100,100,100,100
Ciudad Bolívar,63,63,63,63,63,63
Engativá,100,100,100,100,100,100
Fontibón,100,100,100,100,100,100
Kennedy,62,62,62,62,62,62
La Candelaria,96,96,96,96,96,96
Mártires,100,100,100,100,100,100


**Let's find out how many unique categories can be curated from all the returned venues**

In [63]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 170 uniques categories.


In [64]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Coworking Space', 'Coffee Shop', 'Park', 'Hotel',
       'Spanish Restaurant', 'Bar', 'Dessert Shop', 'Café', 'Dog Run',
       'Bed & Breakfast', 'Tea Room', 'French Restaurant', 'Gourmet Shop',
       'Indie Movie Theater', 'Gastropub', 'Supermarket', 'Spa',
       'Burger Joint', 'Gymnastics Gym', 'Hawaiian Restaurant',
       'Restaurant', 'Argentinian Restaurant', 'Whisky Bar',
       'Shopping Mall', 'Playground', 'Market', 'Gym / Fitness Center',
       'Pharmacy', 'Bookstore', 'Butcher', 'Asian Restaurant',
       'Golf Course', 'Mountain', 'Chinese Restaurant', 'Diner',
       'Seafood Restaurant', 'Pizza Place', 'Lounge', 'Creperie',
       'Scenic Lookout', 'Health Food Store', 'Ice Cream Shop', 'Brewery',
       'Theater', 'Garden Center', 'Buffet', 'Bakery', 'Gym',
       'Italian Restaurant', 'Snack Place', 'Record Shop',
       'Latin American Restaurant', 'Wings Joint', 'Gift Shop',
       'Breakfast Spot', 'Taco Place', 'Vegetarian / Vegan Restaurant',
       '

In [65]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

### 6. Analyze Each Neighborhood

In [66]:
# one hot encoding
BG_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
BG_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [BG_onehot.columns[-1]] + list(BG_onehot.columns[:-1])
BG_onehot = BG_onehot[fixed_columns]

print(BG_onehot.shape)
BG_onehot.head()

(1689, 171)


Unnamed: 0,Neighborhoods,Airport,Airport Lounge,Airport Service,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Bed & Breakfast,Big Box Store,Bike Rental / Bike Share,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Donut Shop,Drugstore,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,Fast Food Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden Center,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Hawaiian Restaurant,Health Food Store,Historic Site,History Museum,Hostel,Hot Dog Joint,Hotel,IT Services,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Latin American Restaurant,Library,Lounge,Market,Martial Arts Dojo,Men's Store,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mountain,Movie Theater,Multiplex,Music Venue,Neighborhood,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pub,Record Shop,Recreation Center,Restaurant,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Theater,Theme Park,Theme Restaurant,Toy / Game Store,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Veterinarian,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint
0,Usaquén,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Usaquén,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Usaquén,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Usaquén,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Usaquén,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [67]:
BG_grouped = BG_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(BG_grouped.shape)
BG_grouped

(19, 171)


Unnamed: 0,Neighborhoods,Airport,Airport Lounge,Airport Service,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Bed & Breakfast,Big Box Store,Bike Rental / Bike Share,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Donut Shop,Drugstore,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,Fast Food Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden Center,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Hawaiian Restaurant,Health Food Store,Historic Site,History Museum,Hostel,Hot Dog Joint,Hotel,IT Services,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Latin American Restaurant,Library,Lounge,Market,Martial Arts Dojo,Men's Store,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mountain,Movie Theater,Multiplex,Music Venue,Neighborhood,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pub,Record Shop,Recreation Center,Restaurant,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Theater,Theme Park,Theme Restaurant,Toy / Game Store,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Veterinarian,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint
0,Antonio Nariño,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.02,0.04,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.06,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.03,0.02,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.01,0.03,0.0,0.03,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01
1,Barrios Unidos,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.09,0.0,0.0,0.02,0.0,0.04,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.04,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.03,0.0,0.02,0.01,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.09,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,Bosa,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.036364,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.127273,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.054545,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.072727,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.036364,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.109091,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.0,0.054545,0.0,0.0,0.0,0.0,0.0,0.054545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chapinero,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.01,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.04,0.0,0.02,0.0,0.04,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.02,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.08,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01
4,Ciudad Bolívar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.047619,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.063492,0.0,0.015873,0.0,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.063492,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.0,0.063492,0.0,0.0,0.015873,0.015873,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.015873,0.047619,0.0,0.015873,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.015873,0.0,0.047619,0.015873,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.031746,0.0,0.0,0.047619,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873
5,Engativá,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.05,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.03,0.0,0.02,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.01,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01
6,Fontibón,0.03,0.06,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.08,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.04,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.04,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
7,Kennedy,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.064516,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.145161,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.016129,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.032258,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.080645,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.0,0.0,0.016129,0.0,0.0,0.048387,0.0,0.0,0.032258,0.0,0.016129,0.032258,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,La Candelaria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.020833,0.0,0.010417,0.0,0.010417,0.041667,0.020833,0.010417,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.020833,0.0,0.0,0.0,0.052083,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.052083,0.010417,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.010417,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.020833,0.010417,0.0,0.010417,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.020833,0.0,0.0,0.03125,0.010417,0.0,0.0,0.010417,0.0,0.010417,0.020833,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.010417,0.0,0.010417,0.0,0.0,0.010417,0.0,0.0,0.052083,0.0,0.010417,0.020833,0.0,0.0,0.010417,0.020833,0.0,0.0,0.0,0.010417,0.0,0.0,0.072917,0.0,0.010417,0.010417,0.010417,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.010417,0.0,0.0,0.0,0.010417,0.010417,0.041667,0.0,0.010417,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.010417,0.0
9,Mártires,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.01,0.0,0.01,0.03,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.07,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.1,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0


In [68]:
len(BG_grouped[BG_grouped["Shopping Mall"] > 0])

12

**Create a new DataFrame for Shopping Mall data only**

In [69]:
BG_mall = BG_grouped[["Neighborhoods","Shopping Mall"]]

In [70]:
BG_mall

Unnamed: 0,Neighborhoods,Shopping Mall
0,Antonio Nariño,0.02
1,Barrios Unidos,0.01
2,Bosa,0.054545
3,Chapinero,0.0
4,Ciudad Bolívar,0.047619
5,Engativá,0.0
6,Fontibón,0.02
7,Kennedy,0.032258
8,La Candelaria,0.0
9,Mártires,0.0


### 7. Cluster Neighborhoods
Run k-means to cluster the neighborhoods in Bogotá into 3 clusters.

In [71]:
# set number of clusters
kclusters = 3

BG_clustering = BG_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(BG_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 1, 0, 1, 0, 1, 2, 2, 1, 1])

In [72]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
BG_merged = BG_mall.copy()

# add clustering labels
BG_merged["Cluster Labels"] = kmeans.labels_

In [73]:
BG_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
BG_merged

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Antonio Nariño,0.02,2
1,Barrios Unidos,0.01,1
2,Bosa,0.054545,0
3,Chapinero,0.0,1
4,Ciudad Bolívar,0.047619,0
5,Engativá,0.0,1
6,Fontibón,0.02,2
7,Kennedy,0.032258,2
8,La Candelaria,0.0,1
9,Mártires,0.0,1


In [74]:
# merge Bogota_grouped with Bogota_data to add latitude/longitude for each neighborhood
BG_merged = BG_merged.join(BG_df.set_index("Neighborhood"), on="Neighborhood")

print(BG_merged.shape)
BG_merged.head() # check the last columns!

(19, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Antonio Nariño,0.02,2,4.59658,-74.11202
1,Barrios Unidos,0.01,1,4.66971,-74.07785
2,Bosa,0.054545,0,4.60974,-74.1828
3,Chapinero,0.0,1,4.63848,-74.06021
4,Ciudad Bolívar,0.047619,0,4.55367,-74.14648


In [75]:
# sort the results by Cluster Labels
print(BG_merged.shape)
BG_merged.sort_values(["Cluster Labels"], inplace=True)
BG_merged

(19, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
18,Usme,0.076923,0,4.49828,-74.10745
2,Bosa,0.054545,0,4.60974,-74.1828
4,Ciudad Bolívar,0.047619,0,4.55367,-74.14648
15,Teusaquillo,0.0,1,4.62329,-74.07225
14,Suba,0.01,1,4.73438,-74.08563
13,Santa Fe,0.0,1,4.59459,-74.06405
12,San Cristóbal,0.0,1,4.57643,-74.09314
17,Usaquén,0.01,1,4.69259,-74.03009
9,Mártires,0.0,1,4.61791,-74.07847
5,Engativá,0.0,1,4.70127,-74.11269


**Finally, let's visualize the resulting clusters**

In [81]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.hot_r(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(BG_merged['Latitude'], BG_merged['Longitude'], BG_merged['Neighborhood'], BG_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

In [77]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [78]:
BG_merged.loc[BG_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
18,Usme,0.076923,0,4.49828,-74.10745
2,Bosa,0.054545,0,4.60974,-74.1828
4,Ciudad Bolívar,0.047619,0,4.55367,-74.14648


#### Cluster 1

In [79]:
BG_merged.loc[BG_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
15,Teusaquillo,0.0,1,4.62329,-74.07225
14,Suba,0.01,1,4.73438,-74.08563
13,Santa Fe,0.0,1,4.59459,-74.06405
12,San Cristóbal,0.0,1,4.57643,-74.09314
17,Usaquén,0.01,1,4.69259,-74.03009
9,Mártires,0.0,1,4.61791,-74.07847
5,Engativá,0.0,1,4.70127,-74.11269
3,Chapinero,0.0,1,4.63848,-74.06021
1,Barrios Unidos,0.01,1,4.66971,-74.07785
8,La Candelaria,0.0,1,4.59437,-74.07689


#### Cluster 2

In [80]:
BG_merged.loc[BG_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
7,Kennedy,0.032258,2,4.62748,-74.17022
10,Puente Aranda,0.04,2,4.63334,-74.10628
11,Rafael Uribe,0.02,2,4.5765,-74.11517
6,Fontibón,0.02,2,4.68637,-74.151
16,Tunjuelito,0.03,2,4.56182,-74.12734
0,Antonio Nariño,0.02,2,4.59658,-74.11202


#### Observations:
As observations noted from the map in the Results section, most of the shopping malls are concentrated in the central southern area of Bogota, with the highest number in cluster 0 and moderate number in cluster 2. On the other hand, cluster 1 has very low number of shopping malls. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 0 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, the results also show that the oversupply of shopping malls mostly happened in the southern area of the city, with the northern area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighbourhoods in cluster 1 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighbourhoods in cluster 2 with moderate competition. Lastly, property developers are advised to avoid neighbourhoods in cluster 1 which already have high concentration of shopping malls and suffering from intense competition.