# Introduction/ Business Problem:¶


My client wants to open a shopping mall in Hyderabad and has asked me to recommend the best location within the city. In order to achieve this goal, I'll be analysing the existing shopping malls and their ratings. We need to ensure that there are enough people in the neighborhood and not too many shopping malls already existing.

# Data

First, I need to obtain the geographical coordinates (latitude & longitude) of Hyd; Foursquare provides this data. Then again, I will use the Foursquare API to get the necessary information (including the number and the ratings of the shopping malls). While preparing the final report, I may also share some samples from users’ comments to give a better understanding of the situation.

In [4]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

import numpy as np # library to handle data in a vectorized manner

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
#=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [31]:
website_url = requests.get("https://es.wikipedia.org/wiki/Anexo:Localidades_de_Bogot%C3%A1").text


In [32]:
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="es">
 <head>
  <meta charset="utf-8"/>
  <title>
   Anexo:Localidades de Bogotá - Wikipedia, la enciclopedia libre
  </title>
  <script>
   document.documentElement.className=document.documentElement.className.replace(/(^|\s)client-nojs(\s|$)/,"$1client-js$2");RLCONF={"wgCanonicalNamespace":"Anexo","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":104,"wgPageName":"Anexo:Localidades_de_Bogotá","wgTitle":"Localidades de Bogotá","wgCurRevisionId":114573108,"wgRevisionId":114573108,"wgArticleId":6319494,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Localidades de Bogotá","Anexos:Bogotá"],"wgBreakFrames":!1,"wgPageContentLanguage":"es","wgPageContentModel":"wikitext","wgSeparatorTransformTable":[",\t."," \t,"],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","enero","febrero","marzo","abril","mayo","junio","julio","agosto","septiembre","oc

In [39]:
My_table = soup.find('table',{'class':'sortable wikitable'})
My_table

<table border="1" class="sortable wikitable">
<tbody><tr bgcolor="silver">
<th width="23"><center>Nº
</center></th>
<th width="110"><center> Localidad </center>
</th>
<th width="90"><center> Códigos Postales </center>
</th>
<th><center> Superficie km²<sup class="reference separada" id="cite_ref-2"><a href="#cite_note-2"><span class="corchete-llamada">[</span>2<span class="corchete-llamada">]</span></a></sup>​</center>
</th>
<th><center> Población<sup class="reference separada" id="cite_ref-3"><a href="#cite_note-3"><span class="corchete-llamada">[</span>3<span class="corchete-llamada">]</span></a></sup>​</center>
</th>
<th><center> Densidad hab/km²</center>
</th></tr>
<tr>
<td><b>01</b>
</td>
<td><a href="/wiki/Usaqu%C3%A9n" title="Usaquén">Usaquén</a>
</td>
<td>11<b>01</b>11-11<b>01</b>51
</td>
<td>65.31
</td>
<td>501 999
</td>
<td>7 686.4
</td></tr>
<tr>
<td><b>02</b>
</td>
<td><a href="/wiki/Chapinero" title="Chapinero">Chapinero</a>
</td>
<td>11<b>02</b>11-11<b>02</b>31
</td>
<td>3

In [40]:
links = My_table.findAll('a')
links

[<a href="#cite_note-2"><span class="corchete-llamada">[</span>2<span class="corchete-llamada">]</span></a>,
 <a href="#cite_note-3"><span class="corchete-llamada">[</span>3<span class="corchete-llamada">]</span></a>,
 <a href="/wiki/Usaqu%C3%A9n" title="Usaquén">Usaquén</a>,
 <a href="/wiki/Chapinero" title="Chapinero">Chapinero</a>,
 <a href="/wiki/Santa_Fe_(Bogot%C3%A1)" title="Santa Fe (Bogotá)">Santa Fe</a>,
 <a href="/wiki/San_Crist%C3%B3bal_(Bogot%C3%A1)" title="San Cristóbal (Bogotá)">San Cristóbal</a>,
 <a href="/wiki/Usme" title="Usme">Usme</a>,
 <a href="/wiki/Tunjuelito" title="Tunjuelito">Tunjuelito</a>,
 <a href="/wiki/Bosa_(Bogot%C3%A1)" title="Bosa (Bogotá)">Bosa</a>,
 <a href="/wiki/Kennedy_(Bogot%C3%A1)" title="Kennedy (Bogotá)">Kennedy</a>,
 <a href="/wiki/Fontib%C3%B3n" title="Fontibón">Fontibón</a>,
 <a href="/wiki/Engativ%C3%A1" title="Engativá">Engativá</a>,
 <a href="/wiki/Suba" title="Suba">Suba</a>,
 <a href="/wiki/Barrios_Unidos_(Bogot%C3%A1)" title="Barrios 

In [45]:
Localidades = []
for link in links:
    Localidades.append(link.get('title'))
    
print(Localidades)

del Localidades[0:2]

Localidades

[None, None, 'Usaquén', 'Chapinero', 'Santa Fe (Bogotá)', 'San Cristóbal (Bogotá)', 'Usme', 'Tunjuelito', 'Bosa (Bogotá)', 'Kennedy (Bogotá)', 'Fontibón', 'Engativá', 'Suba', 'Barrios Unidos (Bogotá)', 'Teusaquillo', 'Los Mártires', 'Antonio Nariño (Bogotá)', 'Puente Aranda', 'La Candelaria', 'Rafael Uribe Uribe (Bogotá)', 'Ciudad Bolívar (Bogotá)', 'Sumapaz (Bogotá)']


['Usaquén',
 'Chapinero',
 'Santa Fe (Bogotá)',
 'San Cristóbal (Bogotá)',
 'Usme',
 'Tunjuelito',
 'Bosa (Bogotá)',
 'Kennedy (Bogotá)',
 'Fontibón',
 'Engativá',
 'Suba',
 'Barrios Unidos (Bogotá)',
 'Teusaquillo',
 'Los Mártires',
 'Antonio Nariño (Bogotá)',
 'Puente Aranda',
 'La Candelaria',
 'Rafael Uribe Uribe (Bogotá)',
 'Ciudad Bolívar (Bogotá)',
 'Sumapaz (Bogotá)']

In [46]:
df = pd.DataFrame()
df['Localidades'] = Localidades
df

Unnamed: 0,Localidades
0,Usaquén
1,Chapinero
2,Santa Fe (Bogotá)
3,San Cristóbal (Bogotá)
4,Usme
5,Tunjuelito
6,Bosa (Bogotá)
7,Kennedy (Bogotá)
8,Fontibón
9,Engativá


In [47]:
def get_coords_local(localidad, output_as='center'):
    """
    get the bounding box of a locality in WGS84 given its name

    Parameters
    ----------
    localidad : str
        name of the country in english and lowercase
    output_as : 'str
        chose from 'boundingbox' or 'center'. 
         - 'boundingbox' for [latmin, latmax, lonmin, lonmax]
         - 'center' for [latcenter, loncenter]

    Returns
    -------
    output : list
        list with coordinates as str
    """
    # create url
    url = '{0}{1}{2}'.format('http://nominatim.openstreetmap.org/search.php?q=',
                             localidad+', Bogota, Bogota Capital District',
                             '&format=json&polygon=0')
    response = requests.get(url).json()[0]

    # parse response to list
    if output_as == 'boundingbox':
        lst = response[output_as]
        output = [float(i) for i in lst]
    if output_as == 'center':
        lst = [response.get(key) for key in ['lat','lon']]
        output = [float(i) for i in lst]
    return output

In [48]:
df2 = df.copy()

latitudeCln = []
longitudeCln = []
for index, row in df2.iterrows():
    print(row[0])
    lat, long = get_coords_local(localidad=row[0], output_as='center')
    latitudeCln.append(lat)
    longitudeCln.append(long)

df2['Latitude'] = latitudeCln
df2['Longitude'] = longitudeCln

df2.shape

Usaquén
Chapinero
Santa Fe (Bogotá)
San Cristóbal (Bogotá)
Usme
Tunjuelito
Bosa (Bogotá)
Kennedy (Bogotá)
Fontibón
Engativá
Suba
Barrios Unidos (Bogotá)
Teusaquillo
Los Mártires
Antonio Nariño (Bogotá)
Puente Aranda
La Candelaria
Rafael Uribe Uribe (Bogotá)
Ciudad Bolívar (Bogotá)
Sumapaz (Bogotá)


(20, 3)

In [49]:
df2

Unnamed: 0,Localidades,Latitude,Longitude
0,Usaquén,4.694969,-74.031093
1,Chapinero,4.645377,-74.061943
2,Santa Fe (Bogotá),4.602204,-74.078837
3,San Cristóbal (Bogotá),4.548658,-74.047473
4,Usme,4.411136,-74.129108
5,Tunjuelito,4.561049,-74.127523
6,Bosa (Bogotá),4.625492,-74.20028
7,Kennedy (Bogotá),4.629682,-74.149935
8,Fontibón,4.673327,-74.144732
9,Engativá,4.708695,-74.109643


In [50]:
address = 'Bogotá, Colombia'

geolocator = Nominatim(user_agent="capstoneProject")
location = geolocator.geocode(address, timeout=60, exactly_one=True)
latitude = location.latitude
longitude = location.longitude
print('The decimal coordinates of Bogotá are {}, {}.'.format(latitude, longitude))

The decimal coordinates of Bogotá are 4.59808, -74.0760439.


In [51]:
df3 = df2.copy()
df3

Unnamed: 0,Localidades,Latitude,Longitude
0,Usaquén,4.694969,-74.031093
1,Chapinero,4.645377,-74.061943
2,Santa Fe (Bogotá),4.602204,-74.078837
3,San Cristóbal (Bogotá),4.548658,-74.047473
4,Usme,4.411136,-74.129108
5,Tunjuelito,4.561049,-74.127523
6,Bosa (Bogotá),4.625492,-74.20028
7,Kennedy (Bogotá),4.629682,-74.149935
8,Fontibón,4.673327,-74.144732
9,Engativá,4.708695,-74.109643


In [52]:
map_bogota = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, local in zip(df3['Latitude'], df3['Longitude'], df3['Localidades']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_bogota)  
    
map_bogota

In [53]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds='', LIMIT=25):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
                               
        if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)
    
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Localidad', 
                  'Localidad Latitude', 
                  'Localidad Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [54]:
imit = 500 # limit of number of venues returned by Foursquare API
radius = 5000 # define radius
CLIENT_ID = '3E4QZCGH2POU3L20ACFGIASKZIL2Z3FPYDFGZ5GUCGHIA4SN'
CLIENT_SECRET = 'DFKJI40HHY1YH3XWGTQZHVWZI5ACHPGP2K3HV5LUZS4MIJDX'
VERSION = '20200716'

In [55]:
# Use category id 4bf58dd8d48988d1df931735 to only get the BBQ JOINT
bogota_venues_BBQjoint = getNearbyVenues(names=df3['Localidades'], latitudes=df3['Latitude'], longitudes=df3['Longitude'], radius=1000, categoryIds='4bf58dd8d48988d1df931735')
bogota_venues_BBQjoint.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Usaquén,4.694969,-74.031093,Santa Costilla,4.696975,-74.030405,BBQ Joint
1,Usaquén,4.694969,-74.031093,La Biferia,4.694728,-74.031525,BBQ Joint
2,Usaquén,4.694969,-74.031093,Tienda de Café,4.695467,-74.030644,BBQ Joint
3,Usaquén,4.694969,-74.031093,Mister Ribs,4.695403,-74.0306,American Restaurant
4,Usaquén,4.694969,-74.031093,Blancos y Negros,4.699625,-74.029149,BBQ Joint


In [56]:
bogota_venues_BBQjoint.shape


(116, 7)

In [57]:
#function to add markers for given venues to map
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Localidad'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [58]:
map_bogota_BBQjoint = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(bogota_venues_BBQjoint, 'red', map_bogota_BBQjoint)
map_bogota_BBQjoint

In [59]:
bogota_venues_university = getNearbyVenues(names=df3['Localidades'], latitudes=df3['Latitude'], longitudes=df3['Longitude'], radius=1000, categoryIds='4bf58dd8d48988d1ae941735')
bogota_venues_university.head()


Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Usaquén,4.694969,-74.031093,Tecnologico de Monterry,4.6921,-74.034916,University
1,Usaquén,4.694969,-74.031093,Centrum Católica,4.690426,-74.038657,University
2,Chapinero,4.645377,-74.061943,CIDCA,4.647201,-74.063729,University
3,Chapinero,4.645377,-74.061943,Universidad de La Salle,4.644576,-74.059379,University
4,Chapinero,4.645377,-74.061943,Universidad Konrad Lorenz,4.648584,-74.061586,University


In [60]:
bogota_venues_university.shape


(57, 7)

In [61]:
map_bogota_university = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(bogota_venues_university, 'gold', map_bogota_university)
map_bogota_university

In [62]:
bogota_venues_office = getNearbyVenues(names=df3['Localidades'], latitudes=df3['Latitude'], longitudes=df3['Longitude'], radius=1000, categoryIds='4d4b7105d754a06375d81259')
bogota_venues_office.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Usaquén,4.694969,-74.031093,Teleport Torre B,4.692007,-74.035008,Professional & Other Places
1,Usaquén,4.694969,-74.031093,Hotel NH Collection Bogotá Teleport Royal,4.692047,-74.035487,Hotel
2,Usaquén,4.694969,-74.031093,Lavafante Calle 127,4.702286,-74.028229,Parking
3,Usaquén,4.694969,-74.031093,Medtronic Colombia,4.693505,-74.033762,Office
4,Usaquén,4.694969,-74.031093,Axentria Consulting Group,4.698639,-74.030199,Office


In [63]:
bogota_venues_office.shape


(287, 7)

In [64]:
map_bogota_office = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(bogota_venues_office, 'fuchsia', map_bogota_office)
map_bogota_office

In [65]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Localidad').count()
    
    for n in startDf['Localidad']:
        try:
            startDf.loc[startDf['Localidad'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Localidad'] == n,columnTitle] = 0

In [66]:
df_data = df3.copy()
df_data.rename(columns={'Localidades':'Localidad'}, inplace=True)
addColumn(df_data, 'BBQjoint', bogota_venues_BBQjoint)
addColumn(df_data, 'Universities', bogota_venues_university)
addColumn(df_data, 'Offices', bogota_venues_office)
df_data

Unnamed: 0,Localidad,Latitude,Longitude,BBQjoint,Universities,Offices
0,Usaquén,4.694969,-74.031093,25.0,2.0,25.0
1,Chapinero,4.645377,-74.061943,25.0,17.0,25.0
2,Santa Fe (Bogotá),4.602204,-74.078837,20.0,8.0,25.0
3,San Cristóbal (Bogotá),4.548658,-74.047473,0.0,0.0,0.0
4,Usme,4.411136,-74.129108,0.0,0.0,0.0
5,Tunjuelito,4.561049,-74.127523,0.0,0.0,3.0
6,Bosa (Bogotá),4.625492,-74.20028,0.0,0.0,10.0
7,Kennedy (Bogotá),4.629682,-74.149935,1.0,1.0,14.0
8,Fontibón,4.673327,-74.144732,3.0,2.0,15.0
9,Engativá,4.708695,-74.109643,2.0,2.0,14.0


In [67]:
# negative weight, because robert  wants to open a BBQ joint and thus wants to avoid concurrence as much as possible
weight_BBQjoint = -1

# positive weight, because uni students are good customers
weight_university = 1.5

# positive weight because employees are even better customers
weight_offices = 2

In [68]:
df_weighted = df_data[['Localidad']].copy()


In [69]:
df_weighted['Score'] = df_data['BBQjoint'] * weight_BBQjoint + df_data['Universities'] * weight_university + df_data['Offices'] * weight_offices
df_weighted = df_weighted.sort_values(by=['Score'], ascending=False)
df_weighted

Unnamed: 0,Localidad,Score
16,La Candelaria,56.5
1,Chapinero,50.5
15,Puente Aranda,49.0
14,Antonio Nariño (Bogotá),47.5
12,Teusaquillo,47.5
11,Barrios Unidos (Bogotá),43.5
2,Santa Fe (Bogotá),42.0
13,Los Mártires,39.5
8,Fontibón,30.0
9,Engativá,29.0


In [70]:
map_bog_result = folium.Map(location=[latitude, longitude], zoom_start=15)

bog_win = df3[df3['Localidades'] == 'La Candelaria']

for lat, lng, local in zip(bog_win['Latitude'], bog_win['Longitude'], bog_win['Localidades']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_bog_result) 

addToMap(bogota_venues_BBQjoint[bogota_venues_BBQjoint['Localidad'] == 'La Candelaria'], 'red', map_bog_result)
addToMap(bogota_venues_university[bogota_venues_university['Localidad'] == 'La Candelaria'], 'gold', map_bog_result)
addToMap(bogota_venues_office[bogota_venues_office['Localidad'] == 'La Candelaria'], 'fuchsia', map_bog_result)

map_bog_result

### La Candelaria Locality is the best option for robert in order to open his BBQ Joint
