# Introduction/ Business Problem:¶


My client wants to open a shopping mall in Hyderabad and has asked me to recommend the best location within the city. In order to achieve this goal, I'll be analysing the existing shopping malls and their ratings. We need to ensure that there are enough people in the neighborhood and not too many shopping malls already existing.

# Data

First, I need to obtain the geographical coordinates (latitude & longitude) of Hyd; Foursquare provides this data. Then again, I will use the Foursquare API to get the necessary information (including the number and the ratings of the shopping malls). While preparing the final report, I may also share some samples from users’ comments to give a better understanding of the situation.

In [4]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

import numpy as np # library to handle data in a vectorized manner

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
#=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [31]:
website_url = requests.get("https://es.wikipedia.org/wiki/Anexo:Localidades_de_Bogot%C3%A1").text


In [32]:
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="es">
 <head>
  <meta charset="utf-8"/>
  <title>
   Anexo:Localidades de Bogotá - Wikipedia, la enciclopedia libre
  </title>
  <script>
   document.documentElement.className=document.documentElement.className.replace(/(^|\s)client-nojs(\s|$)/,"$1client-js$2");RLCONF={"wgCanonicalNamespace":"Anexo","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":104,"wgPageName":"Anexo:Localidades_de_Bogotá","wgTitle":"Localidades de Bogotá","wgCurRevisionId":114573108,"wgRevisionId":114573108,"wgArticleId":6319494,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Localidades de Bogotá","Anexos:Bogotá"],"wgBreakFrames":!1,"wgPageContentLanguage":"es","wgPageContentModel":"wikitext","wgSeparatorTransformTable":[",\t."," \t,"],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","enero","febrero","marzo","abril","mayo","junio","julio","agosto","septiembre","oc

In [39]:
My_table = soup.find('table',{'class':'sortable wikitable'})
My_table

<table border="1" class="sortable wikitable">
<tbody><tr bgcolor="silver">
<th width="23"><center>Nº
</center></th>
<th width="110"><center> Localidad </center>
</th>
<th width="90"><center> Códigos Postales </center>
</th>
<th><center> Superficie km²<sup class="reference separada" id="cite_ref-2"><a href="#cite_note-2"><span class="corchete-llamada">[</span>2<span class="corchete-llamada">]</span></a></sup>​</center>
</th>
<th><center> Población<sup class="reference separada" id="cite_ref-3"><a href="#cite_note-3"><span class="corchete-llamada">[</span>3<span class="corchete-llamada">]</span></a></sup>​</center>
</th>
<th><center> Densidad hab/km²</center>
</th></tr>
<tr>
<td><b>01</b>
</td>
<td><a href="/wiki/Usaqu%C3%A9n" title="Usaquén">Usaquén</a>
</td>
<td>11<b>01</b>11-11<b>01</b>51
</td>
<td>65.31
</td>
<td>501 999
</td>
<td>7 686.4
</td></tr>
<tr>
<td><b>02</b>
</td>
<td><a href="/wiki/Chapinero" title="Chapinero">Chapinero</a>
</td>
<td>11<b>02</b>11-11<b>02</b>31
</td>
<td>3

In [40]:
links = My_table.findAll('a')
links

[<a href="#cite_note-2"><span class="corchete-llamada">[</span>2<span class="corchete-llamada">]</span></a>,
 <a href="#cite_note-3"><span class="corchete-llamada">[</span>3<span class="corchete-llamada">]</span></a>,
 <a href="/wiki/Usaqu%C3%A9n" title="Usaquén">Usaquén</a>,
 <a href="/wiki/Chapinero" title="Chapinero">Chapinero</a>,
 <a href="/wiki/Santa_Fe_(Bogot%C3%A1)" title="Santa Fe (Bogotá)">Santa Fe</a>,
 <a href="/wiki/San_Crist%C3%B3bal_(Bogot%C3%A1)" title="San Cristóbal (Bogotá)">San Cristóbal</a>,
 <a href="/wiki/Usme" title="Usme">Usme</a>,
 <a href="/wiki/Tunjuelito" title="Tunjuelito">Tunjuelito</a>,
 <a href="/wiki/Bosa_(Bogot%C3%A1)" title="Bosa (Bogotá)">Bosa</a>,
 <a href="/wiki/Kennedy_(Bogot%C3%A1)" title="Kennedy (Bogotá)">Kennedy</a>,
 <a href="/wiki/Fontib%C3%B3n" title="Fontibón">Fontibón</a>,
 <a href="/wiki/Engativ%C3%A1" title="Engativá">Engativá</a>,
 <a href="/wiki/Suba" title="Suba">Suba</a>,
 <a href="/wiki/Barrios_Unidos_(Bogot%C3%A1)" title="Barrios 

In [45]:
Localidades = []
for link in links:
    Localidades.append(link.get('title'))
    
print(Localidades)

del Localidades[0:2]

Localidades

[None, None, 'Usaquén', 'Chapinero', 'Santa Fe (Bogotá)', 'San Cristóbal (Bogotá)', 'Usme', 'Tunjuelito', 'Bosa (Bogotá)', 'Kennedy (Bogotá)', 'Fontibón', 'Engativá', 'Suba', 'Barrios Unidos (Bogotá)', 'Teusaquillo', 'Los Mártires', 'Antonio Nariño (Bogotá)', 'Puente Aranda', 'La Candelaria', 'Rafael Uribe Uribe (Bogotá)', 'Ciudad Bolívar (Bogotá)', 'Sumapaz (Bogotá)']


['Usaquén',
 'Chapinero',
 'Santa Fe (Bogotá)',
 'San Cristóbal (Bogotá)',
 'Usme',
 'Tunjuelito',
 'Bosa (Bogotá)',
 'Kennedy (Bogotá)',
 'Fontibón',
 'Engativá',
 'Suba',
 'Barrios Unidos (Bogotá)',
 'Teusaquillo',
 'Los Mártires',
 'Antonio Nariño (Bogotá)',
 'Puente Aranda',
 'La Candelaria',
 'Rafael Uribe Uribe (Bogotá)',
 'Ciudad Bolívar (Bogotá)',
 'Sumapaz (Bogotá)']

In [46]:
df = pd.DataFrame()
df['Localidades'] = Localidades
df

Unnamed: 0,Localidades
0,Usaquén
1,Chapinero
2,Santa Fe (Bogotá)
3,San Cristóbal (Bogotá)
4,Usme
5,Tunjuelito
6,Bosa (Bogotá)
7,Kennedy (Bogotá)
8,Fontibón
9,Engativá


In [47]:
def get_coords_local(localidad, output_as='center'):
    """
    get the bounding box of a locality in WGS84 given its name

    Parameters
    ----------
    localidad : str
        name of the country in english and lowercase
    output_as : 'str
        chose from 'boundingbox' or 'center'. 
         - 'boundingbox' for [latmin, latmax, lonmin, lonmax]
         - 'center' for [latcenter, loncenter]

    Returns
    -------
    output : list
        list with coordinates as str
    """
    # create url
    url = '{0}{1}{2}'.format('http://nominatim.openstreetmap.org/search.php?q=',
                             localidad+', Bogota, Bogota Capital District',
                             '&format=json&polygon=0')
    response = requests.get(url).json()[0]

    # parse response to list
    if output_as == 'boundingbox':
        lst = response[output_as]
        output = [float(i) for i in lst]
    if output_as == 'center':
        lst = [response.get(key) for key in ['lat','lon']]
        output = [float(i) for i in lst]
    return output

In [48]:
df2 = df.copy()

latitudeCln = []
longitudeCln = []
for index, row in df2.iterrows():
    print(row[0])
    lat, long = get_coords_local(localidad=row[0], output_as='center')
    latitudeCln.append(lat)
    longitudeCln.append(long)

df2['Latitude'] = latitudeCln
df2['Longitude'] = longitudeCln

df2.shape

Usaquén
Chapinero
Santa Fe (Bogotá)
San Cristóbal (Bogotá)
Usme
Tunjuelito
Bosa (Bogotá)
Kennedy (Bogotá)
Fontibón
Engativá
Suba
Barrios Unidos (Bogotá)
Teusaquillo
Los Mártires
Antonio Nariño (Bogotá)
Puente Aranda
La Candelaria
Rafael Uribe Uribe (Bogotá)
Ciudad Bolívar (Bogotá)
Sumapaz (Bogotá)


(20, 3)

In [49]:
df2

Unnamed: 0,Localidades,Latitude,Longitude
0,Usaquén,4.694969,-74.031093
1,Chapinero,4.645377,-74.061943
2,Santa Fe (Bogotá),4.602204,-74.078837
3,San Cristóbal (Bogotá),4.548658,-74.047473
4,Usme,4.411136,-74.129108
5,Tunjuelito,4.561049,-74.127523
6,Bosa (Bogotá),4.625492,-74.20028
7,Kennedy (Bogotá),4.629682,-74.149935
8,Fontibón,4.673327,-74.144732
9,Engativá,4.708695,-74.109643


In [50]:
address = 'Bogotá, Colombia'

geolocator = Nominatim(user_agent="capstoneProject")
location = geolocator.geocode(address, timeout=60, exactly_one=True)
latitude = location.latitude
longitude = location.longitude
print('The decimal coordinates of Bogotá are {}, {}.'.format(latitude, longitude))

The decimal coordinates of Bogotá are 4.59808, -74.0760439.


In [51]:
df3 = df2.copy()
df3

Unnamed: 0,Localidades,Latitude,Longitude
0,Usaquén,4.694969,-74.031093
1,Chapinero,4.645377,-74.061943
2,Santa Fe (Bogotá),4.602204,-74.078837
3,San Cristóbal (Bogotá),4.548658,-74.047473
4,Usme,4.411136,-74.129108
5,Tunjuelito,4.561049,-74.127523
6,Bosa (Bogotá),4.625492,-74.20028
7,Kennedy (Bogotá),4.629682,-74.149935
8,Fontibón,4.673327,-74.144732
9,Engativá,4.708695,-74.109643
