# Finding Retail Store Optimal Locations in Tegucigalpa
### Data Science Capstone Project
Andres Dominguez

---
---

## **Table of Contents**
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem

In this project, we will analyze and try to find an optimal location to open a new retail store in the capital of my country (Tegucigalpa, Honduras), based on the goods that they will be offering. Its target audience is for stakeholders interested in opening a retail store in Tegucigalpa.

As an example, we will try to find an optimal location for a **retail store that will sell and repair electronic related products**. Take in mind the same analysis can be done for any goods the retail store offers.

## Data

When looking for a good location to open a retail store, there are some aspects that must be considered. These aspects are:

+ Similarity or dissimilarity of neighbor stores - positioning a retail store next to a similar store will tend to draw the same demographic of customers, to more optimal results.
+ Visibility - in many cases, the better visibility your retail store has, the less advertising is needed. 
+ Transportation accessibility.
+ Parking accessibility.
+ Personal factors – work life balance issues in case you plan to work in your store.


The goal is to locate the retail store where many shoppers meet the definition of the target market.

As such, we will be using Foursquare's location data as our source to obtain the venues of three of the aspects mentioned above and cluster them to find optimal locations to open our new retail store. These aspects are the similarity of neighbor stores, transportation, and parking.

## Methodology

### 1. Exploratory analysis using Foursquare’s location data

As mentioned before, we will use Foursquare’s API to obtain data about transportation, parking, and similar electronics stores venues in Tegucigalpa.

After retrieving the venues, we will plot and explore them using a geospatial visualization tool.

### 1.1 Defining Foursqaure API credentials

Every time we access the API, we need to hand over our credentials. To facilitate this process, we will store our credentials in two variables.

Foursquare credentials are defined in hidden cell bellow.

In [1]:
# The code was removed by Watson Studio for sharing.

### 1.2 Getting Tegucigalpa coordinates

Since our project focus is on the city of Tegucigalpa, we will need to define Tegucigalpa's coordinates for our search queries. 

For this task, we will be using geopy, a library to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources.

Let’s start by installing geopy and importing the required modules.

In [2]:
# We install geopy and import geocoders library.
!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values



We will now retrieve the coordinates as follow:

In [3]:
# Obtaining Tegucigalpa coordinates:
address = 'Tegucigalpa, Honduras'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("Tegucigalpa's coordinates are: ", latitude, longitude)

Tegucigalpa's coordinates are:  14.1056861 -87.204676


### 1.3 Searching for venues

Now that we have our coordinates, we will search for the venues in Tegucigalpa. Remember, we want to search for parking, similar stores, and transportation.

Foursquare lets us search by query or category ID. To facilitate our search criteria, we will use the category IDs.

Another thing to consider is the radius of our search. Since Tegucigalpa is a small city, we will search for venues within a radius of 22 km, covering the entire city.

First, we will start by defining our venues by their category ID.

In [4]:
# Defining venues category ID:
parking = '4c38df4de52ce0d596b336e1'
electronicsStore = '4bf58dd8d48988d122951735,4f04afc02fb6e1c99f3db0bc,4bf58dd8d48988d10b951735,52f2ab2ebcbc57f1066b8b36' # All related electronic stores (electronics, phones, video games, IT services)
transportation = '52f2ab2ebcbc57f1066b8b4f,4bf58dd8d48988d1fe931735,4bf58dd8d48988d130951735,53fca564498e1a175f32528b,4bf58dd8d48988d1eb931735'# All related transportation services (taxi, bus stops, airport)


# Defining radius and limit for our search:
limit = 50 # We limit our search to 50 venue entries. Foursquare only returns a maximum of 50 venues per call.
radius = 22000 # We use a radius of 22KM, which covers the entire city.

To make a call to Foursquare’s location data, we need to do it in the form of a special url that will include all the parameters from above.

Below, we will define the url to obtain our venues.

In [5]:
# Defining the corresponding URL:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={},{},{}&radius={}&limit={}&intent=browse'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, parking, electronicsStore, transportation, radius, limit)

Then, we obtain the results of our search as follows:

In [17]:
# Loading required libraries:
import requests # library to handle requests

# Importing results:
results = requests.get(url).json()

You will notice that the results are in json format. For us to manipulate the data and visualize it, we need to transform it to a pandas dataframe.

Let’s transform the venues into a dataframe.

In [7]:
# Importing required libraries
import pandas as pd
from pandas.io.json import json_normalize

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId
0,"[{'id': '4f04afc02fb6e1c99f3db0bc', 'name': 'M...",False,5d0febc82e173e0023af692a,Calle principal aldea las casitas,HN,Tegucigalpa,Honduras,,8913,"[Calle principal aldea las casitas, Tegucigalp...","[{'label': 'display', 'lat': 14.048346, 'lng':...",14.048346,-87.2623,,Francisco Morazán,Tigo Aldea Las Casitas,v-1592690796
1,"[{'id': '4c38df4de52ce0d596b336e1', 'name': 'P...",False,5ce203e12db4a9002c769db6,Calle principal residencial la arboleda,HN,Tegucigalpa,Honduras,,9121,"[Calle principal residencial la arboleda, Tegu...","[{'label': 'display', 'lat': 14.04619, 'lng': ...",14.04619,-87.262764,,Francisco Morazán,Parqueo Residencial La Arboleda,v-1592690796
2,"[{'id': '4bf58dd8d48988d122951735', 'name': 'E...",False,5079e11de4b0da238583eae7,City Mall,HN,Tegucigalpa,Honduras,,5140,"[City Mall, Tegucigalpa, Francisco Morazán, Ho...","[{'label': 'display', 'lat': 14.06193659309128...",14.061937,-87.219922,,Francisco Morazán,Jetstereo,v-1592690796
3,"[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",False,5ac08343b040560f6d88c3da,Aldea las casitas,HN,Tegucigalpa,Honduras,Frente campo fútbol,8025,"[Aldea las casitas (Frente campo fútbol), Tegu...","[{'label': 'display', 'lat': 14.053133, 'lng':...",14.053133,-87.255554,,Francisco Morazán,Papelería Aldea Casitas,v-1592690796
4,"[{'id': '4bf58dd8d48988d103951735', 'name': 'C...",False,5cbbedfc250cab002c910066,Aldea las casitas frente campo fútbol,HN,Tegucigalpa,Honduras,,8840,"[Aldea las casitas frente campo fútbol, Teguci...","[{'label': 'display', 'lat': 14.048828, 'lng':...",14.048828,-87.26184,,Francisco Morazán,Venta Ropa Flores,v-1592690796


Now… that looks quite messy. Let’s clean our table by removing unnecessary columns.

In [8]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Tigo Aldea Las Casitas,Mobile Phone Shop,Calle principal aldea las casitas,HN,Tegucigalpa,Honduras,,8913,"[Calle principal aldea las casitas, Tegucigalp...","[{'label': 'display', 'lat': 14.048346, 'lng':...",14.048346,-87.2623,,Francisco Morazán,5d0febc82e173e0023af692a
1,Parqueo Residencial La Arboleda,Parking,Calle principal residencial la arboleda,HN,Tegucigalpa,Honduras,,9121,"[Calle principal residencial la arboleda, Tegu...","[{'label': 'display', 'lat': 14.04619, 'lng': ...",14.04619,-87.262764,,Francisco Morazán,5ce203e12db4a9002c769db6
2,Jetstereo,Electronics Store,City Mall,HN,Tegucigalpa,Honduras,,5140,"[City Mall, Tegucigalpa, Francisco Morazán, Ho...","[{'label': 'display', 'lat': 14.06193659309128...",14.061937,-87.219922,,Francisco Morazán,5079e11de4b0da238583eae7
3,Papelería Aldea Casitas,Bookstore,Aldea las casitas,HN,Tegucigalpa,Honduras,Frente campo fútbol,8025,"[Aldea las casitas (Frente campo fútbol), Tegu...","[{'label': 'display', 'lat': 14.053133, 'lng':...",14.053133,-87.255554,,Francisco Morazán,5ac08343b040560f6d88c3da
4,Venta Ropa Flores,Clothing Store,Aldea las casitas frente campo fútbol,HN,Tegucigalpa,Honduras,,8840,"[Aldea las casitas frente campo fútbol, Teguci...","[{'label': 'display', 'lat': 14.048828, 'lng':...",14.048828,-87.26184,,Francisco Morazán,5cbbedfc250cab002c910066
5,Avianca Toncontin,Airport Terminal,,HN,Tegucigalpa,Honduras,,5295,"[Tegucigalpa, Francisco Morazán, Honduras]","[{'label': 'display', 'lat': 14.06020942670940...",14.060209,-87.219079,,Francisco Morazán,527d008611d263ba9f1ca190
6,Grupo raf honduras,Electronics Store,"Barrio guadalupe, av. San Martin de Porres, 1 ...",HN,Tegucigalpa,Honduras,,655,"[Barrio guadalupe, av. San Martin de Porres, 1...","[{'label': 'display', 'lat': 14.10058562294454...",14.100586,-87.201641,11101.0,Francisco Morazán,5dd94417e1d6e600089dbafd
7,Jetstereo,Electronics Store,Boulevard Morazan,HN,,Honduras,,2397,"[Boulevard Morazan, Francisco Morazán, Honduras]","[{'label': 'display', 'lat': 14.10054810504269...",14.100548,-87.183106,,Francisco Morazán,4d4021e51bd2a14322c3ef7c
8,Buses Aldea La Casitas,Bus Station,Frente residencial santa cruz,HN,Tegucigalpa,Honduras,,8968,"[Frente residencial santa cruz, Tegucigalpa, F...","[{'label': 'display', 'lat': 14.05337, 'lng': ...",14.05337,-87.267845,,Francisco Morazán,5adbde724c9be6665e3a8e0a
9,Tecnoplanet,Mobile Phone Shop,Mall El Dorado,HN,,Honduras,,2798,"[Mall El Dorado, Honduras]","[{'label': 'display', 'lat': 14.09427, 'lng': ...",14.09427,-87.181584,,,5078a2dee4b0459b75c03e38


Looking good! Finally, let’s check how many venues we got from our search.

In [9]:
tableSize = dataframe.shape
print('There is a total of ', tableSize[0], ' venues.')

There is a total of  50  venues.


### 1.4 Exploring the venues

Now that we have our venues, we will plot and explore them using Folium. Folium is a library for geospatial visualization.

Let’s start by installing the respective library.

In [10]:
!pip install folium
import folium # plotting library

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 7.4MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


We now use folium to generate an interactive map and plot the venues.

In [11]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13)

# add the venues as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

Feel free to explore the venues using the interactive map. You can pan the map using your mouse and get more information of each venue by clicking on them.

### 2. Finding retail store optimal locations

We will now define optimal locations for our electronics retail store by clustering our venues using DBSCAN.

DBSCAN is a machine learning algorithm that clusters our data based on density and excludes individual data points that are far away called noise. It works based on two parameters: radius and minimum points. R determines a specified radius that if it includes enough points within it, we call it a dense area. M determines the minimum number of data points we want in a neighborhood to define a cluster.

DBSCAN is ideal for our analysis because we want to locate our store near a high density of parking, similar store, and transportation venues to maximize shoppers’ traffic in our store.

We will start by importing the required libraries.

In [12]:
import numpy as np 
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.datasets.samples_generator import make_blobs 
from sklearn.preprocessing import StandardScaler 
import matplotlib.pyplot as plt 
%matplotlib inline

Since we have very few data points, for this example, we will define r=0.15 and m=2. This will give us the optimal locations to open our retail store. You can play with these two variables depending on how big the data set is to get more accurate results.

We will now apply the DBSCAN algorithm to cluster our data and obtain our optimal locations. 

In [20]:
sklearn.utils.check_random_state(1000)
Clus_dataSet = dataframe_filtered[['lat','lng']]
Clus_dataSet = np.nan_to_num(Clus_dataSet)
Clus_dataSet = StandardScaler().fit_transform(Clus_dataSet)

# Compute DBSCAN
db = DBSCAN(eps=0.15, min_samples=2).fit(Clus_dataSet)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
dataframe_filtered["Clus_Db"]=labels

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels)) 

# A sample of clusters
dataframe_filtered.head(10)

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id,Clus_Db
0,Tigo Aldea Las Casitas,Mobile Phone Shop,Calle principal aldea las casitas,HN,Tegucigalpa,Honduras,,8913,"[Calle principal aldea las casitas, Tegucigalp...","[{'label': 'display', 'lat': 14.048346, 'lng':...",14.048346,-87.2623,,Francisco Morazán,5d0febc82e173e0023af692a,0
1,Parqueo Residencial La Arboleda,Parking,Calle principal residencial la arboleda,HN,Tegucigalpa,Honduras,,9121,"[Calle principal residencial la arboleda, Tegu...","[{'label': 'display', 'lat': 14.04619, 'lng': ...",14.04619,-87.262764,,Francisco Morazán,5ce203e12db4a9002c769db6,0
2,Jetstereo,Electronics Store,City Mall,HN,Tegucigalpa,Honduras,,5140,"[City Mall, Tegucigalpa, Francisco Morazán, Ho...","[{'label': 'display', 'lat': 14.06193659309128...",14.061937,-87.219922,,Francisco Morazán,5079e11de4b0da238583eae7,1
3,Papelería Aldea Casitas,Bookstore,Aldea las casitas,HN,Tegucigalpa,Honduras,Frente campo fútbol,8025,"[Aldea las casitas (Frente campo fútbol), Tegu...","[{'label': 'display', 'lat': 14.053133, 'lng':...",14.053133,-87.255554,,Francisco Morazán,5ac08343b040560f6d88c3da,-1
4,Venta Ropa Flores,Clothing Store,Aldea las casitas frente campo fútbol,HN,Tegucigalpa,Honduras,,8840,"[Aldea las casitas frente campo fútbol, Teguci...","[{'label': 'display', 'lat': 14.048828, 'lng':...",14.048828,-87.26184,,Francisco Morazán,5cbbedfc250cab002c910066,0
5,Avianca Toncontin,Airport Terminal,,HN,Tegucigalpa,Honduras,,5295,"[Tegucigalpa, Francisco Morazán, Honduras]","[{'label': 'display', 'lat': 14.06020942670940...",14.060209,-87.219079,,Francisco Morazán,527d008611d263ba9f1ca190,1
6,Grupo raf honduras,Electronics Store,"Barrio guadalupe, av. San Martin de Porres, 1 ...",HN,Tegucigalpa,Honduras,,655,"[Barrio guadalupe, av. San Martin de Porres, 1...","[{'label': 'display', 'lat': 14.10058562294454...",14.100586,-87.201641,11101.0,Francisco Morazán,5dd94417e1d6e600089dbafd,-1
7,Jetstereo,Electronics Store,Boulevard Morazan,HN,,Honduras,,2397,"[Boulevard Morazan, Francisco Morazán, Honduras]","[{'label': 'display', 'lat': 14.10054810504269...",14.100548,-87.183106,,Francisco Morazán,4d4021e51bd2a14322c3ef7c,-1
8,Buses Aldea La Casitas,Bus Station,Frente residencial santa cruz,HN,Tegucigalpa,Honduras,,8968,"[Frente residencial santa cruz, Tegucigalpa, F...","[{'label': 'display', 'lat': 14.05337, 'lng': ...",14.05337,-87.267845,,Francisco Morazán,5adbde724c9be6665e3a8e0a,-1
9,Tecnoplanet,Mobile Phone Shop,Mall El Dorado,HN,,Honduras,,2798,"[Mall El Dorado, Honduras]","[{'label': 'display', 'lat': 14.09427, 'lng': ...",14.09427,-87.181584,,,5078a2dee4b0459b75c03e38,2


We summarize our generated optimal locations (clusters). 

In [14]:
set(labels)

{-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

Great! We obtained eleven optimal locations (labels from 0 to 10). The locations with a label of -1 mean that the location is not optimal to open our electronics retail store.

We will now plot and color code the optimal locations in our folium map. We will gay out the locations that are not optimal.

In [15]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [16]:
clustered_map = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters:

x = np.arange(10)
ys = [i + x + (i*x)**2 for i in range(10)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add the venues as different color per cluster
for lat, lng, label, clusters in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories, dataframe_filtered.Clus_Db):
    if (clusters == -1):
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            color='gray',
            popup=label,
            fill = True,
            fill_color='gray',
            fill_opacity=0.6
        ).add_to(clustered_map)
    else:
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            color=rainbow[clusters-1],
            popup=label,
            fill = True,
            fill_color=rainbow[clusters-1],
            fill_opacity=0.6
        ).add_to(clustered_map)

# display map
clustered_map

Feel free to navigate through the map to view the optimal locations for our electronics retail store. This concludes our analysis.

## Results and Discussion

Although Foursquare limits our search for up to 50 venues, we were able to obtain eleven optimal locations to open our new electronics retail store.

Highest concentrations were found near Tegucigalpa’s airport, Toncontin (showed in purple), between El Hogar and San Ignacio neighborhoods (showed in orange), and in the historic downtown of Tegucigalpa (showed in lime). We also found different locations near the center and one location almost at the southwest border of the city.

Ideally, we would want to pick a location that is somewhere close to the center for higher visibility. It should always be considered our best option. However, one reason to choose one that is far away from the center is because of personal factors.

## Conclusion

We successfully identified optimal locations in Tegucigalpa for our new retail store to aid stakeholders to narrow down their search.

In the end, the final decision will be made by stakeholders based on personal factors. If stakeholders will be working in their retail store, they might think about work-life balance issues such as the distance from the shop to home.