# Capstone Project - A Burger Place in Belo Horizonte
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening a **Burger Restaurant** in **Belo Horizonte**, Brazil.

Given that local customer behavior in Belo Horizonte, we will try to detect **locations that are known for having many restaurants**, but also particularly interested in **one of these areas with no Burger restaurants in vicinity**. We would also prefer locations **as close to Liberty Square as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Burger restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around Liberty Square, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Belo Horizonte interest area will be obtained using **Google Maps API geocoding** of well known Belo Horizonte location (Liberty Square)

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 3x3 kilometers centered around Belo Horizone downtown area.

Let's first find the latitude & longitude of Liberty Square in Belo Horizonte, using specific, well known address and geocoding API.

In [1]:
import geopy as gp
from geopy.geocoders import Nominatim
import geocoder
import requests

In [2]:
def get_coordinates(location):
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(location)
    lat = location.latitude
    lng = location.longitude
    return lat, lng

In [3]:
address = 'Praca da Liberdade, Belo Horizonte'
location = get_coordinates(address)
print('Coordinate of Liberty Square, Belo Horizonte: {}'.format(location))

Coordinate of Liberty Square, Belo Horizonte: (-19.9318074, -43.937935254113015)


Now let's create a grid of area candidates, equaly spaced, centered around this location and within 6km from Liberty Square. Our neighborhoods will be defined as circular areas with a radius of 75 meters, so our neighborhood centers will be 150 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

In [4]:
import shapely.geometry
import pyproj
import math

def lonlat_to_xy(lng, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84', preserve_units=False)
    proj_xy = pyproj.Proj(proj="utm", zone=23, datum='WGS84', preserve_units=False)
    xy = pyproj.transform(proj_latlon, proj_xy, lng, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84', preserve_units=False)
    proj_xy = pyproj.Proj(proj="utm", zone=23, datum='WGS84', preserve_units=False)
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('BH Liberty Square longitude={}, latitude={}'.format(location[1], location[0]))
x, y = lonlat_to_xy(location[1], location[0])
print('BH Liberty Square UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('BH Liberty Square longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
BH Liberty Square longitude=-43.937935254113015, latitude=-19.9318074
BH Liberty Square UTM X=611150.2070851794, Y=-2204286.357586489
BH Liberty Square longitude=-43.937935254113015, latitude=-19.931807399999993


  
  


Let's create a **grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [5]:
location_x, location_y = lonlat_to_xy(location[1], location[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = location_x - 1500
x_step = 150
y_min = location_y - 1500 - (int(21/k)*k*150 - 3000)/2
y_step = 150 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 75 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(location_x, location_y, x, y)
        if (distance_from_center <= 1501):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


364 candidate neighborhood centers generated.


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [6]:
import folium

map_bh = folium.Map(location=location, zoom_start=15)
folium.Marker(location, popup='Liberty Square').add_to(map_bh)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=75, color='red', fill=False).add_to(map_bh)

map_bh

OK, we now have the coordinates of centers of neighborhoods/areas to be evaluated, equally spaced (distance from every point to it's neighbors is exactly the same) and within ~1.5km from Liberty Square. 

Let's now reverse geocode to get approximate addresses of those locations.

In [7]:
def get_address(latitude, longitude, verbose=False):
    locator = Nominatim(user_agent="foursquare_agent")
    coordinates = latitude, longitude
    location = locator.reverse(coordinates)
    addr = location.raw['address']    
    try:
        road = addr['road']
        num = addr['house_number']
        neigh = addr['suburb']
        city = addr['city']
        address = ', '.join([road, num, neigh, city])
        return address
    except:
        try:
            road = addr['road']
            num = addr['house_number']
            city = addr['city']
            address = ', '.join([road, num, city])
            return address
        except:
            try:
                road = addr['road']
                neigh = addr['suburb']
                city = addr['city']
                address = ', '.join([road, neigh, city])
                return address
            except:
                try:
                    neigh = addr['suburb']
                    city = addr['city']
                    address = ', '.join([neigh, city])
                    return address
                except:
                    return None

In [8]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [9]:
addresses[150:155]

['Avenida Getúlio Vargas, 351, Funcionários, Belo Horizonte',
 'Rua Gonçalves Dias, 92, Funcionários, Belo Horizonte',
 'Rua Desembargador Drumond, Serra, Belo Horizonte',
 'Avenida Álvares Cabral, 1690, Santo Agostinho, Belo Horizonte',
 'Rua Santos Barreto, Santo Agostinho, Belo Horizonte']

Looking good. Let's now place all this into a Pandas dataframe.

In [10]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Lat': latitudes,
                             'Lon': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

print(df_locations.shape)
df_locations.head(10)

(364, 6)


Unnamed: 0,Address,Lat,Lon,X,Y,Distance from center
0,"Rua Carangola, 571, Santo Antônio, Belo Horizonte",-19.944744,-43.942149,610700.207085,-2205715.0,1498.123827
1,"Rua São Romão, São Pedro, Belo Horizonte",-19.944735,-43.940715,610850.207085,-2205715.0,1460.094175
2,"Rua São Domingos do Prata, Santo Antônio, Belo...",-19.944727,-43.939282,611000.207085,-2205715.0,1436.793305
3,"Rua São Domingos do Prata, São Pedro, Belo Hor...",-19.944718,-43.937849,611150.207085,-2205715.0,1428.941916
4,"Rua Padre Severino, São Pedro, Belo Horizonte",-19.94471,-43.936416,611300.207085,-2205715.0,1436.793305
5,"Avenida Nossa Senhora do Carmo, 500, São Pedro...",-19.944701,-43.934983,611450.207085,-2205715.0,1460.094175
6,"Rua Passa Tempo, 600, Carmo, Belo Horizonte",-19.944692,-43.933549,611600.207085,-2205715.0,1498.123827
7,"Rua Mar de Espanha, 525, Santo Antônio, Belo H...",-19.943583,-43.944306,610475.207085,-2205585.0,1463.941597
8,"Rua Mar de Espanha, 525, Santo Antônio, Belo H...",-19.943574,-43.942873,610625.207085,-2205585.0,1401.115627
9,"Rua Carangola, 433, Santo Antônio, Belo Horizonte",-19.943566,-43.94144,610775.207085,-2205585.0,1352.081728


### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, bakeries etc. are not direct competitors so we don't care about those.

Foursquare credentials are defined in changed cell bellow.

In [11]:
client_id = '2UXJFZHE1ELDVFBGIJSYNBXRKGLZDRDH23PTDAXXMQXLTEVR'
client_secret = 'QOWI13N3AC5EFTWMKMG5UOHOZ2CNDKH2X5FR52HAIWYKDFGT'
version = '20200505'
radius = '75'
limit = '150'

In [12]:
food_category = '4d4b7105d754a06374d81259'

excluded_categories = ['Cafeteria',     #'4bf58dd8d48988d128941735', 
                       'Coffee Shop',   #'4bf58dd8d48988d1e0931735', 
                       'Dessert Shop',  #'4bf58dd8d48988d1d0941735',
                       'Donut Shop',    #'4bf58dd8d48988d148941735',
                       'Fast Food Restaurant',     #'4bf58dd8d48988d16e941735',
                       'Food Stand',    #'56aa371be4b08b9a8d57350b',
                       'Food Truck',    #'4bf58dd8d48988d1cb941735',
                       'Juice Bar',     #'4bf58dd8d48988d112941735'
                       'Pet Cafe',      #'56aa371be4b08b9a8d573508',
                       'Bakery']        #'4bf58dd8d48988d16a941735'

burger_joint_category = ['burger joint']  #'4bf58dd8d48988d16c941735'

In [13]:
venues = []

for lat, lon in zip(df_locations['Lat'], df_locations['Lon']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        food_category,
        client_id,
        client_secret,
        version,
        lat,
        lon,
        radius, 
        limit)
    
    # make the GET request
    results = requests.get(url).json()['response']['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            lat, 
            lon,
            venue['venue']['id'],
            venue['venue']['name'],
            venue['venue']['categories'][0]['name'],
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],
            venue['venue']['location']['distance']))

In [15]:
venues[100:105]

[(-19.940006180784277,
  -43.935013990704775,
  '53a0c950498ec38a9e4bfaa4',
  'Hay Salsa Pizza Y Otros',
  'Pizza Place',
  -19.9395494210987,
  -43.934599528201176,
  66),
 (-19.940006180784277,
  -43.935013990704775,
  '553afc8e498e839c574f766e',
  'Água na Boca',
  'Snack Place',
  -19.93969030640515,
  -43.934691482212244,
  48),
 (-19.940006180784277,
  -43.935013990704775,
  '4c0aa8f2a1b32d7f7cbc99f0',
  'Falafel - Árabe & Vegetariano',
  'Falafel Restaurant',
  -19.93964912163674,
  -43.934577607880534,
  60),
 (-19.940006180784277,
  -43.935013990704775,
  '52d939f811d22ef28c0d9b2f',
  "Ambrosio's",
  'Steakhouse',
  -19.939918518066406,
  -43.93538284301758,
  39),
 (-19.939997582884008,
  -43.933580819516955,
  '4b78847ef964a520a5d32ee3',
  'Outback Steakhouse',
  'Steakhouse',
  -19.940318444182477,
  -43.93390429680717,
  49)]

In [16]:
restaurants = {}
burgers = {}
location_restaurants = []
area_restaurants = []

def classify():
    for venue in venues:
        restaurants.update({venue[2]:{}})
        burgers.update({venue[2]:{}})
        if venue[4] not in excluded_categories:
            x, y = lonlat_to_xy(venue[6], venue[5])
            restaurants[venue[2]].update({
                'Distance': venue[7],
                'Name':venue[3],
                'Category':venue[4],
                'Latitude': venue[5],
                'Longitude': venue[6],
                'X': x,
                'Y': y,
                'Lat': venue[0],
                'Lon': venue[1]
            })
            if venue[7]<=200:
                area_restaurants.append(venue[4])

            if venue[4] == 'Burger Joint':
                burgers[venue[2]].update({
                    'Distance': venue[7],
                    'Name':venue[3],
                    'Category':venue[4],
                    'Latitude': venue[5],
                    'Longitude': venue[6],
                    'X': x,
                    'Y': y,
                    'Lat': venue[0],
                    'Lon': venue[1]
                 })
            location_restaurants.append(area_restaurants)    
    return

In [17]:
classify()

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  


In [20]:
burger_places = {k:v for k,v in burgers.items() if v != {}}

In [22]:
print(len(location_restaurants))

756


In [24]:
import pandas as pd

venues_df = pd.DataFrame(restaurants).T
venues_df = venues_df.dropna()
venues_df.reset_index(inplace=True)
venues_df.rename(columns={'index': 'ID'}, inplace=True)
venues_df.drop(columns=['ID'], axis=1, inplace=True)
print(venues_df.shape)
venues_df.head()

(756, 9)


Unnamed: 0,Distance,Name,Category,Latitude,Longitude,X,Y,Lat,Lon
0,63,Cafeteria Jamile,Café,-19.9445,-43.9401,610910,-2205690.0,-19.9447,-43.9407
1,62,Restaurante Katucha,Restaurant,-19.945,-43.9402,610904,-2205750.0,-19.9447,-43.9407
2,2,Casa da Lasanha,Italian Restaurant,-19.9447,-43.9364,611301,-2205710.0,-19.9447,-43.9364
3,15,Sapore D'Itália,Italian Restaurant,-19.9446,-43.9363,611314,-2205710.0,-19.9447,-43.9364
4,12,Sanduiche São Pedro,Burger Joint,-19.9446,-43.9364,611307,-2205710.0,-19.9447,-43.9364


In [25]:
restaurants_df = venues_df.merge(df_locations[['Address','Lat','Lon']], how = 'inner', on = ['Lat','Lon'])
print(restaurants_df.shape)
restaurants_df.head()

(756, 10)


Unnamed: 0,Distance,Name,Category,Latitude,Longitude,X,Y,Lat,Lon,Address
0,63,Cafeteria Jamile,Café,-19.9445,-43.9401,610910,-2205690.0,-19.9447,-43.9407,"Rua São Romão, São Pedro, Belo Horizonte"
1,62,Restaurante Katucha,Restaurant,-19.945,-43.9402,610904,-2205750.0,-19.9447,-43.9407,"Rua São Romão, São Pedro, Belo Horizonte"
2,2,Casa da Lasanha,Italian Restaurant,-19.9447,-43.9364,611301,-2205710.0,-19.9447,-43.9364,"Rua Padre Severino, São Pedro, Belo Horizonte"
3,15,Sapore D'Itália,Italian Restaurant,-19.9446,-43.9363,611314,-2205710.0,-19.9447,-43.9364,"Rua Padre Severino, São Pedro, Belo Horizonte"
4,12,Sanduiche São Pedro,Burger Joint,-19.9446,-43.9364,611307,-2205710.0,-19.9447,-43.9364,"Rua Padre Severino, São Pedro, Belo Horizonte"


In [26]:
burgers_df = pd.DataFrame(burger_places).T
burgers_df = burgers_df.dropna()
burgers_df.reset_index(inplace=True)
burgers_df.rename(columns={'index': 'ID'}, inplace=True)
burgers_df.drop(columns=['ID'], axis=1, inplace=True)
burgers_df = burgers_df.merge(df_locations[['Address','Lat','Lon']], how = 'inner', on = ['Lat','Lon'])
print(burgers_df.shape)
burgers_df.head()

(28, 10)


Unnamed: 0,Distance,Name,Category,Latitude,Longitude,X,Y,Lat,Lon,Address
0,12,Sanduiche São Pedro,Burger Joint,-19.9446,-43.9364,611307,-2205710.0,-19.9447,-43.9364,"Rua Padre Severino, São Pedro, Belo Horizonte"
1,48,Eddie Fine Burgers,Burger Joint,-19.9408,-43.934,611557,-2205290.0,-19.9412,-43.9343,"Avenida do Contorno, 6061, São Pedro, Belo Hor..."
2,57,X-Tudo,Burger Joint,-19.9401,-43.9359,611357,-2205210.0,-19.94,-43.9364,"Avenida do Contorno, 6162, Savassi, Belo Horiz..."
3,57,BRONX,Burger Joint,-19.9404,-43.9361,611332,-2205240.0,-19.94,-43.9364,"Avenida do Contorno, 6162, Savassi, Belo Horiz..."
4,37,Slow Burger,Burger Joint,-19.9397,-43.9306,611911,-2205160.0,-19.94,-43.9307,"Avenida do Contorno, 5731, Carmo, Belo Horizonte"


In [27]:
others_df = restaurants_df[restaurants_df['Category'] != 'Burger Joint']
others_df.reset_index(drop=True, inplace=True)

print(others_df.shape)
others_df.head()

(728, 10)


Unnamed: 0,Distance,Name,Category,Latitude,Longitude,X,Y,Lat,Lon,Address
0,63,Cafeteria Jamile,Café,-19.9445,-43.9401,610910,-2205690.0,-19.9447,-43.9407,"Rua São Romão, São Pedro, Belo Horizonte"
1,62,Restaurante Katucha,Restaurant,-19.945,-43.9402,610904,-2205750.0,-19.9447,-43.9407,"Rua São Romão, São Pedro, Belo Horizonte"
2,2,Casa da Lasanha,Italian Restaurant,-19.9447,-43.9364,611301,-2205710.0,-19.9447,-43.9364,"Rua Padre Severino, São Pedro, Belo Horizonte"
3,15,Sapore D'Itália,Italian Restaurant,-19.9446,-43.9363,611314,-2205710.0,-19.9447,-43.9364,"Rua Padre Severino, São Pedro, Belo Horizonte"
4,16,Ser Saudável,Salad Place,-19.9446,-43.9365,611295,-2205700.0,-19.9447,-43.9364,"Rua Padre Severino, São Pedro, Belo Horizonte"


Let's have a look at the basic overall numbers:

In [28]:
import numpy as np

n_res = len(location_restaurants)
n_loc = len(df_locations)
n_bgr = len(burgers_df)

print('Number of Restaurants: ', n_res)
print('Number of Burger Restaurants: ', n_bgr)
print('Number of locations: ', n_loc)

print('Average number of restaurants per area with radius=150m: ', n_res / n_loc)
print('Average number of Burger restaurants per area with radius=150m: ', n_bgr / n_loc)
print('Average number of Restaurants per Burger Restaurant: ', n_res / n_bgr)
print('Percentage of Burger restaurants: {:.2f}%'.format(len(burgers_df) / len(venues_df) * 100))

Number of Restaurants:  756
Number of Burger Restaurants:  28
Number of locations:  364
Average number of restaurants per area with radius=150m:  2.076923076923077
Average number of Burger restaurants per area with radius=150m:  0.07692307692307693
Average number of Restaurants per Burger Restaurant:  27.0
Percentage of Burger restaurants: 3.70%


Looking good. So now we have all the restaurants in area within few kilometers from Liberty Square, and we know which ones are Burger Joints! We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Burger restaurant!

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Belo Horizonte that have high restaurant density, particularly those with low number of Burger restaurants. We will limit our analysis to area ~6km around Liberty Square.

In first step we have collected the required **data: location and type (category) of every restaurant within 6km from Liberty Square** (Praca da Liberdade). We have also **identified Burger restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Belo Horizonte - we will use **heatmaps** to identify a few promising areas close to center with high number of restaurants in general (*but* with low or no Burger restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **more than 8 restaurants in radius of 150 meters**, and we want locations **with no more than 1 Burger restaurant in radius of 150 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

Let's crete a map showing **heatmap / density of restaurants** and try to extract some meaningfull info from that. 

In [29]:
from folium import plugins
from folium.plugins import HeatMap
from folium.plugins import FastMarkerCluster

map_bh2 = folium.Map(location=location, zoom_start = 15,) 

others_df['Latitude'] = others_df['Latitude'].astype(float)
others_df['Longitude'] = others_df['Longitude'].astype(float)

heat_df = others_df[['Latitude', 'Longitude']]
heat_data = [[row['Latitude'],row['Longitude']] for index, row in heat_df.iterrows()]
HeatMap(heat_data).add_to(map_bh2)

FastMarkerCluster(data=list(zip(burgers_df['Latitude'].values, burgers_df['Longitude'].values))).add_to(map_bh2)
folium.LayerControl().add_to(map_bh2)

# Display the map
map_bh2

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Now let's invert the visualization method to see if brings us any extra insight:

In [30]:
map_bh3 = folium.Map(location=location, zoom_start = 15) 

burgers_df['Latitude'] = burgers_df['Latitude'].astype(float)
burgers_df['Longitude'] = burgers_df['Longitude'].astype(float)

heat_df2 = burgers_df[['Latitude', 'Longitude']]
heat_data2 = [[row['Latitude'],row['Longitude']] for index, row in heat_df2.iterrows()]
HeatMap(heat_data2).add_to(map_bh3)

FastMarkerCluster(data=list(zip(others_df['Latitude'].values, others_df['Longitude'].values))).add_to(map_bh3)
folium.LayerControl().add_to(map_bh3)

map_bh3

This map is not so 'hot' (Burger restaurants represent a subset of ~3.7% of all restaurants in this area) but it also indicates higher density of existing Burger Joints in the western side of Liberty Square, with closest pockets of **low Burger restaurant density positioned east, north and center-west from Liberty Square**.

Now let's count the **number of restaurants in every area candidate** and then filter the ones that match our parameters, our "Areas Of Interest".

In [31]:
resume_restaurants = restaurants_df.groupby('Address').size().sort_values(ascending=False) \
  .reset_index(name='Sum of Restaurants')
resume_burgers = burgers_df.groupby('Address').size().sort_values(ascending=False) \
  .reset_index(name='Sum of Burger Places')
resume = resume_restaurants.merge(resume_burgers, how = 'outer', on = ['Address'])
resume = resume.merge(df_locations[['Address','Lat','Lon','X','Y']], how='inner', on='Address')
print(resume.shape)
resume = resume.sort_values('Sum of Restaurants', ascending=False)
resume['Sum of Restaurants'] = resume['Sum of Restaurants'].fillna(0).astype(int)
resume['Sum of Burger Places'] = resume['Sum of Burger Places'].fillna(0).astype(int)

(292, 7)


In [32]:
AOI = resume[(resume['Sum of Restaurants']>8) & (resume['Sum of Burger Places']<2)]
print(AOI.shape)
AOI.head()

(20, 7)


Unnamed: 0,Address,Sum of Restaurants,Sum of Burger Places,Lat,Lon,X,Y
0,"Rua dos Tupis, 337, Centro, Belo Horizonte",20,1,-19.922431,-43.940148,610925.207085,-2203247.0
1,"Avenida do Contorno, 6061, São Pedro, Belo Hor...",13,1,-19.941176,-43.93429,611525.207085,-2205326.0
2,"Avenida Augusto de Lima, Centro, Belo Horizonte",12,0,-19.924761,-43.937266,611225.207085,-2203507.0
3,"Avenida Augusto de Lima, Centro, Belo Horizonte",12,0,-19.923617,-43.942289,610700.207085,-2203377.0
4,"Avenida Augusto de Lima, Centro, Belo Horizonte",12,0,-19.923609,-43.940856,610850.207085,-2203377.0


-


Ok, but what does that mean? In **one sentence**, it means that:

In [33]:
AOIr = pd.merge(restaurants_df, AOI,  how='inner', on=['Lat','Lon'])
AOIb = pd.merge(burgers_df, AOI,  how='inner', on=['Lat','Lon'])

print('There are {} restaurants, from which {} are Burger Places, distributed over {} different 75m radius areas of interest.'.format(AOIr.shape[0], AOIb.shape[0], AOI.shape[0]))

There are 118 restaurants, from which 4 are Burger Places, distributed over 20 different 75m radius areas of interest.


**Let's see how this sentence looks on a map.**

In [34]:
map_bh4 = folium.Map(location=location, zoom_start=15)

heat_df3 = AOIr[['Latitude', 'Longitude']]
heat_data3 = [[row['Latitude'],row['Longitude']] for index, row in heat_df3.iterrows()]
HeatMap(heat_data3).add_to(map_bh4)

for lat, lon in list(zip(AOI['Lat'].values, AOI['Lon'].values)):
    folium.Circle([lat, lon], radius=75, color='blue', fill=False).add_to(map_bh4)

for latitude, longitude in list(zip(AOIb['Latitude'].values, AOIb['Longitude'].values)):
    folium.Marker([latitude, longitude], popup='Burger Place').add_to(map_bh4)

map_bh4

That's already very insightful! We're close, now, to obtain our results!

We now have some locations fairly close to Liberty Square (Praca da Liberdade), and we know that each of those locations has no more than 1 Burger Place in radius of 150m and over 8 restaurants within 150m as indication of "trend" locations. Any of those locations is already a potential candidate for a new Burger restaurant, at least based on nearby competition.

Looking good. What we have now is a clear indication of zones with high number of restaurants in vicinity, and *no or only 1* Burger restaurant nearby.

Let us now **cluster** those locations to create **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis. 

In [35]:
AOIr.head(5)

Unnamed: 0,Distance,Name,Category,Latitude,Longitude,X_x,Y_x,Lat,Lon,Address_x,Address_y,Sum of Restaurants,Sum of Burger Places,X_y,Y_y
0,56,Parrilla del Pátio,BBQ Joint,-19.9407,-43.934,611555,-2205280.0,-19.9412,-43.9343,"Avenida do Contorno, 6061, São Pedro, Belo Hor...","Avenida do Contorno, 6061, São Pedro, Belo Hor...",13,1,611525.207085,-2205326.0
1,48,Eddie Fine Burgers,Burger Joint,-19.9408,-43.934,611557,-2205290.0,-19.9412,-43.9343,"Avenida do Contorno, 6061, São Pedro, Belo Hor...","Avenida do Contorno, 6061, São Pedro, Belo Hor...",13,1,611525.207085,-2205326.0
2,50,Marietta,Salad Place,-19.9408,-43.934,611556,-2205290.0,-19.9412,-43.9343,"Avenida do Contorno, 6061, São Pedro, Belo Hor...","Avenida do Contorno, 6061, São Pedro, Belo Hor...",13,1,611525.207085,-2205326.0
3,6,Ah! Bon,Café,-19.9411,-43.9343,611528,-2205320.0,-19.9412,-43.9343,"Avenida do Contorno, 6061, São Pedro, Belo Hor...","Avenida do Contorno, 6061, São Pedro, Belo Hor...",13,1,611525.207085,-2205326.0
4,62,Cappuccini,Café,-19.9406,-43.9341,611548,-2205270.0,-19.9412,-43.9343,"Avenida do Contorno, 6061, São Pedro, Belo Hor...","Avenida do Contorno, 6061, São Pedro, Belo Hor...",13,1,611525.207085,-2205326.0


In [36]:
from sklearn.cluster import KMeans

number_of_clusters = 9

good_xys = AOIr[['X_x', 'Y_x']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_bh5 = folium.Map(location=location, zoom_start=15)
#folium.TileLayer('cartodbpositron').add_to(map_bh5)
HeatMap(heat_data3).add_to(map_bh5)
for latitude, longitude in list(zip(AOIb['Latitude'].values, AOIb['Longitude'].values)):
    folium.Marker([latitude, longitude], popup='Burger Place').add_to(map_bh5)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=150, color='green', fill=True, fill_opacity=0.25).add_to(map_bh5) 
for lat, lon in list(zip(AOI['Lat'].values, AOI['Lon'].values)):
    folium.Circle([lat, lon], radius=75, color='blue', fill=False, fill_color='blue', fill_opacity=1).add_to(map_bh5)
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bh)
#folium.GeoJson(bh_boroughs, style_function=boroughs_style, name='geojson').add_to(map_bh)
map_bh5

  
  
  
  
  
  
  
  
  


**NOTE: The *Heat Map* refers to number of restaurants within the previously selected *Blue* areas (Areas of Interest), while the *Green* areas represent the Clusters created by analyzing the density of such restaurants with KMEANS.**

__________


Not bad - our clusters represent groupings of most of the candidate locations and cluster centers are placed nicely in the middle of the zones 'rich' with location candidates.

Addresses of those cluster centers will be a good starting point for exploring the neighborhoods to find the best possible location based on neighborhood specifics.

Let's see those zones on a city map without heatmap, using shaded areas to indicate our clusters:

In [37]:
map_bh6 = folium.Map(location=location, zoom_start=15)
folium.Marker(location).add_to(map_bh6)
for lat, lon in list(zip(AOI['Lat'].values, AOI['Lon'].values)):
    folium.Circle([lat, lon], radius=75, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.2).add_to(map_bh6)
for lat, lon in list(zip(AOI['Lat'].values, AOI['Lon'].values)):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bh6)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=150, color='green', fill=True, fill_color='green', fill_opacity=0.2).add_to(map_bh6) 
#folium.GeoJson(bh_boroughs, style_function=boroughs_style, name='geojson').add_to(map_bh)
map_bh6

Finaly, let's **reverse geocode those candidate area centers to get the addresses** which can be presented to stakeholders.

In [38]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(lat, lon)
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, location_x, location_y)
    print('{}{} => {:.1f}km from Liberty Square'.format(addr, ' '*(50-len(addr)), d/1000))
    

Addresses of centers of areas recommended for further analysis

Rua dos Tupis, 337, Centro, Belo Horizonte         => 1.1km from Liberty Square


  
  


Avenida do Contorno, 6087, São Pedro, Belo Horizonte => 1.0km from Liberty Square


  


Rua Alvarenga Peixoto, Lourdes, Belo Horizonte     => 0.7km from Liberty Square


  


Rua Goiás, 300, Centro, Belo Horizonte             => 0.7km from Liberty Square


  


Rua Padre Rolim, Santa Efigênia, Belo Horizonte    => 1.2km from Liberty Square


  


Avenida Augusto de Lima, 744, Centro, Belo Horizonte => 1.2km from Liberty Square


  


Rua Gonçalves Dias, Santo Agostinho, Belo Horizonte => 1.0km from Liberty Square


  


Rua Alvarenga Peixoto, Lourdes, Belo Horizonte     => 0.3km from Liberty Square
Rua da Bahia, Centro, Belo Horizonte               => 0.8km from Liberty Square


  


This concludes our analysis. We have created 9 addresses representing centers of zones containing locations with high number of restaurants and no or only 1 Burger restaurants nearby, all zones being quite close to Liberty Square (Praca da Liberdade). Although zones are shown on map with a radius of ~150 meters (green circles), their shape is actually irregular and their centers/addresses should be considered only as a starting point for exploring areas in search for potential restaurant locations. Most of the zones are located in areas that are popular with tourists, very close to city center and well connected by public transport.

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in Belo Horizonte (~800 in our initial area of interest alone, which was 3x3km around Liberty Square), even though there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected north and south-south-east from Liberty Square.

We first created a dense grid of location candidates (spaced 150m appart); those locations were then filtered, so that those with less than 8 restaurants and more than 1 Burger in radius of 75m were removed. The high density restaurants was chosen as these pocket areas reflect popularity (and consequently general demand) among inhabitants and tourists.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors, like Commercial Points availability, for example.

The result of all this is a set of 9 zones containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Burger restaurants specifically. This, of course, does not imply that those zones are actually optimal locations for a new restaurant, as other factors need to be taken into account. Purpose of this analysis was to only provide starting points for more detailed analysis which could eventually result in location which has not only no nearby same class competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify areas in Belo Horizonte close to downtown area with high demand for restaurants, but low or no presence of Burger Restaurants, in order to aid stakeholders in narrowing down the search for optimal location for a new Burger restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general areas of interest and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were obtained to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zones, taking into consideration additional factors like attractiveness of each location, type of restaurants concentration (shopping malls, squares, or other), real estate availability, prices, social and economic dynamics of every neighborhood, etc.