# Japanese Restaurant in Madrid. Where and why?

## Table of contents
* [Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Japanese restaurant** in **Madrid**, Spain.

Since there are lots of restaurants in Spain we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Japanese restaurants in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Madrid center will be obtained using **Google Maps API geocoding** of well known Madrid location (Puerta del Sol)

### Cells generation
Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 18x18 killometers centered around Madrid city center. We will get approximiately a full city area coverage.

Let's first find the latitude & longitude of Madrrid city center, using specific, well known address and Google Maps geocoding API.

In [18]:
#api keys
google_api_key = 'AIzaSyDS37LsmR21qrc6QAXlziR0Tkfq2uhb3C4'
client_id = 'EDZVIEP5YCZYBWJNFQHOOOKG4GLNAEMNFHRRU4ASTVJJLYCW' # your Foursquare ID
client_secret = '2DOW12Y13LJ03ZA4GVP4YQLWZQTFMW3GWF3SGA3Z5ESZ50KB' # your Foursquare Secret
version = '20180604'

In [10]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Madrid, Spain'
madrid_center = get_coordinates(google_api_key, address, True)
print('Coordinate of {}: {}'.format(address, madrid_center))

Google Maps API JSON result => {'results': [{'address_components': [{'long_name': 'Madrid', 'short_name': 'Madrid', 'types': ['locality', 'political']}, {'long_name': 'Madrid', 'short_name': 'M', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Community of Madrid', 'short_name': 'Community of Madrid', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'Spain', 'short_name': 'ES', 'types': ['country', 'political']}], 'formatted_address': 'Madrid, Spain', 'geometry': {'bounds': {'northeast': {'lat': 40.5638447, 'lng': -3.5249115}, 'southwest': {'lat': 40.3120639, 'lng': -3.8341618}}, 'location': {'lat': 40.4167754, 'lng': -3.7037902}, 'location_type': 'APPROXIMATE', 'viewport': {'northeast': {'lat': 40.5638447, 'lng': -3.5249115}, 'southwest': {'lat': 40.3120639, 'lng': -3.8341618}}}, 'place_id': 'ChIJgTwKgJcpQg0RaSKMYcHeNsQ', 'types': ['locality', 'political']}], 'status': 'OK'}
Coordinate of Madrid, Spain: [40.4167754, -3.7037902]


In [11]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

z = 30
_projections = {}

#this is a functions to convert latlon values to Cartesian and reverse.

def project(lo, la): #latlon to xy
    z = 30
    if z not in _projections:
        _projections[z] = pyproj.Proj(proj='utm', zone=z, ellps='WGS84')
    x, y = _projections[z](lo, la)
    if y < 0:
        y += 10000000
    return x, y

def unproject(x, y): #xy to latlon
    z = 30
    if z not in _projections:
        _projections[z] = pyproj.Proj(proj='utm', zone=z, ellps='WGS84')
    lng, lat = _projections[z](x, y, inverse=True)
    return (lng, lat)

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Madrid center longitude={}, latitude={}'.format(madrid_center[1], madrid_center[0]))
x, y = project(madrid_center[1], madrid_center[0])
print('Madrid center UTM X={}, Y={}'.format(x, y))
lo, la = unproject(x, y)
print('Madrid center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Madrid center longitude=-3.7037902, latitude=40.4167754
Madrid center UTM X=440291.2677340498, Y=4474254.644794532
Madrid center longitude=-3.7037902, latitude=40.416775400000006


#### Grid generation and visualisation

In [8]:
madrid_center_x, madrid_center_y = project(madrid_center[1], madrid_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = madrid_center_x - 9000
x_step = 600
y_min = madrid_center_y - 9000 - (int(31/k)*k*600 - 18000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(31/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 31):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(madrid_center_x, madrid_center_y, x, y)
        if (distance_from_center <= 9001):
            lon, lat = unproject(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')


813 candidate neighborhood centers generated.


In [5]:
import folium

In [12]:
map_madrid = folium.Map(location=madrid_center, zoom_start=13)
folium.Marker(madrid_center, popup='Puerta del Sol').add_to(map_madrid)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='green', fill=False).add_to(map_madrid)
map_madrid

In [13]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, madrid_center[0], madrid_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(madrid_center[0], madrid_center[1], addr))

Reverse geocoding check
-----------------------
Address of [40.4167754, -3.7037902] is: Puerta del Sol, 11, 28013 Madrid, Spain


Let's now use Google Maps API to get approximate addresses of cells centres

In [14]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Spain', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [15]:
addresses[150:170] #example list of centres addreses

['Paseo de Alabarderos, 49, 28024 Madrid',
 'Carr. de Extremadura, Madrid',
 'Calle Mirabel, 4, 28044 Madrid',
 'Calle de Soledad Cazorla, 14, 28044 Madrid',
 'Calle de Gando, 7, 28044 Madrid',
 'Unnamed Road, 28044 Madrid',
 'Av. de los Poblados, 78, 28044 Madrid',
 'Calle Federico Grases, 36, 28025 Madrid',
 'Calle de Belzunegui, 3, 28025 Madrid',
 'Plaza Rendición de Breda, 7, 28025 Madrid',
 'Parque Emperatriz María de Austria, Via Lusitana, 3, 28025 Madrid',
 'Calle Arenaria, 7, 28026 Madrid',
 'Unnamed Road, 28026 Madrid',
 'Unnamed Road, 28026 Madrid',
 'Hospital 12 Octubre, 28041 Madrid',
 'Cmo. de Perales, 90, 28041 Madrid',
 'Embajadores - Viveros Raga, 28053 Madrid',
 'Av. Sta. Catalina, 12, 28053 Madrid',
 'Calle de la Serena, 31, 28053 Madrid',
 'Av. de Entrevías, 128, 28053 Madrid']

In [16]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Av. gran bretaña, S/N, 28916 Leganés, Madrid",40.33914,-3.73125,437890.32259,4465656.0,8903.229751
1,"A-42, 28021 Madrid",40.339184,-3.724187,438490.32259,4465656.0,8760.56505
2,"Calle San Mames, 48, 28021 Madrid",40.339228,-3.717123,439090.32259,4465656.0,8657.222418
3,"Av. Real de Pinto, 106, 28021 Madrid",40.339272,-3.710059,439690.32259,4465656.0,8594.62041
4,"Calle San Norberto, 21, 28021 Madrid",40.339315,-3.702996,440290.32259,4465656.0,8573.651497
5,"Av. de Andalucía, 38, 28021 Madrid",40.339357,-3.695932,440890.32259,4465656.0,8594.62041
6,"Calle de Godella, 205, 28021 Madrid",40.3394,-3.688868,441490.32259,4465656.0,8657.222418
7,"Calle Arroyo de la Bulera, 4, 28021 Madrid",40.339442,-3.681805,442090.32259,4465656.0,8760.56505
8,"Calle de Berrocal, 78, 28021 Madrid",40.339483,-3.674741,442690.32259,4465656.0,8903.229751
9,"Calle Clara Janés, 7, 28919 Leganés, Madrid",40.343708,-3.748961,436390.32259,4466176.0,8948.603243


In [17]:
df_locations.to_pickle('./madrid_locations.pkl') #save to pickle

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each cell

We're only interested in restaraunts, so we filter it from all other venues like bakeries, etc.

In [19]:
# Category IDs corresponding to Japanese restaurants were taken from Foursquare web site

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

japanese_restaurant_categories = ['4bf58dd8d48988d111941735', '55a59bace4b013909087cb0c', '55a59bace4b013909087cb30',
                                '55a59bace4b013909087cb21', '55a59bace4b013909087cb06', '55a59bace4b013909087cb1b', 
                                '55a59bace4b013909087cb1e', '55a59bace4b013909087cb18', '55a59bace4b013909087cb24',
                                '55a59bace4b013909087cb15', '55a59bace4b013909087cb27', '55a59bace4b013909087cb12',
                                '4bf58dd8d48988d1d2941735', '55a59bace4b013909087cb2d', '55a59a31e4b013909087cb00',
                                '55a59af1e4b013909087cb03', '55a59bace4b013909087cb2a', '55a59bace4b013909087cb0f',
                                '55a59bace4b013909087cb33', '55a59bace4b013909087cb09', '55a59bace4b013909087cb36'] #special Japanese categories

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Spain', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [22]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found japanese restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    japanese_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to make sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, client_id, client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_japanese = is_restaurant(venue_categories, specific_filter=japanese_restaurant_categories)
            if is_res:
                x, y = project(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_japanese, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_japanese:
                    japanese_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, japanese_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
japanese_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('japanese_restaurants_350.pkl', 'rb') as f:
        japanese_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, japanese_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('japanese_restaurants_350.pkl', 'wb') as f:
        pickle.dump(japanese_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [23]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Italian restaurants:', len(japanese_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(japanese_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 3311
Total number of Italian restaurants: 135
Percentage of Italian restaurants: 4.08%
Average number of restaurants in neighborhood: 3.5043050430504303


In [24]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4fdcf2f8e4b09d4fcc55df81', 'La Tagliatella', 40.33892070896567, -3.73189638758068, 'C.C. Parquesur (Av. Gran Bretaña, s/n), 28916 Leganés Madrid, España', 60, False, 437835.2337973132, 4465632.225239797)
('57894d4c498e877018cc45f6', 'La Martínez', 40.33923, -3.733078, 'cc. parquesur, 28916 Leganés Madrid, España', 155, False, 437735.1551427487, 4465667.3865855)
('5367b732498e2b4b283b9d25', 'Tommy Mel´s', 40.33909675923206, -3.7336032612302232, 'C.C. Parquesur, 28916 Leganés Madrid, España', 199, False, 437690.4184776894, 4465652.966746954)
('56f31421498e0d5be517deb4', 'Sushita Café', 40.338961, -3.732143, 'Centro comercial Parquesur, Leganés Madrid, España', 78, True, 437814.32419206924, 4465636.870713104)
('59d0d94641868645c691450f', 'wagamama', 40.339737, -3.732563, 'C.C. Parquesur (Av. Gran Bretaña, s/n), 28916 Leganés Madrid, España', 129, False, 437779.36345246225, 4465723.3005184475)
('4c0408c5187ec9288a7bb67b', 'Tortillería Libra

In [25]:
print('List of Japanese restaurants')
print('---------------------------')
for r in list(japanese_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(japanese_restaurants))

List of Japanese restaurants
---------------------------
('56f31421498e0d5be517deb4', 'Sushita Café', 40.338961, -3.732143, 'Centro comercial Parquesur, Leganés Madrid, España', 78, True, 437814.32419206924, 4465636.870713104)
('55f840e5498ea00d22f3a986', 'UDON', 40.33854663534535, -3.733334541320801, 'C.C. Parquesur, 28916 Leganés Madrid, España', 188, True, 437712.7368406997, 4465591.714826652)
('55f09cd7498e0fbe29d04ca3', 'Udon LEGANES', 40.338665245265524, -3.733500002262125, 'Avenida De Gibraltar, 28915 Leganés Madrid, España', 198, True, 437698.792057944, 4465604.99677165)
('4fe7a282e4b0c7d6bbb4d9a8', 'Nureta Neko', 40.377824, -3.797892, 'España', 343, True, 432268.6798893281, 4469998.88947353)
('4beb181761aca593b2998400', 'Japan Pearl', 40.3921000724847, -3.6924150587421485, 'C. Zinc (C. Bolívar, 8), 28045 Madrid Madrid, España', 143, True, 441234.8732545466, 4471508.0765432445)
('5717a252498e2dc51cd33d16', 'Sakura', 40.396072, -3.712003, 'Calle Antonio López, 35. (Pasaje Montse

In [27]:
print('Restaurants around location')
print('---------------------------')
for i in range(160, 170):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 161: 
Restaurants around location 162: 
Restaurants around location 163: La terraza del campo deRugby
Restaurants around location 164: 
Restaurants around location 165: Cafetería Vértice, Cafeteria Autoservicio 12 de Octubre (ext.), Cafeteria Scat
Restaurants around location 166: Kristin Kebap
Restaurants around location 167: Korynto
Restaurants around location 168: 
Restaurants around location 169: 
Restaurants around location 170: 


Let's plot all restaraunts on Madrid map as blue circles, and Japanese restaraunts as red circles

In [31]:
map_madrid = folium.Map(location=madrid_center, zoom_start=12)
folium.Marker(madrid_center, popup='Puerta del Sol').add_to(map_madrid)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_japanese = res[6]
    color = 'red' if is_japanese else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_madrid)
map_madrid

## Methodology <a name="methodology"></a>

In first step we have collected the required **data: location and type (category) of every restaurant within 9km from Madrid center** (Puerta del Sol). We have also **identified Japanese restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Madrid - we will use **heatmaps** to identify a few promising areas close to center with low number of restaurants in general (*and* no Japanese restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 250 meters**, and we want locations **without Japanese restaurants in radius of 400 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the **number of restaurants in every area candidate**:

In [32]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 3.5043050430504303


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"Av. gran bretaña, S/N, 28916 Leganés, Madrid",40.33914,-3.73125,437890.32259,4465656.0,8903.229751,21
1,"A-42, 28021 Madrid",40.339184,-3.724187,438490.32259,4465656.0,8760.56505,0
2,"Calle San Mames, 48, 28021 Madrid",40.339228,-3.717123,439090.32259,4465656.0,8657.222418,2
3,"Av. Real de Pinto, 106, 28021 Madrid",40.339272,-3.710059,439690.32259,4465656.0,8594.62041,1
4,"Calle San Norberto, 21, 28021 Madrid",40.339315,-3.702996,440290.32259,4465656.0,8573.651497,1
5,"Av. de Andalucía, 38, 28021 Madrid",40.339357,-3.695932,440890.32259,4465656.0,8594.62041,0
6,"Calle de Godella, 205, 28021 Madrid",40.3394,-3.688868,441490.32259,4465656.0,8657.222418,1
7,"Calle Arroyo de la Bulera, 4, 28021 Madrid",40.339442,-3.681805,442090.32259,4465656.0,8760.56505,0
8,"Calle de Berrocal, 78, 28021 Madrid",40.339483,-3.674741,442690.32259,4465656.0,8903.229751,0
9,"Calle Clara Janés, 7, 28919 Leganés, Madrid",40.343708,-3.748961,436390.32259,4466176.0,8948.603243,3


OK, now let's calculate the **distance to nearest Japanese restaurant from every area candidate center** (not only those within 300m - we want distance to closest one, regardless of how distant it is).

In [33]:
distances_to_japanese_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in japanese_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_japanese_restaurant.append(min_distance)

df_locations['Distance to Japanese restaurant'] = distances_to_japanese_restaurant

In [34]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Japanese restaurant
0,"Av. gran bretaña, S/N, 28916 Leganés, Madrid",40.33914,-3.73125,437890.32259,4465656.0,8903.229751,21,78.385717
1,"A-42, 28021 Madrid",40.339184,-3.724187,438490.32259,4465656.0,8760.56505,0,676.270951
2,"Calle San Mames, 48, 28021 Madrid",40.339228,-3.717123,439090.32259,4465656.0,8657.222418,2,1276.142812
3,"Av. Real de Pinto, 106, 28021 Madrid",40.339272,-3.710059,439690.32259,4465656.0,8594.62041,1,1876.096627
4,"Calle San Norberto, 21, 28021 Madrid",40.339315,-3.702996,440290.32259,4465656.0,8573.651497,1,2476.072825
5,"Av. de Andalucía, 38, 28021 Madrid",40.339357,-3.695932,440890.32259,4465656.0,8594.62041,0,3076.058307
6,"Calle de Godella, 205, 28021 Madrid",40.3394,-3.688868,441490.32259,4465656.0,8657.222418,1,3676.048529
7,"Calle Arroyo de la Bulera, 4, 28021 Madrid",40.339442,-3.681805,442090.32259,4465656.0,8760.56505,0,4276.041495
8,"Calle de Berrocal, 78, 28021 Madrid",40.339483,-3.674741,442690.32259,4465656.0,8903.229751,0,4876.036192
9,"Calle Clara Janés, 7, 28919 Leganés, Madrid",40.343708,-3.748961,436390.32259,4466176.0,8948.603243,3,1427.507


In [35]:
print('Average distance to closest Japanese restaurant from each area center:', df_locations['Distance to Japanese restaurant'].mean())

Average distance to closest Japanese restaurant from each area center: 1745.7611141827158


OK, so **on average Japanese restaurant can be found within ~2km** from every area center candidate. That's very good result!

In [36]:
#this is a Madrid borough borders map

madrid_boroughs_url = 'https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/madrid-districts.geojson'
madrid_boroughs = requests.get(madrid_boroughs_url).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

In [37]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

japanese_latlons = [[res[2], res[3]] for res in japanese_restaurants.values()]

In [39]:
from folium import plugins
from folium.plugins import HeatMap

map_madrid = folium.Map(location=madrid_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_madrid) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_madrid)
folium.Marker(madrid_center).add_to(map_madrid)
folium.Circle(madrid_center, radius=1000, fill=False, color='white').add_to(map_madrid)
folium.Circle(madrid_center, radius=2000, fill=False, color='white').add_to(map_madrid)
folium.Circle(madrid_center, radius=3000, fill=False, color='white').add_to(map_madrid)
folium.GeoJson(madrid_boroughs, style_function=boroughs_style, name='geojson').add_to(map_madrid)
map_madrid

Looks like a completely dense map (park zones at west and east are not appropriate candidates for restaraunt location)

Let's create another heatmap map showing **heatmap/density of Japanese restaurants** only.

In [40]:
map_madrid = folium.Map(location=madrid_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_madrid) #cartodbpositron cartodbdark_matter
HeatMap(japanese_latlons).add_to(map_madrid)
folium.Marker(madrid_center).add_to(map_madrid)
folium.Circle(madrid_center, radius=1000, fill=False, color='white').add_to(map_madrid)
folium.Circle(madrid_center, radius=2000, fill=False, color='white').add_to(map_madrid)
folium.Circle(madrid_center, radius=3000, fill=False, color='white').add_to(map_madrid)
folium.GeoJson(madrid_boroughs, style_function=boroughs_style, name='geojson').add_to(map_madrid)
map_madrid

#malasana, x -500, y +1500 
#arganzuela x -500, y -1500

This map is not so 'hot' (Japanese restaurants represent a subset of ~5% of all restaurants in Madrid) but it also indicates higher density of existing Japanese restaurants directly north and west from Puerta del Sol, and absence of the such type of restaraunts at south.

Based on this we will now focus our analysis on areas *south-west, south from Madrid center* - we will move the center of our area of interest and reduce it's size to have a radius of **1.5km**. This places our location candidates mostly in boroughs **Arganzuela, Carabanchel and Usera** 

### Arganzuela, Carabanchel and Usera

These boroughs are located not so far from city center.
Some words about these districts:

**Arganzuela:** 'Located just south of the centre on the banks of Madrid’s Manzanares River, Arganzuela is the best of both worlds: an easy walk into central Madrid and far enough away to have a local atmosphere and none of the crowds. It is also home to some great attractions including Madrid Rio park, a huge renovation of the river banks that was completed in 2011. It includes play parks, kiosks and terraces, football pitches and lots of space to walk, cycle or rollerblade. Another key sight is the Matadero, Madrid’s former slaughterhouse that is now a thriving cultural space with regular exhibitions, markets and its own cinema.'

**Usera:** 'Known as Madrid’s Chinatown, Usera is home to much of the city’s Chinese community and is – unsurprisingly – where the best and most authentic Chinese restaurants can be found. It is also becoming one of Madrid’s most trendy areas for its green spaces and reasonable rents. Located just south of the River Manzanares, a new riverside shopping centre, Plaza Río 2 has also helped attract more attention to the area. In 2017, Airbnb named the area one of the “17 neighbourhoods to watch in 2017”. Usera is also home to the Manzanares Linear Park, a riverside park with a manmade hill topped by the impressive sculpture La Dama del Manzanares, by Valencian artist Manolo Valdés.'

**Carabanchel:** 'Another neighbourhood just to the south of the River Manzanares, Carabanchel was first mentioned in historical documents in the 12th century. The area is known for its green spaces, especially San Isidro Park, the epicentre of the week-long Festival of San Isidro in May, dedicated to Madrid’s patron saint.'

Material from site https://theculturetrip.com/europe/spain/articles/a-guide-to-madrids-most-up-and-coming-neighbourhoods/

In [53]:
roi_x_min = madrid_center_x - 2500
roi_y_max = madrid_center_y + 500
roi_width = 4000
roi_height = 4000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = unproject(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_madrid = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_madrid)
folium.Marker(madrid_center).add_to(map_madrid)
folium.Circle(roi_center, radius=1500, color='white', fill=True, fill_opacity=0.4).add_to(map_madrid)
folium.GeoJson(madrid_boroughs, style_function=boroughs_style, name='geojson').add_to(map_madrid)
map_madrid

Let's also create new, more dense grid of location candidates restricted to our new region of interest (let's make our location candidates 100m appart).

In [62]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 1500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(31/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 31):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 1501):
            lon, lat = unproject(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

605 candidate neighborhood centers generated.


Now let's calculate two most important things for each location candidate: **number of restaurants in vicinity** (we'll use radius of **250 meters**) and **distance to closest Japanese restaurant**.

In [63]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_japanese_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, japanese_restaurants)
    roi_japanese_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [64]:
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Japanese restaurant':roi_japanese_distances})



Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Japanese restaurant
0,40.385017,-3.70406,440240.32259,4470730.0,7,1262.921046
1,40.385025,-3.702882,440340.32259,4470730.0,5,1185.773771
2,40.385765,-3.709369,439790.32259,4470816.0,1,1165.712485
3,40.385772,-3.708191,439890.32259,4470816.0,2,1188.172128
4,40.38578,-3.707013,439990.32259,4470816.0,3,1218.450005
5,40.385787,-3.705835,440090.32259,4470816.0,3,1255.980821
6,40.385794,-3.704657,440190.32259,4470816.0,4,1252.840519
7,40.385801,-3.703479,440290.32259,4470816.0,3,1170.768651
8,40.385808,-3.702301,440390.32259,4470816.0,4,1091.690937
9,40.385816,-3.701123,440490.32259,4470816.0,4,1016.306533


In [67]:
df_roi_locations.describe()

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Japanese restaurant
count,605.0,605.0,605.0,605.0,605.0,605.0
mean,40.398486,-3.707641,439948.339119,4472227.0,5.905785,554.623805
std,0.007068,0.006473,549.259917,784.5898,5.443513,260.978565
min,40.385017,-3.721283,438790.32259,4470730.0,0.0,20.23202
25%,40.392769,-3.712953,439490.32259,4471596.0,2.0,352.470681
50%,40.398291,-3.707225,439990.32259,4472202.0,4.0,545.724059
75%,40.40446,-3.702301,440390.32259,4472895.0,8.0,741.179847
max,40.411572,-3.697007,440840.32259,4473674.0,38.0,1262.921046


In [71]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Japanese restaurant']>=400)
print('Locations with no Japanese restaurants within 400m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]


Locations with no more than two restaurants nearby: 187
Locations with no Japanese restaurants within 400m: 419
Locations with both conditions met: 171


In [72]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_madrid = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_madrid)
HeatMap(restaurant_latlons).add_to(map_madrid)
folium.Circle(roi_center, radius=1500, color='white', fill=True, fill_opacity=0.6).add_to(map_madrid)
folium.Marker(madrid_center).add_to(map_madrid)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_madrid) 
folium.GeoJson(madrid_boroughs, style_function=boroughs_style, name='geojson').add_to(map_madrid)
map_madrid

We now have a bunch of locations fairly close to Puerta del Sol, and we know that each of those locations has no more than two restaurants in radius of 250m, and no Japanese restaurant closer than 400m.
Let's now show those good locations in a form of heatmap:

In [73]:
map_madrid = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_madrid)
folium.Marker(madrid_center).add_to(map_madrid)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_madrid)
folium.GeoJson(madrid_boroughs, style_function=boroughs_style, name='geojson').add_to(map_madrid)
map_madrid

Let us now **cluster** those locations to create **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis. 

In [75]:
from sklearn.cluster import KMeans

number_of_clusters = 15

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [unproject(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_madrid = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_madrid)
HeatMap(restaurant_latlons).add_to(map_madrid)
folium.Circle(roi_center, radius=1500, color='white', fill=True, fill_opacity=0.4).add_to(map_madrid)
folium.Marker(madrid_center).add_to(map_madrid)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=300, color='green', fill=True, fill_opacity=0.25).add_to(map_madrid) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_madrid)
folium.GeoJson(madrid_boroughs, style_function=boroughs_style, name='geojson').add_to(map_madrid)
map_madrid

In [77]:
map_madrid = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(madrid_center).add_to(map_madrid)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_madrid)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_madrid)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=300, color='green', fill=False).add_to(map_madrid) 
folium.GeoJson(madrid_boroughs, style_function=boroughs_style, name='geojson').add_to(map_madrid)
map_madrid

Finally, let's **reverse geocode those candidate area centers to get the addresses** which can be presented to stakeholders.

In [79]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', Spain', '')
    candidate_area_addresses.append(addr)    
    x, y = project(lon, lat)
    d = calc_xy_distance(x, y, madrid_center_x, madrid_center_y)
    print('{}{} => {:.1f}km from Puerta del Sol'.format(addr, ' '*(50-len(addr)), d/1000))
    

Addresses of centers of areas recommended for further analysis

Calle de la Verdad, 20, 28019 Madrid               => 2.9km from Puerta del Sol
Calle Torero, 6, 28026 Madrid                      => 2.7km from Puerta del Sol
Paseo de la Esperanza, 21, 28005 Madrid            => 1.8km from Puerta del Sol
Calle del Dr. Carmena Ruiz, 1, 28026 Madrid        => 2.9km from Puerta del Sol
Autopista de Circunvalación M-30, 28019 Madrid     => 2.1km from Puerta del Sol
Calle de Gil Imón, 1, 28005 Madrid                 => 1.4km from Puerta del Sol
Pasarela de la Princesa, Unnamed Road, 28026 Madrid => 3.2km from Puerta del Sol
Calle San Nicomedes, 4, 28026 Madrid               => 3.2km from Puerta del Sol
Paseo del Quince de Mayo, 28, 28019 Madrid         => 2.4km from Puerta del Sol
Paseo de la Chopera, 2, 28045 Madrid               => 2.4km from Puerta del Sol
Calle Mirasierra, 32, 28026 Madrid                 => 3.1km from Puerta del Sol
Calle de Jacinto Verdaguer, 36, 28019 Madrid       => 2

This concludes our analysis. We have created 15 addresses representing centers of zones containing locations with low number of restaurants and no Japanese restaurants nearby, all zones being fairly close to city center (all less than ~3km from Puerta del Sol, and about half of those less than 2km from Alexanderplatz). Most of zones are located near borders of three boroughs (Arganzuela, Carabanchel and Usera).

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in Madrid (~3000 in our initial area of interest which was 18x18km around Puerta del Sol), there are aboslutely no pockets of low restaurant density fairly close to city center.  Our attention was brought by near absence of Japanese restaraunt south to the center.

After directing our attention to this more narrow area of interest (covering approx. 3x3km south from Puerta del Sol) we first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than two restaurants in radius of 250m and those with an Japanese restaurant closer than 400m were removed.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is 15 zones containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Japanese restaurants particularly. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Some of addresses are near the highways - I'm sure this is not the best place for a restaurant.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Madrid areas close to center with low number of restaurants (particularly Japanese restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Japanese restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis (Arganzuela, Usera and Carabanchel), and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.

## Acknowledgments

I really appreciate the Coursera example notebook! Many thanks to you!
