# Capstone Project - The Battle of Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

A big US company for fitness food wants to open up a flagshipstore in Germany.<br>
They have already choosen Frankfurt (am Main) as the target city, because Frankfurt is one of the biggest city in central Germany and the one hot spot for fitness influencer and bodybuilding.
In this project we will try to find an optimal location for the flagship store. Specifically, this report will be targeted to stakeholders of the US company.
<br>
From their experience in other countries the company identified the following location based criteria in order to garantee the success of the flagshipstore.
* There should be many fitness centers or sports clubs in the city, which indicates that there are a lot of customers living in the city.
* There should not be many competitors in the city which also offer health menues.


There are a lot of fitness centers and sport clubs in the city so we will try to identify locations that have a very high amount of restaurants. In addtion we are particularly interested in areas with no asian restaurants in vicinity. We would also prefer locations as close to city center as possible, assuming that first two conditions are met.
<br>
In order to be able to make a reasonable recomandation more detailed information about the city environments are needed. 
In order to fullfill customer criteria location data derived from foursquare will be used to give an educacted suggestion in which location the company should open up their flagshipstore. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.
<br>

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing asian restaurants in the neighborhood
* number of and distance to fitness and sports clubs in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Nominatim API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Berlin center will be obtained using **Nominatim API geocoding** of well known Berlin location (Alexanderplatz)

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Frankfurt city center.

Let's first find the latitude & longitude of the Frankfurt city center, using specific, well known address and Nominatim API geocoding.
The specific address we are looking for is a street called "Freßgass" (Hochstraße 43, 60313 Frankfurt).

In [3]:
!pip install geopy
from geopy.geocoders import Nominatim

In [4]:
address = 'Hochstraße 43, Frankfurt, Germany'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
center = [location.latitude,location.longitude]

print('Coordinates of {}: {},{}'.format(address,center[0],center[1]))

Coordinates of Hochstraße 43, Frankfurt, Germany: 50.1156658,8.67436999298753


Now let's create a grid of area candidates, equaly spaced, centered around city center and within ~6km from "Freßgass". Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

In [6]:
#!pip install shapely
import shapely.geometry

In [7]:
#!pip install pyproj
import pyproj

In [8]:
import math

In [9]:
def lonlat_to_xy(lon,lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm",zone=33,datum='WGS84')
    xy = pyproj.transform(proj_latlon,proj_xy,lon,lat)
    return xy[0],xy[1]

def xy_to_lonlat(x,y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm",zone=33,datum='WGS84')
    lonlat = pyproj.transform(proj_xy,proj_latlon,x,y)
    return lonlat[0],lonlat[1]

def calc_xy_distance(x1,y1,x2,y2):
    dx = x2-x1
    dy = y2-y1
    return math.sqrt(dx*dx+dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('center longitude={}, latitude={}'.format(center[1],center[0]))
x, y = lonlat_to_xy(center[1],center[0])
print('center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
center longitude=8.67436999298753, latitude=50.1156658
center UTM X=47913.1032108044, Y=5570676.413829651
center longitude=8.67436999298753, latitude=50.1156658


Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [10]:
center_x,center_y = lonlat_to_xy(center[1],center[0]) # City center in Cartesian coordinates

k = math.sqrt(3)/2 # Vertical offset for hexagonal grid cells
x_min = center_x-6000
x_step = 600
y_min = center_y-6000-(int(21/k)*k*600-12000)/2
y_step = 600*k 

coordinates = []
distances_from_center = []
xs = []
ys = []
for i in range(0,int(21/k)):
    y = y_min+i*y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min+j*x_step+x_offset
        distance_from_center = calc_xy_distance(center_x,center_y,x,y)
        if (distance_from_center<=6001):
            lon,lat = xy_to_lonlat(x,y)
            coordinates.append((lat,lon))
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(coordinates), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [11]:
#!pip install folium
import folium

In [12]:
map_city = folium.Map(location=center,zoom_start=13)
folium.Marker(center,popup='Hochstraße 43').add_to(map_city)
for lat,lon in coordinates:
    folium.Circle([lat,lon],radius=300,color='blue',fill=False).add_to(map_city)
map_city

OK, we now have the coordinates of centers of neighborhoods/areas to be evaluated, equally spaced (distance from every point to it's neighbors is exactly the same) and within ~6km from "Freßgass".

Let's now use Nominatim API to get approximate addresses of those locations.
We will only use locations where information about road, postcode and city are present.

In [13]:
addresses = []
idx_delete = []
print('Obtaining location addresses: ',end='')
for idx,coordinate in enumerate(coordinates):
    address = geolocator.reverse(coordinate,language="en").raw['address']
    if 'road' in address and 'postcode' in address and 'postcode' in address:
        road = address['road']
        postcode = address['postcode']
        city = address['city']
        postcode_city = ' '.join([postcode,city])
        road_postcode_city = ', '.join([road,postcode_city])
        addresses.append(road_postcode_city)
    else:
        idx_delete.append(idx)
        
    print('.', end='')
print('done.')

Obtaining location addresses: ............................................................................................................................................................................................................................................................................................................................................................................done.


Now lets place all this into a Pandas dataframe. In order to do so we have also to remove the locations with missing information before the dataframe creation.

In [14]:
import pandas as pd

In [15]:
for idx in idx_delete:
    del coordinates[idx]
    del xs[idx]
    del ys[idx]
    del distances_from_center[idx]

In [16]:
latitudes = [x[0] for x in coordinates]
longitude = [x[1] for x in coordinates]

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitude,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

print(df_locations.shape)
print(df_locations.head(10))

(359, 6)
                                 Address   Latitude  Longitude             X  \
0       Habichtschneise, 60528 Frankfurt  50.063203   8.656123  46113.103211   
1       Bussardschneise, 60528 Frankfurt  50.063660   8.664455  46713.103211   
2         Milanschneise, 60528 Frankfurt  50.064116   8.672787  47313.103211   
3                  F 10, 60598 Frankfurt  50.064572   8.681119  47913.103211   
4          Welscher Weg, 60598 Frankfurt  50.065027   8.689452  48513.103211   
5             Beckerweg, 60599 Frankfurt  50.065481   8.697784  49113.103211   
6      Schillerschneise, 60599 Frankfurt  50.065935   8.706117  49713.103211   
7   Otto-Fleck-Schneise, 60528 Frankfurt  50.067161   8.643010  45213.103211   
8  Mörfelder Landstraße, 60528 Frankfurt  50.067619   8.651342  45813.103211   
9   Otto-Fleck-Schneise, 60528 Frankfurt  50.068076   8.659675  46413.103211   

              Y  Distance from center  
0  5.564961e+06           5992.495307  
1  5.564961e+06           5840

Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are asian restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in our list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Asian restaurant' category, as we need info on Asian restaurants in the neighborhood.

*Foursquare credentials are defined in hidden cell bellow.*

In [17]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100
RADIUS = 500

In [18]:
# Category IDs were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):
asian_restaurant_categories = ['4bf58dd8d48988d142941735','4bf58dd8d48988d1f9941735','4bf58dd8d48988d118951735',
                               '50aa9e744b90af0d42d5de0e','52f2ab2ebcbc57f1066b8b45','52f2ab2ebcbc57f1066b8b46',
                               '52f2ab2ebcbc57f1066b8b1c','52f2ab2ebcbc57f1066b8b41','4bf58dd8d48988d14a941735',
                               '5744ccdfe4b0c0459246b4cd','52af39fb3cf9994f4e043be9','56aa371be4b08b9a8d573568',
                               '4bf58dd8d48988d145941735','52af3a5e3cf9994f4e043bea','52af3a723cf9994f4e043bec',
                               '52af3a7c3cf9994f4e043bed','58daa1558bbb0b01f18ec1d3','52af3a673cf9994f4e043beb',
                               '52af3a903cf9994f4e043bee','4bf58dd8d48988d1f5931735','52af3a9f3cf9994f4e043bef',
                               '52af3aaa3cf9994f4e043bf0','52af3ab53cf9994f4e043bf1','52af3abe3cf9994f4e043bf2',
                               '52af3ac83cf9994f4e043bf3','52af3ad23cf9994f4e043bf4','52af3add3cf9994f4e043bf5',
                               '52af3af23cf9994f4e043bf7','52af3ae63cf9994f4e043bf6','52af3afc3cf9994f4e043bf8',
                               '52af3b053cf9994f4e043bf9','52af3b213cf9994f4e043bfa','52af3b293cf9994f4e043bfb',
                               '52af3b343cf9994f4e043bfc','52af3b3b3cf9994f4e043bfd','52af3b463cf9994f4e043bfe',
                               '52af3b633cf9994f4e043c01','52af3b513cf9994f4e043bff','52af3b593cf9994f4e043c00',
                               '52af3b6e3cf9994f4e043c02','52af3b773cf9994f4e043c03','52af3b813cf9994f4e043c04',
                               '52af3b893cf9994f4e043c05','52af3b913cf9994f4e043c06','52af3b9a3cf9994f4e043c07',
                               '52af3ba23cf9994f4e043c08','4eb1bd1c3b7b55596b4a748f','52e81612bcbc57f1066b79fb',
                               '4bf58dd8d48988d111941735','55a59bace4b013909087cb0c','55a59bace4b013909087cb30',
                               '55a59bace4b013909087cb21','55a59bace4b013909087cb06','55a59bace4b013909087cb1b',
                               '55a59bace4b013909087cb1e','55a59bace4b013909087cb18','55a59bace4b013909087cb24',
                               '55a59bace4b013909087cb15','55a59bace4b013909087cb27','55a59bace4b013909087cb12',
                               '4bf58dd8d48988d1d2941735','55a59bace4b013909087cb2d','55a59a31e4b013909087cb00',
                               '55a59af1e4b013909087cb03','55a59bace4b013909087cb2a','55a59bace4b013909087cb0f',
                               '55a59bace4b013909087cb33','55a59bace4b013909087cb09','55a59bace4b013909087cb36',
                               '4bf58dd8d48988d113941735','56aa371be4b08b9a8d5734e4','56aa371be4b08b9a8d5734f0',
                               '56aa371be4b08b9a8d5734e7','56aa371be4b08b9a8d5734ed','56aa371be4b08b9a8d5734ea',
                               '56aa371be4b08b9a8d57350e','4bf58dd8d48988d149941735','56aa371be4b08b9a8d573502']

athletics_sports_category = ['4f4528bc4b90abdf24c9de85','4bf58dd8d48988d175941735','52f2ab2ebcbc57f1066b8b47',
                             '503289d391d4c4b30a586d6a','52f2ab2ebcbc57f1066b8b49','4bf58dd8d48988d105941735',
                             '52f2ab2ebcbc57f1066b8b48','4bf58dd8d48988d176941735','4bf58dd8d48988d101941735',
                             '58daa1558bbb0b01f18ec203','5744ccdfe4b0c0459246b4b2','4bf58dd8d48988d106941735',
                             '590a0744340a5803fd8508c3','4bf58dd8d48988d102941735','52e81612bcbc57f1066b7a2e',
                             '52e81612bcbc57f1066b7a28','4bf58dd8d48988d1e2941735','56aa371be4b08b9a8d57355e',
                             '4bf58dd8d48988d15e941735','52e81612bcbc57f1066b7a26','50328a4b91d4c4b30a586d6b',
                             '58daa1558bbb0b01f18ec1d0','5744ccdfe4b0c0459246b4af','56aa371be4b08b9a8d57351d',
                             '52e81612bcbc57f1066b7a44','52f2ab2ebcbc57f1066b8b3c','58daa1558bbb0b01f18ec1ae',
                             '4bf58dd8d48988d1ed941735','4d1cf8421a97d635ce361c31','4bf58dd8d48988d193941735',
                             '4bf58dd8d48988d1b2941735','4bf58dd8d48988d1f2941735']

Now we go over our neighborhood locations and get nearby asian restaurants and nearby sport clubs. We save the number of category values in the original dataframe but also save the found locations in the new dataframe df_venues.

In [19]:
import requests

In [20]:
df_venues = pd.DataFrame(columns=['Name','Latitude','Longitude','Asian_Restaurant','Sports_Club'])

num_AR = [0]*df_locations.shape[0]
num_SC = [0]*df_locations.shape[0]
idx = 0

print('Obtaining location venues: ',end='')
for row in df_locations.iterrows():
    latitude = row[1]['Latitude']
    longitude = row[1]['Longitude']
    
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,latitude,longitude,ACCESS_TOKEN,VERSION,'',RADIUS,LIMIT)
    result = requests.get(url).json()
    venues = result['response']['venues']
    
    num_asian_restaurants = 0
    num_sports_clubs = 0
    for venue in venues:
        name = venue['name']
        latitude = venue['location']['lat']
        longitude = venue['location']['lng']
        if venue['categories'] != []:
            if venue['categories'][0]['id'] in asian_restaurant_categories:
                df_venues = df_venues.append({'Name':name,'Latitude':latitude,'Longitude':longitude,'Asian_Restaurant':1,'Sports_Club':0},ignore_index=True)
                num_asian_restaurants = num_asian_restaurants+1
            elif venue['categories'][0]['id'] in athletics_sports_category:
                df_venues = df_venues.append({'Name':name,'Latitude':latitude,'Longitude':longitude,'Asian_Restaurant':0,'Sports_Club':1},ignore_index=True)
                num_sports_clubs = num_sports_clubs+1
    
    num_AR[idx] = num_asian_restaurants
    num_SC[idx] = num_sports_clubs
    idx = idx+1
    print('.',end='')

df_locations['Number_Asian_Restaurants'] = num_AR
df_locations['Number_Sports_Clubs'] = num_SC
print('done.')

Obtaining location venues: .......................................................................................................................................................................................................................................................................................................................................................................done.


Now we print the dataframe heads to see if they where filled correctly.

In [21]:
print(df_locations.head())
print(df_venues.head())

                            Address   Latitude  Longitude             X  \
0  Habichtschneise, 60528 Frankfurt  50.063203   8.656123  46113.103211   
1  Bussardschneise, 60528 Frankfurt  50.063660   8.664455  46713.103211   
2    Milanschneise, 60528 Frankfurt  50.064116   8.672787  47313.103211   
3             F 10, 60598 Frankfurt  50.064572   8.681119  47913.103211   
4     Welscher Weg, 60598 Frankfurt  50.065027   8.689452  48513.103211   

              Y  Distance from center  Number_Asian_Restaurants  \
0  5.564961e+06           5992.495307                         0   
1  5.564961e+06           5840.376700                         0   
2  5.564961e+06           5747.173218                         3   
3  5.564961e+06           5715.767665                         3   
4  5.564961e+06           5747.173218                         2   

   Number_Sports_Clubs  
0                    9  
1                    6  
2                    5  
3                    2  
4                    

Now we mark the found locations of our map. **Asian restaurants will be marked in Red** and **Sports Clubs will be marked in Green**

In [22]:
map_city = folium.Map(location=center,zoom_start=13)
folium.Marker(center,popup='Hochstraße 43').add_to(map_city)
for row in df_venues.iterrows():
    lat = row[1]['Latitude']
    lon = row[1]['Longitude']
    if row[1]['Asian_Restaurant']==1:
        color = 'red'
    elif row[1]['Sports_Club']==1:
         color = 'green'
    folium.CircleMarker([lat,lon],radius=3,color=color,fill=True,fill_color=color,fill_opacity=1).add_to(map_city)
map_city

So now that we have the asian restaurants and sports clubs in the areas, we can start to analyse this data and use it for our location recommendation.

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas in Frankfurt that have a low amount of asian restaurant and a high amount of sports clubs. We will limit our analysis to  an area ~6km around the city center.

In the first step we have collected the required data: the location every asian restaurant and every sports club within 6km from Frankfurt center ("Freßgass"). THe locations have been identified according to the Foursquare categorization.

The Second step in our analysis will be the calculation and exploration of the 'restaurant density' and the 'sports club density' across different areas of Frankfurt - we will use heatmaps to identify a few promising areas close to the center with a low number of restaurants and a high number of sports clubs.

In third and final step we will focus on most the promising areas and within those create clusters of locations that meet the basic requirements, which were  established in discussions with the stakeholders. We will take into consideration locations with:
* no more than one restaurant in radius of 250 meters and 
* at least 2 sports clubs in radius of 400 meters.

We will present a map of those locations but also create clusters (using k-means clustering) to identify general zones / neighborhoods / addresses which should be a starting point for final selection process.
In the final selection process the stakeholders will visit the recommended locations and perform some further analysis based on customer flows, which will not be part of this analyis report.

## Analysis <a name="analysis"></a>

First we calculate the distance to the nearest restaurant and the nearest sport club from every area candidate center (not only those within 300m - we want distance to closest one, regardless of how distant it is).

In [23]:
distances_restaurant = []
distances_sport = []

for row1 in df_locations.iterrows():
    area_x = row1[1]['X']
    area_y = row1[1]['Y']
    min_distance_rest = 10000
    min_distance_sport = 10000
    for row2 in df_venues.iterrows():
        lat = row2[1]['Latitude']
        lon = row2[1]['Longitude']
        res_x,res_y = lonlat_to_xy(lon,lat)
        
        d = calc_xy_distance(area_x,area_y,res_x,res_y)
        if row2[1]['Asian_Restaurant']==1:
            if d<min_distance_rest:
                min_distance_rest = d
        elif row2[1]['Sports_Club']==1:
            if d<min_distance_sport:
                min_distance_sport = d
            
    distances_restaurant.append(min_distance_rest)
    distances_sport.append(min_distance_sport)
    
df_locations['Distance to restaurant'] = distances_restaurant
df_locations['Distance to sports club'] = distances_sport

In [24]:
print('Average distance to restaurant from each area center:', df_locations['Distance to restaurant'].mean())
print('Average distance to sprots club from each area center:', df_locations['Distance to sports club'].mean())

Average distance to restaurant from each area center: 452.8798331754038
Average distance to sprots club from each area center: 431.31103520814094


In the next step we create a map showing heatmap/density of restaurants. Also we will on our map and circles indicating the distance of 1km, 2km and 3km from the "Freßgass".

In [25]:
from folium import plugins
from folium.plugins import HeatMap

In [26]:
restaurant_latlons =  []
sport_latlons =  []
for row in df_venues.iterrows():
    lat = row[1]['Latitude']
    lon = row[1]['Longitude']
       
    if row[1]['Asian_Restaurant']==1:
        restaurant_latlons.append([lat,lon])
    elif row[1]['Sports_Club']==1:
        sport_latlons.append([lat,lon])        

In [27]:
map_city = folium.Map(location=center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_city) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_city)
folium.Marker(center).add_to(map_city)
folium.Circle(center,radius=1000,fill=False,color='white').add_to(map_city)
folium.Circle(center,radius=2000,fill=False,color='white').add_to(map_city)
folium.Circle(center,radius=3000,fill=False,color='white').add_to(map_city)
map_city

In the next step we create another map showing heatmap/density of sport clubs. Also we will on our map and circles indicating the distance of 1km, 2km and 3km from the "Freßgass".

In [28]:
map_city = folium.Map(location=center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_city) #cartodbpositron cartodbdark_matter
HeatMap(sport_latlons).add_to(map_city)
folium.Marker(center).add_to(map_city)
folium.Circle(center,radius=1000,fill=False,color='white').add_to(map_city)
folium.Circle(center,radius=2000,fill=False,color='white').add_to(map_city)
folium.Circle(center,radius=3000,fill=False,color='white').add_to(map_city)
map_city

Based on the maps we will now focus our analysis on the area north-east the center - we will move the center of our area of interest and reduce it's size to have a radius of 2.5km. The center will be moved to "Zeil 64, 60313 Frankfurt am Main".

In [39]:
address = 'Zeil 64, Frankfurt, Germany'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
roi_center = [location.latitude,location.longitude]
roi_x,roi_y = lonlat_to_xy(roi_center[1],roi_center[0])

print('Coordinates of {}: {},{}'.format(address,roi_center[0],roi_center[1]))   
    
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100*k 
roi_y_min = roi_y-2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min+i*y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x+j*x_step+x_offset
        d = calc_xy_distance(roi_x,roi_y,x,y)
        if (d <= 2501):
            lon,lat = xy_to_lonlat(x,y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

idx_delete = []
idx = 0
for row in df_venues.iterrows():
    lat = row[1]['Latitude']
    lon = row[1]['Longitude']
    res_x,res_y = lonlat_to_xy(lon,lat)
    d = calc_xy_distance(roi_x,roi_y,res_x,res_y)
    if d>2500:
        idx_delete.append(idx)
    idx = idx+1
    
df_venues_roi = df_venues.copy()
df_venues_roi = df_venues_roi.drop(df_venues_roi.index[idx_delete])
print('Total number of venues: ',df_venues.shape[0])
print('Number of venues within 2,5km: ',df_venues_roi.shape[0])

Coordinates of Zeil 64, Frankfurt, Germany: 50.1148522,8.6868859
1145 candidate neighborhood centers generated.
Total number of venues:  2919
Number of venues within 2,5km:  463


In [40]:
map_city = folium.Map(location=roi_center, zoom_start=14)
HeatMap(sport_latlons).add_to(map_city)
folium.Marker(center).add_to(map_city)
folium.Circle(roi_center,radius=2500,color='white',fill=True,fill_opacity=0.4).add_to(map_city)
for lat,lon in zip(roi_latitudes,roi_longitudes):
    folium.Circle([lat,lon],radius=100,color='blue',fill=False).add_to(map_city)
map_city

Now we calculate two most important things for each location candidate: number of restaurants in vicinity (we'll use radius of 250 meters) and number of sports clubs in vicinity (we'll use radius of 400 meters).

In [41]:
addresses = []
idx_delete = []
print('Obtaining location addresses: ',end='')
for idx,coordinate in enumerate(zip(roi_latitudes,roi_longitudes)):
    address = geolocator.reverse(coordinate,language="en").raw['address']
    if 'road' in address and 'postcode' in address and 'postcode' in address:
        road = address['road']
        postcode = address['postcode']
        city = address['city']
        postcode_city = ' '.join([postcode,city])
        road_postcode_city = ', '.join([road,postcode_city])
        addresses.append(road_postcode_city)
    else:
        idx_delete.append(idx)
        
    print('.', end='')
print('done.')

Obtaining location addresses: ..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

In [42]:
for idx in idx_delete:
    del roi_latitudes[idx]
    del roi_longitudes[idx]
    del roi_xs[idx]
    del roi_ys[idx]
    
df_roi_locations = pd.DataFrame({'Address': addresses,
                                 'Latitude': roi_latitudes,
                                 'Longitude': roi_longitudes,
                                 'X': roi_xs,
                                 'Y': roi_ys})

print(df_roi_locations.shape)

(1136, 5)


In [48]:
radius_rest = 250
radius_sport = 400

num_restaurant = [0]*df_roi_locations.shape[0]
num_sport = [0]*df_roi_locations.shape[0]

idx = 0
print('Analyssing locations: ',end='')
for row1 in df_roi_locations.iterrows():
    area_x = row1[1]['X']
    area_y = row1[1]['Y']
    
    count_rest = 0
    count_sports = 0
    for row2 in df_venues_roi.iterrows():
        lat = row2[1]['Latitude']
        lon = row2[1]['Longitude']
        res_x,res_y = lonlat_to_xy(lon,lat)
        
        d = calc_xy_distance(area_x,area_y,res_x,res_y)
        if row2[1]['Asian_Restaurant']==1:
            if d<=radius_rest:
                count_rest = count_rest+1
        elif row2[1]['Sports_Club']==1:
            if d<=radius_sport:
                count_sports = count_sports+1
            
    num_restaurant[idx] = count_rest
    num_sport[idx] = count_sports
    idx = idx+1
    print('.', end='')
    
print('done.')
    
df_roi_locations['Number_Asian_Restaurants'] = num_restaurant
df_roi_locations['Number_Sports_Clubs'] = num_sport
df_roi_locations.head(10)

Analyssing locations: ;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.;.

In [49]:
df_roi_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Number_Asian_Restaurants,Number_Sports_Clubs
0,"Darmstädter Landstraße, 60598 Frankfurt",50.092542,8.690529,48849.278355,5568010.0,0,0
1,"Grethenweg, 60598 Frankfurt",50.093278,8.689732,48799.278355,5568097.0,0,6
2,"Darmstädter Landstraße, 60598 Frankfurt",50.093354,8.691122,48899.278355,5568097.0,0,6
3,"Unterster Zwerchweg, 60599 Frankfurt",50.09343,8.692512,48999.278355,5568097.0,0,6
4,"Unterster Zwerchweg, 60599 Frankfurt",50.093506,8.693901,49099.278355,5568097.0,0,4
5,"Schweinfurter Weg, 60599 Frankfurt",50.093581,8.695291,49199.278355,5568097.0,0,0
6,"Hainer Weg, 60599 Frankfurt",50.093657,8.69668,49299.278355,5568097.0,0,0
7,"Schweinfurter Weg, 60599 Frankfurt",50.093733,8.69807,49399.278355,5568097.0,0,0
8,"Darmstädter Landstraße, 60598 Frankfurt",50.09409,8.690325,48849.278355,5568183.0,0,7
9,"Darmstädter Landstraße, 60598 Frankfurt",50.094166,8.691715,48949.278355,5568183.0,0,10


Now we filter those locations. We're interested only in locations with:
* no asian restaurant in radius of 250 meters and 
* at least 5 sports clubs in radius of 400 meters.

In [51]:
import numpy as np

In [57]:
good_res_count = np.array((df_roi_locations['Number_Asian_Restaurants']<=0))
print('Locations with no more than one restaurant nearby:', good_res_count.sum())

good_sport_count = np.array((df_roi_locations['Number_Sports_Clubs']>=5))
print('Locations with more than two sports clubs nearby:', good_sport_count.sum())

good_locations = np.logical_and(good_res_count,good_sport_count)
print('Locations with both conditions met:',good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than one restaurant nearby: 425
Locations with more than two sports clubs nearby: 653
Locations with both conditions met: 226


We now have a list of locations which meet our requirements. Any of those locations is a potential candidate for the new flagshipstore.
In the next step we cluster those locations to create centers of zones containing good locations. Those zones, their centers and addresses will be the final result of our analysis.

In [58]:
from sklearn.cluster import KMeans

In [59]:
good_xys = df_good_locations[['X','Y']].values
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

number_of_clusters = 15
kmeans = KMeans(n_clusters=number_of_clusters,random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0],cc[1]) for cc in kmeans.cluster_centers_]

map_city = folium.Map(location=center,zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_city)
HeatMap(restaurant_latlons).add_to(map_city)
folium.Marker(center).add_to(map_city)
for lon,lat in cluster_centers:
    folium.Circle([lat,lon],radius=500,color='green',fill=True,fill_opacity=0.25).add_to(map_city) 
for lat,lon in zip(good_latitudes,good_longitudes):
    folium.CircleMarker([lat,lon],radius=2,color='blue',fill=True,fill_color='blue',fill_opacity=1).add_to(map_city)
map_city

Let's see those zones on a city map without heatmap, using shaded areas to indicate our clusters:

In [60]:
map_city = folium.Map(location=center,zoom_start=13)
folium.Marker(center).add_to(map_city)
for lat,lon in zip(good_latitudes,good_longitudes):
    folium.Circle([lat,lon],radius=250,color='#00000000',fill=True,fill_color='#0066ff',fill_opacity=0.07).add_to(map_city)
for lat,lon in zip(good_latitudes,good_longitudes):
    folium.CircleMarker([lat,lon],radius=2,color='blue',fill=True,fill_color='blue',fill_opacity=1).add_to(map_city)
for lon,lat in cluster_centers:
    folium.Circle([lat,lon],radius=500,color='green',fill=False).add_to(map_city) 
map_city

Finally, let's reverse geocode those candidate area centers to get the addresses which can be presented to stakeholders.

In [64]:
candidate_area_addresses = []
candidate_distances = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon,lat in cluster_centers:
    coordinate = (lat,lon)
    address = geolocator.reverse(coordinate,language="en").raw['address']
    if 'road' in address and 'postcode' in address and 'postcode' in address:
        road = address['road']
        postcode = address['postcode']
        city = address['city']
        postcode_city = ' '.join([postcode,city])
        road_postcode_city = ', '.join([road,postcode_city])
        candidate_area_addresses.append(road_postcode_city)
        
        x, y = lonlat_to_xy(lon,lat)
        d = calc_xy_distance(x,y,center_x,center_y)
        candidate_distances.append(d)
        
df_candidates = pd.DataFrame()
df_candidates['Address'] = candidate_area_addresses
df_candidates['Distance_Center'] = candidate_distances
df_candidates.sort_values(by=['Distance_Center'],inplace=True,ascending=True)

for row in df_candidates.iterrows():
    print('{}{} => {:.1f}km from Freßgass'.format(row[1]['Address'], ' '*(50-len(row[1]['Address'])),row[1]['Distance_Center']/1000))

Addresses of centers of areas recommended for further analysis

Friedberger Landstraße, 60316 Frankfurt            => 1.2km from Freßgass
Lange Straße, 60311 Frankfurt                      => 1.6km from Freßgass
Nibelungenallee, 60318 Frankfurt                   => 1.9km from Freßgass
Günthersburgallee, 60389 Frankfurt                 => 2.2km from Freßgass
Weseler Werft, 60314 Frankfurt                     => 2.2km from Freßgass
Länderweg, 60599 Frankfurt                         => 2.4km from Freßgass
Röderbergweg, 60314 Frankfurt                      => 2.4km from Freßgass
Mayfarthstraße, 60314 Frankfurt                    => 2.6km from Freßgass
Darmstädter Landstraße, 60598 Frankfurt            => 2.7km from Freßgass
Gagernstraße, 60385 Frankfurt                      => 2.9km from Freßgass
Röderbergweg, 60385 Frankfurt                      => 2.9km from Freßgass
Strahlenberger Weg, 60599 Frankfurt                => 3.0km from Freßgass
Im Bärengarten, 60599 Frankfurt                 

This concludes our analysis. We have created 15 addresses representing centers of zones containing locations with low number of asian restaurants and a high number of sports clubs nearby, all zones being fairly close to city center. Although zones are shown on map with a radius of ~500 meters (green circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential locations.