# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

The restaurant chain of Eastern European cuisine "EEC", St. Petersburg, Russia plans to expand to foreign countries. It was decided to open the first foreign restaurant in Tallinn, Estonia - the nearest capital of a foreign state to St. Petersburg.

Preliminary analysis has shown that the most popular area of Tallinn for both tourists and locals is Old Town or Vanalinn (https://www.visittallinn.ee/eng/visitor/ideas-tips/tips-and-guides/top-must-see-sights).

It is the oldest part of Tallinn, Estonia. Old Town of Tallinn has managed to wholly preserve its structure of medieval and Hanseatic origin. Old town has exceptionally intact 13th century city plan. Since 1997, the area has been registered in the UNESCO World Heritage List. The old town is bordered by the Walls of Tallinn. Its area is 113 ha. The majority of the old town's structures were built during the 13th–16th centuries (https://en.wikipedia.org/wiki/Tallinn_Old_Town).

Thus, the location of the restaurant was determined by the old town and the adjacent territories within walking distance (up to 1.5 km) from Town Hall Square - the cultural and historical center of the old town.

The aim of the project is to determine the most promising locations for placing the restaurant, taking into account the presence of restaurants in general and with Eastern European cuisine in particular in the area under consideration.

Determining a specific location for a restaurant is beyond the scope of this project, requiring additional analysis, including conducted directly on the spot.

## Data <a name="data"></a>

Based on the formulation of a business problem, the main factors influencing its solution will be:

* coverage area for analysis (1.5 km from Town Hall Square);
* total number of restaurants of all types in the area under consideration (density);
* number of Eastern European restaurants in the area under consideration (density);
* distance between neighboring restaurants of Eastern European cuisine; 
* distance from the center of the considered area (Town Hall Square) to each restaurant of Eastern European cuisine found in it.

We will use a uniformly distributed grid of locations centered around Town Hall Square to break the area into separate locations.

The sources of the necessary data will be:

* coordinates of the center of the considered area (Town Hall Square) will be obtained using the Nominatim geocoding API;
* centers of individual locations of the considered area will be generated algorithmically, and their coordinates will be obtained using reverse geocoding;
* number of restaurants, their type and location will be obtained using the Foursquare API.

### Neighborhood Candidates

Let's determine the latitude and longitude of the center of the area under consideration using a specific, well-known place (Town Hall Square) and the Nominatim geocoding API.

In [1]:
from geopy.geocoders import Nominatim

address = 'Town Hall Square, Tallinn, Estonia'

geolocator = Nominatim(user_agent='ny_explorer')
location = geolocator.geocode(address)
lat = location.latitude
lon = location.longitude

tallinn_center = lat, lon
print('Coordinate of {}: {}'.format(address, tallinn_center))

Coordinate of Town Hall Square, Tallinn, Estonia: (59.43735425, 24.74521104002489)


Let's create a grid of potential locations at an equal distance from each other, centered in Town Hall Square and with a radius of 1.5 km.

Our locations will be defined as circles with a radius of 300 meters, so their centers will be located at a distance of 600 meters from each other.

To calculate distances, we will create a grid of locations in a Cartesian 2D coordinate system, which allows us to calculate distances in meters (and not in degrees of latitude/ longitude). Then we will project these coordinates back to the degrees of latitude/ longitude that will be shown on the Folium map.

Let's take the functions for converting between the spherical coordinate system WGS 84 (degrees of latitude/longitude) and the Cartesian coordinate system UTM (X/Y coordinates in meters).

In [2]:
#!pip install pyproj
import pyproj
import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Tallinn center longitude={}, latitude={}'.format(tallinn_center[1], tallinn_center[0]))
x, y = lonlat_to_xy(tallinn_center[1], tallinn_center[0])
print('Tallinn center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Tallinn center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Tallinn center longitude=24.74521104002489, latitude=59.43735425
Tallinn center UTM X=1051472.6299566994, Y=6629285.178051489
Tallinn center longitude=24.74521104002489, latitude=59.437354249999984


  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)


Let's create a grid of cells by shifting every second row and adjusting the distance between the vertical rows so that the center of each cell is equally distant from all its neighbors.

In [3]:
# city center in Cartesian coordinates
tallinn_center_x, tallinn_center_y = lonlat_to_xy(tallinn_center[1], tallinn_center[0]) 

# vertical offset for hexagonal grid cells
k = math.sqrt(3) / 2

x_min = tallinn_center_x - 1500
x_step = 600
y_min = tallinn_center_y - 1500 - (int(21/k)*k*600 - 3000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(tallinn_center_x, tallinn_center_y, x, y)
        if (distance_from_center <= 1501):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon

19 candidate neighborhood centers generated.


  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)


We visualize the data we have - the location of the center of the area under consideration (Town Hall Square) and the boundaries of potential locations within it.

In [4]:
#!pip install folium
import folium

map_tallinn = folium.Map(location=tallinn_center, zoom_start=13)
folium.Marker(tallinn_center, popup='Town Hall Square').add_to(map_tallinn)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_tallinn)

map_tallinn

Now we have the coordinates of the locations to be further evaluated, with the centers at an equal distance from each other and within 1.5 km of Town Hall Square.

Place all this into a Pandas dataframe.

In [5]:
import pandas as pd

df_locations = pd.DataFrame({'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head()

Unnamed: 0,Latitude,Longitude,X,Y,Distance from center
0,59.428941,24.732119,1050873.0,6628246.0,1200.0
1,59.428157,24.742541,1051473.0,6628246.0,1039.230485
2,59.427371,24.752964,1052073.0,6628246.0,1200.0
3,59.433932,24.72824,1050573.0,6628766.0,1039.230485
4,59.433148,24.738664,1051173.0,6628766.0,600.0


And save this data into local file.

In [6]:
df_locations.to_pickle('./locations.pkl')    

### Foursquare
Now that we have potentially interesting locations, we use the Foursquare API to get information about restaurants located in them in general and Eastern European cuisine in particular.

In [7]:
import requests

foursquare_client_id = 'DFB4RPIFKPANCRDRLNOMT5PJFA5EJOWKEJ3LWJRLP3YWRNUM' 
foursquare_client_secret = 'UBSNPTQDUOAKVDI3GVQ5JU5RHEY41BPSWKUZS4TN00GPCBA3'

Category IDs corresponding to restaurants were taken from Foursquare web site https://developer.foursquare.com/docs/resources/categories.

Restaurants of Russian and Ukrainian cuisine belonging to Eastern European cuisine, are allocated in Foursquare in separate categories ("Russian Restaurant" and "Ukrainian Restaurant"), so we will add them to the category "Eastern European Restaurant".

In [8]:
# category for all food-related venues
food_category = '4d4b7105d754a06374d81259'

# categories for Eastern European, Russian and Ukrainian restaurants
easteur_restaurant_categories = ['4bf58dd8d48988d109941735','52e928d0bcbc57f1066b7e97',
                                 '58daa1558bbb0b01f18ec1ee','56aa371be4b08b9a8d5734f3',
                                 '52960bac3cf9994f4e043ac4','52e928d0bcbc57f1066b7e98',
                                 '5293a7563cf9994f4e043a44','52e928d0bcbc57f1066b7e9d',
                                 '52e928d0bcbc57f1066b7e9c','52e928d0bcbc57f1066b7e96',
                                 '52e928d0bcbc57f1066b7e9a','52e928d0bcbc57f1066b7e9b']

We are interested in catering establishments, but only those that are real restaurants.

Coffee shops, pizzerias, bakeries, etc. are not our direct competitors, so we will include in our list only those establishments in the name of the category of which there are words indicating their attitude to "serious" establishments such as a restaurant.

In [9]:
def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse', 'нouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific


def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

We will remove unnecessary information from the received addresses - the name of the country and the city.

In [10]:
def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(' Tallinn, Eesti', '')
    address = address.replace(', Tallinn, Eesti', '')
    return address


def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, 
                             limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

Now let's go through our locations and find nearby restaurants.

We will also keep a dictionary of all the restaurants found in general and Eastern European cuisine in particular.

In [11]:
def get_restaurants(lats, lons):
    restaurants = {}
    easteur_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):             
        # we use a radius of 350 m to ensure overlap and not to miss restaurants 
        # we use dictionaries to remove duplicates that occur as a result of overlap
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, 
                                          foursquare_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_easteur = is_restaurant(venue_categories, 
                                               specific_filter=easteur_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], 
                              venue_address, venue_distance, is_easteur, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_easteur:
                    easteur_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, easteur_restaurants, location_restaurants

We can download restaurant data from the local file system if we have done this before.

Or use the Foursquare API to get data and save it to the local file system.

In [12]:
import pickle

restaurants = {}
easteur_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('easteur_restaurants.pkl', 'rb') as f:
        easteur_restaurants = pickle.load(f)
    with open('location_restaurants.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

if not loaded:
    restaurants, easteur_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    with open('restaurants.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('easteur_restaurants.pkl', 'wb') as f:
        pickle.dump(easteur_restaurants, f)
    with open('location_restaurants.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)        

Obtaining venues around candidate locations:

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lo

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lo

 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lo

 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 .

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


 . done.


  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


We will display the search results in a generalized form.

In [13]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Eastern European restaurants:', len(easteur_restaurants))
print('Percentage of Eastern European: {:.2f}%'.format(len(easteur_restaurants) / len(restaurants) * 100))

Total number of restaurants: 199
Total number of Eastern European restaurants: 15
Percentage of Eastern European: 7.54%


We can view all the restaurants found...

In [14]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:5]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4b558a5bf964a520e2e627e3', 'Mandarin', 59.42875506774824, 24.731546189319406, 'Endla 23A,', 38, False, 1050843.412517936, 6628220.600809776)
('4e27eb61e4cd6c6cb3414e12', 'Toidugalerii ReStart', 59.429388039761726, 24.731159052425166, 'Endla 16,', 73, False, 1050811.2679422386, 6628287.370300681)
('599b23d058002c4cce3bab2c', 'Restoran 100', 59.429427, 24.735012, '15 Endla, 10122', 172, False, 1051027.6895597982, 6628323.735024275)
('51ae16d9498e32ebec67578c', 'Nipernaadi', 59.428926696112214, 24.73230591832704, 'Endla 23, 10122', 10, True, 1050883.4106637198, 6628245.8986522565)
('5a8172341fa7634a4cda427b', 'Poke Bowl', 59.429523, 24.743278, '13 Hariduse, 10119', 157, False, 1051491.7852637274, 6628403.166540368)
...
Total: 199


... and only restaurants of Eastern European cuisine.

In [15]:
print('List of Eastern European restaurants')
print('---------------------------')
for r in list(easteur_restaurants.values())[:5]:
    print(r)
print('...')
print('Total:', len(easteur_restaurants))

List of Eastern European restaurants
---------------------------
('51ae16d9498e32ebec67578c', 'Nipernaadi', 59.428926696112214, 24.73230591832704, 'Endla 23, 10122', 10, True, 1050883.4106637198, 6628245.8986522565)
('4bb31e7d715eef3b41f285bb', 'Caravan Restaurant', 59.43359262525735, 24.76025859645835, 'Narva mnt. 7c, 10117', 228, True, 1052381.7203281638, 6628994.691632097)
('548b4dda498e949de7f1a2ad', 'Kivi Paber Käärid', 59.43867894572112, 24.728327883915476, 'Telliskivi 60a C4 (Telliskivi Loomelinnak),', 226, True, 1050500.1159197856, 6629291.154158094)
('5122765fe4b078ac9b2ef4a8', 'Rataskaevu 16', 59.436808, 24.742454, 'Rataskaevu 16, 10123', 167, True, 1051326.2798022463, 6629201.8177785855)
('4b968814f964a52090d234e3', 'Tchaikovsky', 59.43773828726152, 24.74705786922795, 'Vene 9, 10123', 112, True, 1051570.3642979015, 6629343.025351866)
...
Total: 15


Let's look at the map (in the area under consideration) all the restaurants found in general and Eastern European cuisine in particular, marking them in different colors (blue and red, respectively).

In [16]:
map_tallinn = folium.Map(location=tallinn_center, zoom_start=13)
folium.Marker(tallinn_center, popup='Town Hall Square').add_to(map_tallinn)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_easteur = res[6]
    color = 'red' if is_easteur else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, 
                        fill_opacity=1).add_to(map_tallinn)

map_tallinn

So, now we have all the restaurants in the area within 1.5 km from Town Hall Square; we also know which of them are restaurants of Eastern European cuisine.

The data collection stage is completed.

Now we will use this data for analysis to choose the optimal locations for our restaurant.

## Methodology <a name="methodology"></a>

In this project, we will determine the locations of the historical part of Tallinn and the surrounding areas with a low density of restaurants in general and restaurants of Eastern European cuisine in particular.

We will limit our search (analysis) to an area with a radius of 1.5 km from the "heart" of old Tallinn - Town Hall Square.

At the first stage, we collected the necessary data - the location of all restaurants within 1.5 km from Town Hall Square, as well as restaurants of Eastern European cuisine (in accordance with the Foursquare categorization).

The second stage of our analysis is the calculation and study of the "density" of restaurants in different locations of the considered area in order to identify several promising locations near Town Hall Square with a small number of restaurants in general and the absence of restaurants of Eastern European cuisine nearby. To do this, we will use heat maps.

Thus, the results of our project will become the starting point for a detailed study at the "street level" and, taking into account the analysis of additional factors beyond the scope of this study, determining the optimal location of the first foreign restaurant of the company "EEC.

## Analysis <a name="analysis"></a>

We will conduct a basic analysis of our data and get additional information for the purposes of our research.

First, let's calculate the average number of restaurants in each location of the area under consideration.

In [17]:
location_restaurants_count = [len(res) for res in location_restaurants]
df_locations['Restaurants in area'] = location_restaurants_count
print('Average number of restaurants in every area with radius=300m:', 
      np.array(location_restaurants_count).mean())

df_locations.head()

Average number of restaurants in every area with radius=300m: 9.368421052631579


Unnamed: 0,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,59.428941,24.732119,1050873.0,6628246.0,1200.0,4
1,59.428157,24.742541,1051473.0,6628246.0,1039.230485,4
2,59.427371,24.752964,1052073.0,6628246.0,1200.0,7
3,59.433932,24.72824,1050573.0,6628766.0,1039.230485,8
4,59.433148,24.738664,1051173.0,6628766.0,600.0,4


Now we will calculate the distance to the nearest restaurant of Eastern European cuisine from each location of the area under consideration (not only within 300 meters from the center of the location)...

In [18]:
distances_to_easteur_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 1500
    for res in easteur_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_easteur_restaurant.append(min_distance)

df_locations['Distance to Eastern European restaurant'] = distances_to_easteur_restaurant
df_locations.head()

Unnamed: 0,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Eastern European restaurant
0,59.428941,24.732119,1050873.0,6628246.0,1200.0,4,10.780818
1,59.428157,24.742541,1051473.0,6628246.0,1039.230485,4,589.219295
2,59.427371,24.752964,1052073.0,6628246.0,1200.0,7,810.033662
3,59.433932,24.72824,1050573.0,6628766.0,1039.230485,8,530.570025
4,59.433148,24.738664,1051173.0,6628766.0,600.0,4,383.845799


...and we will output the average value.

In [19]:
print('Average distance to closest Eastern European restaurant from each area center:', 
      df_locations['Distance to Eastern European restaurant'].mean())

Average distance to closest Eastern European restaurant from each area center: 461.58586770522226


Thus, on average, a restaurant of Eastern European cuisine can be found within 500 meters from each location of the area under consideration. This is quite close, so you need to carefully explore promising locations.

To do this, we will create a heat map showing the density of restaurant locations, and try to extract information from it that is significant for the purposes of this project.

For the convenience of visual perception, we will show circles on the map indicating the distance of 0.5 km, 1 km and 2 km from Town Hall Square.

In [20]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]
easteur_latlons = [[res[2], res[3]] for res in easteur_restaurants.values()]

In [21]:
from folium import plugins
from folium.plugins import HeatMap

map_tallinn = folium.Map(location=tallinn_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_tallinn)
HeatMap(restaurant_latlons).add_to(map_tallinn)
folium.Marker(tallinn_center).add_to(map_tallinn)
folium.Circle(tallinn_center, radius=500, fill=False, color='white').add_to(map_tallinn)
folium.Circle(tallinn_center, radius=1000, fill=False, color='white').add_to(map_tallinn)
folium.Circle(tallinn_center, radius=1500, fill=False, color='white').add_to(map_tallinn)

map_tallinn

We see that in the western and northern parts of the area under consideration there are locations with a relatively low density of restaurants.

Let's create a heat map showing the density of placement of only Eastern European restaurants.

In [22]:
map_tallinn = folium.Map(location=tallinn_center, zoom_start=13)
folium.Marker(tallinn_center, popup='Town Hall Square').add_to(map_tallinn)
folium.TileLayer('cartodbpositron').add_to(map_tallinn)
HeatMap(easteur_latlons).add_to(map_tallinn)
folium.Marker(tallinn_center).add_to(map_tallinn)
folium.Circle(tallinn_center, radius=500, fill=False, color='white').add_to(map_tallinn)
folium.Circle(tallinn_center, radius=1000, fill=False, color='white').add_to(map_tallinn)
folium.Circle(tallinn_center, radius=2000, fill=False, color='white').add_to(map_tallinn)

map_tallinn

This map shows a sufficient number of locations where there are no restaurants of Eastern European cuisine.

Taking into account the map of the density of placement of restaurants of all categories, we can determine the locations that are the most priority for the placement of our restaurant.

## Results <a name="results"></a>

Our research shows that there are a relatively large number of restaurants of different types / categories in the historical part of Tallinn and in the surrounding area (~200 in our area of interest, with a radius of 1.5 km around Town Hall Square).

The largest concentration of restaurants was found in the center and to the east of Town Hall Square, while there are locations with a relatively low density of restaurants in the west and north.

The density of restaurants of Eastern European cuisine, as the analysis shows, is quite low in the entire area under consideration.

## Discussion <a name="discussion"></a>

On the one hand, the locations to the west and north of Town Hall Square can be considered as a priority for further more detailed research, including in the "field" conditions.

However, this does not mean that these locations are the optimal places for our restaurant. Perhaps there are good reasons for the low density of restaurants in these locations, which can become stop factors for opening our restaurant there.

The recommended locations should be considered only as a starting point for further, more detailed analysis.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to determine the most promising locations for placing an Eastern European cuisine restaurant in the historical part of Tallinn, taking into account the presence of restaurants in general and with Eastern European cuisine in particular in the area under consideration.

Having calculated the distribution of the density of restaurants according to Foursquare data, we have identified the locations (to the west and north of Town Hall Square) that will be used as starting points for a detailed study.

The final decision on the location of the restaurant will be made by the interested parties on the basis of additional information about the locations and the competitive environment obtained during their study directly on the spot.