# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data and Methodology](#data)
* [Analysis](#analysis)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

The Project is devoted to the problem of finding better place for Coworking office taking into account restaurants nearby. 
The idea is to find some area filled with restaurants without existing coworkings or with minimal number of. 

## Data and Methodology <a name="data"></a>

Any city can be used in this study, but for my case I've taken Saint-Petersburg - one of the most beautiful cities in Russia!
Using Foursquare we will find restaurants and coworkings around citu center and then group them with KNN.
After wi will plot these groups and analyse them in terms of number of coworkings and restaurants count.
Lets proceed!

## Analysis <a name="analysis"></a>

In [1]:
!pip install geopy



In [120]:
import requests
from geopy.geocoders import Nominatim

def get_coordinates(address):
    try:
        geolocator = Nominatim(user_agent="Capstone")
        location = geolocator.geocode(address)
        return [location.latitude, location.longitude]
    except:
        return [None, None]
    
address = 'London, UK'
city_center = get_coordinates(address)
print('Coordinate of {}: {}'.format(address, city_center))

Coordinate of London, UK: [51.5073219, -0.1276474]


In [121]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('City center longitude={}, latitude={}'.format(city_center[1], city_center[0]))
x, y = lonlat_to_xy(city_center[1], city_center[0])
print('City center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('City center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
City center longitude=-0.1276474, latitude=51.5073219
City center UTM X=-547012.468459844, Y=5815556.876143048
City center longitude=-0.12764739999999858, latitude=51.5073219


Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [122]:
city_center_x, city_center_y = lonlat_to_xy(city_center[1], city_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = city_center_x - 6000
x_step = 600
y_min = city_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(20/k)):
    y = y_min + i * y_step
    x_offset = 600 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(city_center_x, city_center_y, x, y)
        if (distance_from_center <= 3001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

89 candidate neighborhood centers generated.


Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [123]:
!pip install folium
import folium



In [124]:
map_city = folium.Map(location=city_center, zoom_start=13)
folium.Marker(berlin_center, popup='Alexanderplatz').add_to(map_city)
for lat, lon in zip(latitudes, longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_city) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_city)
    folium.Marker([lat, lon]).add_to(map_city)
map_city

In [125]:
import pandas as pd
df_locations = pd.DataFrame({'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})
df_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Distance from center
0,51.482567,-0.136688,-548212.46846,5812959.0,2861.817604
1,51.483669,-0.128347,-547612.46846,5812959.0,2666.458325
2,51.484771,-0.120006,-547012.46846,5812959.0,2598.076211
3,51.485871,-0.111664,-546412.46846,5812959.0,2666.458325
4,51.486972,-0.103321,-545812.46846,5812959.0,2861.817604
5,51.485974,-0.146558,-548812.46846,5813478.0,2749.545417
6,51.487077,-0.138217,-548212.46846,5813478.0,2400.0
7,51.488179,-0.129875,-547612.46846,5813478.0,2163.330765
8,51.489281,-0.121533,-547012.46846,5813478.0,2078.460969
9,51.490382,-0.113191,-546412.46846,5813478.0,2163.330765


### Foursquare
Now we will find all restaurants and coworkings around each point. This is done so since there is a limit up to 50 venues returned by one call. Thus, we need to call API several times

In [129]:
food_category     = '4d4b7105d754a06374d81259' 
coworking_office  = '4bf58dd8d48988d174941735'

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=50):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [132]:
import pickle
foursquare_client_id = 'BBSAP5XPFWRJSJK3JXRIIRS45YRSOIXJNTICPQ0PTOOFWNRL'      # Please put your own ID and secret here, since this one might expire
foursquare_client_secret = 'EZM5QSUT2NJDLJ0HPTVHHX22U2L0PJ5QRLOK2IKSWORSS2NC'
def get_objects(lats, lons, objects_category):
    objects = {}
    location_set = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, objects_category, foursquare_client_id, foursquare_client_secret, radius=350, limit=50)
        area_objects = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
            object_ = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, x, y)
            if venue_distance<=300:
                area_objects.append(object_)
            objects[venue_id] = object_
        location_set.append(area_objects)
        print(' .', end='')
    print(' done. Found: {}'.format(len(objects)))
    return objects, location_set


restaurants, location_restaurants = get_objects(latitudes, longitudes, food_category)     
offices, location_offices = get_objects(latitudes, longitudes, coworking_office)  

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done. Found: 0
Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done. Found: 0


In [83]:
map_city = folium.Map(location=city_center, zoom_start=14)

rest_latitudes = []
rest_longitudes = []
rest_x = []
rest_y = []

for res in restaurants.values():
    lat = res[2]; lon = res[3]
    rest_latitudes.append(lat)
    rest_longitudes.append(lon)
    (x, y) = lonlat_to_xy(lon, lat)
    rest_x.append(x)
    rest_y.append(y)
    folium.CircleMarker([lat, lon], radius=3, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_city)

off_latitudes = []
off_longitudes = []
off_x = []
off_y = []
for res in offices.values():
    lat = res[2]; lon = res[3]
    off_latitudes.append(lat)
    off_longitudes.append(lon)    
    (x, y) = lonlat_to_xy(lon, lat)
    off_x.append(x)
    off_y.append(y)
    folium.CircleMarker([lat, lon], radius=3, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_city)
map_city

In [40]:
df_restaurants_locations = pd.DataFrame({'Longitude':rest_longitudes,
                                        'Latitude':rest_latitudes,
                                        'X':rest_x,
                                        'Y':rest_y})

df_offices_locations = pd.DataFrame({'Longitude':off_longitudes,
                                     'Latitude':off_latitudes,
                                     'X':off_x,
                                     'Y':off_y})

In [41]:
def count_objects_nearby(x, y, objectsXY, radius=500):    
    count = 0
    for res in objectsXY:
        d = calc_xy_distance(x, y, res[0], res[1])
        if d<=radius:
            count += 1
    return count

In [42]:
from folium import plugins
from folium.plugins import HeatMap
from sklearn.cluster import KMeans
from random import randrange

number_of_clusters = 57
kmeans = KMeans(n_clusters=number_of_clusters).fit(df_restaurants_locations[['Longitude', 'Latitude']].values)
cluster_centers = kmeans.cluster_centers_
cluster_labels = kmeans.labels_
cluster_colors = []

cluster_centersXY = []
for c in cluster_centers:
    cluster_centersXY.append(lonlat_to_xy(*c))

In [43]:
cluster_off_count = []
cluster_rest_count = []

off_locs = df_offices_locations[['X', 'Y']].values
rest_locs = df_restaurants_locations[['X', 'Y']].values

for c in range(number_of_clusters):
    x,y = lonlat_to_xy(*cluster_centers[c])
    count = count_objects_nearby(x, y, off_locs)
    cluster_off_count.append(count)
    
    count = count_objects_nearby(x, y, rest_locs)
    cluster_rest_count.append(count)  
    
    cluster_colors.append((randrange(0, 255, 1) ,randrange(0, 255, 1), randrange(0, 255, 1)))

In [44]:
df_clusters = pd.DataFrame({'Latitude':cluster_centers[:,0],
                            'Longitude':cluster_centers[:,1],
                            'Restaurants nearby':cluster_rest_count,
                            'Coworkings count':cluster_off_count})

In [45]:
def rgbToHex(r,g,b):
    return '#%02x%02x%02x' % (r, g, b)

In [46]:
map_city = folium.Map(location=berlin_center, zoom_start=13)

for ind, (lat,lon) in enumerate(zip(rest_latitudes, rest_longitudes)):
    colorRGB = rgbToHex(*cluster_colors[cluster_labels[ind]])
    folium.CircleMarker([lat, lon], radius=5, color=colorRGB, fill=True, fill_color='blue', fill_opacity=1).add_to(map_city)

for ind, (lon,lat) in enumerate(cluster_centers):
    colorRGB = rgbToHex(*cluster_colors[ind])
    folium.CircleMarker([lat, lon], radius=7, color=colorRGB, fill=True, fill_color='red', fill_opacity=0.8).add_child(folium.Popup('Coworking offices in 500 meters aroud : {}'.format(cluster_off_count[ind]))).add_to(map_city)
    
for res in offices.values():
    lat = res[2]; lon = res[3]
    folium.CircleMarker([lat, lon], radius=3, color='black', fill=True, fill_color='white', fill_opacity=1).add_to(map_city)
    
map_city

In [51]:
fr = df_clusters.sort_values(by=['Coworkings count', 'Restaurants nearby'], ascending=True).head(5)
fr

Unnamed: 0,Latitude,Longitude,Restaurants nearby,Coworkings count
38,13.432199,52.520643,13,0
35,13.367875,52.522674,5,1
22,13.439424,52.541859,10,1
33,13.440529,52.518708,15,1
48,13.376766,52.536077,15,2


In [52]:
map_city = folium.Map(location=berlin_center, zoom_start=13)

for ind, (lat,lon) in enumerate(zip(rest_latitudes, rest_longitudes)):
    colorRGB = rgbToHex(*cluster_colors[cluster_labels[ind]])
    folium.CircleMarker([lat, lon], radius=5, color=colorRGB, fill=True, fill_color='blue', fill_opacity=1).add_to(map_city)

for ind, (lon,lat) in enumerate(cluster_centers):
    colorRGB = rgbToHex(*cluster_colors[ind])
    folium.CircleMarker([lat, lon], radius=7, color=colorRGB, fill=True, fill_color='red', fill_opacity=0.8).add_child(folium.Popup('Coworking offices in 500 meters aroud : {}'.format(cluster_off_count[ind]))).add_to(map_city)
    
for res in offices.values():
    lat = res[2]; lon = res[3]
    folium.CircleMarker([lat, lon], radius=3, color='black', fill=True, fill_color='white', fill_opacity=1).add_to(map_city)

colors = ['red', 'blue', 'green', 'purple', 'orange']
for ind, item in enumerate(fr.values):
    lat = item[1]; lon = item[0]
    folium.Marker(location=[lat, lon], popup='Timberline Lodge', icon=folium.Icon(color=colors[ind])).add_to(map_city)

map_city

## Conclusion <a name="conclusion"></a> 
Although results demonstrate that city center is overcrowded by Coworkings, we still may chose some places from recommended

For chosing best location of coworking office various parameters like restaurants might be used together with ML algorithms like KNN or others. This is really powerfull tool that are simple to setup and get the results