# Capstone Project - The Battle of the Neighborhoods
### Introduction
St. Petersburg is one of the most beautiful cities. There are many park areas here.
I bought an apartment in St. Petersburg 10 years ago on the outskirts of the city. There were a lot of green areas nearby. But over the past 10 years, the area has become heavily built up. Fewer and fewer places to stay. This situation is not only in my area.
Environmental issues come first. I tried to show the problem centers in this project. Perhaps such mechanisms should be used by the city government for planning construction.
### Data
This notebook is very inspired by the work I found for the example (https://www.kaggle.com/aquadrox/week-4-capstone-the-battle-of-the-neighborhoods). I'll keep the idea of clustering the city by area and then build a heatmap to find the best area.

I will change some data:

Country / city: Russia
Objective: find areas without access to parks
So, I will be crossing data from business days and localizations.

I will be using the following API:

Foursquare API: for finding parks
geopy: reverse geolocation

#### Neighborhood Candidates
Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 14km killometers centered around St. Petersburg city center.

Let's first find the latitude & longitude of St. Petersburg center, using geocoding API.

##### Imports

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


# !pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


# ! pip install folium==0.5.0
import folium # plotting library

#!pip install pyproj
import pyproj
import math
import warnings
warnings.simplefilter("ignore")

##### Basics functions

In [2]:
address = 'Saint-Petersburg, Russia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
spb_center_latitude = location.latitude
spb_center_longitude = location.longitude
print(spb_center_latitude, spb_center_longitude)

59.938732 30.316229


##### Basics functions

In [3]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Saint-Petersburg center longitude={}, latitude={}'.format(spb_center_longitude, spb_center_latitude))
x, y = lonlat_to_xy(spb_center_longitude, spb_center_latitude)
print('Saint-Petersburg center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Saint-Petersburg center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Saint-Petersburg center longitude=30.316229, latitude=59.938732
Saint-Petersburg center UTM X=1350717.6399045559, Y=6743888.192438476
Saint-Petersburg center longitude=30.316229000000003, latitude=59.938731999999995


In [4]:
def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    return address

def get_venues_near_location(lat, lon):
    client_id = 'QJPJYZY52XJX2GU2UULLF2CUZ5CURS2DUZVZZXXJL4WLKBC1' # your Foursquare ID
    client_secret = 'FWEMSZ3ICG2KWFYMSDBHEAODEAEPRCYOARXM5CV5EEKOLGEZ' # your Foursquare Secret
    version = '20180724'
    category = '4bf58dd8d48988d163941735'
    radius=500
    limit=100
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   item['venue']['location']['lat'], 
                   item['venue']['location']['lng'],
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [28]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 25000 # define radius
search_query = '4bf58dd8d48988d163941735'

# url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, spb_center_latitude, spb_center_longitude, VERSION, search_query, radius, LIMIT)
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, spb_center_latitude, spb_center_longitude, VERSION, search_query, radius, LIMIT)
# categoryId
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = pd.json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,location.labeledLatLngs,location.neighborhood,venuePage.id
0,4c0c6b85d64c0f471a47255d,Alexander Garden (Александровский сад),"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",v-1622712071,False,Адмиралтейский просп.,59.936598,30.309554,441,190000.0,RU,Санкт-Петербург,Санкт-Петербург,Россия,"[Адмиралтейский просп., 190000, Санкт-Петербур...",,,,
1,4c162f2d82a3c9b666dbfff8,Mikhailovsky Garden (Михайловский сад),"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",v-1622712071,False,Садовая ул.,59.939925,30.332693,927,191186.0,RU,Санкт-Петербург,Санкт-Петербург,Россия,"[Садовая ул. (наб. реки Мойки), 191186, Санкт-...",наб. реки Мойки,,,
2,55053a5c498e843f63882aea,Ближний парк,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",v-1622712071,False,,60.07557,29.967792,24657,,RU,,,Россия,[Россия],,"[{'label': 'display', 'lat': 60.07556977755985...",,
3,4c8b47d73dc2a1cd5cefb432,Alexander Park (Александровский парк),"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",v-1622712071,False,Кронверкский просп.,59.955977,30.319669,1929,190000.0,RU,Санкт-Петербург,Санкт-Петербург,Россия,"[Кронверкский просп. (Кронверская наб.), 19000...",Кронверская наб.,,,
4,4beff6b3c8d920a177439430,Saint Petersburg's 300th Anniversary Park (Пар...,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",v-1622712071,False,"Приморский просп., 74",59.982995,30.199348,8167,,RU,Санкт-Петербург,Санкт-Петербург,Россия,"[Приморский просп., 74, Санкт-Петербург, Россия]",,"[{'label': 'display', 'lat': 59.982995, 'lng':...",,


In [69]:
dataframe.shape

(50, 19)

In [67]:
venues_map = folium.Map(location=[spb_center_latitude, spb_center_longitude], zoom_start=11) 

# add the parkss as blue circle markers
for lat, lng in zip(dataframe['location.lat'], dataframe['location.lng']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup= 'park',
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

##### City partitionning
Now let's create a grid of area candidates, equaly spaced, centered around city center and within ~15km. Our neighborhoods will be defined as circular areas with a radius of 500 meters, so our neighborhood centers will be 1000 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

Let's create a hexagonal grid of cells: we offset every other row, and adjust vertical row spacing so that every cell center is equally distant from all it's neighbors.

Let's visualize the data we have so far: candidate neighborhood centers:

In [5]:
spb_center_x, spb_center_y = lonlat_to_xy(spb_center_longitude, spb_center_latitude) # City center in Cartesian coordinates
nb_k = 1000
radius = 500
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = spb_center_x - radius * 15
x_step = radius*2
y_min = spb_center_y - radius * 2 - (int(nb_k/k)*k*radius*2 - radius*20)/2
y_step = radius*2 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
df = pd.DataFrame (columns = ['N', 'lat', 'lng', 'id', 'name', 'vlat','vlng', 'address', 'distance'])
n = 0

for i in range(0, int(nb_k/k)):
    y = y_min + i * y_step
    x_offset = radius if i%2==0 else 0
    for j in range(0, nb_k):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(spb_center_x, spb_center_y, x, y)
        if (distance_from_center <= 14001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)
            n = n + 1
            
            v = get_venues_near_location(lat, lon)
            for item in v:
                df = df.append({'N': n, 'lat': lat, 'lng': lon, 'id': item[0], 'name': item[1], 'vlat': item[2], 'vlng': item[3], 'address': item[4], 'distance': item[5]}, ignore_index=True)
            if not v:
                df = df.append({'N': n, 'lat': lat, 'lng': lon, 'id': None, 'name': None, 'vlat': None, 'vlng': None, 'address': None, 'distance': None}, ignore_index=True)
            
map_spb = folium.Map(location=[spb_center_latitude,spb_center_longitude], zoom_start=10)

for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=radius, color='blue', fill=False).add_to(map_spb)
map_spb

### Methodology 
Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on parks in each neighborhood.

We're interested in venues in 'Park' category.

For each neighbor, we request data from the Foursquare API to find nearby parks. We save the coordinates of the point of interest and the found parks to the dataset.

In [7]:
df.to_excel('parks.xlsx')

In [6]:
df

Unnamed: 0,N,lat,lng,id,name,vlat,vlng,address,distance
0,1.0,59.830495,30.201723,,,,,,
1,2.0,59.828458,30.218934,,,,,,
2,3.0,59.826419,30.236143,,,,,,
3,4.0,59.824378,30.253350,,,,,,
4,5.0,59.822335,30.270554,,,,,,
...,...,...,...,...,...,...,...,...,...
810,587.0,60.057918,30.354479,,,,,,
811,588.0,60.055857,30.371796,,,,,,
812,589.0,60.053793,30.389111,,,,,,
813,590.0,60.051727,30.406423,,,,,,
