#The Battle of Neighborhoods
##Finding similar neighborhoods in Kyiv.

##Introduction
The rental market is very mobile. This question becomes especially relevant during a pandemic:
* move closer to work, so as not to use transport, 
* on the contrary, to move further from the center in connection with the transition to a given job,
* find a more comfortable and larger apartment because they began to spend more time at home
* change apartment to house.

For all these questions, it is very convenient to be able to compare different neighborhoods in order to choose one that is similar in infrastructure to the one where you lived before, but in a different part of the city.

##Data
We will consider the solution of the problem using the example of the city of Kyiv, Ukraine.
To solve the described problem, we need the following data:
* geodata for dividing the city into areas (opencagedata.com)
* venues of each region (data from Foursquare)
* information on the location of metro stations (Wikipedia).
* information on the index of real estate prices (data of real estate rental portals).

In [1]:
import pandas as pd
import numpy as np
import math
import requests
import json
import geopy.distance

In [2]:
import matplotlib.pyplot as plt 
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

###Get areas


We will use the coordinates of Saint Sophia Cathedral in Kyiv as the city center.

In [3]:
latitude = 50.452780
longitude = 30.514440

We will consider a zone with a radius of about 10 km around the center of Kiev. Divide it into sectors with a diameter of 1 km and calculate the coordinates of the center of each sector.

In [4]:
delta_lat = 360/40075
delta_lon = 360/(math.cos(math.radians(latitude))*40075)

latitudes=[]
longitudes=[]

for i in range(-10, 11):
  for j in range(-10, 11):
    latitudes.append(latitude + delta_lat*i)
    longitudes.append(longitude + delta_lon*j)

In [5]:
Kyiv_coords = pd.DataFrame(list(zip(latitudes, longitudes)), columns =['latitude', 'longitude'])

Kyiv_coords.shape

(441, 2)

We got 441 areas. We need to identify them.
For each region, we will define its address and district. To do this, we will use the API https://opencagedata.com/. In addition, some areas lie outside the city, such areas need to be removed from the data.

In [6]:
Token = '28ff77aa2ad0408d86dc6f072f1c3d49'

In [7]:
def getAdr(latitudes, longitudes):
    
    addresses_list=[]
    for lat, lng in zip(latitudes, longitudes):
                    
        # create the API request URL
        url='https://api.opencagedata.com/geocode/v1/json?q={},{}&language=en&key={}'.format(
            lat, 
            lng, 
            Token)
            
        # make the GET request
        results = requests.get(url).json()['results'][0]
        #print(results)

        # if we are outside of Kiev, there will be no 'city' parameter
        if 'city' in results['components']:

          if 'road' in results['components']:
            address = results['components']['road']
            if 'house_number' in results['components']:
              address = address + ', ' + results['components']['house_number']
          else:
            address = results['formatted']

          addresses_list.append([
              address,
              results['components']['borough'],
              results['components']['city'],
              lat,
              lng])

    addresses = pd.DataFrame(addresses_list, columns=['Address', 'Neighborhood', 'City', 'Latitude', 'Longitude'])
    
    return(addresses)

In [8]:
Kyiv_addresses = getAdr(latitudes=Kyiv_coords['latitude'], longitudes=Kyiv_coords['longitude'])

In [11]:
Kyiv_addresses[20:30]

Unnamed: 0,Address,Neighborhood,City,Latitude,Longitude
20,Спуск Ельдорадо,Holosiivskyi District,Kyiv,50.371932,30.500331
21,"Myshalivka, Kyiv, 03041, Ukraine",Holosiivskyi District,Kyiv,50.371932,30.51444
22,"Akademika Kashchenka Street, 33",Holosiivskyi District,Kyiv,50.371932,30.528549
23,"Nauky Avenue, 88а",Holosiivskyi District,Kyiv,50.371932,30.542657
24,Halerna Street,Holosiivskyi District,Kyiv,50.371932,30.556766
25,Halerna Street,Holosiivskyi District,Kyiv,50.371932,30.570874
26,75th Sadova Street,Darnytskyi district,Kyiv,50.371932,30.584983
27,"Kyiv, 02132, Ukraine",Darnytskyi district,Kyiv,50.371932,30.599092
28,71st Sadova Street,Darnytskyi district,Kyiv,50.371932,30.6132
29,6th Pivnichno-Ozernyi Lane,Darnytskyi district,Kyiv,50.371932,30.627309


In [12]:
Kyiv_addresses.to_csv('Kyiv_addresses.csv')

###Distance to city center
One of the important criteria when choosing a place of residence is the distance to the city center. Let's calculate ago for each of the area.
The latitude and longitude of the city center are already defined above.

In [14]:
dist_list = []
for i, address in Kyiv_addresses.iterrows():
  lat = address['Latitude']
  lon = address['Longitude']
  dist_list.append(geopy.distance.vincenty((lat,lon), (latitude, longitude)).km)

In [None]:
Kyiv_addresses['Distance to center'] = dist_list

In [18]:
Kyiv_addresses[140:145]

Unnamed: 0,Address,Neighborhood,City,Latitude,Longitude,Distance to center
140,"Pavla Tychyny Avenue, 14б",Dniprovskyi district,Kyiv,50.425831,30.599092,6.719463
141,Berezniakivska Street,Dniprovskyi district,Kyiv,50.425831,30.6132,7.629584
142,Darnytske Road,Darnytskyi district,Kyiv,50.425831,30.627309,8.560319
143,"Kharkivske Road, 21/3",Darnytskyi district,Kyiv,50.425831,30.641418,9.505614
144,"Novodarnytska Street, 15",Darnytskyi district,Kyiv,50.425831,30.655526,10.461523


###Distance to the nearest metro station
Let's take a list of metro stations with coordinates of their location and determine the nearest metro station to each of the regions.

In [19]:
Kyiv_metro = pd.DataFrame([['Akademmistechko', 50.464861, 30.355083],
                          ['Zhytomyrska', 50.456175, 30.365628],
                          ['Sviatoshyn', 50.457903, 30.390614],
                          ['Nyvky', 50.458653, 30.404042],
                          ['Beresteiska', 50.458333, 30.420833],
                          ['Shuliavska', 50.455, 30.445278], 
                          ['Politekhnichnyi Instytut', 50.450833, 30.466111], 
                          ['Vokzalna', 50.441667, 30.4875], 
                          ['Universytet', 50.444167, 30.505833], 
                          ['Teatralna', 50.444722, 30.518056], 
                          ['Khreshchatyk', 50.447222, 30.523333], 
                          ['Arsenalna', 50.443889, 30.545556], 
                          ['Hidropark', 50.445556, 30.576944], 
                          ['Livoberezhna', 50.451389, 30.598889], 
                          ['Darnytsia', 50.455556, 30.613056], 
                          ['Chernihivska', 50.459722, 30.630833], 
                          ['Lisova', 50.464639, 30.645556], 
                          ['Heroiv Dnipra', 50.5225, 30.498611], 
                          ['Minska', 50.512222, 30.498333], 
                          ['Obolon', 50.501111, 30.498056], 
                          ['Pochaina', 50.486667, 30.497778], 
                          ['Tarasa Shevchenka', 50.473056, 30.505], 
                          ['Kontraktova Ploshcha', 50.465, 30.516667], 
                          ['Poshtova Ploshcha', 50.459167, 30.525556], 
                          ['Maidan Nezalezhnosti', 50.45, 30.524167], 
                          ['Ploshcha Lva Tolstoho', 50.439444, 30.516944],
                          ['Olimpiiska', 50.431944, 30.516389],
                          ['Palats Ukrayina', 50.420833, 30.520833], 
                          ['Lybidska', 50.420833, 30.520833], 
                          ['Demiivska', 50.404792, 30.516833], 
                          ['Holosiivska', 50.3975, 30.508333], 
                          ['Vasylkivska', 50.393333, 30.488056], 
                          ['Vystavkovyi Tsentr', 50.382581, 30.477536], 
                          ['Ipodrom', 50.376556, 30.469117], 
                          ['Teremky', 50.367044, 30.454203], 
                          ['Syrets', 50.476111, 30.430556], 
                          ['Dorohozhychi', 50.473611, 30.449444], 
                          ['Lukianivska', 50.462222, 30.481667], 
                          ['Zoloti Vorota', 50.448333, 30.513333], 
                          ['Palats Sportu', 50.438056, 30.520833], 
                          ['Klovska', 50.436944, 30.531667], 
                          ['Pecherska', 50.4275, 30.538889], 
                          ['Druzhby Narodiv', 50.418056, 30.545], 
                          ['Vydubychi', 50.402222, 30.560833], 
                          ['Slavutych', 50.394167, 30.604167], 
                          ['Osokorky', 50.395278, 30.616111], 
                          ['Pozniaky', 50.398056, 30.633333], 
                          ['Kharkivska', 50.400833, 30.652222], 
                          ['Vyrlytsia', 50.403333, 30.666111], 
                          ['Boryspilska', 50.403333, 30.682778], 
                          ['Chervony Khutir', 50.408889, 30.694444]], columns=['Metro Station', 'Latitude', 'Longitude'])

In [20]:
station_list = []
min_dist = []
for i, address in Kyiv_addresses.iterrows():
  lat = address['Latitude']
  lon = address['Longitude']
  mind = geopy.distance.vincenty((lat,lon), (Kyiv_metro.at[0, 'Latitude'],Kyiv_metro.at[0, 'Longitude'])).km
  minst = Kyiv_metro.at[0, 'Metro Station']
  for j, station in Kyiv_metro.iterrows():
    lat_st = station['Latitude']
    lon_st = station['Longitude']
    
    #distance in kilometers
    dist = geopy.distance.vincenty((lat,lon), (lat_st,lon_st)).km
    if dist < mind:
      mind = dist
      minst = station['Metro Station']
  min_dist.append(mind)
  station_list.append(minst)

Let's add the received lists to our addresses.

In [21]:
Kyiv_addresses['Metro Station'] = station_list
Kyiv_addresses['Metro_dist'] = min_dist

In [22]:
Kyiv_addresses[170:175]

Unnamed: 0,Address,Neighborhood,City,Latitude,Longitude,Distance to center,Metro Station,Metro_dist
170,Аттили Могильного вулиця,Solomianskyi district,Kyiv,50.443797,30.429788,6.09502,Shuliavska,1.662349
171,"Industrialnyi Lane, 2",Solomianskyi district,Kyiv,50.443797,30.443897,5.10913,Shuliavska,1.250068
172,Dashavska Street,Solomianskyi district,Kyiv,50.443797,30.458006,4.131044,Politekhnichnyi Instytut,0.971622
173,"Umanska Street, 6Ф",Solomianskyi district,Kyiv,50.443797,30.472114,3.167999,Politekhnichnyi Instytut,0.891295
174,"Starovokzalna Street, 25",Shevchenkivskyi district,Kyiv,50.443797,30.486223,2.239482,Vokzalna,0.253697


###Price index

Let's take the cost of one square meter of housing in Kiev, depending on the metro station, normalize this data and add it to the list of addresses.
Cost per square meter https://stroyobzor.ua/news/ceny-na-kvartiry-v-stolice-v-zavisimosti-ot-stancii-metro-oktyabr-2020.html

In [26]:
Kyiv_cost_per_m = pd.DataFrame([['Akademmistechko', 20921],
                          ['Zhytomyrska', 29428],
                          ['Sviatoshyn', 27222],
                          ['Nyvky', 31530],
                          ['Beresteiska', 26754],
                          ['Shuliavska', 23226], 
                          ['Politekhnichnyi Instytut', 45390], 
                          ['Vokzalna', 37846], 
                          ['Universytet', 57417], 
                          ['Teatralna', 57417], 
                          ['Khreshchatyk', 148188], 
                          ['Arsenalna', 74167], 
                          ['Hidropark', 49612], 
                          ['Livoberezhna', 25729], 
                          ['Darnytsia', 25331], 
                          ['Chernihivska', 23980], 
                          ['Lisova', 25787], 
                          ['Heroiv Dnipra', 25793], 
                          ['Minska', 51025], 
                          ['Obolon', 34048], 
                          ['Pochaina', 24439], 
                          ['Tarasa Shevchenka', 44025], 
                          ['Kontraktova Ploshcha', 39633], 
                          ['Poshtova Ploshcha', 32690], 
                          ['Maidan Nezalezhnosti', 148188], 
                          ['Ploshcha Lva Tolstoho', 70000],
                          ['Olimpiiska', 65312],
                          ['Palats Ukrayina', 51590], 
                          ['Lybidska', 48538], 
                          ['Demiivska', 26252], 
                          ['Holosiivska', 35823], 
                          ['Vasylkivska', 25151], 
                          ['Vystavkovyi Tsentr', 32000], 
                          ['Ipodrom', 27931], 
                          ['Teremky', 24028], 
                          ['Syrets', 23848], 
                          ['Dorohozhychi', 31403], 
                          ['Lukianivska', 36888], 
                          ['Zoloti Vorota', 57417], 
                          ['Palats Sportu', 70000], 
                          ['Klovska', 105330], 
                          ['Pecherska', 63312], 
                          ['Druzhby Narodiv', 56718], 
                          ['Vydubychi', 23540], 
                          ['Slavutych', 26057], 
                          ['Osokorky', 28865], 
                          ['Pozniaky', 22667], 
                          ['Kharkivska', 24467], 
                          ['Vyrlytsia', 23957], 
                          ['Boryspilska', 22474], 
                          ['Chervony Khutir', 24745]], columns=['Metro Station', 'Cost per square meter'])

In [27]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
Cost = Kyiv_cost_per_m['Cost per square meter'].values.reshape(-1,1)
Cost = sc.fit_transform(Cost, )
Kyiv_cost_per_m['Cost per square meter'] = Cost

Kyiv_cost_per_m.head()

Unnamed: 0,Metro Station,Cost per square meter
0,Akademmistechko,-0.789293
1,Zhytomyrska,-0.481102
2,Sviatoshyn,-0.561021
3,Nyvky,-0.40495
4,Beresteiska,-0.577975


In [38]:
Kyiv_addresses=Kyiv_addresses.merge(Kyiv_cost_per_m)

In [40]:
Kyiv_addresses.rename(columns={'Cost per square meter': 'Price index'}, inplace=True)
Kyiv_addresses[125:130]

Unnamed: 0,Address,Neighborhood,City,Latitude,Longitude,Distance to center,Metro Station,Metro_dist,Price index
125,Druzhby Narodiv Boulevard,Pecherskyi district,Kyiv,50.416847,30.542657,4.471629,Druzhby Narodiv,0.214009,0.507561
126,"Zvirynets, Kyiv, 01014, Ukraine",Pecherskyi district,Kyiv,50.416847,30.556766,5.001925,Druzhby Narodiv,0.846974,0.507561
127,Staronavodnytska Street,Pecherskyi district,Kyiv,50.425831,30.556766,4.245923,Druzhby Narodiv,1.202941,0.507561
128,Naberezhne Road,Pecherskyi district,Kyiv,50.425831,30.570874,5.00598,Druzhby Narodiv,2.032021,0.507561
129,Kachalova Street,Sviatoshynskyi district,Kyiv,50.416847,30.41568,8.075243,Beresteiska,4.629257,-0.577975


In [41]:
# create map of Kyiv using latitude and longitude values
map_Kyiv = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lon in zip(Kyiv_addresses['Latitude'], Kyiv_addresses['Longitude']):
    folium.Circle([lat, lon], radius=500, color='yellow', fill=False).add_to(map_Kyiv)   
map_Kyiv

###Venues of each region (from Foursquare)

In [42]:
CLIENT_ID = 'WWKCYIA5GJWJFZUPW5NRECC0RZBSUTGAM5SKTH0GU1T315DV'
CLIENT_SECRET = 'AZGYWNMIYLUA1EZ1WYFAOMITCV3IWRXSMXBWY4L4LXWKAKS5' 
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WWKCYIA5GJWJFZUPW5NRECC0RZBSUTGAM5SKTH0GU1T315DV
CLIENT_SECRET:AZGYWNMIYLUA1EZ1WYFAOMITCV3IWRXSMXBWY4L4LXWKAKS5


In order not to miss the zones between the circles, increase the radius by the root of two. We will delete any possible duplicates after creating a directory of establishments.

In [43]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
RADIUS = 500*math.sqrt(2) # define radius

In [44]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [46]:
def getNearbyVenues(names, latitudes, longitudes):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            RADIUS, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['id'], 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Address', 
                  'Address Latitude', 
                  'Address Longitude', 
                  'Venue ID',
                  'Venue',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [47]:
Kyiv_venues = getNearbyVenues(names=Kyiv_addresses['Address'],
                                   latitudes=Kyiv_addresses['Latitude'],
                                   longitudes=Kyiv_addresses['Longitude']
                                  )

Delete duplicates by Venue ID

In [49]:
Kyiv_venues.drop_duplicates(subset=['Venue ID'], inplace=True)
Kyiv_venues.head()

Unnamed: 0,Address,Address Latitude,Address Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Akademika Hlushkova Avenue,50.362948,30.443897,4fa0315ae4b0dc03c72a1e1d,Урочище «Теремки» (ліс Шмальгаузена),50.363313,30.451183,Forest
1,Akademika Hlushkova Avenue,50.362948,30.443897,51f297c7498ecbcd7ce958ab,"Гриль-паб ""КуСай""",50.361583,30.444024,BBQ Joint
2,Akademika Hlushkova Avenue,50.362948,30.443897,5058081de4b0db00fc0871f0,Зупинка «Кібернетичний центр»,50.361706,30.444877,Bus Stop
3,Akademika Hlushkova Avenue,50.362948,30.443897,520de49311d2c41539dd317a,Restorio,50.361631,30.446598,Restaurant
4,Akademika Hlushkova Avenue,50.362948,30.443897,4f11ab8fe4b0a6ade07e42be,Полуничка,50.360584,30.444413,Café


In [50]:
Kyiv_venues.to_csv('Kyiv_venues.csv')