# Segmenting And Clustering Neighbourhoods Project

_This notebooks is used for the project within the Courser course 'Applied Data Science Capstone'_

## The Project Goals

We are going to develop an application that will help families with kids to plan their day-off. <br/>
In big cities, such as Moscow, there are thousands of venues that could be interesting for kids. But unfortunately their parents do not know about them. In fact, there is a very limited list of places affordable for kids, mostly placed in the center of the city, that are well known to everybody. They are nice but always overcrowded and it could take more than an hour to get there through traffic jams. And parents encounter more problems when they are going to plan several activities for the same day. For example, where they could eat with kids, or what they will do if their plans, weather or mud  suddenly change.<br/>
It would be nice to have an application that will show on the screen the clustered venues affordable for families with kids. That clusters could include venues for education and entertainment and highlight those that have been highly ranked by Foursquare users. And of course, it is very useful to see the clusters that are close to your living place, probably in your neighbourhood.

## How To Determine Neighbourhoods in Moscow

There are several ways to determine neighbourhoods.<br/>
The first one is by the municipalities. There are 12 administrative districts in Moscow that include 146 municipalities. Their boundaries can be downloaded at http://gis-lab.info/qa/moscow-atd.html in ESRI Shape, GeoJSON, CSV+VRT or KML formats.<br/>
The second way is by the post offices locations. There are 13 post regions in Moscow that include 524 post offices. The geo locations of the post offices can be downloaded at http://hubofdata.ru/dataset/ruspost-msk in JSON format.

The first approach (by municipalities) seems to be more effective since it can provide wider point of view at neighbourhoods.<br/>
At the same time using postoffices locations as neighbourhoods is easier since most venues have postcodes (except such as parks, playgrounds etc). For those that have not, it can be determined by the closest venues.

Почта России — это не орган власти, но госпредприятие часто критикуемое в виду качества работы. Данные по отделениям у них есть, в частности они публикуют их на нескольких своих сайтах основной из которых — это их сайт.

Мы вытащили данные по их отделениям по Москве с информацией по координатам их нахождения, адресам, индексам, времени работы и так далее. Эти данные никак не удалось упаковать в CSV простым образом, так что они доступны цельным JSON файлом http://hubofdata.ru/dataset/ruspost-msk

http://gis-lab.info/qa/moscow-atd.html

http://geointellect.ru/ru-ru/Геоданные/Список-геоданных-по-Москве

Let's download the libraries we'll need

In [2]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you run the cell for the first time or in case of error message "Name '...' is not defined"
import folium # map rendering library

!conda install -c conda-forge geopy --yes # uncomment this line if you run the cell for the first time in case of error message "Name '...' is not defined"
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import pandas as pd
import numpy as np

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

Let's download data for Moscow municipalities. The link is http://gis-lab.info/data/mos-adm/mo-csv.zip<br/>
Since data are stored ind zip file we will need to unzip it first.

In [3]:
# importing required modules 
from zipfile import ZipFile 

# downloading zip file
!wget -q -O 'mun_coordinates.zip' http://gis-lab.info/data/mos-adm/mo-csv.zip

# specifying the zip file name 
file_name = "mun_coordinates.zip"
  
# opening the zip file in READ mode 
with ZipFile(file_name, 'r') as zip: 
    # printing all the contents of the zip file 
    zip.printdir() 
  
    # extracting all the files 
    print('Extracting all the files now...') 
    zip.extractall() 
    print('Done!') 
    
print('Data downloaded!')
mun_coord = pd.read_csv('mo.csv')

File Name                                             Modified             Size
mo.csv                                         2014-06-14 15:29:14       892686
mo.csvt                                        2014-06-14 15:29:14           85
mo.prj                                         2014-06-13 20:27:22          143
mo.qml                                         2014-03-23 21:21:22        10368
mo.vrt                                         2014-03-29 23:06:40          295
Extracting all the files now...
Done!
Data downloaded!


Let's have a look at the data downloaded

In [4]:
mun_coord.head()

Unnamed: 0,WKT,NAME,OKATO,OKTMO,NAME_AO,OKATO_AO,ABBREV_AO,TYPE_MO
0,"MULTIPOLYGON (((36.8031012 55.4408329,36.80319...",Киевский,45298555,45945000,Троицкий,45298000,Троицкий,Поселение
1,"POLYGON ((37.4276499 55.7482092,37.4284863 55....",Филёвский Парк,45268595,45328000,Западный,45268000,ЗАО,Муниципальный округ
2,"POLYGON ((36.8035692 55.4516224,36.8045117 55....",Новофёдоровское,45298567,45954000,Троицкий,45298000,Троицкий,Поселение
3,"POLYGON ((36.9372397 55.2413907,36.9372604 55....",Роговское,45298575,45956000,Троицкий,45298000,Троицкий,Поселение
4,"POLYGON ((37.4395575 55.6273129,37.4401803 55....","""Мосрентген""",45297568,45953000,Новомосковский,45297000,Новомосковский,Поселение


It turned out that some columns have information in cyrillic. To make data more convinient for review within Coursera project we have to transliterate cyrillic symbols ito English ones the exact rules providing similar articulation.<br/>
The cell below makes transliteration by replacing cyrillic symbols for the whole 'mo.csv' file. 

In [5]:
import os
import fileinput

def latinizator(letter, dic):
    for i, j in dic.items():
        letter = letter.replace(i, j)
    return letter

legend = {
'а':'a',
'б':'b',
'в':'v',
'г':'g',
'д':'d',
'е':'e',
'ё':'yo',
'ж':'zh',
'з':'z',
'и':'i',
'й':'y',
'к':'k',
'л':'l',
'м':'m',
'н':'n',
'о':'o',
'п':'p',
'р':'r',
'с':'s',
'т':'t',
'у':'u',
'ф':'f',
'х':'h',
'ц':'ts',
'ч':'ch',
'ш':'sh',
'щ':'shch',
'ъ':'y',
'ы':'y',
'ь':"'",
'э':'e',
'ю':'yu',
'я':'ya',

'А':'A',
'Б':'B',
'В':'V',
'Г':'G',
'Д':'D',
'Е':'E',
'Ё':'Yo',
'Ж':'Zh',
'З':'Z',
'И':'I',
'Й':'Y',
'К':'K',
'Л':'L',
'М':'M',
'Н':'N',
'О':'O',
'П':'P',
'Р':'R',
'С':'S',
'Т':'T',
'У':'U',
'Ф':'F',
'Х':'H',
'Ц':'Ts',
'Ч':'Ch',
'Ш':'Sh',
'Щ':'Shch',
'Ъ':'Y',
'Ы':'Y',
'Ь':"'",
'Э':'E',
'Ю':'Yu',
'Я':'Ya',
}

with fileinput.FileInput('mo.csv', inplace=True, backup='.bak') as f:
    for line in f:
        print(latinizator(line, legend), end='')

Let's check the result.

In [6]:
mun_coord = pd.read_csv('mo.csv')
mun_coord.head()

Unnamed: 0,WKT,NAME,OKATO,OKTMO,NAME_AO,OKATO_AO,ABBREV_AO,TYPE_MO
0,"MULTIPOLYGON (((36.8031012 55.4408329,36.80319...",Kievskiy,45298555,45945000,Troitskiy,45298000,Troitskiy,Poselenie
1,"POLYGON ((37.4276499 55.7482092,37.4284863 55....",Filyovskiy Park,45268595,45328000,Zapadnyy,45268000,ZAO,Munitsipal'nyy okrug
2,"POLYGON ((36.8035692 55.4516224,36.8045117 55....",Novofyodorovskoe,45298567,45954000,Troitskiy,45298000,Troitskiy,Poselenie
3,"POLYGON ((36.9372397 55.2413907,36.9372604 55....",Rogovskoe,45298575,45956000,Troitskiy,45298000,Troitskiy,Poselenie
4,"POLYGON ((37.4395575 55.6273129,37.4401803 55....","""Mosrentgen""",45297568,45953000,Novomoskovskiy,45297000,Novomoskovskiy,Poselenie


In [38]:
mun_coord.iat[0,0]

'MULTIPOLYGON (((36.8031012 55.4408329,36.8031903 55.4416007,36.8035692 55.4516224,36.812528 55.4513994,36.8274471 55.4513398,36.8333688 55.4513764,36.8338034 55.4516439,36.8345763 55.4512558,36.8348594 55.4514247,36.8349932 55.4514931,36.8358013 55.4511173,36.8360591 55.4511632,36.8461554 55.4510412,36.8602864 55.4508946,36.8649423 55.4506415,36.8608407 55.4492656,36.8582649 55.4478456,36.8582898 55.447659,36.8600008 55.4466656,36.8611076 55.4473042,36.8622805 55.4467128,36.8638768 55.4472018,36.8694408 55.4489425,36.8724625 55.4502245,36.8749845 55.4513839,36.8773319 55.453135,36.8804877 55.4548177,36.8822676 55.455771,36.8833225 55.4551478,36.8837761 55.4554817,36.8846681 55.4551548,36.8855977 55.4548773,36.8863281 55.4552539,36.8952465 55.450465,36.8875942 55.4460139,36.8818703 55.4426547,36.8912361 55.437452,36.8915818 55.4377769,36.893413 55.4368788,36.8948019 55.4377928,36.8963369 55.4389852,36.8968637 55.4392912,36.8968237 55.4387451,36.8964101 55.4380589,36.8959156 55.4369562,

Now we are going to go through every polygon to find the maximum and minimum for longitude and latitude. Since we got data from .csv file polygon is a string.

In [47]:
# Python code to convert string to list with ',' as separator
def Convert(string): 
    li = list(string.split(",")) 
    return li 
  
# Driver code     
poly = Convert(mun_coord.iat[0,0])
for i in range(1,len(poly)):
    poly[i]
    

In [27]:
address = 'Moscow'

geolocator = Nominatim(user_agent="ny_explorer")
tor_location = geolocator.geocode(address)

tor_latitude = tor_location.latitude
tor_longitude =tor_location.longitude

print('The geograpical coordinate of Moscow are {}, {}.'.format(tor_latitude, tor_longitude))

The geograpical coordinate of Moscow are 55.7504461, 37.6174943.


In [28]:
# create map of Moscow using latitude and longitude values
map_moscow = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(data2['Latitude'], data2['Longitude'], data2['Borough'], data2['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

NameError: name 'data2' is not defined

In [31]:
CLIENT_ID = 'VC5VAI2CNSBEQ1BIWEREK0RDJX2VK4WVRYEXQTSLB4Q4XXFB' #  Foursquare ID
CLIENT_SECRET = 'HRGWOCZXY555UGMEUD4APX5LYXYTREVQNOLEQZJT1YFRW3U3' #  Foursquare Secret
VERSION = '20180605' # Foursquare API version. In fact, I didn't find the information about it on Foursquare site

print('The credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

The credentails:
CLIENT_ID: VC5VAI2CNSBEQ1BIWEREK0RDJX2VK4WVRYEXQTSLB4Q4XXFB
CLIENT_SECRET:HRGWOCZXY555UGMEUD4APX5LYXYTREVQNOLEQZJT1YFRW3U3


Now let us clarify what venue categories we are interested in. To do so we can use the link https://developer.foursquare.com/docs/resources/categories that includes categories id.<br/>
In fact, we are looking for:

Venue type | Catergory Id        
---|---
Parks | 4bf58dd8d48988d163941735
Entertaiment centers | 4bf58dd8d48988d1e1931735
Amusement parks | 4bf58dd8d48988d182941735
Playgrounds | 4bf58dd8d48988d1e7941735
Museums | 4bf58dd8d48988d181941735
Cinema | 4bf58dd8d48988d17f941735
Kids cafe | 4bf58dd8d48988d1d0941735

In [34]:
neighbourhood_latitude = tor_latitude
neighbourhood_longitude = tor_longitude
cat_id = '4bf58dd8d48988d163941735,4bf58dd8d48988d1e1931735,4bf58dd8d48988d182941735,4bf58dd8d48988d1e7941735,4bf58dd8d48988d181941735,4bf58dd8d48988d17f941735,4bf58dd8d48988d1d0941735' # id for Parks catergories

In [35]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    cat_id, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=VC5VAI2CNSBEQ1BIWEREK0RDJX2VK4WVRYEXQTSLB4Q4XXFB&client_secret=HRGWOCZXY555UGMEUD4APX5LYXYTREVQNOLEQZJT1YFRW3U3&v=20180605&ll=55.7504461,37.6174943&categoryId=4bf58dd8d48988d163941735,4bf58dd8d48988d1e1931735,4bf58dd8d48988d182941735,4bf58dd8d48988d1e7941735,4bf58dd8d48988d181941735,4bf58dd8d48988d17f941735,4bf58dd8d48988d1d0941735&radius=1000&limit=100'

In [36]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d891013bf7dde0038e175c9'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Khamovniki',
  'headerFullLocation': 'Khamovniki, Moscow',
  'headerLocationGranularity': 'neighborhood',
  'query': 'park',
  'totalResults': 59,
  'suggestedBounds': {'ne': {'lat': 55.75944610900001,
    'lng': 37.63345597204126},
   'sw': {'lat': 55.741446090999986, 'lng': 37.60153262795873}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bf7fa734a67c928e27624cf',
       'name': 'Aleksandrovskiy Garden (Александровский сад)',
       'location': {'address': 'Манежная ул.',
        'lat': 55.75270677200052,
        'lng': 37.61373281478882,
        'labeledLatLngs': 

In [100]:
#res = requests.get('https://api.foursquare.com/v2/venues/54a05990498e03eea5fcc08c').json()
#res

SyntaxError: invalid syntax (<ipython-input-100-0eff7c77efda>, line 1)

In [92]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [93]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'reasons.items','venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.iat[1,3]

[{'summary': 'This spot is popular',
  'type': 'general',
  'reasonName': 'globalInteractionReason'}]

In [94]:
nearby_venues.shape

(83, 6)

In [95]:
nearby_venues

Unnamed: 0,id,name,categories,items,lat,lng
0,53fb052c498eacc7b13ad60d,КидБург,Arcade,"[{'summary': 'This spot is popular', 'type': '...",55.760178,37.624539
1,551bec5f498e82cfbfe296f6,Музей Детства в ЦДМ,Museum,"[{'summary': 'This spot is popular', 'type': '...",55.759787,37.624449
2,5496fd26498e9c88a826633e,Клаустрофобия,Arcade,"[{'summary': 'This spot is popular', 'type': '...",55.764167,37.628912
3,5623d6e5498efb95635afbf3,Формула кино,Multiplex,"[{'summary': 'This spot is popular', 'type': '...",55.760263,37.624862
4,551320ca498e5fecfbb1c3d7,Шоу динозавров,Theme Park,"[{'summary': 'This spot is popular', 'type': '...",55.760277,37.625998
5,4f76d997e4b0d693688e74cd,Музей экслибриса и миниатюрной книги,Museum,"[{'summary': 'This spot is popular', 'type': '...",55.760761,37.623263
6,53747d0d498e505d3e52b1fc,Café Vendome & Chocolate Boutique,Dessert Shop,"[{'summary': 'This spot is popular', 'type': '...",55.758951,37.625089
7,54a05990498e03eea5fcc08c,Клаустрофобия,Arcade,"[{'summary': 'This spot is popular', 'type': '...",55.758790,37.636062
8,57a75275498eae176c62c14f,Нереальное место,Arcade,"[{'summary': 'This spot is popular', 'type': '...",55.757385,37.633633
9,54a80c08498e18f46e788e28,Loft Cinema,Movie Theater,"[{'summary': 'This spot is popular', 'type': '...",55.764758,37.631597
