# Krasnodar neighbourhoods

## Introduction

Living in Krasnodar, Russia, I want to explore neighborhoods of this city and compare which one is more comfortable for life.

I will use my new data science skills to generate a few most promissing neighborhoods based on this criteria.

## Data

Based on definition of my problem, factors that will influence my decission are:
* number of neighborhoods
* coordinates of each neighborhood
* distance of neighborhood from city center
* number venues
* characteristics of venues

I decided to use regularly spaced grid of locations, centered around city center, to define neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Yandex maps**
* characteristics of venues and their type and location in every neighborhood will be obtained using **Foursquare API**

In [1]:
#import libraries
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
#define Foursquare Credentials and Version
CLIENT_ID = 'LZWBCVIRMS4C1ABBDINNNWHN3SXL2NN3EE3IMAJ2ZUF2STZV' # my Foursquare ID
CLIENT_SECRET = 'PYTWNPWBEIT1MDHDOYVCCB10WCCSM1W2SCHLFSLG15CCU2BX' # my Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: LZWBCVIRMS4C1ABBDINNNWHN3SXL2NN3EE3IMAJ2ZUF2STZV
CLIENT_SECRET:PYTWNPWBEIT1MDHDOYVCCB10WCCSM1W2SCHLFSLG15CCU2BX


In [3]:
#make DataFrame
neighborhoods_data = pd.read_csv('neigh_kras.csv')
neighborhoods_data.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Краснодар,Завод Измерительных Приборов,45.057959,39.000072
1,Краснодар,Табачная Фабрика,45.044634,39.016405
2,Краснодар,Московский,45.068462,39.01276
3,Краснодар,40 лет Победы,45.050779,38.994438
4,Краснодар,Покровка,45.036582,39.0046
5,Краснодар,9-й километр,45.085224,38.982375
6,Краснодар,Фестивальный,45.061754,38.956619
7,Краснодар,Кожзавод,45.040994,38.943036
8,Краснодар,Хлопчато-бумажный Комбинат,45.028847,39.051673
9,Краснодар,РМЗ,45.030461,39.030954


## Analysis
### Using geopy library to get the latitude and longitude values of Krasnodar City

In order to define an instance of the geocoder, we need to define a user_agent.

We will name our agent kras_explorer, as shown below.

In [4]:
#lthe geographical coordinates of Krasnodar
latitude = 45.0241
longitude = 38.5833

### Creating a map of Krasnodar with neighborhoods superimposed on top

In [5]:
import folium
# create map using latitude and longitude values
map_kras = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for neighborhood, lat, lng in zip(neighborhoods_data['Neighborhood'], neighborhoods_data['Latitude'], neighborhoods_data['Longitude']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kras)  
    
map_kras

We get the 'Завод Измерительных Приборов' neighborhood's latitude and longitude values.

In [6]:
neighborhood_name = neighborhoods_data.loc[0, 'Neighborhood'] # neighborhood name
neighborhood_latitude = neighborhoods_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods_data.loc[0, 'Longitude'] # neighborhood longitude value

print("Широта и долгота района '{}': {}, {}.".format(neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Широта и долгота района 'Завод Измерительных Приборов': 45.057959000000004, 39.000071999999996.


Now, let's get the top 100 venues that are in 'Завод Измерительных Приборов' within a radius of 500 meters.
First, let's create the GET request URL.

In [7]:
radius = 500 # define radius in meters
LIMIT = 100
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius,
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=LZWBCVIRMS4C1ABBDINNNWHN3SXL2NN3EE3IMAJ2ZUF2STZV&client_secret=PYTWNPWBEIT1MDHDOYVCCB10WCCSM1W2SCHLFSLG15CCU2BX&v=20180604&ll=45.057959000000004,39.000071999999996&radius=500&limit=100'

Send the GET request and examine the resutls.

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e1c7ec102a1720028efd1a8'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 14,
  'suggestedBounds': {'ne': {'lat': 45.062459004500006,
    'lng': 39.006430520514314},
   'sw': {'lat': 45.0534589955, 'lng': 38.99371347948568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5141da26e4b00feff76cd5f4',
       'name': 'Парк 40 Лет Победы',
       'location': {'lat': 45.05505976539333,
        'lng': 39.001201778250696,
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.05505976539333,
          'lng': 39.00120

From the Foursquare lab, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [10]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Парк 40 Лет Победы,Park,45.05506,39.001202
1,Книжный Рынок,Bookstore,45.05547,38.995139
2,"Диско Бар ""Винегрет""",Bar,45.054994,39.001358
3,Пирогово,Bakery,45.054675,38.995959
4,Sushi City,Sushi Restaurant,45.057416,39.002344


In [11]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

14 venues were returned by Foursquare.


### Explore Neighborhoods
Let's create a function to repeat the same process to all the neighborhoods in Krasnodar

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we write the code to run the above function on each neighborhood and create a new dataframe called nearby_venues.

In [13]:
nearby_venues = getNearbyVenues(names=neighborhoods_data['Neighborhood'],
                                   latitudes=neighborhoods_data['Latitude'],
                                   longitudes=neighborhoods_data['Longitude']
                                  )

Завод Измерительных Приборов
Табачная Фабрика
Московский
40 лет Победы
Покровка
9-й километр
Фестивальный
Кожзавод
Хлопчато-бумажный Комбинат
РМЗ
Дубинка
Черёмушки
Школьный
Центральный
Юбилейный
аул Новая Адыгея
Музыкальный
Пашковский
Слявянский
Микрохирургии глаза
Комсомольский
Энка (п. Жукова)
Гидрострой
Репино
Восточно-Кругликовский
Знаменский
Новознаменский
Аэропорт
Рубероидный
Сельхозинститут
ТЭЦ
Горхутор
Молодежный
Краевая Больница
Ул. Российская
Витаминкомбинат
Ростовское шоссе
Баскет-Холл
Поселок Северный
Горгаз
Панорама
Красная Площадь
Ипподром
Аврора
КСК
9-я Тихая


In [14]:
print(nearby_venues.shape)
nearby_venues.head()

(364, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Завод Измерительных Приборов,45.057959,39.000072,Парк 40 Лет Победы,45.05506,39.001202,Park
1,Завод Измерительных Приборов,45.057959,39.000072,Книжный Рынок,45.05547,38.995139,Bookstore
2,Завод Измерительных Приборов,45.057959,39.000072,"Диско Бар ""Винегрет""",45.054994,39.001358,Bar
3,Завод Измерительных Приборов,45.057959,39.000072,Пирогово,45.054675,38.995959,Bakery
4,Завод Измерительных Приборов,45.057959,39.000072,Sushi City,45.057416,39.002344,Sushi Restaurant


Let's check how many venues were returned for each neighborhood.

In [15]:
nearby_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
40 лет Победы,4,4,4,4,4,4
9-й километр,4,4,4,4,4,4
9-я Тихая,5,5,5,5,5,5
Аврора,22,22,22,22,22,22
Аэропорт,10,10,10,10,10,10
Баскет-Холл,2,2,2,2,2,2
Витаминкомбинат,2,2,2,2,2,2
Восточно-Кругликовский,2,2,2,2,2,2
Гидрострой,3,3,3,3,3,3
Горгаз,9,9,9,9,9,9


Let's find out how many unique categories can be curated from all the returned venues.

In [16]:
print('There are {} uniques categories.'.format(len(nearby_venues['Venue Category'].unique())))

There are 132 uniques categories.


### Analyze Each Neighbourhood

In [17]:
# one hot encoding
nearby_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nearby_onehot['Neighborhood'] = nearby_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nearby_onehot.columns[-1]] + list(nearby_onehot.columns[:-1])
nearby_onehot = nearby_onehot[fixed_columns]

nearby_onehot.head()

Unnamed: 0,Neighborhood,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bar,Beer Store,Big Box Store,Bistro,Blini House,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Cheese Shop,Circus,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Gastropub,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Hookah Bar,Hostel,Hotel,Ice Cream Shop,Internet Cafe,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Karaoke Bar,Kids Store,Light Rail Station,Liquor Store,Market,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Moving Target,Music Venue,National Park,Nightclub,Noodle House,Office,Other Great Outdoors,Outdoor Sculpture,Park,Pastry Shop,Pedestrian Plaza,Pelmeni House,Pet Store,Pharmacy,Photography Lab,Pizza Place,Plaza,Pool,Print Shop,Pub,Record Shop,Rental Car Location,Rest Area,Restaurant,Rock Club,Russian Restaurant,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Theater,Theme Park,Toy / Game Store,Trail,Train Station,Tram Station,Tree,Varenyky restaurant,Vegetarian / Vegan Restaurant,Warehouse Store
0,Завод Измерительных Приборов,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Завод Измерительных Приборов,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Завод Измерительных Приборов,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Завод Измерительных Приборов,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Завод Измерительных Приборов,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


In [18]:
nearby_onehot.shape

(364, 133)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [19]:
nearby_grouped = nearby_onehot.groupby('Neighborhood').mean().reset_index()
nearby_grouped.head()

Unnamed: 0,Neighborhood,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bar,Beer Store,Big Box Store,Bistro,Blini House,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Cheese Shop,Circus,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Gastropub,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Hookah Bar,Hostel,Hotel,Ice Cream Shop,Internet Cafe,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Karaoke Bar,Kids Store,Light Rail Station,Liquor Store,Market,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Moving Target,Music Venue,National Park,Nightclub,Noodle House,Office,Other Great Outdoors,Outdoor Sculpture,Park,Pastry Shop,Pedestrian Plaza,Pelmeni House,Pet Store,Pharmacy,Photography Lab,Pizza Place,Plaza,Pool,Print Shop,Pub,Record Shop,Rental Car Location,Rest Area,Restaurant,Rock Club,Russian Restaurant,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Theater,Theme Park,Toy / Game Store,Trail,Train Station,Tram Station,Tree,Varenyky restaurant,Vegetarian / Vegan Restaurant,Warehouse Store
0,40 лет Победы,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,9-й километр,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,9-я Тихая,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Аврора,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Аэропорт,0.1,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's print each neighborhood along with the top 5 most common venues.

In [20]:
num_top_venues = 5

for hood in nearby_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = nearby_grouped[nearby_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----40 лет Победы----
                venue  freq
0  Light Rail Station  0.50
1              Bakery  0.25
2             Brewery  0.25
3         Pizza Place  0.00
4     Photography Lab  0.00


----9-й километр----
               venue  freq
0     Soccer Stadium  0.25
1          Gastropub  0.25
2  Convenience Store  0.25
3           Tea Room  0.25
4               Park  0.00


----9-я Тихая----
               venue  freq
0          Nightclub   0.2
1  Electronics Store   0.2
2      Shopping Mall   0.2
3                Bar   0.2
4            Brewery   0.2


----Аврора----
                venue  freq
0                Café  0.09
1        Dessert Shop  0.05
2         Supermarket  0.05
3         Pizza Place  0.05
4  Light Rail Station  0.05


----Аэропорт----
             venue  freq
0      Coffee Shop   0.2
1             Café   0.2
2   Airport Lounge   0.1
3           Bistro   0.1
4  Airport Service   0.1


----Баскет-Холл----
            venue  freq
0   Big Box Store   0.5
1            Café  

Let's put that into a pandas dataframe.

First, let's write a function to sort the venues in descending order.

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = nearby_grouped['Neighborhood']

for ind in np.arange(nearby_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nearby_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,40 лет Победы,Light Rail Station,Bakery,Brewery,Warehouse Store,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market
1,9-й километр,Convenience Store,Soccer Stadium,Gastropub,Tea Room,Fountain,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market
2,9-я Тихая,Shopping Mall,Brewery,Nightclub,Electronics Store,Bar,Warehouse Store,Flea Market,Food & Drink Shop,Flower Shop,Fast Food Restaurant
3,Аврора,Café,Mobile Phone Shop,Light Rail Station,Cupcake Shop,Dessert Shop,Pizza Place,Plaza,Pool,Coffee Shop,Restaurant
4,Аэропорт,Coffee Shop,Café,Airport Lounge,Bistro,Restaurant,Rental Car Location,Airport Terminal,Airport Service,Eastern European Restaurant,Electronics Store
5,Баскет-Холл,Café,Big Box Store,Warehouse Store,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,French Restaurant
6,Витаминкомбинат,Italian Restaurant,Rest Area,Warehouse Store,Farm,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Eastern European Restaurant
7,Восточно-Кругликовский,Supermarket,Moving Target,Warehouse Store,Farmers Market,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farm
8,Гидрострой,Market,Gastropub,Bakery,French Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Warehouse Store
9,Горгаз,Hotel,Moving Target,Gourmet Shop,Bus Station,Café,Market,Shopping Mall,Light Rail Station,Farm,Electronics Store


### Cluster Neighborhoods

Run k-means to cluster the neighborhood into clusters.

The KMeans class has many parameters that can be used, but we will be using these three:
<ul>
    <li> <b>init</b>: Initialization method of the centroids. </li>
    <ul>
        <li> Value will be: "k-means++" </li>
        <li> k-means++: Selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.</li>
    </ul>
    <li> <b>n_clusters</b>: The number of clusters to form as well as the number of centroids to generate. </li>
    <ul> <li> Value will be: 5</li> </ul>
    <li> <b>n_init</b>: Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. </li>
    <ul> <li> Value will be: 10 </li> </ul>
</ul>

Initialize KMeans with these parameters, where the output parameter is called <b>kmeans</b>.

In [33]:
# set number of clusters
kclusters = 5
nearby_grouped_clustering = nearby_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(init = "k-means++", n_clusters = kclusters,  n_init = 10)
kmeans.fit(nearby_grouped_clustering)
labels = kmeans.labels_
print(labels)

[4 3 3 3 3 3 3 3 3 0 3 3 3 1 3 3 0 3 3 3 3 3 3 0 3 3 0 0 3 0 0 3 2 3 3 4 0
 0 3 3 3 3 3 3 3 3]


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
nearby_merged = neighborhoods_data

# merge grouped with data to add latitude/longitude for each neighborhood
nearby_merged = nearby_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
nearby_merged # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Краснодар,Завод Измерительных Приборов,45.057959,39.000072,3,Bookstore,Light Rail Station,Motel,Department Store,Mobile Phone Shop,Fast Food Restaurant,Bar,Bakery,Sushi Restaurant,Coffee Shop
1,Краснодар,Табачная Фабрика,45.044634,39.016405,0,Hotel,Pet Store,Auto Workshop,Grocery Store,Brewery,Print Shop,Dessert Shop,Diner,Department Store,Food Court
2,Краснодар,Московский,45.068462,39.01276,3,Shopping Mall,Café,Pedestrian Plaza,Gym,Grocery Store,Big Box Store,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant
3,Краснодар,40 лет Победы,45.050779,38.994438,4,Light Rail Station,Bakery,Brewery,Warehouse Store,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market
4,Краснодар,Покровка,45.036582,39.0046,0,Hotel,Flower Shop,Office,Pet Store,American Restaurant,Bus Station,Bus Stop,Grocery Store,Nightclub,Middle Eastern Restaurant
5,Краснодар,9-й километр,45.085224,38.982375,3,Convenience Store,Soccer Stadium,Gastropub,Tea Room,Fountain,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market
6,Краснодар,Фестивальный,45.061754,38.956619,3,Café,Food & Drink Shop,Park,Mobile Phone Shop,Bookstore,Department Store,Dessert Shop,Dance Studio,Pharmacy,Gym / Fitness Center
7,Краснодар,Кожзавод,45.040994,38.943036,0,Hotel,Bar,Asian Restaurant,Food & Drink Shop,Spa,Cheese Shop,Diner,Eastern European Restaurant,Electronics Store,Farm
8,Краснодар,Хлопчато-бумажный Комбинат,45.028847,39.051673,3,Electronics Store,Fast Food Restaurant,Hotel,Noodle House,Shopping Mall,Burger Joint,Mobile Phone Shop,Blini House,Sandwich Place,Sushi Restaurant
9,Краснодар,РМЗ,45.030461,39.030954,0,Hotel,Liquor Store,Bus Stop,Pizza Place,Gym / Fitness Center,Restaurant,Moving Target,Department Store,Pharmacy,Café


Finally, let's visualize the resulting clusters

In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [36]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nearby_merged['Latitude'], nearby_merged['Longitude'],
                                  nearby_merged['Neighborhood'], nearby_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

In [37]:
#cluster 1
nearby_merged.loc[nearby_merged['Cluster Labels'] == 0, nearby_merged.columns[[1] + list(range(5, nearby_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Табачная Фабрика,Hotel,Pet Store,Auto Workshop,Grocery Store,Brewery,Print Shop,Dessert Shop,Diner,Department Store,Food Court
4,Покровка,Hotel,Flower Shop,Office,Pet Store,American Restaurant,Bus Station,Bus Stop,Grocery Store,Nightclub,Middle Eastern Restaurant
7,Кожзавод,Hotel,Bar,Asian Restaurant,Food & Drink Shop,Spa,Cheese Shop,Diner,Eastern European Restaurant,Electronics Store,Farm
9,РМЗ,Hotel,Liquor Store,Bus Stop,Pizza Place,Gym / Fitness Center,Restaurant,Moving Target,Department Store,Pharmacy,Café
16,Музыкальный,Pizza Place,Bus Stop,Café,Hotel,Health & Beauty Service,Gym / Fitness Center,Deli / Bodega,Department Store,Dessert Shop,Diner
17,Пашковский,Hotel,Auto Workshop,Gym / Fitness Center,Gym,Restaurant,Bakery,Pool,Athletics & Sports,Asian Restaurant,Dessert Shop
23,Репино,Hotel,Bakery,Irish Pub,Diner,German Restaurant,Cafeteria,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant
34,Ул. Российская,Spa,Warehouse Store,Farmers Market,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farm,French Restaurant
39,Горгаз,Hotel,Moving Target,Gourmet Shop,Bus Station,Café,Market,Shopping Mall,Light Rail Station,Farm,Electronics Store


In [38]:
#cluster 2
nearby_merged.loc[nearby_merged['Cluster Labels'] == 1, nearby_merged.columns[[1] + list(range(5, nearby_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Знаменский,Farm,Warehouse Store,French Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Electronics Store,Farmers Market


In [39]:
#cluster 3
nearby_merged.loc[nearby_merged['Cluster Labels'] == 2, nearby_merged.columns[[1] + list(range(5, nearby_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Рубероидный,Flower Shop,Warehouse Store,French Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Eastern European Restaurant,Electronics Store,Farm


In [40]:
#cluster 4
nearby_merged.loc[nearby_merged['Cluster Labels'] == 3, nearby_merged.columns[[1] + list(range(5, nearby_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Завод Измерительных Приборов,Bookstore,Light Rail Station,Motel,Department Store,Mobile Phone Shop,Fast Food Restaurant,Bar,Bakery,Sushi Restaurant,Coffee Shop
2,Московский,Shopping Mall,Café,Pedestrian Plaza,Gym,Grocery Store,Big Box Store,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant
5,9-й километр,Convenience Store,Soccer Stadium,Gastropub,Tea Room,Fountain,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market
6,Фестивальный,Café,Food & Drink Shop,Park,Mobile Phone Shop,Bookstore,Department Store,Dessert Shop,Dance Studio,Pharmacy,Gym / Fitness Center
8,Хлопчато-бумажный Комбинат,Electronics Store,Fast Food Restaurant,Hotel,Noodle House,Shopping Mall,Burger Joint,Mobile Phone Shop,Blini House,Sandwich Place,Sushi Restaurant
10,Дубинка,Boat or Ferry,German Restaurant,Rock Club,Moving Target,Warehouse Store,Fast Food Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market
11,Черёмушки,Coffee Shop,Eastern European Restaurant,Intersection,Dessert Shop,Japanese Restaurant,Movie Theater,Grocery Store,Farmers Market,Food & Drink Shop,Flower Shop
12,Школьный,Shopping Mall,Flower Shop,Burger Joint,Coffee Shop,Flea Market,Park,Café,Warehouse Store,Farmers Market,Fast Food Restaurant
13,Центральный,Coffee Shop,Restaurant,Theater,Burger Joint,Pizza Place,Gastropub,Pelmeni House,Circus,Concert Hall,Deli / Bodega
14,Юбилейный,Park,Restaurant,Fast Food Restaurant,Light Rail Station,Outdoor Sculpture,Pharmacy,Gym / Fitness Center,Pizza Place,Department Store,Pub


In [41]:
#cluster 5
nearby_merged.loc[nearby_merged['Cluster Labels'] == 4, nearby_merged.columns[[1] + list(range(5, nearby_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,40 лет Победы,Light Rail Station,Bakery,Brewery,Warehouse Store,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market
30,ТЭЦ,Light Rail Station,Warehouse Store,Farm,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Electronics Store
