# Final Project - The Battle of the Neighborhoods!

## Introduction

### Problem description
In this challenge, I'm going to tackle the social texture of the city of Jerusalem! I'll conduct a thorough analysis which will conculude with a clustering for the various venues throughout the different neighborhoods.  as a city with a unique socio-demographic texture, this analysis should be super interesting. this problem should appeal to anyone who cares about the socio demographic texture of the city.

### Data description
I'm going to utilize data from wikipedia (download and parse), grab some data from the google geocode API and in order to get the venue list - use the foursquare API.

### Methodology
1. scrap neighborhood data using BeatutifulSoup from wiki - parse, wrangle, and clean it.
2. extract lonitude and latitude coordinates using geocoder
3. validating the base-data (manually and by visualization)
4. extracting the jerusalem long/lat for the map usage using google geocode API
5. fetching the venue lists per neighborhood
6. preparing the data for clustering (OHE per unique venue category, etc.)
7. executing clustering algo
8. visualize the results
9. summarize and discuss the results

In [2]:
from pprint import pprint
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize
import pyproj
#import math
from keys import google_api, foursquare_api
import geocoder 
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

### grabbing, parsing and wrangling data from wiki

In [3]:
jerusalem_raw = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_of_Jerusalem").text

In [4]:
soup = BeautifulSoup(jerusalem_raw, 'html.parser')
neighborhoodList = []
for row in soup.find_all("div", class_="mw-category")[1].findAll("li"):
    neighborhoodList.append(row.text)

jerusalem_df = pd.DataFrame({"Neighborhood": neighborhoodList})

jerusalem_df = jerusalem_df[1:]
jerusalem_df.index  = range(len(jerusalem_df))

print(jerusalem_df)     #present initial neighborhood list

Neighborhood
0                       Abu Tor
1    American Colony, Jerusalem
2              Armenian Quarter
3            Armon (given name)
4                        Arnona
..                          ...
146                 Yemin Moshe
147               Zikhron Moshe
148               Zikhron Tuvya
149               Zikhron Yosef
150                  Mount Zion

[151 rows x 1 columns]


In [5]:
# def function to grab coordinates based on text
def get_latlng(neighborhood):
    """
    grabs latitude and logitude based on address
    """
    lat_lng_coords = None   # initialize your variable to None
    while(lat_lng_coords is None): # iterate until we get coords
        g = geocoder.arcgis('{}, Jerusalem, Israel'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [6]:
coord_list = [get_latlng(neighborhood) for neighborhood in jerusalem_df["Neighborhood"].tolist()]

In [7]:
coord_df = pd.DataFrame(coord_list, columns=['Latitude', 'Longitude'])
df = jerusalem_df.copy()
df['Latitude'] = coord_df['Latitude']
df['Longitude'] = coord_df['Longitude']

### display full neighborhood list + coordinates for validation (can see one outlier immediately)

In [8]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    print(df)

Neighborhood   Latitude   Longitude
0                             Abu Tor  31.863790   35.177161
1          American Colony, Jerusalem  31.789720   35.229160
2                    Armenian Quarter  31.774950   35.229840
3                  Armon (given name)  31.780030   35.218730
4                              Arnona  31.744220   35.220620
5                        Arzei HaBira  31.793129   35.224803
6                              Atarot  31.860690   35.219800
7                        Bab a-Zahara  31.786011   35.232731
8                            Bab Huta  31.781844   35.235252
9                     Baka, Jerusalem  31.756850   35.220750
10                      Batei Munkacs  31.780030   35.218730
11                      Batei Saidoff  31.786620   35.210190
12                      Batei Ungarin  31.780030   35.218730
13                       Batei Warsaw  31.780030   35.218730
14                        Bayit VeGan  31.767604   35.184851
15                         Beit David  31.793680 

In [9]:
# removing outliers (based on coordinates/ visualization issues)
df.drop(df[(df['Neighborhood']=='Rassco (neighborhood)') | (df['Neighborhood']=='Kiryat HaLeom') | (df['Neighborhood']=='E1 (Jerusalem)')].index, inplace=True)

### create map and vizualize the above neighborhoods

In [10]:
def get_coordinates(api_key, address, verbose=False):
    """
    grabbing lang/lot from google api based on address
    """
    try:
        url = f'https://maps.googleapis.com/maps/api/geocode/json?key={api_key}&address={address}'
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location']
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]

google_api_key = google_api().key   #get google API key from file
address = 'jerusalem'
jerusalem_center = get_coordinates(google_api_key, address)

# manual backup
if jerusalem_center == [None, None]:
    location = {'lat': 31.768319, 'lng': 35.21371}  #extracted with key
    latitude = location['lat']
    longitude = location['lng']
    jerusalem_center = [latitude, longitude]

print(jerusalem_center)

[31.768319, 35.21371]


In [11]:
# generate map
map_jer = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_jer)  
    
map_jer

In [23]:
map_jer.save('map_jer1.html')   # saving the map

### utilize forsquare - grab tokens, and get venues per neighborhood

In [12]:
x = foursquare_api()    # load api credentials from file

CLIENT_ID = x.client_id
CLIENT_SECRET = x.cilent_secret
VERSION = '20180605'

In [13]:
# define limit = 100 & radius = 500(meters)
LIMIT = 100
radius = 500

venues = []

# iterate over the neighborhoods from the above list and grab venues using the foursquare API
for neighborhood, lat, long in zip(df.Neighborhood, df.Latitude, df.Longitude):
    try:
        url = f'https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={lat},{long}&radius={radius}&limit={LIMIT}'
        output = requests.get(url).json()["response"]
        results = output['groups'][0]['items']
        #results = requests.get(url).json()["response"]['groups'][0]['items']

        # stores only the relevant information for every venue in the set radius
        for venue in results:
            venues.append((
                neighborhood,
                lat, 
                long, 
                venue['venue']['name'], 
                venue['venue']['location']['lat'], 
                venue['venue']['location']['lng'],  
                venue['venue']['categories'][0]['name']))
    except:
        print(f'failed on neighborhood={neighborhood}')
        pprint(output)
        break

In [14]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues, columns=['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory'])

print(venues_df.shape)
venues_df

(3038, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abu Tor,31.863790,35.177161,מנפיס - גבעת זאב,31.862652,35.173940,Hot Dog Joint
1,"American Colony, Jerusalem",31.789720,35.229160,American Colony Hotel (אמריקן קולוני),31.789819,35.229448,Hotel
2,"American Colony, Jerusalem",31.789720,35.229160,Leonardo Hotel,31.788548,35.227600,Hotel
3,"American Colony, Jerusalem",31.789720,35.229160,Olive Tree Hotel,31.789357,35.228211,Hotel
4,"American Colony, Jerusalem",31.789720,35.229160,Cellar Bar at American Colony Hotel,31.789886,35.229359,Restaurant
...,...,...,...,...,...,...,...
3033,Mount Zion,31.772253,35.228600,Montefiore / מונטיפיורי,31.771307,35.224519,Italian Restaurant
3034,Mount Zion,31.772253,35.228600,Skizza,31.769041,35.225845,Art Gallery
3035,Mount Zion,31.772253,35.228600,Judean Desert,31.771872,35.223881,Outdoors & Recreation
3036,Mount Zion,31.772253,35.228600,מלון הר ציון,31.768594,35.226533,Hotel


we can see a low number of results (out of a possible 14k we've received only 3k, which means there isnt too much data for jerusalem in foursquare)

In [15]:
# validation - see number of "rows"(venues) per neighborhood(results)
with pd.option_context('display.max_rows', None, 'display.max_columns', None): 
    display(venues_df.groupby(["Neighborhood"]).count())

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abu Tor,1,1,1,1,1,1
Al-Ram,46,46,46,46,46,46
Al-Walaja,3,3,3,3,3,3
"American Colony, Jerusalem",26,26,26,26,26,26
Armenian Quarter,41,41,41,41,41,41
Armon (given name),46,46,46,46,46,46
Arnona,2,2,2,2,2,2
Arzei HaBira,6,6,6,6,6,6
At-Tur (Mount of Olives),5,5,5,5,5,5
Atarot,1,1,1,1,1,1


as explained before - we see that we have incomplete entreis for some of the neighborhoods (e.g. no full results per neighborhood), the breakdown itself is interesting (e.g. Jerusalem has a very unique social texture - religious, arabs and non-religious jews), however, this will not be explored in this seession.

In [16]:
venues_df['VenueCategory'].nunique() # number of unique cateogries of venues

131

In [17]:
jer_mat = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")    #get dummies per venue category
jer_mat['Neighborhoods'] = venues_df['Neighborhood']
print(jer_mat.shape)
jer_mat

(3038, 132)


Unnamed: 0,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,...,Toy / Game Store,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Vineyard,Waterfall,Wine Bar,Wings Joint,Neighborhoods
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Abu Tor
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"American Colony, Jerusalem"
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"American Colony, Jerusalem"
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"American Colony, Jerusalem"
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"American Colony, Jerusalem"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3033,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mount Zion
3034,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mount Zion
3035,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mount Zion
3036,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mount Zion


In [18]:
jer_df = jer_mat.groupby(["Neighborhoods"]).mean().reset_index() #group by neighborhoods and grab frequency per venue category
# validation
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(jer_df)

Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Bed & Breakfast,Beer Bar,Bike Rental / Bike Share,Bistro,Boarding House,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Burger Joint,Burrito Place,Bus Station,Bus Stop,Cafeteria,Café,Caucasian Restaurant,Cheese Shop,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Eastern European Restaurant,Electronics Store,Exhibit,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food Service,French Restaurant,Fruit & Vegetable Store,Garden,Gay Bar,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym / Fitness Center,Historic Site,History Museum,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Jewish Restaurant,Juice Bar,Kosher Restaurant,Lake,Lebanese Restaurant,Library,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Memorial Site,Middle Eastern Restaurant,Monument / Landmark,Moroccan Restaurant,Mountain,Movie Theater,Moving Target,Museum,Nature Preserve,Neighborhood,New American Restaurant,Noodle House,Other Great Outdoors,Outdoors & Recreation,Park,Pedestrian Plaza,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pub,Rental Car Location,Restaurant,River,Road,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Soccer Stadium,Soup Place,South American Restaurant,Sports Bar,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Synagogue,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Vineyard,Waterfall,Wine Bar,Wings Joint
0,Abu Tor,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Al-Ram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.065217,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.23913,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.130435,0.0,0.0,0.021739,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Al-Walaja,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"American Colony, Jerusalem",0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.538462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Armenian Quarter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.073171,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.121951,0.04878,0.02439,0.0,0.04878,0.0,0.0,0.0,0.0,0.04878,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.121951,0.0,0.0,0.04878,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0
5,Armon (given name),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.065217,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.23913,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.130435,0.0,0.0,0.021739,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Arnona,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Arzei HaBira,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,At-Tur (Mount of Olives),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Atarot,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Clustering

In [19]:
kclusters = 5
jer_clustering = jer_df.drop(["Neighborhoods"], axis=1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(jer_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 3, 1, 0, 3, 3, 1, 0, 0, 4])

In [20]:
jer_df.insert(1, 'label', kmeans.labels_)
jer_df

Unnamed: 0,Neighborhoods,label,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Vineyard,Waterfall,Wine Bar,Wings Joint
0,Abu Tor,3,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.000,0.0,0.00000,0.000000,0.0,0.0,0.0,0.0
1,Al-Ram,3,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.021739,...,0.0,0.0,0.000,0.0,0.00000,0.000000,0.0,0.0,0.0,0.0
2,Al-Walaja,1,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.000,0.0,0.00000,0.000000,0.0,0.0,0.0,0.0
3,"American Colony, Jerusalem",0,0.0,0.0,0.000000,0.038462,0.0,0.038462,0.0,0.000000,...,0.0,0.0,0.000,0.0,0.00000,0.000000,0.0,0.0,0.0,0.0
4,Armenian Quarter,3,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.000,0.0,0.02439,0.024390,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135,Yefeh Nof,3,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.000,0.0,0.00000,0.000000,0.0,0.0,0.0,0.0
136,Yemin Moshe,3,0.0,0.0,0.015385,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.000,0.0,0.00000,0.015385,0.0,0.0,0.0,0.0
137,Zikhron Moshe,1,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.125,0.0,0.00000,0.000000,0.0,0.0,0.0,0.0
138,Zikhron Tuvya,3,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.017857,...,0.0,0.0,0.000,0.0,0.00000,0.035714,0.0,0.0,0.0,0.0


In [21]:
jer_merged = pd.merge(df, jer_df, left_on='Neighborhood', right_on='Neighborhoods', how='inner')
jer_merged.drop(labels=["Neighborhood_x"], axis=1, inplace=True)
print(jer_merged.shape)
jer_merged.head()

(140, 135)


Unnamed: 0,Latitude,Longitude,Neighborhoods,label,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Vineyard,Waterfall,Wine Bar,Wings Joint
0,31.86379,35.177161,Abu Tor,3,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,31.78972,35.22916,"American Colony, Jerusalem",0,0.0,0.0,0.0,0.038462,0.0,0.038462,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,31.77495,35.22984,Armenian Quarter,3,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0
3,31.78003,35.21873,Armon (given name),3,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,31.74422,35.22062,Arnona,1,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### displaying clusters on map

In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(jer_merged['Latitude'], jer_merged['Longitude'], jer_merged['Neighborhoods'], jer_merged['label']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [24]:
map_clusters.save('map_clusters1.html')   # saving the map

### Executive Summary and conclusions
I've started with grabbing the data from the Jerusalem neighborhoods wiki page. I've then grabbed the relevant longitudes and latitudes per neighborhood. validated the data, cleaned for outliers. displayed over the map of the Jerusalem Area (including suburbs). The distribution makes sense and fits the real state of the world.
I've then created the relevant datasets utilizing the google geocode and foursquare APIs to grab relevant venues for each one of the neighborhoods (around 135 neighborhoods in total). an initial observation was that we've received a (relatively) low number of venues. As i was curious about the reason for that, i had a breakdown by the neighborhood. where it shows a drastic variance of venues for neighborhoods.
An important aspect of Israel, and Jerusalem in particular is the social texture. the city is somewhat split - 1/3 are religious Jews, 1/3 are non-religious Jews and 1/3 are Arabs (roughly). It appears that we have less information as a whole for the 2/3 of the Arab and the religious Jewish neighborhoods - which aligns with our expectations (these fractions of the population are somewhat less vanguard when it comes to technology). I've decided to move forward and analyze the entire set of neighborhoods as I figured a breakdown between populations is more interesting than a within population analysis (however, perhaps for future tasks...)
the clustering results make some sense as it appears to capture the difference between relatively mid-income non-religious Jewish neighborhoods (light green), and low income neighborhoods (purple), the Arab neighborhoods either fall in the purple (low income) neighborhoods or in red (east Jerusalem) . when conducting this analysis, we need to consider the low number of observations for some of the neighborhoods which affect the results drastically.