# Capstone Project - The battle of neighborhoods

The Battle of Neighborhoods

**Details**
Name:	Joost van Kempen


**Content**
1. Research/business problem	
2. Data description	
3. Methodological framework	
4. Results	
5. Discussion	
6. Conclusion	



## Week 1

**1. Research/business problem**
	
Over the last years, the housing market has been overcrowded in the Netherlands making it very difficult for starters to buy their own house. Viewings are almost always fully booked and even if you get a spot during these viewings the bidding process is fierce. It usually comes down to who wants to pay the most on top of what has already been asked by the seller and how quick you are to act. People start to ignore their wish list on how their ideal house should look like just to get a hold on something. At this moment, there is just not enough time for potential buyers to look into the details of a house, where it is located, if the area is kid friendly, how are the restaurants, and so on. 
	Having a quick overview of how neighborhoods look like and if it matches your needs is key in this market. If a house becomes available you would know in a matter of hours, maybe even minutes, if this is the house you are looking for. The buyer would be less dependent on advisors to tell them is a house fulfill their needs and would be able to compare houses far more in depth than they were used to do.
	In this research there will be a loop on the Amsterdam area in the Netherlands. Tilburg is one of the big student cities in the Netherlands resulting in a lot of graduates looking for a home when they start their working life. Similar to the rest of the Netherlands, in Amsterdam the housing market is overcrowded. For every house that comes available on the market, there are at least 50 potential buyers who want to schedule a viewing. Students tend to search for a house in the neighborhood where they rented during their student life. However, Amsterdam has a lot more to offer than only that neighborhood. The problem is that students do not know enough about these neighborhoods and do not have the luxury of time to investigate when a house comes available.
	There are many factors that can be considered when comparing neighborhoods. To narrow the scope of this research, the research question will be limited to different types (Chinese, Indian, Italian, native Dutch, ect…) of restaurants in the area and how they are rated, assuming this is an important reason for students to live in that area. The same approach could be applied for different elements of neighborhood such as a good school or the number of sport associations nearby. This leads into the following research question:

*In which neighborhood is it best to buy a house based on the preference for a type of restaurant in that area?*


***

**2. Data description**

Location data from neighborhoods in the Amsterdam area are taken from the Foursquare API database. From the database, this research will in particular look at the neighborhood names, the restaurants in the area, restaurant type, and the reviews from those areas to specify in which neighborhoods you have the best restaurants for a specific dish or type of food to answer the research question. In case the results lead to relevant sub questions for a specific neighborhood additional data could be added to the data frame to deep dive into these additional questions.
	The starting point of the data will be different types of restaurants after which the code will search for the neighborhood which scores best for that particular type of food. For example, the starting point would be to find the best place to buy a house if you are a big fan of Italian food. The code will rank the neighborhoods in Amsterdam where it is best to live based on the preference for Italian food. The same sequence will be performed for other types of food such as Chinese, Indian, or others.


-----------------

## Week 2

**Methodology**


The Methodology section will describe the main components of our analysis and predication system. The Methodology section comprises four stages:

1. Collect Data
2. Explore and Understand Data
3. Data preparation and preprocessing 
4. Modeling

**1. Collect Data**

For the first step in collecting the data we need the necessary libraries:

In [110]:
import os # Operating System
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!pip install geopy 

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!pip install folium
import folium #import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Now the data can be loaded into a dataframe as shown in the code below. Before we can start using the data, we'll have to explore and understand it.

In [125]:
url = 'https://raw.githubusercontent.com/JoostKempen/Coursera_Capstone/main/nl_postal_codes.csv'
df_netherlands = pd.read_csv(url,encoding='latin-1')

In [126]:
df_netherlands.head()

Unnamed: 0,Postal Code,Place Name,State,County,Latitude,Longitude
0,9400,Assen,Provincie Drenthe,Gemeente Assen,52.9967,6.5625
1,9401,Assen,Provincie Drenthe,Gemeente Assen,52.9967,6.5625
2,9402,Assen,Provincie Drenthe,Gemeente Assen,52.9967,6.5625
3,9403,Assen,Provincie Drenthe,Gemeente Assen,52.9967,6.5625
4,9404,Assen,Provincie Drenthe,Gemeente Assen,52.9967,6.5625


In [127]:
df_netherlands.shape

(5314, 6)

Our data set consists of 138 rows and 9 columns. We'll now prepare and preprocess the data accordingly

**Data preparation and preprocessing**

At this stage, we prepare our dataset for the modeling process, opting for the most suitable machine learning algorithm for our scope. Accordingly, we perform the following steps:

* Rename the column names
* Remove the first empty rows
* Drop unnessecary columns


In [128]:
df_netherlands.drop('State',axis=1,inplace=True)
df_netherlands.head()

Unnamed: 0,Postal Code,Place Name,County,Latitude,Longitude
0,9400,Assen,Gemeente Assen,52.9967,6.5625
1,9401,Assen,Gemeente Assen,52.9967,6.5625
2,9402,Assen,Gemeente Assen,52.9967,6.5625
3,9403,Assen,Gemeente Assen,52.9967,6.5625
4,9404,Assen,Gemeente Assen,52.9967,6.5625


In [129]:
df_amsterdam = df_netherlands.loc[df_netherlands['County'] == 'Gemeente Amsterdam']
df_amsterdam.head()

Unnamed: 0,Postal Code,Place Name,County,Latitude,Longitude
2923,1000,Amsterdam,Gemeente Amsterdam,52.374,4.8897
2924,1001,Amsterdam,Gemeente Amsterdam,52.374,4.8897
2925,1002,Amsterdam,Gemeente Amsterdam,52.374,4.8897
2926,1003,Amsterdam,Gemeente Amsterdam,52.374,4.8897
2927,1004,Amsterdam,Gemeente Amsterdam,52.374,4.8897


In [143]:
df_amsterdam.shape

(21, 5)

In [210]:
df_amsterdam.drop('County',axis=1,inplace=True)
df_amsterdam.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Postal Code,Place Name,Latitude,Longitude
2923,1000,Amsterdam,52.374,4.8897
2924,1001,Amsterdam,52.374,4.8897
2925,1002,Amsterdam,52.374,4.8897
2926,1003,Amsterdam,52.374,4.8897
2927,1004,Amsterdam,52.374,4.8897


In [212]:
df_amsterdam.drop('Place Name',axis=1,inplace=True)
df_amsterdam.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Postal Code,Latitude,Longitude
2923,1000,52.374,4.8897
2924,1001,52.374,4.8897
2925,1002,52.374,4.8897
2926,1003,52.374,4.8897
2927,1004,52.374,4.8897


In [131]:
address = 'Amsterdam, Netherlands'

geolocator = Nominatim(user_agent='amsterdam-application')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Amsterdam are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Amsterdam are 52.3727598, 4.8936041.


---

**With these latitude and longitude numbers we'll create a map of Amsterdam city**

In [142]:
# create map of Amsterdam using latitude and longitude values
map_amsterdam = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
df_amsterdam.apply(lambda row:folium.CircleMarker(location=[row["Latitude"], row["Longitude"]], 
                                              radius=10, popup=row['Postal Code'])
                                             .add_to(map_amsterdam), axis=1)

map_amsterdam

In [144]:
# Define Foursquare Credentials and Version

CLIENT_ID = 'DVQLXP5VIS0DAY4B2POSWECHCHALFPSTN4AUWU131UV0PDMS' # my Foursquare ID
CLIENT_SECRET = 'U4UGTKCU1IRMY3DKSROJ0HGBFM5VQJTZVXLJDRCT32WBE3V3' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version


We can now proceed to the Modeling phase. We will analyze neighborhoods to recommend real estates where home buyers can make a real estate investment. 

**4. Modeling**

After exploring the dataset and gaining insights into it, we are ready to use the clustering methodology to analyze neighborhoods. We will use the k-means clustering technique as it is fast and efficient in terms of computational cost.

In [153]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Postal Code Latitude', 
                  'Postal Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [154]:
location_venues = getNearbyVenues(names=df_amsterdam['Postal Code'],
                                   latitudes=df_amsterdam['Latitude'],
                                   longitudes=df_amsterdam['Longitude']
                                  )

1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1026
1027
1080
1081
1082
1083
1084
1085
1088
1089
1109


In [155]:
location_venues

Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1000,52.3740,4.8897,De Bierkoning,52.372404,4.889795,Beer Store
1,1000,52.3740,4.8897,Proeflokaal De Drie Fleschjes,52.374203,4.892239,Bar
2,1000,52.3740,4.8897,Grey Area Coffeeshop,52.374641,4.888839,Marijuana Dispensary
3,1000,52.3740,4.8897,Caffè Il Momento,52.374682,4.889018,Café
4,1000,52.3740,4.8897,Kaagman & Kortekaas,52.374878,4.892455,French Restaurant
...,...,...,...,...,...,...,...
859,1089,52.3740,4.8897,Manneken Pis,52.375708,4.896237,Fast Food Restaurant
860,1109,52.3058,5.0167,Rezk & De Vries Snackbar,52.304692,5.017882,Snack Place
861,1109,52.3058,5.0167,Bushalte Geinbrug,52.304607,5.012811,Bus Stop
862,1109,52.3058,5.0167,SV Geinburgia,52.308529,5.015150,Athletics & Sports


Let's see how many venues there are in each neighborhood

In [156]:
location_venues.groupby('Postal Code').count()

Unnamed: 0_level_0,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1000,52,52,52,52,52,52
1001,52,52,52,52,52,52
1002,52,52,52,52,52,52
1003,52,52,52,52,52,52
1004,52,52,52,52,52,52
1005,52,52,52,52,52,52
1006,52,52,52,52,52,52
1007,52,52,52,52,52,52
1008,52,52,52,52,52,52
1009,52,52,52,52,52,52


In [157]:
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

There are 62 uniques categories.


In [158]:
location_venues.shape

(864, 7)

In the section below one hot coding will be used to show an overview of the venues in all areas

In [159]:
# one hot encoding
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Postal Code'] = location_venues['Postal Code'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Postal Code,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beer Bar,Beer Store,Bistro,...,Shopping Mall,Snack Place,Soccer Field,South American Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Toy / Game Store,Tram Station,Vietnamese Restaurant
0,1000,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,1000,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1000,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1000,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1000,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The next step is grouping the venues in each area together in one row

In [160]:
amsterdam_grouped = venues_onehot.groupby('Postal Code').mean().reset_index()
amsterdam_grouped

Unnamed: 0,Postal Code,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beer Bar,Beer Store,Bistro,...,Shopping Mall,Snack Place,Soccer Field,South American Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Toy / Game Store,Tram Station,Vietnamese Restaurant
0,1000,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
1,1001,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
2,1002,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
3,1003,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
4,1004,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
5,1005,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
6,1006,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
7,1007,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
8,1008,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0
9,1009,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0


In [168]:
amsterdam_grouped['Postal Code'] = amsterdam_grouped['Postal Code'].astype(str)
print(amsterdam_grouped.dtypes)

Postal Code               object
Art Gallery              float64
Art Museum               float64
Asian Restaurant         float64
Athletics & Sports       float64
                          ...   
Supermarket              float64
Sushi Restaurant         float64
Toy / Game Store         float64
Tram Station             float64
Vietnamese Restaurant    float64
Length: 63, dtype: object


In [169]:
# What are the top 5 venues/facilities in each neighborhood?

num_top_venues = 5

for hood in amsterdam_grouped['Postal Code']:
    print("----"+hood+"----")
    temp = amsterdam_grouped[amsterdam_grouped['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1000----
                venue  freq
0               Hotel  0.13
1      Clothing Store  0.06
2                 Bar  0.06
3                Café  0.04
4  Italian Restaurant  0.04


----1001----
                venue  freq
0               Hotel  0.13
1      Clothing Store  0.06
2                 Bar  0.06
3                Café  0.04
4  Italian Restaurant  0.04


----1002----
                venue  freq
0               Hotel  0.13
1      Clothing Store  0.06
2                 Bar  0.06
3                Café  0.04
4  Italian Restaurant  0.04


----1003----
                venue  freq
0               Hotel  0.13
1      Clothing Store  0.06
2                 Bar  0.06
3                Café  0.04
4  Italian Restaurant  0.04


----1004----
                venue  freq
0               Hotel  0.13
1      Clothing Store  0.06
2                 Bar  0.06
3                Café  0.04
4  Italian Restaurant  0.04


----1005----
                venue  freq
0               Hotel  0.13
1      Clothing 

Now we can start creating the overview that will show us which venues are most common in the area

In [170]:
# Define a function to return the most common venues/facilities nearby real estate investments#

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [171]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [173]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Postal Code'] = amsterdam_grouped['Postal Code']

for ind in np.arange(amsterdam_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(amsterdam_grouped.iloc[ind, :], num_top_venues)

In [177]:
venues_sorted.head(21)

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1000,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
1,1001,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2,1002,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
3,1003,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
4,1004,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
5,1005,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
6,1006,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
7,1007,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
8,1008,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
9,1009,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop


In [197]:
amsterdam_grouped.shape

(21, 64)

In [198]:
amsterdam_grouped.head()

Unnamed: 0,Postal Code,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beer Bar,Beer Store,Bistro,...,Snack Place,Soccer Field,South American Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Toy / Game Store,Tram Station,Vietnamese Restaurant,Cluster Labels
0,1000,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0
1,1001,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0
2,1002,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0
3,1003,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0
4,1004,0.038462,0.019231,0.0,0.0,0.0,0.057692,0.019231,0.019231,0.0,...,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0


In [207]:
amsterdam_grouped = df_amsterdam

As we see some overlap in some of the area's let's cluster the neighborhoods in Amsterdam

In [213]:
#Distribute in 5 Clusters

# set number of clusters
kclusters = 5

amsterdam_grouped_clustering = amsterdam_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(amsterdam_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 4, 0, 0, 0, 0, 2, 2, 2, 2, 3],
      dtype=int32)

In [214]:
#Dataframe to include Clusters
amsterdam_grouped_clustering=df_amsterdam
amsterdam_grouped_clustering.head()

Unnamed: 0,Postal Code,Latitude,Longitude
2923,1000,52.374,4.8897
2924,1001,52.374,4.8897
2925,1002,52.374,4.8897
2926,1003,52.374,4.8897
2927,1004,52.374,4.8897


In [215]:
amsterdam_grouped_cluster.dtypes

Postal Code                object
Art Gallery               float64
Art Museum                float64
Asian Restaurant          float64
Athletics & Sports        float64
                           ...   
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
Length: 74, dtype: object

In [None]:
# add clustering labels
amsterdam_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge amsterdam_grouped with london_data to add latitude/longitude for each neighborhood
amsterdam_grouped_clustering = amsterdam_grouped_clustering.join(venues_sorted.set_index('Postal Code'), on='Postal Code')

In [220]:
amsterdam_grouped_clustering.head(30) 

Unnamed: 0,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2923,1000,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2924,1001,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2925,1002,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2926,1003,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2927,1004,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2928,1005,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2929,1006,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2930,1007,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2931,1008,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop
2932,1009,52.374,4.8897,2,Hotel,Bar,Clothing Store,Art Gallery,Dessert Shop,Italian Restaurant,Café,Hotel Bar,Historic Site,Gift Shop


In [219]:
# Create Map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(amsterdam_grouped_clustering['Latitude'], amsterdam_grouped_clustering['Longitude'], amsterdam_grouped_clustering['Postal Code'], amsterdam_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results 

Analyzing data on neighborhoods in Amsterdam resulted in 5 clusters in the area basaed on similarity in venue data:

* Cluster 0: Amsterdam-Zuid (Red)
* Cluster 1: Amsterdam - Noordoost (Purple)
* Cluster 2: City centre (Blue)
* Cluster 3: Amsterdam-Zuidoost (Green)
* Cluster 4: Amsterdam - Noord (Orange)

The data shows in one quick overview which venues are most popular in that area. We see for example that in Amsterdam-Zuid Sushi restaurants are very popular. Therefore, we can say that a student with a preference for sushi will likely need to look for a house in this particular area. If someone is a big fan of quick snack we can see that in Amsterdam-Zuidoost this is the 4th most commen venue and therefore a great place to live. And for those who have cravings Italian food all the time, the city centre is definitely the place settle.

Initially this research question only wanted to look at type of preferred restaurants to see in which neighborhood a student might want to buy a house. What we see in the data though is that we have many more potential venues which we can take a look at as well.

For example, if you have many friends that live oversees and visit you fairly often, it might be interesting to buy a house in a neighborhood which has many hotels. We see that in the city centre hotels are actually the most common venue and therefore a great option. If the city centre might be a bit too expense for a student to buy their first house the data also shows that in Amsterdam-Zuid there are still quite some hotels (6th most common venue). As Amsterdam-Zuid is most likely going to be cheaper than the city centre this could still be a great option. 


------

## Discussion

Dependend on the preferences of the student, this overview can help them by looking for the right area that fit their needs. When a student likes Italian food, goes regularly to a bar, goes a lot to the gym he/she could have a quick look in this overview to see in which area's those facilities are represented the most. The possibilities are endless!

This will help the student in finding the right place for them without having to invest a lot of time in searching the internet for all the different venues in each area. One quick glanse at the overview will provide them all the information they need.

The overview below shows some of the recommendations for each cluster based on the table:

**Amsterdam-Zuid**\
If you're are really into sushi, this is the place you want to live! You have also a good amount of supermarkets, restaurants and bakeries. 

**Amsterdam-Noordoost**\
When you like to go fishing or like to eat fish at a restaurant Amsterdam-Noordoost is most likely going to be the right place for you. Next to that, there are multiple soccer fields around if you're a sports person.

**City Centre**\
Having friends over that cannot not sleep at your place? Plenty of opportunities for them to book a room in a hotel close by. Especially after a night of drinking in one of the many bars in the area this might be very convenient.

**Amsterdam-Zuidoost**\
No money for a car yet as you just left the university? Not a problem in Amsterdam-Zuidoost as there are many busstops in the area. This area is also the best place to live for people who like to work out as there are many Athletics & sport, and soccer facilities

**Amsterdam-Zuidoost**\
If you like to buy your groceries at the market this is definetly the neighborhood where you cant to live, plenty of markets in the area.

----

## Conclusion

In recent years the housing market has been overcrowded and it has been very difficult for students to get a hold of their first house. The time to decide which house you are going to buy is very limited which led to the need to give them a quick and sound overview of what neighborhoods have to offer. The scope in particular has been on venues in the are such as restaurants, hotels, bars, a gym, and many more.

To solve this problem we clustered Amsterdam neighborhoods data in order to make recommendations for each of those clusters. The data was gathered from the AggData site which is a free to use data platform available for everyone (https://www.aggdata.com/free/netherlands-postal-codes). Moreover, to explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we accessed data through FourSquare API interface and arranged them as a data frame for visualization. By merging data on Amsterdam neighborhoods and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we were able to recommend suitable neighborhoods.

The Methodology section comprised four stages: 1. Collect Data; 2. Explore and Understand Data; 3. Data preparation and preprocessing; 4. Modeling. In particular, in the modeling section, we used the k-means clustering technique as it is fast and efficient in terms of computational cost.

Finally, we drew the conclusion that each neighborhood has it's own benefits. Whether you like Italian food, play soccer, go to the gym. There will always be a neighborhood which has these facilities. It all comes down to the student balancing their preferences to see what they look for the most in a neighborhood.