# Capstone Project - The Battle of Neighborhoods to get optimal Real-Estate properties

## Business Problem section

### Background

New York City’s housing market has largely recovered from the financial crisis of 2008, but that doesn’t necessarily mean that buying a home here is, in the long run, a good investment. That’s the conclusion from a new report by StreetEasy, which looks at how home values in the city have changed in the 10 years since the Great Recession.
Additionally, home values have overall gone up since the post-crisis low of November 2011. StreetEasy found that those have risen by a whopping 30 percent in the past seven years, at an average of nearly four percent per year.

### Business Problem

The problem scenario is to suggest the homebuyers clientele to purchase a suitable real estate in New York using Machine Learning Algorithms.

As a result, the business problem we are currently posing is:

**How could we provide  suggestions to homebuyers clients to purchase a suitable real estate in New York street in this depreciating economy?**

To solve this business problem, we are going to cluster New York neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a real estate investment.Also we will recommend profitable venues  venues i.e. pharmacy, restaurants, hospitals & grocery stores.

## Data Section

The Department of Finance (DOF) maintains records for all property sales in New York City, including sales of family homes in each borough(https://data.cityofnewyork.us/api/views/948r-3ads/rows.csv?accessType=DOWNLOAD). 

This list includes all sales of 1-, 2-, and 3-Family Homes' from January 1st, 2009 to December 31, 2009, whose sale price is equal to or more than $150,000. The Building Class Category for Sales is based on the Building Class at the time of the sale. 

To explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. By merging data on New York properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable real estate investments.

## Methodology

1. Collect Inspection Data
2. Explore and Understand Data
3. Data preparation and preprocessing 
4. Modeling

# Implementation

In [1]:
#Beautifulsoup library helps in web scraping data from webpage
from bs4 import BeautifulSoup
#lxml library is the parser used to parse the content from diffrent HTML Tags
import lxml
# Requests library helps in getting the content of the webpage
import requests as req
# library to handle data in a vectorized manner
import numpy as np
#library for Data Analysis
import pandas as pd
# library to handle JSON files
import json 
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 
# library to handle requests
import requests 
# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
# map rendering library
import folium 
# library to find median of List
from numpy import median
print('Libraries imported.')

Libraries imported.


## 1. Collect Inspection data

In [2]:
# Download the Neighbourhood of NewYork with price dataset
!wget -O ny_neighbourhood.csv  https://data.cityofnewyork.us/api/views/948r-3ads/rows.csv?accessType=DOWNLOAD

--2019-03-16 17:29:28--  https://data.cityofnewyork.us/api/views/948r-3ads/rows.csv?accessType=DOWNLOAD
Resolving data.cityofnewyork.us (data.cityofnewyork.us)... 52.206.140.199, 52.206.68.26, 52.206.140.205
Connecting to data.cityofnewyork.us (data.cityofnewyork.us)|52.206.140.199|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘ny_neighbourhood.csv’

ny_neighbourhood.cs     [ <=>                  ]  18.18K  --.-KB/s   in 0s     

2019-03-16 17:29:29 (96.3 MB/s) - ‘ny_neighbourhood.csv’ saved [18612]



## 2. Explore Data

In [3]:
#Reading the dataset value to DataFrame
original_data=pd.read_csv('ny_neighbourhood.csv')
original_data.head()

Unnamed: 0,NEIGHBORHOOD,TYPE OF HOME,TOTAL NO. OF PROPERTIES,NUMBER OF SALES,LOWEST SALE PRICE,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE
0,AIRPORT LA GUARDIA,01 ONE FAMILY HOMES,84,1,485000.0,485000.0,485000.0,485000.0
1,AIRPORT LA GUARDIA,02 TWO FAMILY HOMES,14,1,480000.0,480000.0,480000.0,480000.0
2,ARVERNE,01 ONE FAMILY HOMES,696,32,161000.0,297194.0,310276.0,390291.0
3,ARVERNE,02 TWO FAMILY HOMES,1528,112,160000.0,505043.0,427868.0,1170987.0
4,ARVERNE,03 THREE FAMILY HOMES,137,6,165000.0,414658.0,506796.0,582320.0


##  3. Preprocessing 

In [6]:
#Label Encoding for Type of Homes
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
original_data['TYPE OF HOME'] = labelencoder.fit_transform(original_data['TYPE OF HOME'])
original_data.head()

Unnamed: 0,NEIGHBORHOOD,TYPE OF HOME,TOTAL NO. OF PROPERTIES,NUMBER OF SALES,LOWEST SALE PRICE,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE
0,AIRPORT LA GUARDIA,0,84,1,485000.0,485000.0,485000.0,485000.0
1,AIRPORT LA GUARDIA,1,14,1,480000.0,480000.0,480000.0,480000.0
2,ARVERNE,0,696,32,161000.0,297194.0,310276.0,390291.0
3,ARVERNE,1,1528,112,160000.0,505043.0,427868.0,1170987.0
4,ARVERNE,2,137,6,165000.0,414658.0,506796.0,582320.0


The following Label Encoded value can be mapped as follows:

**0  = 01 ONE FAMILY HOMES**  
**1  = 02 TWO FAMILY HOMES**  
**2  = 03 THREE FAMILY HOMES**

In [7]:
count=0
lat=[]
lon=[]
from geopy.geocoders import Nominatim
for i in original_data['NEIGHBORHOOD']:
    address = i+' , New York, USA'

    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    if location==None:
        latitude = 40.7136
        longitude = -73.7965
        #print('{},{},{}'.format(address,latitude,longitude))
        lat.append(latitude)
        lon.append(longitude)
        continue
    latitude = location.latitude
    longitude = location.longitude
    #print('{},{},{}'.format(address,latitude,longitude))
    lat.append(latitude)
    lon.append(longitude)
    
    
#Outlier Reduction and Treatment
lat_med=median(lat)
lon_med=median(lon)
for i in range(0,len(lat)):
    #print(lat[i])
    if lat[i]>41:
        lat[i]=lat_med
        #print(lat[i])
for i in range(0,len(lon)):
    #print(lat[i])
    if lon[i]<-74:
        lon[i]=lon_med
        #print(lat[i])

#print(lat)
original_data['LONGITUDE']=lon
original_data['LATITUDE']=lat



In [26]:
#Getting the Data 
original_data.head()


Unnamed: 0,NEIGHBORHOOD,TYPE OF HOME,TOTAL NO. OF PROPERTIES,NUMBER OF SALES,LOWEST SALE PRICE,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE
0,AIRPORT LA GUARDIA,0,84,1,485000.0,485000.0,485000.0,485000.0,-73.873364,40.775714
1,AIRPORT LA GUARDIA,1,14,1,480000.0,480000.0,480000.0,480000.0,-73.873364,40.775714
2,ARVERNE,0,696,32,161000.0,297194.0,310276.0,390291.0,-73.789546,40.593417
3,ARVERNE,1,1528,112,160000.0,505043.0,427868.0,1170987.0,-73.789546,40.593417
4,ARVERNE,2,137,6,165000.0,414658.0,506796.0,582320.0,-73.789546,40.593417


In [8]:
#Getting the Geographical Location both Longitude and Latitude of New York , USA
from geopy.geocoders import Nominatim
address = 'NEW YORK'
geolocator = Nominatim(user_agent="tr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7308619, -73.9871558.


## Plotting the Neighborhoods of NewYork present in Dataset

In [10]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(original_data['LATITUDE'], original_data['LONGITUDE'], original_data['AVERAGE SALE PRICE'], original_data['NEIGHBORHOOD']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [11]:
CLIENT_ID = 'TZ2P2CCEDAWFAQXREEPK30XUHXJD20G4JMEKDH0PZ4KESADF' # Foursquare ID
CLIENT_SECRET = '4FEVNVQT5FQ2XKR2X1YPSDM4NE3B4SC2BDDBSJB43VWDONCB' # Foursquare Secret
VERSION = '20190226' # Foursquare API version
LIMIT=100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TZ2P2CCEDAWFAQXREEPK30XUHXJD20G4JMEKDH0PZ4KESADF
CLIENT_SECRET:4FEVNVQT5FQ2XKR2X1YPSDM4NE3B4SC2BDDBSJB43VWDONCB


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=2500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Getting Near By Venues of New York

In [13]:
newyork_venues = getNearbyVenues(names=original_data['NEIGHBORHOOD'],
                                   latitudes=original_data['LATITUDE'],
                                   longitudes=original_data['LONGITUDE']
                                  )
newyork_venues

AIRPORT LA GUARDIA       
AIRPORT LA GUARDIA       
ARVERNE                  
ARVERNE                  
ARVERNE                  
ASTORIA                  
ASTORIA                  
ASTORIA                  
BAYSIDE                  
BAYSIDE                  
BAYSIDE                  
BEECHHURST               
BEECHHURST               
BELLE HARBOR             
BELLE HARBOR             
BELLEROSE                
BELLEROSE                
BRIARWOOD                
BRIARWOOD                
BRIARWOOD                
BROAD CHANNEL            
BROAD CHANNEL            
CAMBRIA HEIGHTS          
CAMBRIA HEIGHTS          
COLLEGE POINT            
COLLEGE POINT            
COLLEGE POINT            
CORONA                   
CORONA                   
CORONA                   
DOUGLASTON               
DOUGLASTON               
EAST ELMHURST            
EAST ELMHURST            
EAST ELMHURST            
ELMHURST                 
ELMHURST                 
ELMHURST                 
FAR ROCKAWAY

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,AIRPORT LA GUARDIA,40.775714,-73.873364,The Centurion Lounge LaGuardia,40.774511,-73.871962,Airport Lounge
1,AIRPORT LA GUARDIA,40.775714,-73.873364,Shoe Shine AA,40.775239,-73.874322,Shoe Repair
2,AIRPORT LA GUARDIA,40.775714,-73.873364,Five Guys,40.774219,-73.873859,Burger Joint
3,AIRPORT LA GUARDIA,40.775714,-73.873364,7-Eleven,40.763868,-73.881667,Convenience Store
4,AIRPORT LA GUARDIA,40.775714,-73.873364,Delta Sky Club,40.769101,-73.862337,Airport Lounge
5,AIRPORT LA GUARDIA,40.775714,-73.873364,Shake Shack,40.773936,-73.869490,Burger Joint
6,AIRPORT LA GUARDIA,40.775714,-73.873364,"Airways Pizza, Gyro & Restaurant",40.763781,-73.878553,Pizza Place
7,AIRPORT LA GUARDIA,40.775714,-73.873364,Starbucks,40.775290,-73.874180,Coffee Shop
8,AIRPORT LA GUARDIA,40.775714,-73.873364,La Guardia Café,40.768226,-73.873524,Café
9,AIRPORT LA GUARDIA,40.775714,-73.873364,Delta Sky Club,40.771701,-73.865391,Airport Lounge


In [14]:
# Grouping the venues according to Street
newyork_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AIRPORT LA GUARDIA,200,200,200,200,200,200
ARVERNE,195,195,195,195,195,195
ASTORIA,300,300,300,300,300,300
BAYSIDE,300,300,300,300,300,300
BEECHHURST,156,156,156,156,156,156
BELLE HARBOR,130,130,130,130,130,130
BELLEROSE,200,200,200,200,200,200
BRIARWOOD,300,300,300,300,300,300
BROAD CHANNEL,120,120,120,120,120,120
CAMBRIA HEIGHTS,200,200,200,200,200,200


In [15]:
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(newyork_venues['Venue Category'].unique())))

There are 293 uniques categories.


In [16]:
newyork_venues.shape

(14859, 7)

In [17]:
# one hot encoding
venues_onehot = pd.get_dummies(newyork_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Street'] = newyork_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Street,Accessories Store,Afghan Restaurant,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,AIRPORT LA GUARDIA,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,AIRPORT LA GUARDIA,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,AIRPORT LA GUARDIA,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,AIRPORT LA GUARDIA,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,AIRPORT LA GUARDIA,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
newyork_grouped = venues_onehot.groupby('Street').mean().reset_index()
newyork_grouped.head()

Unnamed: 0,Street,Accessories Store,Afghan Restaurant,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,AIRPORT LA GUARDIA,0.01,0.0,0.04,0.02,0.01,0.0,0.01,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ARVERNE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.030769,0.0,0.0,0.0,0.0,0.0
2,ASTORIA,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0
3,BAYSIDE,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0
4,BEECHHURST,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.0


In [19]:
newyork_grouped.shape

(58, 294)

## Top Five Venues

In [21]:
num_top_venues = 5

for hood in newyork_grouped['Street']:
    print("----"+hood+"----")
    temp = newyork_grouped[newyork_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AIRPORT LA GUARDIA       ----
                       venue  freq
0                Pizza Place  0.07
1                 Donut Shop  0.05
2                     Bakery  0.05
3  Latin American Restaurant  0.04
4        Rental Car Location  0.04


----ARVERNE                  ----
         venue  freq
0        Beach  0.14
1   Donut Shop  0.06
2    Surf Spot  0.06
3  Pizza Place  0.06
4  Supermarket  0.05


----ASTORIA                  ----
                venue  freq
0    Greek Restaurant  0.09
1  Italian Restaurant  0.05
2                 Bar  0.05
3                Park  0.04
4          Bagel Shop  0.04


----BAYSIDE                  ----
                venue  freq
0              Bakery  0.05
1   Korean Restaurant  0.05
2    Greek Restaurant  0.04
3  Italian Restaurant  0.04
4      Cosmetics Shop  0.04


----BEECHHURST               ----
                venue  freq
0                Park  0.10
1         Pizza Place  0.09
2  Italian Restaurant  0.08
3       Deli / Bodega  0.06
4  Chinese

In [22]:
# Define a function to return the most common venues/facilities nearby real estate investments#

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [24]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = newyork_grouped['Street']

for ind in np.arange(newyork_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(newyork_grouped.iloc[ind, :], num_top_venues)

In [25]:
venues_sorted.head()


Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AIRPORT LA GUARDIA,Pizza Place,Donut Shop,Bakery,Rental Car Location,Ice Cream Shop,Latin American Restaurant,Airport Lounge,Burger Joint,Pharmacy,Coffee Shop
1,ARVERNE,Beach,Pizza Place,Donut Shop,Surf Spot,Supermarket,Board Shop,Bus Stop,Grocery Store,Bar,Metro Station
2,ASTORIA,Greek Restaurant,Italian Restaurant,Bar,Bagel Shop,Park,Sushi Restaurant,Bakery,Sandwich Place,Food Truck,Coffee Shop
3,BAYSIDE,Bakery,Korean Restaurant,Cosmetics Shop,Italian Restaurant,Greek Restaurant,Gym / Fitness Center,Pizza Place,Bagel Shop,Bar,Ice Cream Shop
4,BEECHHURST,Park,Pizza Place,Italian Restaurant,Deli / Bodega,Ice Cream Shop,Chinese Restaurant,Bagel Shop,History Museum,Donut Shop,Beach


## 4. Modelling

In [33]:
# set number of clusters
kclusters = 7

newyork_grouped_clustering = newyork_grouped.drop('Street', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(newyork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 0, 0, 5, 2, 4, 6, 4, 1], dtype=int32)

In [39]:
#newyork_merged.drop('Cluster Labels',1)
venues_sorted.drop('Cluster Labels',1,inplace=True)


In [40]:
# add clustering labels
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

newyork_merged = original_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
newyork_merged = newyork_merged.join(venues_sorted.set_index('Street'), on='NEIGHBORHOOD')

newyork_merged.head() # check the last columns!


Unnamed: 0,NEIGHBORHOOD,TYPE OF HOME,TOTAL NO. OF PROPERTIES,NUMBER OF SALES,LOWEST SALE PRICE,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AIRPORT LA GUARDIA,0,84,1,485000.0,485000.0,485000.0,485000.0,-73.873364,40.775714,...,Pizza Place,Donut Shop,Bakery,Rental Car Location,Ice Cream Shop,Latin American Restaurant,Airport Lounge,Burger Joint,Pharmacy,Coffee Shop
1,AIRPORT LA GUARDIA,1,14,1,480000.0,480000.0,480000.0,480000.0,-73.873364,40.775714,...,Pizza Place,Donut Shop,Bakery,Rental Car Location,Ice Cream Shop,Latin American Restaurant,Airport Lounge,Burger Joint,Pharmacy,Coffee Shop
2,ARVERNE,0,696,32,161000.0,297194.0,310276.0,390291.0,-73.789546,40.593417,...,Beach,Pizza Place,Donut Shop,Surf Spot,Supermarket,Board Shop,Bus Stop,Grocery Store,Bar,Metro Station
3,ARVERNE,1,1528,112,160000.0,505043.0,427868.0,1170987.0,-73.789546,40.593417,...,Beach,Pizza Place,Donut Shop,Surf Spot,Supermarket,Board Shop,Bus Stop,Grocery Store,Bar,Metro Station
4,ARVERNE,2,137,6,165000.0,414658.0,506796.0,582320.0,-73.789546,40.593417,...,Beach,Pizza Place,Donut Shop,Surf Spot,Supermarket,Board Shop,Bus Stop,Grocery Store,Bar,Metro Station


In [52]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(newyork_merged['LATITUDE'], newyork_merged['LONGITUDE'], newyork_merged['NEIGHBORHOOD'], newyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Data in Cluster 1 after KMeans

In [43]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 0, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,TYPE OF HOME,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,0,552234.0,556000.0,1207099.0,-73.930267,40.772014,0,Greek Restaurant,Italian Restaurant,Bar,Bagel Shop,Park,Sushi Restaurant,Bakery,Sandwich Place,Food Truck,Coffee Shop
6,1,689831.0,700000.0,1600000.0,-73.930267,40.772014,0,Greek Restaurant,Italian Restaurant,Bar,Bagel Shop,Park,Sushi Restaurant,Bakery,Sandwich Place,Food Truck,Coffee Shop
7,2,653181.0,715000.0,1070000.0,-73.930267,40.772014,0,Greek Restaurant,Italian Restaurant,Bar,Bagel Shop,Park,Sushi Restaurant,Bakery,Sandwich Place,Food Truck,Coffee Shop
8,0,606888.0,600000.0,2225000.0,-73.777077,40.768435,0,Bakery,Korean Restaurant,Cosmetics Shop,Italian Restaurant,Greek Restaurant,Gym / Fitness Center,Pizza Place,Bagel Shop,Bar,Ice Cream Shop
9,1,751116.0,777500.0,1110000.0,-73.777077,40.768435,0,Bakery,Korean Restaurant,Cosmetics Shop,Italian Restaurant,Greek Restaurant,Gym / Fitness Center,Pizza Place,Bagel Shop,Bar,Ice Cream Shop
10,2,814500.0,872500.0,1075000.0,-73.777077,40.768435,0,Bakery,Korean Restaurant,Cosmetics Shop,Italian Restaurant,Greek Restaurant,Gym / Fitness Center,Pizza Place,Bagel Shop,Bar,Ice Cream Shop
30,0,934106.0,900000.0,2700000.0,-73.747077,40.768713,0,Italian Restaurant,Bakery,Pizza Place,Coffee Shop,Korean Restaurant,Chinese Restaurant,Yoga Studio,Gym,Grocery Store,Greek Restaurant
31,1,922269.0,784000.0,1900000.0,-73.747077,40.768713,0,Italian Restaurant,Bakery,Pizza Place,Coffee Shop,Korean Restaurant,Chinese Restaurant,Yoga Studio,Gym,Grocery Store,Greek Restaurant
47,0,535279.0,499495.0,1600000.0,-73.802412,40.754693,0,Korean Restaurant,Chinese Restaurant,Pizza Place,Coffee Shop,Park,Ice Cream Shop,Convenience Store,Indian Restaurant,Café,Greek Restaurant
48,1,574314.0,537500.0,987702.0,-73.802412,40.754693,0,Korean Restaurant,Chinese Restaurant,Pizza Place,Coffee Shop,Park,Ice Cream Shop,Convenience Store,Indian Restaurant,Café,Greek Restaurant


## Data in Cluster 2 after KMeans


In [44]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 1, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,TYPE OF HOME,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,0,339606.0,356200.0,479824.0,-73.738465,40.694547,1,Caribbean Restaurant,Pharmacy,Pizza Place,Donut Shop,Discount Store,Convenience Store,Supermarket,Sandwich Place,Ice Cream Shop,Deli / Bodega
23,1,359528.0,349750.0,624000.0,-73.738465,40.694547,1,Caribbean Restaurant,Pharmacy,Pizza Place,Donut Shop,Discount Store,Convenience Store,Supermarket,Sandwich Place,Ice Cream Shop,Deli / Bodega
78,0,343356.0,320000.0,548642.0,-73.805677,40.691485,1,Caribbean Restaurant,Indian Restaurant,Latin American Restaurant,Park,Coffee Shop,Pharmacy,Diner,Southern / Soul Food Restaurant,BBQ Joint,Lounge
79,1,455949.0,450000.0,770000.0,-73.805677,40.691485,1,Caribbean Restaurant,Indian Restaurant,Latin American Restaurant,Park,Coffee Shop,Pharmacy,Diner,Southern / Soul Food Restaurant,BBQ Joint,Lounge
80,2,576935.0,630000.0,811952.0,-73.805677,40.691485,1,Caribbean Restaurant,Indian Restaurant,Latin American Restaurant,Park,Coffee Shop,Pharmacy,Diner,Southern / Soul Food Restaurant,BBQ Joint,Lounge
92,0,328841.0,339000.0,570000.0,-73.751521,40.66677,1,Caribbean Restaurant,Pizza Place,Donut Shop,Sandwich Place,Fast Food Restaurant,Discount Store,Convenience Store,Shipping Store,Park,Pharmacy
93,1,389841.0,373000.0,620000.0,-73.751521,40.66677,1,Caribbean Restaurant,Pizza Place,Donut Shop,Sandwich Place,Fast Food Restaurant,Discount Store,Convenience Store,Shipping Store,Park,Pharmacy
94,2,370000.0,370000.0,370000.0,-73.751521,40.66677,1,Caribbean Restaurant,Pizza Place,Donut Shop,Sandwich Place,Fast Food Restaurant,Discount Store,Convenience Store,Shipping Store,Park,Pharmacy
129,0,349033.0,350000.0,645000.0,-73.73541,40.662048,1,Caribbean Restaurant,Clothing Store,Park,Convenience Store,Donut Shop,Sandwich Place,Pizza Place,Furniture / Home Store,Cosmetics Shop,Department Store
130,1,473370.0,478723.0,3500000.0,-73.73541,40.662048,1,Caribbean Restaurant,Clothing Store,Park,Convenience Store,Donut Shop,Sandwich Place,Pizza Place,Furniture / Home Store,Cosmetics Shop,Department Store


## Data in Cluster 3 after KMeans


In [45]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 2, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,TYPE OF HOME,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,0,297194.0,310276.0,390291.0,-73.789546,40.593417,2,Beach,Pizza Place,Donut Shop,Surf Spot,Supermarket,Board Shop,Bus Stop,Grocery Store,Bar,Metro Station
3,1,505043.0,427868.0,1170987.0,-73.789546,40.593417,2,Beach,Pizza Place,Donut Shop,Surf Spot,Supermarket,Board Shop,Bus Stop,Grocery Store,Bar,Metro Station
4,2,414658.0,506796.0,582320.0,-73.789546,40.593417,2,Beach,Pizza Place,Donut Shop,Surf Spot,Supermarket,Board Shop,Bus Stop,Grocery Store,Bar,Metro Station
13,0,720756.0,700000.0,1225000.0,-73.848577,40.577552,2,Beach,Deli / Bodega,Pizza Place,Ice Cream Shop,Bagel Shop,Pharmacy,Bank,Park,Supermarket,Bus Stop
14,1,737708.0,700000.0,1275000.0,-73.848577,40.577552,2,Beach,Deli / Bodega,Pizza Place,Ice Cream Shop,Bagel Shop,Pharmacy,Bank,Park,Supermarket,Bus Stop
38,0,462934.0,433111.0,900000.0,-73.755133,40.605382,2,Beach,Pizza Place,Supermarket,Sandwich Place,Donut Shop,Discount Store,Fast Food Restaurant,Grocery Store,Golf Course,Bank
39,1,346504.0,289495.0,750000.0,-73.755133,40.605382,2,Beach,Pizza Place,Supermarket,Sandwich Place,Donut Shop,Discount Store,Fast Food Restaurant,Grocery Store,Golf Course,Bank
40,2,429958.0,481000.0,647200.0,-73.755133,40.605382,2,Beach,Pizza Place,Supermarket,Sandwich Place,Donut Shop,Discount Store,Fast Food Restaurant,Grocery Store,Golf Course,Bank
61,0,273274.0,315000.0,325000.0,-73.811151,40.588822,2,Beach,Surf Spot,Bagel Shop,Beach Bar,Wine Bar,Hotel,Frozen Yogurt Shop,Board Shop,Taco Place,Steakhouse
62,1,373778.0,406660.0,506908.0,-73.811151,40.588822,2,Beach,Surf Spot,Bagel Shop,Beach Bar,Wine Bar,Hotel,Frozen Yogurt Shop,Board Shop,Taco Place,Steakhouse


## Data in Cluster 4 after KMeans


In [46]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 3, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,TYPE OF HOME,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,0,400471.0,350000.0,790000.0,-73.860146,40.746959,3,Tennis Stadium,South American Restaurant,Latin American Restaurant,Pizza Place,Bakery,Italian Restaurant,Thai Restaurant,Lingerie Store,Empanada Restaurant,Burger Joint
28,1,490055.0,470000.0,1236300.0,-73.860146,40.746959,3,Tennis Stadium,South American Restaurant,Latin American Restaurant,Pizza Place,Bakery,Italian Restaurant,Thai Restaurant,Lingerie Store,Empanada Restaurant,Burger Joint
29,2,729560.0,766000.0,1042934.0,-73.860146,40.746959,3,Tennis Stadium,South American Restaurant,Latin American Restaurant,Pizza Place,Bakery,Italian Restaurant,Thai Restaurant,Lingerie Store,Empanada Restaurant,Burger Joint
32,0,446447.0,420000.0,799000.0,-73.865136,40.761212,3,Tennis Stadium,Bakery,South American Restaurant,Latin American Restaurant,Pizza Place,Peruvian Restaurant,Mexican Restaurant,Airport Lounge,Italian Restaurant,Argentinian Restaurant
33,1,497409.0,479134.0,859182.0,-73.865136,40.761212,3,Tennis Stadium,Bakery,South American Restaurant,Latin American Restaurant,Pizza Place,Peruvian Restaurant,Mexican Restaurant,Airport Lounge,Italian Restaurant,Argentinian Restaurant
34,2,535799.0,535000.0,957622.0,-73.865136,40.761212,3,Tennis Stadium,Bakery,South American Restaurant,Latin American Restaurant,Pizza Place,Peruvian Restaurant,Mexican Restaurant,Airport Lounge,Italian Restaurant,Argentinian Restaurant
35,0,486767.0,499000.0,770000.0,-73.878393,40.73658,3,Thai Restaurant,Bakery,Argentinian Restaurant,Vietnamese Restaurant,Clothing Store,Asian Restaurant,Indonesian Restaurant,Chinese Restaurant,South American Restaurant,Mexican Restaurant
36,1,637754.0,645500.0,1085000.0,-73.878393,40.73658,3,Thai Restaurant,Bakery,Argentinian Restaurant,Vietnamese Restaurant,Clothing Store,Asian Restaurant,Indonesian Restaurant,Chinese Restaurant,South American Restaurant,Mexican Restaurant
37,2,788719.0,795000.0,1200000.0,-73.878393,40.73658,3,Thai Restaurant,Bakery,Argentinian Restaurant,Vietnamese Restaurant,Clothing Store,Asian Restaurant,Indonesian Restaurant,Chinese Restaurant,South American Restaurant,Mexican Restaurant
44,0,587891.0,580000.0,1370000.0,-73.842804,40.740463,3,Tennis Stadium,Chinese Restaurant,Bakery,Dumpling Restaurant,Park,Cocktail Bar,Pizza Place,Italian Restaurant,Bagel Shop,Liquor Store


## Data in Cluster 4 after KMeans


In [47]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 4, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,TYPE OF HOME,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,485000.0,485000.0,485000.0,-73.873364,40.775714,4,Pizza Place,Donut Shop,Bakery,Rental Car Location,Ice Cream Shop,Latin American Restaurant,Airport Lounge,Burger Joint,Pharmacy,Coffee Shop
1,1,480000.0,480000.0,480000.0,-73.873364,40.775714,4,Pizza Place,Donut Shop,Bakery,Rental Car Location,Ice Cream Shop,Latin American Restaurant,Airport Lounge,Burger Joint,Pharmacy,Coffee Shop
15,0,445081.0,440000.0,780000.0,-73.715131,40.724269,4,Pizza Place,Indian Restaurant,Mobile Phone Shop,Bar,Deli / Bodega,Italian Restaurant,Coffee Shop,Donut Shop,Pharmacy,Sandwich Place
16,1,525935.0,517000.0,1450000.0,-73.715131,40.724269,4,Pizza Place,Indian Restaurant,Mobile Phone Shop,Bar,Deli / Bodega,Italian Restaurant,Coffee Shop,Donut Shop,Pharmacy,Sandwich Place
20,0,323210.0,330000.0,529881.0,-73.819019,40.606401,4,Beach,Pizza Place,Chinese Restaurant,Beach Bar,Bagel Shop,Bar,Supermarket,Surf Spot,Brazilian Restaurant,Ice Cream Shop
21,1,450000.0,450000.0,450000.0,-73.819019,40.606401,4,Beach,Pizza Place,Chinese Restaurant,Beach Bar,Bagel Shop,Bar,Supermarket,Surf Spot,Brazilian Restaurant,Ice Cream Shop
24,0,452241.0,435000.0,977520.0,-73.845968,40.787601,4,Pizza Place,Donut Shop,Park,Ice Cream Shop,Fast Food Restaurant,Coffee Shop,Supermarket,Sandwich Place,Bakery,Gym
25,1,629599.0,599468.0,1160805.0,-73.845968,40.787601,4,Pizza Place,Donut Shop,Park,Ice Cream Shop,Fast Food Restaurant,Coffee Shop,Supermarket,Sandwich Place,Bakery,Gym
26,2,611132.0,550000.0,894023.0,-73.845968,40.787601,4,Pizza Place,Donut Shop,Park,Ice Cream Shop,Fast Food Restaurant,Coffee Shop,Supermarket,Sandwich Place,Bakery,Gym
41,0,481548.0,469000.0,755000.0,-73.704802,40.7247,4,Pizza Place,Diner,Deli / Bodega,Bar,Bakery,Indian Restaurant,Coffee Shop,Pharmacy,Italian Restaurant,Asian Restaurant


## Data in Cluster 6 after KMeans


In [48]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 5, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,TYPE OF HOME,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,0,679765.0,650000.0,1125000.0,-73.804578,40.79149,5,Park,Pizza Place,Italian Restaurant,Deli / Bodega,Ice Cream Shop,Chinese Restaurant,Bagel Shop,History Museum,Donut Shop,Beach
12,1,1235000.0,1235000.0,1560000.0,-73.804578,40.79149,5,Park,Pizza Place,Italian Restaurant,Deli / Bodega,Ice Cream Shop,Chinese Restaurant,Bagel Shop,History Museum,Donut Shop,Beach
149,0,645986.0,600000.0,2000000.0,-73.818467,40.794546,5,Deli / Bodega,Pizza Place,Italian Restaurant,Park,Pet Store,Donut Shop,Golf Course,Ice Cream Shop,Japanese Restaurant,Dessert Shop
150,1,688051.0,690000.0,1200000.0,-73.818467,40.794546,5,Deli / Bodega,Pizza Place,Italian Restaurant,Park,Pet Store,Donut Shop,Golf Course,Ice Cream Shop,Japanese Restaurant,Dessert Shop


## Data in Cluster 7 after KMeans


In [49]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 6, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,TYPE OF HOME,AVERAGE SALE PRICE,MEDIAN SALE PRICE,HIGHEST SALE PRICE,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,0,428607.0,430000.0,650000.0,-73.818783,40.726769,6,Pizza Place,Bakery,Park,Sushi Restaurant,Chinese Restaurant,Bar,Bagel Shop,Italian Restaurant,Vegetarian / Vegan Restaurant,Juice Bar
18,1,500948.0,492000.0,615500.0,-73.818783,40.726769,6,Pizza Place,Bakery,Park,Sushi Restaurant,Chinese Restaurant,Bar,Bagel Shop,Italian Restaurant,Vegetarian / Vegan Restaurant,Juice Bar
19,2,569985.0,730000.0,760000.0,-73.818783,40.726769,6,Pizza Place,Bakery,Park,Sushi Restaurant,Chinese Restaurant,Bar,Bagel Shop,Italian Restaurant,Vegetarian / Vegan Restaurant,Juice Bar
50,0,832035.0,700000.0,3775000.0,-73.819019,40.726769,6,Bakery,Pizza Place,Park,Chinese Restaurant,Sushi Restaurant,Bar,Bagel Shop,Vegetarian / Vegan Restaurant,Asian Restaurant,Ice Cream Shop
51,1,702563.0,671000.0,960000.0,-73.819019,40.726769,6,Bakery,Pizza Place,Park,Chinese Restaurant,Sushi Restaurant,Bar,Bagel Shop,Vegetarian / Vegan Restaurant,Asian Restaurant,Ice Cream Shop
52,2,1065000.0,1065000.0,1065000.0,-73.819019,40.726769,6,Bakery,Pizza Place,Park,Chinese Restaurant,Sushi Restaurant,Bar,Bagel Shop,Vegetarian / Vegan Restaurant,Asian Restaurant,Ice Cream Shop
58,0,459588.0,457500.0,613000.0,-73.845135,40.726769,6,Bakery,Pizza Place,Asian Restaurant,Italian Restaurant,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Café,Bar,Bagel Shop
59,1,474756.0,482500.0,765000.0,-73.845135,40.726769,6,Bakery,Pizza Place,Asian Restaurant,Italian Restaurant,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Café,Bar,Bagel Shop
60,2,643641.0,672266.0,865410.0,-73.845135,40.726769,6,Bakery,Pizza Place,Asian Restaurant,Italian Restaurant,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,Café,Bar,Bagel Shop
64,0,471843.0,480000.0,890000.0,-73.819019,40.726769,6,Bakery,Pizza Place,Park,Chinese Restaurant,Sushi Restaurant,Bar,Bagel Shop,Vegetarian / Vegan Restaurant,Asian Restaurant,Ice Cream Shop


## Results 


First of all, even though the London Housing Market may be in a rut, it is still an "ever-green" for business affairs.

Key Observations under the Results:


First, we may examine them according to neighborhoods of New York Areas. 

Cluster O: 

1. The average and Median price of Cluster one Neighborhoods are 718522.835 and 715406.6471 respectively.  
2. The cluster contains following places ASTORIA , BAYSIDE , DOUGLASTON , FLUSHING-SOUTH , LITTLE NECK and OAKLAND GARDENS.  
3. The most common venues nearby are Greek , Korean and Italian Restaurants, Bar , Bagel Shop , Park. The no of Sales is less with respect to available properties.  
4. The reason for this is its mean , median and highest price of the properties .  
5. The place is best for food and restaurants but frequency of other amenities like hospital, schools is less.

Cluster 1: 

1. The average and Median price of Cluster one Neighborhoods are 406660.4368 and 406972.1304 respectively.  
2. The cluster contains following places CAMBRIA HEIGHTS,CAMBRIA HEIGHTS,JAMAICA,LAURELTON,ROSEDALE,SOUTH JAMAICA,SOUTH OZONE PARK,SPRINGFIELD GARDENS and ST. ALBANS. 
3. The most common venues nearby are Pharmacy , Supermarkets , Restaurants, Bar , Park and Discount store.The properties are best to buy as it has very reasonable average and median rates and in addition to that it has elementary stuffs for daily needs .  

Cluster 2: 

1. The average and Median price of Cluster one Neighborhoods are 474991.3333 and 458104.6 respectively.  
2. The cluster contains following places ARVERNE ,BELLE HARBOR,ARVERNE, BELLE HARBOR, FAR ROCKAWAY , HAMMELS, NEPONSIT and ROCKAWAY PARK.              
3. The most common venues nearby are Beach, Pizza place,Bank,Bus stop and all kinds of Food Corners.  
4. This should be second most preferred properties after Cluster 1 properties due to its average and median rates.

Cluster 3: 

1. The average and Median price of Cluster one Neighborhoods are 587375.4 and 587155.0667 respectively.  
2. The cluster contains following places CORONA, EAST ELMHURST , ELMHURST , FLUSHING-NORTH and JACKSON HEIGHTS.              
3. The most common venues nearby are Tennis stadium, restaurants and Grocery store.  
4. Tennis players will like this place more than anybody and in addition to that the mean and median are reasonable which is little bit more that cluster 1 and cluster 2.

Cluster 4: 

1. The average and Median price of Cluster one Neighborhoods are 502320.1765 and 502016.9216 respectively.  
2. The cluster contains following places AIRPORT LA GUARDIA, BELLEROSE , BROAD CHANNEL ,COLLEGE POINT, FLORAL PARK , FRESH MEADOWS, GLEN OAKS ,HOLLIS ,HOLLIS HILLS,          HOLLISWOOD , HOWARD BEACH , JAMAICA BAY , JAMAICA ESTATES , JAMAICA HILLS , LONG ISLAND CITY , OZONE PARK , QUEENS VILLAGE , SO. JAMAICA-BAISLEY PARK, SUNNYSIDE ,          WOODHAVEN .                   
3. The most common venues nearby are Bank,Park,Food Corner ,Grocery Store ,GYM and Fitness Center.  
4. This cluster has more no of properties with all basic amenities in its nearby. And the lowest average price is also low and hence we get good deal from this clusters.
5. The average and median price also can be the reason for the selection of property from this cluster.


Cluster 5: 

1. The average and Median price of Cluster one Neighborhoods are 812200.5 and 793750 respectively.  
2. The cluster contains following places BEECHHURST and  WHITESTONE.                   
3. The most common venues nearby are Park,Food Corner ,Golf Course and Donout Shops.
4. The cluster is more expensive as compare to any other clusters and also this place has less basic amenities than others.
5. There is very less amount of properties available to buy and all has average price above the average price of all other clusters. 

Cluster 6:

1. The average and Median price of Cluster one Neighborhoods are 576087.5313 and 568167.5313 respectively.  
2. The cluster contains following places BRIARWOOD, FOREST HILLS, GLENDALE, HILLCREST, KEW GARDENS, MASPETH, MIDDLE VILLAGE, REGO PARK, RICHMOND HILL, RIDGEWOOD  and          WOODSIDE.                   
3. The most common venues nearby are Park,Boxing Gym,Grocery Store,Bagel Shop,Cosmetics Shop,Bar,Vegetarian / Vegan Restaurant,Yoga Studio and Ice Cream Shop
4. This cluster is also good as it is covers all types of basic items required for daily needs.
5. The average and median price is little bit more than reasonable but it has more properties with appropriate amenities. 

We may analyze our results according to the five clusters we have produced. Even though, all clusters could praise an optimal range of facilities and amenities.

Cluster 3,4 and 6 have properties with almost average and median nearly close to each other and also the common venues also matching to each other.

Cluster 1 - The average and median price is less compare to other clusters.

Cluster 0 and 1:  The average and median price is more compare to other clusters.


## Conclusion

At Last we state the problem scenario.

The problem scenario is to suggest the home buyers clients to purchase a suitable real estate in New York using Machine Learning Algorithms.

As a result, the business problem we are currently posing is:

How could we provide suggestions to home buyers clients to purchase a suitable real estate in New York street in this depreciating economy?

To solve this business problem, we are going to cluster New York neighborhoods in order to recommend venues and the current average price of real estate where home buyers can make a real estate investment.Also we will recommend profitable venues venues i.e. pharmacy , restaurants, hospitals & grocery stores.

First, we gathered data from The Department of Finance (DOF) maintains records for all property sales in New York City, including sales of family homes in each borough(https://data.cityofnewyork.us/api/views/948r-3ads/rows.csv?accessType=DOWNLOAD).

This list includes all sales of 1-, 2-, and 3-Family Homes' from January 1st, 2009 to December 31, 2009, whose sale price is equal to or more than $150,000. The Building Class Category for Sales is based on the Building Class at the time of the sale.

To explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. By merging data on New York properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable real estate investments.


At last ,  We may analyze our results according to the five clusters we have produced. Even though, all clusters could praise an optimal range of facilities and amenities.

Cluster 3,4 and 6 have properties with almost average and median nearly close to each other and also the common venues also matching to each other.

Cluster 1 - The average and median price is less compare to other clusters.

Cluster 0 and 1:  The average and median price is more compare to other clusters.