# BATTLE OF NEIGHBORHOOD -- Report (Week 1 and 2)

## Business Problem 

#### Background

Finding a suitable property in the urban city is a very difficult job. One does not have enough time to invest in searching property plus sometimes real estate agents also hide information from the client which later become issue for the client. 
There are problems such as: hidden price falls, record-low sales, homebuilder exodus and tax hikes addressing overseas buyers of homes in Brooklyn.

Person who is looking for buying a new property for investment purpose will get benefit with this project.

#### Business Problem

How could we provide support to homebuyers clientele in to purchase a suitable real estate in Brooklyn keeping clients basic needs in mind?

To solve this problem we will make clusters of the property based on the basic amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, supermarket & restaurants.

## Data Collection

Data on Brooklyn property and its price were extracted from the New York Goverment site (http://www1.nyc.gov). The attributes of the file contain Borough, Neighborhood, Building class category, Address, Price, Year built.

Link: https://www1.nyc.gov/assets/finance/downloads/pdf/rolling_sales/rollingsales_brooklyn.xlx

## Methodlogy

The Methodology section comprises four stages:

1. Explore data
2. Data preparation and preprocessing 
3. Modeling
4. Analyzing Cluster

### 1. Explore  Data

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import locale
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import datetime as dt

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Data Loading and Cleaning 

In this section, Property Sales data for the Brooklyn area is loaded and cleaned. The following data contain various attributes such as Sale price, Age of the building, Building Category etc.

In [5]:
proj_data=pd.read_excel('rollingsales_brooklyn.xls')
proj_data.head()

Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,BLOCK,LOT,ADDRESS,ZIP CODE,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,SALE PRICE,SALE DATE
0,3.0,BATH BEACH,01 ONE FAMILY DWELLINGS,6360.0,23.0,8645 15TH AVENUE,11228.0,1.0,0.0,1.0,1547,1428.0,1930.0,750000.0,2018-05-18
1,3.0,BATH BEACH,01 ONE FAMILY DWELLINGS,6363.0,48.0,12 BAY 13TH STREET,11214.0,1.0,0.0,1.0,3142,3200.0,1999.0,0.0,2019-02-11
2,3.0,BATH BEACH,01 ONE FAMILY DWELLINGS,6363.0,48.0,8658 BAY 16TH STREET,11214.0,1.0,0.0,1.0,3142,3200.0,1999.0,0.0,2019-02-27
3,3.0,BATH BEACH,01 ONE FAMILY DWELLINGS,6366.0,69.0,8664 BAY 16TH STREET,11214.0,1.0,0.0,1.0,4833,1724.0,1930.0,0.0,2018-10-25
4,3.0,BATH BEACH,01 ONE FAMILY DWELLINGS,6366.0,72.0,1728 86TH STREET,11214.0,1.0,0.0,1.0,4833,2300.0,1925.0,1720000.0,2018-12-12


### 2. Data preparation and preprocessing

In [6]:
proj_data.dropna(subset=['ADDRESS','SALE DATE'],inplace=True,axis=0)
proj_data['Total Years']=proj_data['SALE DATE'].dt.year.astype('int')-proj_data['YEAR BUILT']
proj_data.drop(inplace=True,columns=['SALE DATE','YEAR BUILT','GROSS SQUARE FEET'])
proj_data=proj_data[proj_data['SALE PRICE']!=0.0]
proj_data['BOROUGH']='Brooklyn'
proj_data.reset_index(drop=True).head(10)

Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,BLOCK,LOT,ADDRESS,ZIP CODE,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,SALE PRICE,Total Years
0,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6360.0,23.0,8645 15TH AVENUE,11228.0,1.0,0.0,1.0,1547,750000.0,88.0
1,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6366.0,72.0,1728 86TH STREET,11214.0,1.0,0.0,1.0,4833,1720000.0,93.0
2,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6367.0,41.0,1730 86TH STREET,11214.0,1.0,1.0,2.0,1342,1380000.0,87.0
3,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6367.0,42.0,1732 86TH STREET,11214.0,1.0,1.0,2.0,1342,1630000.0,93.0
4,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6367.0,43.0,8672 BAY PARKWAY,11214.0,1.0,1.0,2.0,1342,1630000.0,93.0
5,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6380.0,73.0,132 BAY 13 STREET,11214.0,1.0,1.0,2.0,1740,968000.0,58.0
6,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6397.0,63.0,1725 BATH AVENUE,11214.0,1.0,0.0,1.0,2708,1150000.0,109.0
7,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6401.0,10.0,1763 BATH AVENUE,11214.0,1.0,1.0,2.0,1160,980000.0,99.0
8,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6402.0,4.0,1906 BENSON AVENUE,11214.0,1.0,1.0,2.0,980,10.0,91.0
9,Brooklyn,BATH BEACH,01 ONE FAMILY DWELLINGS,6460.0,59.0,286 BAY 10TH STREET,11228.0,1.0,0.0,1.0,3867,1660000.0,93.0


Following data shows total property grouped by Neighborhood of the Brooklyn.

In [7]:
proj_data.groupby(by=['NEIGHBORHOOD']).count()

Unnamed: 0_level_0,BOROUGH,BUILDING CLASS CATEGORY,BLOCK,LOT,ADDRESS,ZIP CODE,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,SALE PRICE,Total Years
NEIGHBORHOOD,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
BATH BEACH,208,208,208,208,208,208,208,208,208,208,208,208
BAY RIDGE,565,565,565,565,565,565,565,565,565,565,565,565
BEDFORD STUYVESANT,853,853,853,853,853,853,853,853,853,853,853,849
BENSONHURST,284,284,284,284,284,284,284,284,284,284,284,284
BERGEN BEACH,125,125,125,125,125,125,125,125,125,125,125,125
BOERUM HILL,152,152,152,152,152,152,152,152,152,152,152,152
BOROUGH PARK,545,545,545,545,545,544,543,543,543,543,545,540
BRIGHTON BEACH,232,232,232,232,232,232,230,230,230,230,232,232
BROOKLYN HEIGHTS,291,291,291,291,291,291,291,291,291,291,291,291
BROWNSVILLE,126,126,126,126,126,126,126,126,126,126,126,126


#### We are taking Brooklyn Heights as a Neighborhood for this project.

In [8]:
brook_height=proj_data[proj_data['NEIGHBORHOOD']=='BROOKLYN HEIGHTS']
brook_height.head()

Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,BLOCK,LOT,ADDRESS,ZIP CODE,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,SALE PRICE,Total Years
4927,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,208.0,316.0,1121 MADISON STREET,11201.0,1.0,0.0,1.0,2492,11750000.0,162.0
4929,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,215.0,19.0,1095 MADISON STREET,11201.0,1.0,0.0,1.0,1358,3600000.0,119.0
4931,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,220.0,5.0,1249 MADISON STREET,11201.0,1.0,0.0,1.0,1263,2530000.0,120.0
4932,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,225.0,10.0,460 CENTRAL AVENUE,11201.0,1.0,0.0,1.0,1017,4350000.0,165.0
4933,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,252.0,42.0,1245 PUTNAM AVENUE,11201.0,1.0,0.0,1.0,1449,5000000.0,108.0


#### Since we don't know the latitude and longitude of the buildings, in the following code we fetch all the coordinates of the building by using GeoCoder.
Limitation: If we are iterating a large set of data then after few minutes geocoder will deny your request which leads to error '429 TOO MANY REQUESTS'. 
This is the main reason for choosing only one neighborhood.

In [9]:
latitude=[]
longitude=[]
i=0
for addr,neigh in zip(brook_height['ADDRESS'],brook_height['NEIGHBORHOOD']):
    address = addr+' ,'+neigh+ ' ,Brooklyn, NY'
    #print(address,'-',i)
    geolocator = Nominatim(user_agent="ny_explorer",timeout=3)
    location = geolocator.geocode(address)
    """geocoder = GoogleGeocoder('AIzaSyBk7EELr6hKx3h4E_5PDvNnCKDRmtomwbc')
    search = geocoder.get(address)"""
    try:
        latitude.append(location.latitude)
        longitude.append(location.longitude)
    except:
        latitude.append(np.nan)
        longitude.append(np.nan)
    i+=1
    if(i==50):
        print(latitude)
brook_height['latitude']=latitude
brook_height['longitude']=longitude
brook_height.head()

[40.6989204, 40.6989204, 40.6989204, nan, 40.682587, 40.682587, 40.6989204, nan, 40.682587, 40.6978252, 40.682587, 40.682587, 40.6978252, 40.69584275, 40.6842279, 40.6846267, 40.6842279, 40.6846267, nan, nan, 40.68992665, 40.68967915, 40.6913148, 40.6949636, 40.6857874285714, 40.6886903, 40.68384, 40.68384, 40.692549, 40.68384, 40.6955489591837, 40.695540122449, 40.692549, 40.692549, 40.6945351, nan, 40.6954738469388, 40.6955091938776, 40.6955180306122, 40.6955622142857, 40.6945351, 40.6945351, nan, nan, 40.6945351, 40.6945351, 40.6945351, nan, nan, 40.6945351]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,BLOCK,LOT,ADDRESS,ZIP CODE,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,SALE PRICE,Total Years,latitude,longitude
4927,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,208.0,316.0,1121 MADISON STREET,11201.0,1.0,0.0,1.0,2492,11750000.0,162.0,40.69892,-73.909293
4929,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,215.0,19.0,1095 MADISON STREET,11201.0,1.0,0.0,1.0,1358,3600000.0,119.0,40.69892,-73.909293
4931,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,220.0,5.0,1249 MADISON STREET,11201.0,1.0,0.0,1.0,1263,2530000.0,120.0,40.69892,-73.909293
4932,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,225.0,10.0,460 CENTRAL AVENUE,11201.0,1.0,0.0,1.0,1017,4350000.0,165.0,,
4933,Brooklyn,BROOKLYN HEIGHTS,01 ONE FAMILY DWELLINGS,252.0,42.0,1245 PUTNAM AVENUE,11201.0,1.0,0.0,1.0,1449,5000000.0,108.0,40.682587,-73.962611


In [10]:
brook_prop=brook_height[['BUILDING CLASS CATEGORY','ADDRESS','SALE PRICE','Total Years','latitude','longitude']].copy()
brook_prop.head()

Unnamed: 0,BUILDING CLASS CATEGORY,ADDRESS,SALE PRICE,Total Years,latitude,longitude
4927,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,11750000.0,162.0,40.69892,-73.909293
4929,01 ONE FAMILY DWELLINGS,1095 MADISON STREET,3600000.0,119.0,40.69892,-73.909293
4931,01 ONE FAMILY DWELLINGS,1249 MADISON STREET,2530000.0,120.0,40.69892,-73.909293
4932,01 ONE FAMILY DWELLINGS,460 CENTRAL AVENUE,4350000.0,165.0,,
4933,01 ONE FAMILY DWELLINGS,1245 PUTNAM AVENUE,5000000.0,108.0,40.682587,-73.962611


In [11]:
brook_prop.dropna(subset=['latitude'],inplace=True,axis=0)

### After retreiving the coordinates of the buildings I plotted them on the map. 
Note: Few building have the same coordinates but the address is different. 

In [12]:
address = 'Brooklyn, New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitudes = location.latitude
longitudes = location.longitude

print('The geograpical coordinate of Brooklyn, New York City is {}, {}.'.format(latitudes, longitudes))

The geograpical coordinate of Brooklyn, New York City is 40.6501038, -73.9495823.


In [84]:
map_newyork = folium.Map(location=[latitudes, longitudes], zoom_start=10)
for lat, lng, BC_cat, address,price,total_yr in zip(brook_prop['latitude'], brook_prop['longitude'], 
                                                   brook_prop['BUILDING CLASS CATEGORY'], brook_prop['ADDRESS'],
                                                   brook_prop['SALE PRICE'],brook_prop['Total Years']):
    label = 'Building Cat: {},<br>Address: {},<br>Price: {},<br>Total Year: {}'.format(BC_cat,address, price ,total_yr)
    label = folium.Popup(label)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)

 
# Save to html
map_newyork.save('Plot_map.html')

#### Analyzing Surroundings using FOURSQUARE

In [14]:
CLIENT_ID = 'QKV5PUM5COLHDT35BNNHD4JWCQLHRUWHAR43ESK3OQULVEQD' # your Foursquare ID
CLIENT_SECRET = 'ZZK3GYORWJFC3IDVLW4NKI0EWOZGZ12BHNS1V2WHJA40UBFT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QKV5PUM5COLHDT35BNNHD4JWCQLHRUWHAR43ESK3OQULVEQD
CLIENT_SECRET:ZZK3GYORWJFC3IDVLW4NKI0EWOZGZ12BHNS1V2WHJA40UBFT


#### Defining a function for getting all the venues near the building within radius of 1 KM

In [15]:
def getNearbyVenues(names, address, latitudes, longitudes, radius=1000):
    LIMIT=150
    venues_list=[]
    for name, address, lat, lng in zip(names, address,latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            address,
            lat, 
            lng, 
            v['venue']['name'],
            v['venue']['location']['distance'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Building Class Name','Address', 
                  'Building Class Latitude', 
                  'Building Class Longitude', 
                  'Venue',
                  'Distance',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
brooklyn_venues = getNearbyVenues(names=brook_prop['BUILDING CLASS CATEGORY'],
                                   address=brook_prop['ADDRESS'],
                                   latitudes=brook_prop['latitude'],
                                   longitudes=brook_prop['longitude']
                                  )
brooklyn_venues.shape

(17636, 9)

As you can see we got 17636 venues in total.

In [17]:
brooklyn_venues.head()

Unnamed: 0,Building Class Name,Address,Building Class Latitude,Building Class Longitude,Venue,Distance,Venue Latitude,Venue Longitude,Venue Category
0,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,40.69892,-73.909293,Ltauha,258,40.70014,-73.906691,American Restaurant
1,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,40.69892,-73.909293,Sabor Mexicano Food Cart,217,40.699794,-73.911598,Food Truck
2,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,40.69892,-73.909293,The Bad Old Days,273,40.701195,-73.908074,Cocktail Bar
3,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,40.69892,-73.909293,L'imprimerie,301,40.69927,-73.912834,Bakery
4,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,40.69892,-73.909293,Irving Gourmet Deli,299,40.697935,-73.912598,Sandwich Place


In [18]:
brooklyn_venues.groupby(by=['Address']).count().head()

Unnamed: 0_level_0,Building Class Name,Building Class Latitude,Building Class Longitude,Venue,Distance,Venue Latitude,Venue Longitude,Venue Category
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1002 HALSEY STREET,100,100,100,100,100,100,100,100
101 CORNELIA STREET,100,100,100,100,100,100,100,100
101 SUYDAM STREET,100,100,100,100,100,100,100,100
1012 DECATUR STREET,100,100,100,100,100,100,100,100
1014 DECATUR STREET,100,100,100,100,100,100,100,100


### Analyzing Venues

Now we are going to convert the categorical data to numerical data using one hot encoding.

In [19]:
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_onehot['Address'] = brooklyn_venues['Address'] 

# move neighborhood column to the first column
fixed_columns = [brooklyn_onehot.columns[brooklyn_onehot.columns=='Address'][0]] + list(brooklyn_onehot.columns[brooklyn_onehot.columns!='Address'])
brooklyn_onehot = brooklyn_onehot[fixed_columns]

brooklyn_onehot.head()

Unnamed: 0,Address,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Coworking Space,Cuban Restaurant,Cupcake Shop,Cycle Studio,Deli / Bodega,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Field,Film Studio,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Himalayan Restaurant,Historic Site,History Museum,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Laundromat,Light Rail Station,Liquor Store,Lounge,Mac & Cheese Joint,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Museum,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Optical Shop,Other Nightlife,Pakistani Restaurant,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Roof Deck,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Smoke Shop,Smoothie Shop,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sports Bar,Street Art,Summer Camp,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Track,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Yoga Studio
0,1121 MADISON STREET,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1121 MADISON STREET,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1121 MADISON STREET,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1121 MADISON STREET,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1121 MADISON STREET,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Note: I am adding SALE PRICE and Years also in the following table which we will be using in the K-Means algorithm.

In [20]:
brooklyn_onehot_group = brooklyn_onehot.groupby('Address').mean().reset_index()
brooklyn_onehot_group=brooklyn_onehot_group.merge(brook_prop[['SALE PRICE','ADDRESS','Total Years']],left_on='Address',right_on='ADDRESS')
brooklyn_onehot_group.drop(columns=['ADDRESS'],inplace=True)
brooklyn_onehot_group.head()


Unnamed: 0,Address,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Coworking Space,Cuban Restaurant,Cupcake Shop,Cycle Studio,Deli / Bodega,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Field,Film Studio,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Himalayan Restaurant,Historic Site,History Museum,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Laundromat,Light Rail Station,Liquor Store,Lounge,Mac & Cheese Joint,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Museum,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Optical Shop,Other Nightlife,Pakistani Restaurant,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Roof Deck,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Smoke Shop,Smoothie Shop,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sports Bar,Street Art,Summer Camp,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Track,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Yoga Studio,SALE PRICE,Total Years
0,1002 HALSEY STREET,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.07,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.06,0.03,0.0,0.0,0.01,0.01,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.01,0.04,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.07,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,14920000.0,119.0
1,101 CORNELIA STREET,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.12,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.02,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.05,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.07,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,970000.0,62.0
2,101 SUYDAM STREET,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.03,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.03,985031.0,81.0
3,1012 DECATUR STREET,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.05,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.05,0.02,0.0,0.0,0.01,0.02,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.01,0.05,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,618000.0,118.0
4,1014 DECATUR STREET,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.05,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.05,0.02,0.0,0.0,0.01,0.02,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.01,0.05,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,2400000.0,118.0


In [21]:
def most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Getting most common venues

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Address']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
brooklyn_venues_sorted = pd.DataFrame(columns=columns)
brooklyn_venues_sorted['Address'] = brooklyn_onehot_group['Address']

for ind in np.arange(brooklyn_onehot_group.shape[0]):
  brooklyn_venues_sorted.iloc[ind, 1:len(brooklyn_onehot_group.columns)-2] = most_common_venues(brooklyn_onehot_group.iloc[ind, :-2], num_top_venues)

brooklyn_venues_sorted.head()

Unnamed: 0,Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1002 HALSEY STREET,Bar,Pizza Place,Café,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Discount Store,Sandwich Place,Bakery,Chinese Restaurant
1,101 CORNELIA STREET,Bar,Pizza Place,Mexican Restaurant,Bakery,Coffee Shop,Latin American Restaurant,Deli / Bodega,Chinese Restaurant,Nightclub,Brewery
2,101 SUYDAM STREET,Bar,Mexican Restaurant,Pizza Place,Coffee Shop,Bakery,Yoga Studio,Cocktail Bar,Gourmet Shop,Italian Restaurant,Chinese Restaurant
3,1012 DECATUR STREET,Café,Coffee Shop,Discount Store,Bar,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Pizza Place,Park,Mexican Restaurant
4,1014 DECATUR STREET,Café,Coffee Shop,Discount Store,Bar,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Pizza Place,Park,Mexican Restaurant


## 3.Modeling

#### Model training 

In [23]:
kclusters = 3

brooklyn_grouped_clustering = brooklyn_onehot_group.drop('Address', 1)

from sklearn import preprocessing
X=np.asarray(brooklyn_grouped_clustering)
X=preprocessing.StandardScaler().fit_transform(X)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(X)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 0, 1, 1, 1, 1, 1, 1, 2], dtype=int32)

In [24]:
brooklyn_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

brooklyn_merged = brook_prop

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(brooklyn_venues_sorted.set_index('Address'), on='ADDRESS')

brooklyn_merged.head() # check the last columns!

Unnamed: 0,BUILDING CLASS CATEGORY,ADDRESS,SALE PRICE,Total Years,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4927,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,11750000.0,162.0,40.69892,-73.909293,2,Bar,Mexican Restaurant,Coffee Shop,Latin American Restaurant,Pizza Place,Bakery,Gym,Brewery,Nightclub,Deli / Bodega
4929,01 ONE FAMILY DWELLINGS,1095 MADISON STREET,3600000.0,119.0,40.69892,-73.909293,2,Bar,Mexican Restaurant,Coffee Shop,Latin American Restaurant,Pizza Place,Bakery,Gym,Brewery,Nightclub,Deli / Bodega
4931,01 ONE FAMILY DWELLINGS,1249 MADISON STREET,2530000.0,120.0,40.69892,-73.909293,2,Bar,Mexican Restaurant,Coffee Shop,Latin American Restaurant,Pizza Place,Bakery,Gym,Brewery,Nightclub,Deli / Bodega
4933,01 ONE FAMILY DWELLINGS,1245 PUTNAM AVENUE,5000000.0,108.0,40.682587,-73.962611,0,Bar,Cocktail Bar,Café,Pizza Place,New American Restaurant,Bakery,Italian Restaurant,Wine Shop,Coffee Shop,Yoga Studio
4934,01 ONE FAMILY DWELLINGS,1241 PUTNAM AVENUE,8325000.0,98.0,40.682587,-73.962611,0,Bar,Cocktail Bar,Café,Pizza Place,New American Restaurant,Bakery,Italian Restaurant,Wine Shop,Coffee Shop,Yoga Studio


## Plotting the Cluster map.

In [83]:
map_clusters = folium.Map(location=[latitudes, longitudes], zoom_start=12)

# set color scheme for the clusters

rainbow = ['black','red','yellow','#ff66cc','blue']

markers_colors = []
for lat, lon, poi, price,cluster in zip(brooklyn_merged['latitude'], brooklyn_merged['longitude'], brooklyn_merged['BUILDING CLASS CATEGORY'],brooklyn_merged['SALE PRICE'] ,brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) +'<br>Sale Price :'+ '${:,.2f}'.format(price) +' <br>Cluster : ' + str(cluster))
    folium.CircleMarker(
        [lat, lon],
        radius=4,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters.save('Cluster.html')

## Analyzing Clusters

#### Analyzing each cluster

### Cluster 0

In [26]:

cluster0=brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[0] + list(range(1, brooklyn_merged.shape[1]))]]
cluster0.head(10)

Unnamed: 0,BUILDING CLASS CATEGORY,ADDRESS,SALE PRICE,Total Years,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4933,01 ONE FAMILY DWELLINGS,1245 PUTNAM AVENUE,5000000.0,108.0,40.682587,-73.962611,0,Bar,Cocktail Bar,Café,Pizza Place,New American Restaurant,Bakery,Italian Restaurant,Wine Shop,Coffee Shop,Yoga Studio
4934,01 ONE FAMILY DWELLINGS,1241 PUTNAM AVENUE,8325000.0,98.0,40.682587,-73.962611,0,Bar,Cocktail Bar,Café,Pizza Place,New American Restaurant,Bakery,Italian Restaurant,Wine Shop,Coffee Shop,Yoga Studio
4939,01 ONE FAMILY DWELLINGS,1361 PUTNAM AVENUE,5500000.0,117.0,40.682587,-73.962611,0,Bar,Cocktail Bar,Café,Pizza Place,New American Restaurant,Bakery,Italian Restaurant,Wine Shop,Coffee Shop,Yoga Studio
4942,02 TWO FAMILY DWELLINGS,1294 PUTNAM AVE,7400000.0,119.0,40.682587,-73.962611,0,Bar,Cocktail Bar,Café,Pizza Place,New American Restaurant,Bakery,Italian Restaurant,Wine Shop,Coffee Shop,Yoga Studio
4943,02 TWO FAMILY DWELLINGS,1308 PUTNAM AVENUE,5500000.0,119.0,40.682587,-73.962611,0,Bar,Cocktail Bar,Café,Pizza Place,New American Restaurant,Bakery,Italian Restaurant,Wine Shop,Coffee Shop,Yoga Studio
5057,10 COOPS - ELEVATOR APARTMENTS,2048 EASTERN PARKWAY,435000.0,80.0,40.67257,-73.96608,0,Bar,Garden,Cocktail Bar,Sushi Restaurant,Wine Shop,Pizza Place,Plaza,Bakery,Coffee Shop,American Restaurant
5062,10 COOPS - ELEVATOR APARTMENTS,101 SUYDAM STREET,985031.0,81.0,40.707332,-73.918532,0,Bar,Mexican Restaurant,Pizza Place,Coffee Shop,Bakery,Yoga Studio,Cocktail Bar,Gourmet Shop,Italian Restaurant,Chinese Restaurant
5064,10 COOPS - ELEVATOR APARTMENTS,541 HART STREET,985000.0,80.0,40.706947,-73.91774,0,Bar,Mexican Restaurant,Pizza Place,Coffee Shop,Bakery,Cocktail Bar,Liquor Store,Chinese Restaurant,Music Venue,Deli / Bodega
5068,10 COOPS - ELEVATOR APARTMENTS,1146 DEKALB AVENUE,945000.0,98.0,40.706546,-73.916948,0,Bar,Mexican Restaurant,Pizza Place,Coffee Shop,Bakery,Cocktail Bar,Liquor Store,Music Venue,Theater,Deli / Bodega
5070,10 COOPS - ELEVATOR APARTMENTS,1376 DEKALB AVENUE,1120000.0,95.0,40.706546,-73.916948,0,Bar,Mexican Restaurant,Pizza Place,Coffee Shop,Bakery,Cocktail Bar,Liquor Store,Music Venue,Theater,Deli / Bodega


### Cluster 1

In [27]:
cluster1=brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[0] + list(range(1, brooklyn_merged.shape[1]))]]
cluster1.head(10)

Unnamed: 0,BUILDING CLASS CATEGORY,ADDRESS,SALE PRICE,Total Years,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4951,02 TWO FAMILY DWELLINGS,997 HANCOCK STREET,6900000.0,117.0,40.684228,-73.935301,1,Bar,Café,Pizza Place,Caribbean Restaurant,Coffee Shop,Deli / Bodega,Chinese Restaurant,Discount Store,Wine Shop,Sandwich Place
4952,02 TWO FAMILY DWELLINGS,1118 JEFFERSON AVENUE,8100000.0,117.0,40.684627,-73.938366,1,Bar,Coffee Shop,Café,Pizza Place,Wine Shop,Discount Store,Caribbean Restaurant,Deli / Bodega,Sandwich Place,Park
4954,02 TWO FAMILY DWELLINGS,1055 HANCOCK STREET,6500000.0,117.0,40.684228,-73.935301,1,Bar,Café,Pizza Place,Caribbean Restaurant,Coffee Shop,Deli / Bodega,Chinese Restaurant,Discount Store,Wine Shop,Sandwich Place
4957,03 THREE FAMILY DWELLINGS,1328 JEFFERSON AVENUE,6000000.0,179.0,40.684627,-73.938366,1,Bar,Coffee Shop,Café,Pizza Place,Wine Shop,Discount Store,Caribbean Restaurant,Deli / Bodega,Sandwich Place,Park
4985,07 RENTALS - WALKUP APARTMENTS,1183 HALSEY STREET,2300000.0,169.0,40.68384,-73.932231,1,Bar,Pizza Place,Café,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Discount Store,Sandwich Place,Bakery,Chinese Restaurant
4987,08 RENTALS - ELEVATOR APARTMENTS,1002 HALSEY STREET,14920000.0,119.0,40.68384,-73.932231,1,Bar,Pizza Place,Café,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Discount Store,Sandwich Place,Bakery,Chinese Restaurant
4989,09 COOPS - WALKUP APARTMENTS,1238 HALSEY STREET,3875000.0,157.0,40.68384,-73.932231,1,Bar,Pizza Place,Café,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Discount Store,Sandwich Place,Bakery,Chinese Restaurant
5015,09 COOPS - WALKUP APARTMENTS,1012 DECATUR STREET,618000.0,118.0,40.68197,-73.928867,1,Café,Coffee Shop,Discount Store,Bar,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Pizza Place,Park,Mexican Restaurant
5016,09 COOPS - WALKUP APARTMENTS,1016 DECATUR STREET,583000.0,119.0,40.68197,-73.928867,1,Café,Coffee Shop,Discount Store,Bar,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Pizza Place,Park,Mexican Restaurant
5017,09 COOPS - WALKUP APARTMENTS,1014 DECATUR STREET,2400000.0,118.0,40.68197,-73.928867,1,Café,Coffee Shop,Discount Store,Bar,Caribbean Restaurant,Southern / Soul Food Restaurant,Deli / Bodega,Pizza Place,Park,Mexican Restaurant


### Cluster 2

In [28]:
cluster2=brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[0] + list(range(1, brooklyn_merged.shape[1]))]]
cluster2.head(10)

Unnamed: 0,BUILDING CLASS CATEGORY,ADDRESS,SALE PRICE,Total Years,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4927,01 ONE FAMILY DWELLINGS,1121 MADISON STREET,11750000.0,162.0,40.69892,-73.909293,2,Bar,Mexican Restaurant,Coffee Shop,Latin American Restaurant,Pizza Place,Bakery,Gym,Brewery,Nightclub,Deli / Bodega
4929,01 ONE FAMILY DWELLINGS,1095 MADISON STREET,3600000.0,119.0,40.69892,-73.909293,2,Bar,Mexican Restaurant,Coffee Shop,Latin American Restaurant,Pizza Place,Bakery,Gym,Brewery,Nightclub,Deli / Bodega
4931,01 ONE FAMILY DWELLINGS,1249 MADISON STREET,2530000.0,120.0,40.69892,-73.909293,2,Bar,Mexican Restaurant,Coffee Shop,Latin American Restaurant,Pizza Place,Bakery,Gym,Brewery,Nightclub,Deli / Bodega
4935,01 ONE FAMILY DWELLINGS,1168 MADISON STREET,5350000.0,119.0,40.69892,-73.909293,2,Bar,Mexican Restaurant,Coffee Shop,Latin American Restaurant,Pizza Place,Bakery,Gym,Brewery,Nightclub,Deli / Bodega
4940,02 TWO FAMILY DWELLINGS,77 CORNELIA STREET,12000000.0,119.0,40.697825,-73.908026,2,Bar,Pizza Place,Mexican Restaurant,Bakery,Coffee Shop,Latin American Restaurant,Deli / Bodega,Chinese Restaurant,Nightclub,Brewery
4944,02 TWO FAMILY DWELLINGS,207 CORNELIA,4300000.0,178.0,40.697825,-73.908026,2,Bar,Pizza Place,Mexican Restaurant,Bakery,Coffee Shop,Latin American Restaurant,Deli / Bodega,Chinese Restaurant,Nightclub,Brewery
4945,02 TWO FAMILY DWELLINGS,339 CORNELIA STREET,5700000.0,104.0,40.695843,-73.91025,2,Bar,Pizza Place,Coffee Shop,Deli / Bodega,Mexican Restaurant,Bakery,Latin American Restaurant,Gym,Chinese Restaurant,Grocery Store
4966,06 TAX CLASS 1 - OTHER,149 WEIRFIELD STREET,2500000.0,32.0,40.689927,-73.91255,2,Café,Pizza Place,Deli / Bodega,Coffee Shop,Sandwich Place,Grocery Store,Restaurant,Brewery,Chinese Restaurant,Fast Food Restaurant
4967,07 RENTALS - WALKUP APARTMENTS,137 WEIRFIELD STREET,15500000.0,178.0,40.689679,-73.912819,2,Pizza Place,Café,Sandwich Place,Deli / Bodega,Chinese Restaurant,Fried Chicken Joint,Indian Restaurant,Grocery Store,Supermarket,Metro Station
4971,07 RENTALS - WALKUP APARTMENTS,205 WEIRFIELD STREET,3200000.0,177.0,40.691315,-73.911186,2,Coffee Shop,Pizza Place,Café,Chinese Restaurant,Bar,Discount Store,Nightclub,Brewery,Fried Chicken Joint,Mexican Restaurant


## Result and Discussion

According to the K-Means clustering model we can see that there are 3 clusters. Most of the property comes under cluster 0 and cluster 2. Now lets see what comes under cluster 0 and cluster 2.

Custer 0: In this cluster most of the common venues are bars, coffee shops, wine shops and yoga studio. There are few restaurants also such as Italian Restaurants. So we can say that a person who likes to do party and love wine will love this property.

Cluster 2: In this cluster most of the venues are restaurants such as Pizza place, Chinese Restaurant, Mexican Restaurant, Sandwich Place, Latin American Restaurant, Fried Chicken Joint and many more. There few Bar and Nighclub also. This cluster also have Supermarket and Grocery Store. This kind of property is very much likeable because every think is available in this area.

You can see below the maximun price of the Cluster 0, 1 and 2. Cluster 2 have the most expensive property.

In [47]:
print('Maximum price of Property in Cluster 0 is : {} million'.format(cluster0['SALE PRICE'].max()/1000000))
print('Maximum price of Property in Cluster 1 is : {} million'.format(cluster1['SALE PRICE'].max()/1000000))
print('Maximum price of Property in Cluster 2 is : {} million'.format(cluster2['SALE PRICE'].max()/1000000))

Maximum price of Property in Cluster 0 is : 8.325 million
Maximum price of Property in Cluster 1 is : 14.92 million
Maximum price of Property in Cluster 2 is : 21.9 million


## Conclusion

Final Conclusion of this report is that most of the property comes under the cluster 0 and 2 but most favourable property will be in cluster 2 because it contains all the basic amenities and essential facilities. Cluster 2 has better property as compared to cluster 0. 

If we also see the price range of the property in both the clusters, Cluster 0 have a price range of 0.151 million dollar to  8.325 Miilion dollar whereas Cluster 2 have price range of 0.32 million dollar to 21.9 million doolar. 