# Movement Suggetion Tool Proposal
## Introduction
When we decide to move to another city, many people, including me, have a similar concern that whether the new place I choose is suitale for me or not. Will I feel comfortable
after I move there For example, around my old apartment, there are several markets I can choose from, like Whole food, Target, Trader-joe. There is a park near my house, so I can 
take a walk through the park to enjoy myself, and on its way, I would buy a coffee or a bubble tea and etc. What if my new place doesn't have a park nearby, or drink shop, or I need 
to drive at least 30mins to the market just to get some mushrooms. Now, my tool can help here. The tool will take the name of your neighborhood where you live right now and the city 
you want to move to, then give you some recommendations that you are more likely satisfied with. 

## Clarify Problems
1. Get the coordinates of your old house and the city you want to move to.
2. Identify the arributes of your old house.
3. Cluster the neighborhoods of the future city into several clusters.
4. Match your old house into one of those cluster.
5. Give you recommendations which are all the neighborhoods in the cluster in step 4.

## Solution
1. Geopy to get the coordinates of places.
2. For now we only use the venue attributes that can be obtained by foursquare
3. Use K mean clustering method
4. Use KNN or other algorithm like logistic regression, decision tree to classfy your house into one cluster.
5. Make recommendation.

## Further scopes
This project for now only take the attributes that I have in hand into consideration, in the future exploration, we can add the weather condition, traffic, crime rate, house price
in to our attributes if data are available

## Data Sets
### 1.New York City Neighborhood Dataset
We set our destination to New York City for sample explanation. This data set have 4 columns, borough,neighborhood,latitude,longitude of 306 neighborhoods in New York City.
#### Data source: https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json

### 2.Venue info
We use foursquare API to retrieve the venue info of each neighborhood.

### 1.Geopy to get the coordinates of places and load NYC data

In [9]:
from geopy.geocoders import Nominatim
import urllib,json
import requests

In [26]:
def find_ll(address,move2 = 'New York City,NY'):
    geolocator = Nominatim(user_agent="explorer")
    location = geolocator.geocode(address)
    location_de = geolocator.geocode(move2)
    latitude = location.latitude
    longitude = location.longitude
    latitude_de = location_de.latitude
    longitude_de = location_de.longitude
    return {'address':[latitude,longitude],'move2':[latitude_de,longitude_de]}

In [10]:
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json'
response = requests.get(url)
json_data = response.json()

In [13]:
newyork_data = json_data

In [14]:
neighborhoods_data = newyork_data['features']

In [16]:
import pandas as pd
import numpy as np

In [17]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [18]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [19]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [37]:
nyc = neighborhoods.copy()

In [22]:
import folium

In [27]:
find_ll('Toronto')

{'address': [43.6534817, -79.3839347], 'move2': [40.7127281, -74.0060152]}

In [32]:
map_newyork = folium.Map(location=[40.7127281, -74.0060152], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### 2. Get venue info by foursquare

#### 2.1 Gather data

In [33]:
CLIENT_ID = 'DPBYY4JUY3DU20ALPSUV4ONY2K1GOJJKJ1NIHBB32XEMOVYY' # Put Your Client Id
CLIENT_SECRET = '1MV443TYEP4HUO0WDUW5NQ5W10L2Y4G05NWG11WIR3NUGC5B' # Put You Client Secret 
VERSION = '20180604'
LIMIT = 30

In [34]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [35]:
from pandas.io.json import json_normalize

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [38]:
nyc_venues = getNearbyVenues(names=nyc['Neighborhood'],
                                   latitudes=nyc['Latitude'],
                                   longitudes=nyc['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [107]:
nyc_venues.tail(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
6142,Queensbridge,40.756091,-73.945631,Queensbridge Basketball Courts,40.75506,-73.949103,Basketball Court
6143,Queensbridge,40.756091,-73.945631,The Ravel Hotel Gym,40.753787,-73.948815,Athletics & Sports
6144,Queensbridge,40.756091,-73.945631,Profundo Pool Club,40.753719,-73.948878,Hotel Pool
6145,Queensbridge,40.756091,-73.945631,Estate Garden And Grill,40.7537,-73.948841,Beer Garden
6146,Queensbridge,40.756091,-73.945631,Track 114,40.753008,-73.947833,Platform
6147,Fox Hills,40.617311,-74.08174,SUBWAY,40.618939,-74.082881,Sandwich Place
6148,Fox Hills,40.617311,-74.08174,MTA Bus - Vanderbilt Av & Osgood Av (S76),40.617809,-74.081111,Bus Stop
6149,Fox Hills,40.617311,-74.08174,Targee Milk & Things,40.61441,-74.084455,Grocery Store
6150,Fox Hills,40.617311,-74.08174,China Garden,40.61441,-74.084455,Chinese Restaurant
6151,Fox Hills,40.617311,-74.08174,MTA Bus - Tompkins Av & Vanderbilt Av (S52/S76...,40.620052,-74.07718,Bus Stop


#### 2.2Analyze Each Neighborhood

In [39]:
nyc_onehot = pd.get_dummies(nyc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nyc_onehot['Neighborhood'] = nyc_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nyc_onehot.columns[-1]] + list(nyc_onehot.columns[:-1])
nyc_onehot = nyc_onehot[fixed_columns]
nyc_grouped = nyc_onehot.groupby('Neighborhood').mean().reset_index()

In [40]:
nyc_grouped = nyc_onehot.groupby('Neighborhood').mean().reset_index()

In [41]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
nyc_venues_sorted = pd.DataFrame(columns=columns)
nyc_venues_sorted['Neighborhood'] = nyc_grouped['Neighborhood']

for ind in np.arange(nyc_grouped.shape[0]):
    nyc_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nyc_grouped.iloc[ind, :], num_top_venues)

In [138]:
nyc_venues_sorted.head()

Unnamed: 0,Cluster Label,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3,Allerton,Supermarket,Pizza Place,Discount Store,Deli / Bodega,Cosmetics Shop,Martial Arts School,Grocery Store,Spa,Chinese Restaurant,Mexican Restaurant
1,3,Annadale,American Restaurant,Cosmetics Shop,Train Station,Diner,Pizza Place,Restaurant,Pharmacy,Deli / Bodega,Bar,Liquor Store
2,4,Arden Heights,Deli / Bodega,Lawyer,Bus Stop,Home Service,Playground,Pharmacy,Pizza Place,Coffee Shop,Women's Store,Farm
3,4,Arlington,Bus Stop,Grocery Store,Deli / Bodega,Construction & Landscaping,American Restaurant,Boat or Ferry,Women's Store,Field,Event Space,Eye Doctor
4,0,Arrochar,Deli / Bodega,Italian Restaurant,Bus Stop,Pizza Place,Athletics & Sports,Supermarket,Liquor Store,Sandwich Place,Mediterranean Restaurant,Bagel Shop


### 3. K mean clustering

In [44]:
from sklearn.cluster import KMeans

In [45]:
nyc_grouped_clustering = nyc_grouped.drop('Neighborhood', 1)

In [105]:
nyc_grouped_clustering

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,...,Vietnamese Restaurant,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
296,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.038462,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
297,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
298,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
299,0.0,0.0,0.0,0.0,0.0,0.0,0.100000,0.0,0.0,0.033333,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0


In [49]:
# set number of clusters
kclusters = 10

nyc_grouped_clustering = nyc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(nyc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 4, 4, 0, 7, 7, 7, 0, 3])

In [50]:
kmeans.labels_

array([3, 3, 4, 4, 0, 7, 7, 7, 0, 3, 7, 0, 7, 3, 7, 9, 3, 7, 3, 4, 0, 3,
       0, 3, 7, 4, 4, 7, 3, 6, 4, 7, 3, 7, 3, 7, 5, 4, 3, 7, 2, 4, 4, 7,
       7, 3, 3, 7, 7, 7, 7, 7, 3, 7, 4, 4, 3, 7, 7, 4, 7, 3, 7, 3, 3, 4,
       3, 3, 3, 3, 7, 3, 3, 7, 7, 4, 7, 4, 7, 4, 3, 7, 7, 4, 4, 4, 0, 0,
       0, 7, 3, 0, 3, 3, 4, 7, 4, 7, 4, 3, 7, 7, 7, 3, 7, 7, 4, 4, 7, 3,
       7, 3, 3, 7, 7, 7, 7, 4, 3, 3, 7, 3, 0, 0, 7, 4, 3, 3, 3, 7, 7, 3,
       3, 3, 7, 3, 7, 7, 7, 7, 3, 4, 3, 7, 3, 3, 3, 3, 4, 7, 7, 0, 7, 3,
       7, 7, 7, 3, 7, 0, 7, 7, 3, 7, 7, 3, 3, 4, 0, 3, 3, 3, 4, 7, 7, 3,
       3, 2, 7, 3, 3, 3, 3, 3, 3, 7, 6, 4, 0, 0, 3, 3, 7, 4, 3, 7, 3, 3,
       7, 4, 3, 0, 4, 3, 7, 4, 7, 3, 3, 3, 3, 3, 7, 8, 7, 4, 3, 7, 7, 3,
       3, 3, 7, 4, 7, 7, 7, 4, 3, 0, 3, 7, 4, 3, 4, 3, 7, 3, 4, 3, 4, 3,
       4, 3, 7, 7, 0, 7, 7, 1, 3, 4, 4, 4, 7, 4, 4, 4, 7, 3, 4, 7, 7, 7,
       7, 3, 7, 0, 1, 4, 0, 4, 7, 7, 7, 3, 3, 7, 7, 4, 3, 7, 3, 7, 4, 7,
       4, 0, 3, 7, 4, 7, 7, 4, 7, 4, 3, 3, 3, 7, 0]

In [53]:
# add clustering labels
try:
    nyc_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)
except:pass
nyc_merged = neighborhoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhoon
nyc_merged = nyc_merged.join(nyc_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

nyc_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,3.0,Pharmacy,Ice Cream Shop,Food,Deli / Bodega,Donut Shop,Dessert Shop,Sandwich Place,Laundromat,Women's Store,Falafel Restaurant
1,Bronx,Co-op City,40.874294,-73.829939,4.0,Bus Station,Fast Food Restaurant,Restaurant,Baseball Field,Jazz Club,Donut Shop,Bagel Shop,Discount Store,Pizza Place,Pharmacy
2,Bronx,Eastchester,40.887556,-73.827806,4.0,Caribbean Restaurant,Bus Station,Deli / Bodega,Diner,Food & Drink Shop,Juice Bar,Seafood Restaurant,Donut Shop,Fast Food Restaurant,Pizza Place
3,Bronx,Fieldston,40.895437,-73.905643,4.0,Cosmetics Shop,Music Venue,Bus Station,Plaza,River,Women's Store,Farmers Market,Entertainment Service,Ethiopian Restaurant,Event Service
4,Bronx,Riverdale,40.890834,-73.912585,4.0,Bus Station,Park,Bank,Plaza,Gym,Baseball Field,Playground,Food Truck,Dessert Shop,Field


In [55]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [56]:
# create map
map_clusters = folium.Map(location=[40.7127281, -74.0060152], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
nyc_merged['Cluster Label'] = nyc_merged['Cluster Label'].fillna(4)
for lat, lon, poi, cluster in zip(nyc_merged['Latitude'], nyc_merged['Longitude'], nyc_merged['Neighborhood'], nyc_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 4. Classify your house into NYC clusters

### 4.1 Transform the data into df

In [58]:
ll = find_ll('Westwood')

In [60]:
ll['address'][0]

34.0561207

In [93]:
def get_venues(address):
    ll = find_ll(addresshouse)
    radius = 500
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT)

    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    venues_list=[]
    venues_list.append([(
                addresshouse, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
          'Neighborhood Latitude', 
          'Neighborhood Longitude', 
          'Venue', 
          'Venue Latitude', 
          'Venue Longitude', 
          'Venue Category']
    return nearby_venues

In [95]:
house_venues = get_venues('Westwood')

In [97]:
all_venues = pd.concat([nyc_venues,house_venues])

In [110]:
def recommend(address,house_venues):
    h_onehot = pd.get_dummies(house_venues[['Venue Category']], prefix="", prefix_sep="")

    # add neighborhood column back to dataframe
    h_onehot['Neighborhood'] = house_venues['Neighborhood'] 

    # move neighborhood column to the first column
    fixed_columns = [h_onehot.columns[-1]] + list(h_onehot.columns[:-1])
    h_onehot = h_onehot[fixed_columns]
    h_grouped = h_onehot.groupby('Neighborhood').mean().reset_index()
    num_top_venues = 10

    indicators = ['st', 'nd', 'rd']

    # create columns according to number of top venues
    columns = ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    # create a new dataframe
    h_venues_sorted = pd.DataFrame(columns=columns)
    h_venues_sorted['Neighborhood'] = h_grouped['Neighborhood']

    for ind in np.arange(h_grouped.shape[0]):
        h_venues_sorted.iloc[ind, 1:] = return_most_common_venues(h_grouped.iloc[ind, :], num_top_venues)

    return h_grouped, h_venues_sorted

In [111]:
h_grouped,re_df = recommend('Westwood',all_venues)
h_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,...,Vietnamese Restaurant,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
1,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
2,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
3,Arlington,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
4,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
297,Woodhaven,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
298,Woodlawn,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
299,Woodrow,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
300,Woodside,0.0,0.0,0.0,0.0,0.0,0.0,0.100000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0


In [113]:
house_group_row = h_grouped[h_grouped['Neighborhood'] == 'Westwood']

In [114]:
house_group_row

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,...,Vietnamese Restaurant,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
290,Westwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 4.2 KNN classification

In [102]:
from sklearn.neighbors import KNeighborsClassifier

In [115]:
data = nyc_grouped_clustering.copy()

In [130]:
train_y = kmeans.labels_
train_x = data

In [129]:
data.shape

(301, 386)

In [128]:
data.drop('label',axis = 1,inplace = True)

In [122]:
k = 10

In [131]:
kNN_model = KNeighborsClassifier(n_neighbors=k).fit(train_x,train_y)
kNN_model

KNeighborsClassifier(n_neighbors=10)

In [132]:
yhat = kNN_model.predict(house_group_row.drop('Neighborhood',axis = 1))

In [133]:
yhat

array([0])

In [134]:
yhat[0]

0

In [135]:
nyc_remmendate = nyc_merged[nyc_merged['Cluster Label'] == yhat[0]]

In [136]:
nyc_remmendate

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Bronx,Throgs Neck,40.815109,-73.81635,0.0,Bar,American Restaurant,Liquor Store,Baseball Field,Coffee Shop,Asian Restaurant,Pizza Place,Sports Bar,Italian Restaurant,Deli / Bodega
34,Bronx,Belmont,40.857277,-73.888452,0.0,Italian Restaurant,Pizza Place,Deli / Bodega,Dessert Shop,Bakery,Food & Drink Shop,Mexican Restaurant,Fish Market,Bar,Market
39,Bronx,Edgewater Park,40.821986,-73.813885,0.0,Italian Restaurant,Coffee Shop,Pizza Place,Deli / Bodega,Chinese Restaurant,Bar,Asian Restaurant,Park,Japanese Restaurant,American Restaurant
46,Brooklyn,Bay Ridge,40.625801,-74.030621,0.0,Spa,Pizza Place,Italian Restaurant,Grocery Store,Greek Restaurant,Bagel Shop,Café,Breakfast Spot,Middle Eastern Restaurant,Bookstore
108,Manhattan,Yorkville,40.77593,-73.947118,0.0,Italian Restaurant,Wine Shop,Sushi Restaurant,Park,Deli / Bodega,Coffee Shop,Café,Liquor Store,Thai Restaurant,Gym
117,Manhattan,Greenwich Village,40.726933,-73.999914,0.0,Italian Restaurant,Dessert Shop,Clothing Store,Sushi Restaurant,Cosmetics Shop,French Restaurant,New American Restaurant,Optical Shop,Pilates Studio,Bagel Shop
123,Manhattan,West Village,40.734434,-74.00618,0.0,Italian Restaurant,Cocktail Bar,Coffee Shop,Chinese Restaurant,Cosmetics Shop,Austrian Restaurant,Park,Sandwich Place,Gourmet Shop,Board Shop
152,Queens,Auburndale,40.76173,-73.791762,0.0,Italian Restaurant,Bar,Comic Shop,Mattress Store,Noodle House,Toy / Game Store,Fast Food Restaurant,Pet Store,Train Station,Pharmacy
190,Queens,Belle Harbor,40.576156,-73.854018,0.0,Beach,Pub,Boutique,Spa,Deli / Bodega,Bakery,Bagel Shop,Donut Shop,Mexican Restaurant,Restaurant
202,Staten Island,Grymes Hill,40.624185,-74.087248,0.0,Dog Run,Women's Store,Empanada Restaurant,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory,Falafel Restaurant,Farm


## 5. Visualization the recommendation

In [137]:
# create map
map_clusters = folium.Map(location=[40.7127281, -74.0060152], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
nyc_merged['Cluster Label'] = nyc_remmendate['Cluster Label'].fillna(4)
for lat, lon, poi, cluster in zip(nyc_remmendate['Latitude'], nyc_remmendate['Longitude'], nyc_remmendate['Neighborhood'], nyc_remmendate['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Done!!! Thank you for taking time go through my job!!