<h1 align=center><font size = 5>Comparing Neighborhoods of Newton, MA</font></h1>

# 1. Purpose & Introduction

This notebook provides a comparison between the different neighborhoods (also known as villages) in the city of Newton, MA. The comparison is used to help  families who want to move to Newton to choose a neighborhood that is best suited for their needs. The data of interest for a family include nearby amenities (data can be obtained from Foursquare) and housing prices, which is obtained from redfin. 

In [1]:
#Before we get the data and start exploring it, let's download all the dependencies that we will need.
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#!conda install -c conda-forge shapely --yes
#!conda install -c conda-forge geopandas --yes
#!conda install -c conda-forge geojsonio --yes
print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

# 2. Data Source & Exploration

### Getting Information About the Different Neighborhoods

Download the geojson file from Newton, MA git hub

In [2]:
import requests

url = 'https://raw.githubusercontent.com/NewtonMAGIS/GISData/master/Zip%20Codes/ZipCodes.geojson'
#url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
results = requests.get(url)
newtonData = results.json()


newtonData['features'][1]['properties']

{'Village_PO': 'CHESTNUT HILL', 'ZIPCODE': '02467'}

Reading the file to extract the different villages

In [3]:
nbdataDict = {"Village":[], 'ZipCode':[]}
for i in newtonData['features']:
    print(i['properties'])
    nbdataDict["Village"].append(i['properties']['Village_PO'])
    nbdataDict['ZipCode'].append(i['properties']['ZIPCODE'])  
nbDf = pd.DataFrame.from_dict(nbdataDict)
nbDf

{'Village_PO': 'BRIGHTON', 'ZIPCODE': '02135'}
{'Village_PO': 'CHESTNUT HILL', 'ZIPCODE': '02467'}
{'Village_PO': 'WABAN', 'ZIPCODE': '02468'}
{'Village_PO': 'WABAN', 'ZIPCODE': '02468'}
{'Village_PO': 'AUBURNDALE', 'ZIPCODE': '02466'}
{'Village_PO': 'CHESTNUT HILL', 'ZIPCODE': '02467'}
{'Village_PO': 'NEWTON', 'ZIPCODE': '02458'}
{'Village_PO': 'NEWTON UPPER FALLS', 'ZIPCODE': '02464'}
{'Village_PO': 'NEWTON LOWER FALLS', 'ZIPCODE': '02462'}
{'Village_PO': 'NEWTONVILLE', 'ZIPCODE': '02460'}
{'Village_PO': 'WEST NEWTON', 'ZIPCODE': '02465'}
{'Village_PO': 'NEWTON CENTER', 'ZIPCODE': '02459'}
{'Village_PO': 'NEWTON HIGHLANDS', 'ZIPCODE': '02461'}


Unnamed: 0,Village,ZipCode
0,BRIGHTON,2135
1,CHESTNUT HILL,2467
2,WABAN,2468
3,WABAN,2468
4,AUBURNDALE,2466
5,CHESTNUT HILL,2467
6,NEWTON,2458
7,NEWTON UPPER FALLS,2464
8,NEWTON LOWER FALLS,2462
9,NEWTONVILLE,2460


Getting the village names, zip code and coordinates

In [4]:
cleanNbDF = nbDf.drop_duplicates().reset_index()
del cleanNbDF['index']
cleanNbDF

Unnamed: 0,Village,ZipCode
0,BRIGHTON,2135
1,CHESTNUT HILL,2467
2,WABAN,2468
3,AUBURNDALE,2466
4,NEWTON,2458
5,NEWTON UPPER FALLS,2464
6,NEWTON LOWER FALLS,2462
7,NEWTONVILLE,2460
8,WEST NEWTON,2465
9,NEWTON CENTER,2459


### Getting the latitude and longitude of each neighborhood

In [5]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim 
geolocator = Nominatim()



In [6]:
coordinate = {"ZipCode": [], "Latitude":[], 'Longitude':[]}
for zipcode in (cleanNbDF['ZipCode']):
    location = geolocator.geocode(zipcode)
    coordinate['Latitude'].append(location.latitude)
    coordinate['Longitude'].append(location.longitude)
    coordinate['ZipCode'].append(zipcode)
coorDF = pd.DataFrame.from_dict(coordinate)
coorDF.head(11)

Unnamed: 0,ZipCode,Latitude,Longitude
0,2135,42.358197,-71.144008
1,2467,42.320017,-71.158139
2,2468,42.32954,-71.21778
3,2466,42.344515,-71.245211
4,2458,42.356482,-71.192272
5,2464,42.313152,-71.221146
6,2462,42.331935,-71.252881
7,2460,49.105796,6.226733
8,2465,42.347353,-71.229726
9,2459,42.319482,-71.190392


In [7]:
cleanNbDFwithCoor = pd.merge(cleanNbDF, coorDF, on = 'ZipCode', how = 'left')
cleanNbDFwithCoor

Unnamed: 0,Village,ZipCode,Latitude,Longitude
0,BRIGHTON,2135,42.358197,-71.144008
1,CHESTNUT HILL,2467,42.320017,-71.158139
2,WABAN,2468,42.32954,-71.21778
3,AUBURNDALE,2466,42.344515,-71.245211
4,NEWTON,2458,42.356482,-71.192272
5,NEWTON UPPER FALLS,2464,42.313152,-71.221146
6,NEWTON LOWER FALLS,2462,42.331935,-71.252881
7,NEWTONVILLE,2460,49.105796,6.226733
8,WEST NEWTON,2465,42.347353,-71.229726
9,NEWTON CENTER,2459,42.319482,-71.190392


### Create a map of newton with different villages using Folium

In [8]:
latitude = cleanNbDFwithCoor['Latitude'][4];
longitude = cleanNbDFwithCoor['Longitude'][4]
# create map of New York using latitude and longitude values
map_Newton = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, village, zipcode in zip(cleanNbDFwithCoor['Latitude'], cleanNbDFwithCoor['Longitude'], cleanNbDFwithCoor['Village'],
                                     cleanNbDFwithCoor['ZipCode']):
    label = '{}, {}'.format(village, zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Newton)  
    
map_Newton

## Define Foursquare Credentials and Version

In [9]:
LIMIT = 300
radius = 500
CLIENT_ID = 'ERVR3FIDFM1HN22OBNBPE4O1X3TBMR4IXTC5LRM51RLHHJ0G' # your Foursquare ID
CLIENT_SECRET = 'URA5FQSSJQ2W0TNUGAEH1ZOBYSSIAYZEMQOXJA1LKOKPEJ4D' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ERVR3FIDFM1HN22OBNBPE4O1X3TBMR4IXTC5LRM51RLHHJ0G
CLIENT_SECRET:URA5FQSSJQ2W0TNUGAEH1ZOBYSSIAYZEMQOXJA1LKOKPEJ4D


### Exploring Newton

Create functions to get venues for different villages

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Village', 
                  'Village Latitude', 
                  'Village Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
newton_venues = getNearbyVenues(names=cleanNbDFwithCoor['Village'],
                                   latitudes=cleanNbDFwithCoor['Latitude'],
                                   longitudes=cleanNbDFwithCoor['Longitude']
                                  )

BRIGHTON
CHESTNUT HILL
WABAN
AUBURNDALE
NEWTON
NEWTON UPPER FALLS
NEWTON LOWER FALLS
NEWTONVILLE
WEST NEWTON
NEWTON CENTER
NEWTON HIGHLANDS


In [12]:
print(newton_venues.shape)

(160, 7)


Counting number of venues for different neighborhoods

In [13]:
newton_venues.groupby('Village').count()

Unnamed: 0_level_0,Village Latitude,Village Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Village,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AUBURNDALE,22,22,22,22,22,22
BRIGHTON,34,34,34,34,34,34
CHESTNUT HILL,2,2,2,2,2,2
NEWTON,15,15,15,15,15,15
NEWTON CENTER,4,4,4,4,4,4
NEWTON HIGHLANDS,22,22,22,22,22,22
NEWTON LOWER FALLS,11,11,11,11,11,11
NEWTON UPPER FALLS,10,10,10,10,10,10
NEWTONVILLE,13,13,13,13,13,13
WABAN,4,4,4,4,4,4


In [14]:
newton_venues.head()

Unnamed: 0,Village,Village Latitude,Village Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,BRIGHTON,42.358197,-71.144008,The Flatbread Company,42.356927,-71.144099,Pizza Place
1,BRIGHTON,42.358197,-71.144008,NB Fitness Club,42.357121,-71.146161,Gym
2,BRIGHTON,42.358197,-71.144008,Kohi Coffee,42.356692,-71.142516,Café
3,BRIGHTON,42.358197,-71.144008,Warrior Ice Arena,42.357094,-71.143708,Hockey Rink
4,BRIGHTON,42.358197,-71.144008,Lincoln Bar & Grill,42.358851,-71.146949,Bar


## Analyze Each Neighborhood

In [15]:
# one hot encoding
newton_onehot = pd.get_dummies(newton_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newton_onehot['Village'] = newton_venues['Village'] 

# move neighborhood column to the first column
fixed_columns = [newton_onehot.columns[-1]] + list(newton_onehot.columns[:-1])
newton_onehot= newton_onehot[fixed_columns]

newton_onehot.head()

Unnamed: 0,Village,ATM,American Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Breakfast Spot,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cultural Center,Dance Studio,Diner,Discount Store,Donut Shop,Dry Cleaner,Entertainment Service,Farmers Market,Fast Food Restaurant,Flower Shop,Food Truck,French Restaurant,Furniture / Home Store,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gymnastics Gym,Hockey Rink,Hotel Pool,Ice Cream Shop,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Lawyer,Liquor Store,Massage Studio,Metro Station,Mobile Phone Shop,Multiplex,Music Store,Music Venue,Office,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pub,Rest Area,Restaurant,Salon / Barbershop,Sandwich Place,Shipping Store,Shoe Store,Shopping Mall,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tennis Court,Thai Restaurant,Theater,Train Station
0,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,BRIGHTON,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [16]:
newton_onehot.shape

(160, 88)

Group rows by neighborhood and calculating mean of occurrence

In [17]:
newton_grouped =newton_onehot.groupby('Village').mean().reset_index()
newton_grouped

Unnamed: 0,Village,ATM,American Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Breakfast Spot,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cultural Center,Dance Studio,Diner,Discount Store,Donut Shop,Dry Cleaner,Entertainment Service,Farmers Market,Fast Food Restaurant,Flower Shop,Food Truck,French Restaurant,Furniture / Home Store,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gymnastics Gym,Hockey Rink,Hotel Pool,Ice Cream Shop,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Lawyer,Liquor Store,Massage Studio,Metro Station,Mobile Phone Shop,Multiplex,Music Store,Music Venue,Office,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pub,Rest Area,Restaurant,Salon / Barbershop,Sandwich Place,Shipping Store,Shoe Store,Shopping Mall,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tennis Court,Thai Restaurant,Theater,Train Station
0,AUBURNDALE,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.045455,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455
1,BRIGHTON,0.029412,0.029412,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.058824,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.029412,0.0,0.0,0.029412,0.029412,0.0,0.029412,0.0,0.0,0.029412,0.058824,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.058824,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0
2,CHESTNUT HILL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,NEWTON,0.0,0.133333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,NEWTON CENTER,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0
5,NEWTON HIGHLANDS,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136364,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.090909,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455
6,NEWTON LOWER FALLS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,NEWTON UPPER FALLS,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,NEWTONVILLE,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.153846,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0
9,WABAN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
newton_grouped.shape

(11, 88)

Print each neighborhoo along with top 5 most common venues

In [19]:
num_top_venues = 5

for hood in newton_grouped['Village']:
    print("----"+hood+"----")
    temp = newton_grouped[newton_grouped['Village'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AUBURNDALE----
            venue  freq
0    Dance Studio  0.09
1     Pizza Place  0.09
2  Ice Cream Shop  0.05
3           Diner  0.05
4      Donut Shop  0.05


----BRIGHTON----
                    venue  freq
0      Athletics & Sports  0.06
1              Donut Shop  0.06
2              Shoe Store  0.06
3  Furniture / Home Store  0.06
4                     Bar  0.06


----CHESTNUT HILL----
          venue  freq
0    Playground   0.5
1  Soccer Field   0.5
2         Plaza   0.0
3   Pizza Place   0.0
4      Pharmacy   0.0


----NEWTON----
                 venue  freq
0  American Restaurant  0.13
1                  Gym  0.13
2                  Spa  0.07
3           Sports Bar  0.07
4           Donut Shop  0.07


----NEWTON CENTER----
                 venue  freq
0                  Spa  0.25
1         Tennis Court  0.25
2     Sushi Restaurant  0.25
3  Japanese Restaurant  0.25
4        Metro Station  0.00


----NEWTON HIGHLANDS----
              venue  freq
0        Donut Shop  0.14
1 

### Putting the above information into a dataframe

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Village']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Village'] = newton_grouped['Village']

for ind in np.arange(newton_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newton_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Village,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AUBURNDALE,Dance Studio,Pizza Place,Train Station,Grocery Store,Liquor Store,Italian Restaurant,Ice Cream Shop,Theater,Gym,Flower Shop
1,BRIGHTON,Shoe Store,Athletics & Sports,Bar,Furniture / Home Store,Donut Shop,Hockey Rink,Music Venue,Liquor Store,Indian Restaurant,Gym
2,CHESTNUT HILL,Playground,Soccer Field,Greek Restaurant,Gift Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cultural Center,Dance Studio,Diner
3,NEWTON,American Restaurant,Gym,Pizza Place,Spa,Donut Shop,Breakfast Spot,Liquor Store,Shoe Store,Restaurant,Sports Bar
4,NEWTON CENTER,Japanese Restaurant,Tennis Court,Sushi Restaurant,Spa,Dry Cleaner,Concert Hall,Convenience Store,Cosmetics Shop,Cultural Center,Dance Studio
5,NEWTON HIGHLANDS,Donut Shop,Metro Station,Train Station,Shipping Store,Gymnastics Gym,Dry Cleaner,Liquor Store,Mobile Phone Shop,Chinese Restaurant,Furniture / Home Store
6,NEWTON LOWER FALLS,Rest Area,Metro Station,Gym,Hotel Pool,Steakhouse,Pool,Intersection,Fast Food Restaurant,Bus Station,Donut Shop
7,NEWTON UPPER FALLS,Pet Store,Coffee Shop,Asian Restaurant,Music Store,Irish Pub,Spa,Entertainment Service,Baseball Field,Lawyer,Restaurant
8,NEWTONVILLE,Fast Food Restaurant,Chinese Restaurant,Restaurant,Furniture / Home Store,Flower Shop,Farmers Market,Mobile Phone Shop,Clothing Store,Gym,Auto Workshop
9,WABAN,Farmers Market,Park,Train Station,Dry Cleaner,Convenience Store,Cosmetics Shop,Cultural Center,Dance Studio,Diner,Discount Store


## Comparing Housing Price

The data was downloaded from Redfin for single house or townhouse that was sold in Newton for the last 3 years. The house of interest has at least 3 bedroom, at least 2 bathroom, price range between \\$400K and \$1 million. Data was saved as a .csv file

Read in the csv file and create a data frame

In [22]:
houseDF = pd.read_csv('redfinhousenewton.csv')
houseDF.head()

Unnamed: 0,SALE TYPE,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE,ZIP,PRICE,BEDS,BATHS,LOCATION,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,STATUS,NEXT OPEN HOUSE START TIME,NEXT OPEN HOUSE END TIME,URL (SEE http://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING),SOURCE,MLS#,FAVORITE,INTERESTED,LATITUDE,LONGITUDE
0,PAST SALE,April-18-2018,Single Family Residential,12 Carter St,Newton,MA,2460,835000,3,2.5,Newtonville,1464,6750.0,1910,188.0,570,,Sold,,,http://www.redfin.com/MA/Newton/12-Carter-St-0...,MLS PIN,72284627.0,N,Y,42.351717,-71.19892
1,PAST SALE,August-22-2018,Single Family Residential,195 Waltham St,Newton,MA,2465,925000,4,2.5,Newton,1900,8220.0,1928,62.0,487,,Sold,,,http://www.redfin.com/MA/West-Newton/195-Walth...,MLS PIN,72350560.0,N,Y,42.360198,-71.222988
2,PAST SALE,November-20-2017,Single Family Residential,305 Woodcliff Rd,Newton,MA,2461,920000,3,2.0,Newton,1696,7704.0,1955,337.0,542,,Sold,,,http://www.redfin.com/MA/Newton-Highlands/305-...,MLS PIN,72235320.0,N,Y,42.312755,-71.2015
3,PAST SALE,March-24-2017,Single Family Residential,73 Canterbury Rd,Newton,MA,2461,867000,3,2.5,Newton Highlands,1872,5270.0,1940,578.0,463,,Sold,,,http://www.redfin.com/MA/Newton/73-Canterbury-...,MLS PIN,72124686.0,N,Y,42.319825,-71.220204
4,PAST SALE,August-25-2016,Single Family Residential,39 Rowena Rd,Newton,MA,2459,837500,4,2.5,Newton,2252,12864.0,1955,789.0,372,,Sold,,,http://www.redfin.com/MA/Newton-Centre/39-Rowe...,MLS PIN,72027576.0,N,Y,42.323054,-71.197962


Clean up the data and only select ones that have sold value, price per square feet, 

In [23]:
cleanHouseDF = houseDF[['ZIP', 'PRICE', 'BEDS', 'BATHS', '$/SQUARE FEET']]

In [24]:
cleanHouseDF.head()

Unnamed: 0,ZIP,PRICE,BEDS,BATHS,$/SQUARE FEET
0,2460,835000,3,2.5,570
1,2465,925000,4,2.5,487
2,2461,920000,3,2.0,542
3,2461,867000,3,2.5,463
4,2459,837500,4,2.5,372


In [78]:
houseZipCode = cleanHouseDF.groupby('ZIP')['PRICE', 'BEDS', 'BATHS','$/SQUARE FEET'].mean()

In [79]:
houseZipCode.head()

Unnamed: 0_level_0,PRICE,BEDS,BATHS,$/SQUARE FEET
ZIP,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2132,661633.333333,3.0,2.5,269.0
2453,618777.777778,3.444444,2.722222,326.444444
2458,782705.744681,3.340426,2.56383,412.319149
2459,821802.692308,3.358974,2.455128,453.794872
2460,818204.166667,3.375,2.583333,431.083333


In [88]:
housePrice = houseZipCode.reset_index()
housePrice['ZIP']='0'+housePrice['ZIP'].astype(str)
housePrice['ZIP'].astype(int)
housePrice

Unnamed: 0,ZIP,PRICE,BEDS,BATHS,$/SQUARE FEET
0,2132,661633.333333,3.0,2.5,269.0
1,2453,618777.777778,3.444444,2.722222,326.444444
2,2458,782705.744681,3.340426,2.56383,412.319149
3,2459,821802.692308,3.358974,2.455128,453.794872
4,2460,818204.166667,3.375,2.583333,431.083333
5,2461,816682.051282,3.307692,2.384615,468.076923
6,2462,836500.0,3.166667,2.25,455.5
7,2464,726630.769231,3.205128,2.538462,401.564103
8,2465,781904.081633,3.489796,2.316327,451.285714
9,2466,823431.818182,3.5,2.386364,432.818182
