<h1 align=center><font size = 5>Comparing Neighborhoods of Newton, MA</font></h1>

# 1. Purpose & Introduction

This notebook provides a comparison between the different neighborhoods (also known as villages) in the city of Newton, MA. The comparison is used to help  families who want to move to Newton to choose a neighborhood that is best suited for their needs. Generally, a family will want to look at what different neighborhoods have to offers. For example, they may want to be close to public transit, they may want to have play grounds nearby if they have little kids. A couple without kids may want to have access to a lot of restaurants. A family also will consider housing price if they want to buy a house. They will have 

# 2. Data Source and Description

The data of interest for a family include nearby amenities/venues (data can be obtained from Foursquare) and housing prices, which will be obtained from the website Redfin, which has a lot of data on sold house price. The crime data of of the city will be obtained as well. 

The venues data from four square will be obtained to determine the top venues in a neighborhood. This data will help a family to see if the amenities in that neighborhood meet their needs. The house price data will help a family to determine if they can afford a house of interest in such neighborhood. The crime data help a family to determine how safe such a neighborhood is. 

## Obtaining Data from Different Source

In [1]:
#Before we get the data and start exploring it, let's download all the dependencies that we will need.
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#!conda install -c conda-forge shapely --yes
#!conda install -c conda-forge geopandas --yes
#!conda install -c conda-forge geojsonio --yes
print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

# 2. Data Source & Exploration

### Getting Information About the Different Neighborhoods

Download the geojson file from Newton, MA git hub

In [2]:
import requests

url = 'https://raw.githubusercontent.com/NewtonMAGIS/GISData/master/Zip%20Codes/ZipCodes.geojson'
#url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
results = requests.get(url)
newtonData = results.json()


newtonData['features'][1]['properties']

{'Village_PO': 'CHESTNUT HILL', 'ZIPCODE': '02467'}

Reading the file to extract the different villages

In [3]:
nbdataDict = {"Village":[], 'ZipCode':[]}
for i in newtonData['features']:
    print(i['properties'])
    nbdataDict["Village"].append(i['properties']['Village_PO'])
    nbdataDict['ZipCode'].append(i['properties']['ZIPCODE'])  
nbDf = pd.DataFrame.from_dict(nbdataDict)
nbDf

{'Village_PO': 'BRIGHTON', 'ZIPCODE': '02135'}
{'Village_PO': 'CHESTNUT HILL', 'ZIPCODE': '02467'}
{'Village_PO': 'WABAN', 'ZIPCODE': '02468'}
{'Village_PO': 'WABAN', 'ZIPCODE': '02468'}
{'Village_PO': 'AUBURNDALE', 'ZIPCODE': '02466'}
{'Village_PO': 'CHESTNUT HILL', 'ZIPCODE': '02467'}
{'Village_PO': 'NEWTON', 'ZIPCODE': '02458'}
{'Village_PO': 'NEWTON UPPER FALLS', 'ZIPCODE': '02464'}
{'Village_PO': 'NEWTON LOWER FALLS', 'ZIPCODE': '02462'}
{'Village_PO': 'NEWTONVILLE', 'ZIPCODE': '02460'}
{'Village_PO': 'WEST NEWTON', 'ZIPCODE': '02465'}
{'Village_PO': 'NEWTON CENTER', 'ZIPCODE': '02459'}
{'Village_PO': 'NEWTON HIGHLANDS', 'ZIPCODE': '02461'}


Unnamed: 0,Village,ZipCode
0,BRIGHTON,2135
1,CHESTNUT HILL,2467
2,WABAN,2468
3,WABAN,2468
4,AUBURNDALE,2466
5,CHESTNUT HILL,2467
6,NEWTON,2458
7,NEWTON UPPER FALLS,2464
8,NEWTON LOWER FALLS,2462
9,NEWTONVILLE,2460


Getting the village names, zip code and coordinates

In [4]:
cleanNbDF = nbDf.drop_duplicates().reset_index()
del cleanNbDF['index']
cleanNbDF

Unnamed: 0,Village,ZipCode
0,BRIGHTON,2135
1,CHESTNUT HILL,2467
2,WABAN,2468
3,AUBURNDALE,2466
4,NEWTON,2458
5,NEWTON UPPER FALLS,2464
6,NEWTON LOWER FALLS,2462
7,NEWTONVILLE,2460
8,WEST NEWTON,2465
9,NEWTON CENTER,2459


### Getting the latitude and longitude of each neighborhood

In [5]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim 
geolocator = Nominatim()



In [13]:
coordinate = {"ZipCode": [], "Latitude":[], 'Longitude':[]}
#for zipcode in (cleanNbDF['ZipCode']):
for zipcode, v in zip(cleanNbDF['ZipCode'], cleanNbDF['Village']):
    address = v + ', Newton, MA, ' + str(zipcode)
    location = geolocator.geocode(address)
    coordinate['Latitude'].append(location.latitude)
    coordinate['Longitude'].append(location.longitude)
    coordinate['ZipCode'].append(zipcode)
coorDF = pd.DataFrame.from_dict(coordinate)
coorDF.head(11)

Unnamed: 0,ZipCode,Latitude,Longitude
0,2135,42.350097,-71.156442
1,2467,42.330653,-71.162276
2,2468,42.327348,-71.229276
3,2466,42.346446,-71.248693
4,2458,42.31,-71.214
5,2464,42.313986,-71.219499
6,2462,42.329172,-71.258548
7,2460,42.350097,-71.203666
8,2465,42.350097,-71.232833
9,2459,42.330653,-71.199499


In [14]:
cleanNbDFwithCoor = pd.merge(cleanNbDF, coorDF, on = 'ZipCode', how = 'left')
cleanNbDFwithCoor

Unnamed: 0,Village,ZipCode,Latitude,Longitude
0,BRIGHTON,2135,42.350097,-71.156442
1,CHESTNUT HILL,2467,42.330653,-71.162276
2,WABAN,2468,42.327348,-71.229276
3,AUBURNDALE,2466,42.346446,-71.248693
4,NEWTON,2458,42.31,-71.214
5,NEWTON UPPER FALLS,2464,42.313986,-71.219499
6,NEWTON LOWER FALLS,2462,42.329172,-71.258548
7,NEWTONVILLE,2460,42.350097,-71.203666
8,WEST NEWTON,2465,42.350097,-71.232833
9,NEWTON CENTER,2459,42.330653,-71.199499


### Create a map of newton with different villages using Folium

In [15]:
latitude = cleanNbDFwithCoor['Latitude'][4];
longitude = cleanNbDFwithCoor['Longitude'][4]
# create map of New York using latitude and longitude values
map_Newton = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, village, zipcode in zip(cleanNbDFwithCoor['Latitude'], cleanNbDFwithCoor['Longitude'], cleanNbDFwithCoor['Village'],
                                     cleanNbDFwithCoor['ZipCode']):
    label = '{}, {}'.format(village, zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Newton)  
    
map_Newton

## Define Foursquare Credentials and Version

In [20]:
LIMIT = 300
radius = 5000
CLIENT_ID = 'ERVR3FIDFM1HN22OBNBPE4O1X3TBMR4IXTC5LRM51RLHHJ0G' # your Foursquare ID
CLIENT_SECRET = 'URA5FQSSJQ2W0TNUGAEH1ZOBYSSIAYZEMQOXJA1LKOKPEJ4D' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ERVR3FIDFM1HN22OBNBPE4O1X3TBMR4IXTC5LRM51RLHHJ0G
CLIENT_SECRET:URA5FQSSJQ2W0TNUGAEH1ZOBYSSIAYZEMQOXJA1LKOKPEJ4D


### Exploring Newton

Create functions to get venues for different villages

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Village', 
                  'Village Latitude', 
                  'Village Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
newton_venues = getNearbyVenues(names=cleanNbDFwithCoor['Village'],
                                   latitudes=cleanNbDFwithCoor['Latitude'],
                                   longitudes=cleanNbDFwithCoor['Longitude']
                                  )

BRIGHTON
CHESTNUT HILL
WABAN
AUBURNDALE
NEWTON
NEWTON UPPER FALLS
NEWTON LOWER FALLS
NEWTONVILLE
WEST NEWTON
NEWTON CENTER
NEWTON HIGHLANDS


In [23]:
print(newton_venues.shape)

(249, 7)


Counting number of venues for different neighborhoods

In [24]:
newton_venues.groupby('Village').count()

Unnamed: 0_level_0,Village Latitude,Village Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Village,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AUBURNDALE,22,22,22,22,22,22
BRIGHTON,47,47,47,47,47,47
CHESTNUT HILL,1,1,1,1,1,1
NEWTON,40,40,40,40,40,40
NEWTON CENTER,22,22,22,22,22,22
NEWTON HIGHLANDS,24,24,24,24,24,24
NEWTON LOWER FALLS,21,21,21,21,21,21
NEWTON UPPER FALLS,10,10,10,10,10,10
NEWTONVILLE,35,35,35,35,35,35
WABAN,12,12,12,12,12,12


In [25]:
newton_venues.head()

Unnamed: 0,Village,Village Latitude,Village Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,BRIGHTON,42.350097,-71.156442,Jim's Deli,42.349267,-71.154088,Deli / Bodega
1,BRIGHTON,42.350097,-71.156442,Cafenation,42.349177,-71.154091,Coffee Shop
2,BRIGHTON,42.350097,-71.156442,Johnny D's Fruit & Produce,42.349239,-71.154376,Grocery Store
3,BRIGHTON,42.350097,-71.156442,Esperia Grill,42.349016,-71.152825,Greek Restaurant
4,BRIGHTON,42.350097,-71.156442,Little Pizza King,42.349211,-71.15474,Pizza Place


## Analyze Each Neighborhood

In [26]:
# one hot encoding
newton_onehot = pd.get_dummies(newton_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newton_onehot['Village'] = newton_venues['Village'] 

# move neighborhood column to the first column
fixed_columns = [newton_onehot.columns[-1]] + list(newton_onehot.columns[:-1])
newton_onehot= newton_onehot[fixed_columns]

newton_onehot.head()

Unnamed: 0,Village,ATM,American Restaurant,Art Gallery,Arts & Crafts Store,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bookstore,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Candy Store,Chinese Restaurant,Clothing Store,Coffee Shop,Community Center,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dry Cleaner,Entertainment Service,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Furniture / Home Store,Gastropub,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Health & Beauty Service,Ice Cream Shop,Irish Pub,Italian Restaurant,Japanese Restaurant,Lake,Lawyer,Liquor Store,Marijuana Dispensary,Martial Arts Dojo,Massage Studio,Mattress Store,Men's Store,Metro Station,Mexican Restaurant,Mobile Phone Shop,Music Store,Neighborhood,Optical Shop,Organic Grocery,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pub,Ramen Restaurant,Rental Service,Rest Area,Restaurant,Salon / Barbershop,Sandwich Place,Shipping Store,Shoe Store,Shopping Mall,Smoke Shop,South American Restaurant,Spa,Sporting Goods Shop,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Shop,Yoga Studio
0,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,BRIGHTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [27]:
newton_onehot.shape

(249, 103)

Group rows by neighborhood and calculating mean of occurrence

In [28]:
newton_grouped =newton_onehot.groupby('Village').mean().reset_index()
newton_grouped

Unnamed: 0,Village,ATM,American Restaurant,Art Gallery,Arts & Crafts Store,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bookstore,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Candy Store,Chinese Restaurant,Clothing Store,Coffee Shop,Community Center,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dry Cleaner,Entertainment Service,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Furniture / Home Store,Gastropub,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Health & Beauty Service,Ice Cream Shop,Irish Pub,Italian Restaurant,Japanese Restaurant,Lake,Lawyer,Liquor Store,Marijuana Dispensary,Martial Arts Dojo,Massage Studio,Mattress Store,Men's Store,Metro Station,Mexican Restaurant,Mobile Phone Shop,Music Store,Neighborhood,Optical Shop,Organic Grocery,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pub,Ramen Restaurant,Rental Service,Rest Area,Restaurant,Salon / Barbershop,Sandwich Place,Shipping Store,Shoe Store,Shopping Mall,Smoke Shop,South American Restaurant,Spa,Sporting Goods Shop,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Shop,Yoga Studio
0,AUBURNDALE,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.045455,0.045455,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0
1,BRIGHTON,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.06383,0.042553,0.021277,0.0,0.0,0.0,0.021277,0.042553,0.0,0.0,0.021277,0.0,0.06383,0.0,0.06383,0.0,0.021277,0.042553,0.0,0.021277,0.0,0.021277,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.042553,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.06383,0.0,0.0,0.06383,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0
2,CHESTNUT HILL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,NEWTON,0.0,0.0,0.0,0.05,0.0,0.025,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.075,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.05,0.025,0.0,0.0,0.0,0.0,0.05,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.075,0.0,0.0,0.0,0.025,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.075,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0
4,NEWTON CENTER,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.045455,0.0,0.0,0.0,0.045455,0.090909,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,NEWTON HIGHLANDS,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.041667,0.083333,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0
6,NEWTON LOWER FALLS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.047619,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.142857,0.0,0.047619,0.0,0.0,0.047619,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0
7,NEWTON UPPER FALLS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
8,NEWTONVILLE,0.0,0.028571,0.057143,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.057143,0.0,0.028571,0.0,0.057143,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.085714,0.028571,0.0,0.085714,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571
9,WABAN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
newton_grouped.shape

(11, 103)

Print each neighborhoo along with top 5 most common venues

In [30]:
num_top_venues = 5

for hood in newton_grouped['Village']:
    print("----"+hood+"----")
    temp = newton_grouped[newton_grouped['Village'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AUBURNDALE----
                venue  freq
0      Baseball Field  0.09
1                 ATM  0.05
2         Coffee Shop  0.05
3        Liquor Store  0.05
4  Italian Restaurant  0.05


----BRIGHTON----
                venue  freq
0         Coffee Shop  0.06
1              Bakery  0.06
2  Chinese Restaurant  0.06
3         Pizza Place  0.06
4                 Pub  0.06


----CHESTNUT HILL----
           venue  freq
0  Metro Station   1.0
1            ATM   0.0
2   Optical Shop   0.0
3            Pub   0.0
4          Plaza   0.0


----NEWTON----
                    venue  freq
0     Sporting Goods Shop  0.08
1            Liquor Store  0.08
2        Department Store  0.08
3                     Gym  0.05
4  Furniture / Home Store  0.05


----NEWTON CENTER----
                 venue  freq
0       Sandwich Place  0.09
1                  Spa  0.09
2                 Lake  0.05
3  Sporting Goods Shop  0.05
4   Mexican Restaurant  0.05


----NEWTON HIGHLANDS----
              venue  freq
0   

### Putting the above information into a dataframe

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Village']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Village'] = newton_grouped['Village']

for ind in np.arange(newton_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newton_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Village,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AUBURNDALE,Baseball Field,ATM,Shopping Mall,Italian Restaurant,Gym / Fitness Center,Gym,Grocery Store,Liquor Store,Gift Shop,Flower Shop
1,BRIGHTON,Chinese Restaurant,Pizza Place,Pub,Coffee Shop,Bakery,Grocery Store,Deli / Bodega,Bus Station,Dry Cleaner,Bank
2,CHESTNUT HILL,Metro Station,Yoga Studio,Gastropub,Dessert Shop,Diner,Donut Shop,Dry Cleaner,Entertainment Service,Farmers Market,Fast Food Restaurant
3,NEWTON,Liquor Store,Sporting Goods Shop,Department Store,Arts & Crafts Store,Pizza Place,Gym,Furniture / Home Store,Coffee Shop,Pet Store,Business Service
4,NEWTON CENTER,Spa,Sandwich Place,Chinese Restaurant,Lake,Mexican Restaurant,Diner,Coffee Shop,Pharmacy,Pizza Place,Playground
5,NEWTON HIGHLANDS,Coffee Shop,Japanese Restaurant,Community Center,Restaurant,Candy Store,Pub,Clothing Store,Pizza Place,Paper / Office Supplies Store,Shoe Store
6,NEWTON LOWER FALLS,Pizza Place,Rest Area,Donut Shop,Japanese Restaurant,Furniture / Home Store,Deli / Bodega,Plaza,Performing Arts Venue,Rental Service,Coffee Shop
7,NEWTON UPPER FALLS,Music Store,Lawyer,Spa,Baseball Field,Gym,Irish Pub,Trail,Italian Restaurant,Entertainment Service,Donut Shop
8,NEWTONVILLE,Massage Studio,Ice Cream Shop,Liquor Store,Art Gallery,Pizza Place,Dance Studio,Chinese Restaurant,Café,Diner,Pet Store
9,WABAN,Deli / Bodega,Metro Station,Tennis Court,Martial Arts Dojo,Bus Stop,Organic Grocery,Neighborhood,Coffee Shop,Ice Cream Shop,Hardware Store


## Getting Housing Price Data

The data was downloaded from Redfin for single house or townhouse that was sold in Newton for the last 3 years. The house of interest has at least 3 bedroom, at least 2 bathroom, price range between \\$400K and \$1 million. Data was saved as a .csv file

Read in the csv file and create a data frame

In [33]:
houseDF = pd.read_csv('redfinhousenewton.csv')
houseDF.head()

Unnamed: 0,SALE TYPE,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE,ZIP,PRICE,BEDS,BATHS,LOCATION,SQUARE FEET,LOT SIZE,YEAR BUILT,DAYS ON MARKET,$/SQUARE FEET,HOA/MONTH,STATUS,NEXT OPEN HOUSE START TIME,NEXT OPEN HOUSE END TIME,URL (SEE http://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING),SOURCE,MLS#,FAVORITE,INTERESTED,LATITUDE,LONGITUDE
0,PAST SALE,April-18-2018,Single Family Residential,12 Carter St,Newton,MA,2460,835000,3,2.5,Newtonville,1464,6750.0,1910,188.0,570,,Sold,,,http://www.redfin.com/MA/Newton/12-Carter-St-0...,MLS PIN,72284627.0,N,Y,42.351717,-71.19892
1,PAST SALE,August-22-2018,Single Family Residential,195 Waltham St,Newton,MA,2465,925000,4,2.5,Newton,1900,8220.0,1928,62.0,487,,Sold,,,http://www.redfin.com/MA/West-Newton/195-Walth...,MLS PIN,72350560.0,N,Y,42.360198,-71.222988
2,PAST SALE,November-20-2017,Single Family Residential,305 Woodcliff Rd,Newton,MA,2461,920000,3,2.0,Newton,1696,7704.0,1955,337.0,542,,Sold,,,http://www.redfin.com/MA/Newton-Highlands/305-...,MLS PIN,72235320.0,N,Y,42.312755,-71.2015
3,PAST SALE,March-24-2017,Single Family Residential,73 Canterbury Rd,Newton,MA,2461,867000,3,2.5,Newton Highlands,1872,5270.0,1940,578.0,463,,Sold,,,http://www.redfin.com/MA/Newton/73-Canterbury-...,MLS PIN,72124686.0,N,Y,42.319825,-71.220204
4,PAST SALE,August-25-2016,Single Family Residential,39 Rowena Rd,Newton,MA,2459,837500,4,2.5,Newton,2252,12864.0,1955,789.0,372,,Sold,,,http://www.redfin.com/MA/Newton-Centre/39-Rowe...,MLS PIN,72027576.0,N,Y,42.323054,-71.197962


Clean up the data and only select ones that have sold value, price per square feet, 

In [34]:
cleanHouseDF = houseDF[['ZIP', 'PRICE', 'BEDS', 'BATHS', '$/SQUARE FEET']]

In [35]:
cleanHouseDF.head()

Unnamed: 0,ZIP,PRICE,BEDS,BATHS,$/SQUARE FEET
0,2460,835000,3,2.5,570
1,2465,925000,4,2.5,487
2,2461,920000,3,2.0,542
3,2461,867000,3,2.5,463
4,2459,837500,4,2.5,372


Calculate average price and average price per square feet for each zip code

In [36]:
houseZipCode = cleanHouseDF.groupby('ZIP')['PRICE', 'BEDS', 'BATHS','$/SQUARE FEET'].mean()

In [37]:
houseZipCode.head()

Unnamed: 0_level_0,PRICE,BEDS,BATHS,$/SQUARE FEET
ZIP,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2132,661633.333333,3.0,2.5,269.0
2453,618777.777778,3.444444,2.722222,326.444444
2458,782705.744681,3.340426,2.56383,412.319149
2459,821802.692308,3.358974,2.455128,453.794872
2460,818204.166667,3.375,2.583333,431.083333


In [38]:
housePrice = houseZipCode.reset_index()
housePrice['ZIP']='0'+housePrice['ZIP'].astype(str)
housePrice['ZIP'].astype(int)
housePrice

Unnamed: 0,ZIP,PRICE,BEDS,BATHS,$/SQUARE FEET
0,2132,661633.333333,3.0,2.5,269.0
1,2453,618777.777778,3.444444,2.722222,326.444444
2,2458,782705.744681,3.340426,2.56383,412.319149
3,2459,821802.692308,3.358974,2.455128,453.794872
4,2460,818204.166667,3.375,2.583333,431.083333
5,2461,816682.051282,3.307692,2.384615,468.076923
6,2462,836500.0,3.166667,2.25,455.5
7,2464,726630.769231,3.205128,2.538462,401.564103
8,2465,781904.081633,3.489796,2.316327,451.285714
9,2466,823431.818182,3.5,2.386364,432.818182


## Geting and Processing Crime Data

The Newton Police log data can be obtained frmo their police data website. Each day has a text file of all logs. They are stored at "http://www.newtonpolice.com/POLICE_LOG/CURRENT/20180101.txt", with the date indicates the website. 

In [39]:
from bs4 import BeautifulSoup as BS
import requests

In [40]:
#function to extract data from url
url = 'http://www.newtonpolice.com/POLICE_LOG/CURRENT/20180102.txt'
def extractCrimeDatafromUrl(url):
    """Extract crime data from newton police data base"""
    crimeDataLocation ={'Date': [], 'Address': [], 'Latitude': [], 'Longitude': [], 'ZipCode': []}
    results = requests.get(url)
    c = results.content
    soup = BS(c)
    a = soup.body.find_all('p')
    b = a[0].text.split('\r\n') 
      
    for incident in b: 
        if len(incident.split('   ')) > 5:
            c = incident.split('   ')
            crimeDataLocation['Date'].append(c[1])
            if len(c[3])>0:
                address = c[3]
            else:
                address = c[4]
            if ' / ' in address:
                address = address.split(' / ')[1]
            if '/ ' in address:
                address = address.split('/ ')[1]
            address = address + ', Newton, MA'
            #get zipcode, lat and long
            #try: 
            #    location = geolocator.geocode(address)
            #except AttributeError:
            #    location = geolocator.geocode('123 Pike Street, Seattle, MA')
            #    address = 'Newton, MA'
            #zipcode = location.address.split(', ')[-2]
            #lat = location.latitude
            #long = location.longitude
            zipcode, lat, long = getZipCodeLongLat(address)
            crimeDataLocation['Address'].append(address)
            crimeDataLocation['ZipCode'].append(zipcode)
            crimeDataLocation['Longitude'].append(long)
            crimeDataLocation['Latitude'].append(lat)
    crimeDf = pd.DataFrame.from_dict(crimeDataLocation)
    #drop any row with Nan
    a = crimeDf.dropna(axis = 0)
    return a

from geopy.geocoders import Nominatim 
geolocator = Nominatim()

def getZipCodeLongLat(address):
    """get Zipcode, longitude and latitude from an address"""
    location = geolocator.geocode(address)
    if location is None:
        return None, None, None
    else:
        zipcode = location.address.split(', ')[-2]
        lat = location.latitude
        long = location.longitude
        return zipcode, lat, long



In [41]:
crimeDf = extractCrimeDatafromUrl(url)

Unnamed: 0,index,Date,Address,Latitude,Longitude,ZipCode
0,0,1/1/2018,"1946 WASHINGTON ST #333, Newton, MA",42.357369,-71.184771,02458
1,2,1/1/2018,"320 WASHINGTON ST, Newton, MA",42.356282,-71.186672,02458
2,3,1/1/2018,"985 BEACON ST, Newton, MA",42.33066,-71.202552,02459
3,4,1/1/2018,"42 OAK AVE, Newton, MA",42.351386,-71.232145,02465
4,5,1/1/2018,"70 UNION ST, Newton, MA",42.3296,-71.1925,02459
5,6,1/1/2018,"PARSONS ST & WASHINGTON ST, Newton, MA",42.349541,-71.155762,02135-3202
6,7,1/1/2018,"197 WALNUT ST, Newton, MA",42.353278,-71.208107,02460
7,8,1/1/2018,"66 AUSTIN ST, Newton, MA",42.350275,-71.20976,02460
8,9,1/1/2018,"111 PROSPECT ST, Newton, MA",42.34403,-71.231656,02465
9,10,1/1/2018,"TREMONT ST, Newton, MA",42.353622,-71.178102,02458


In [45]:
a = crimeDf.reset_index()
del a['index']

In [46]:
a.head()

Unnamed: 0,Date,Address,Latitude,Longitude,ZipCode
0,1/1/2018,"1946 WASHINGTON ST #333, Newton, MA",42.357369,-71.184771,2458
1,1/1/2018,"320 WASHINGTON ST, Newton, MA",42.356282,-71.186672,2458
2,1/1/2018,"985 BEACON ST, Newton, MA",42.33066,-71.202552,2459
3,1/1/2018,"42 OAK AVE, Newton, MA",42.351386,-71.232145,2465
4,1/1/2018,"70 UNION ST, Newton, MA",42.3296,-71.1925,2459


the data above is representative for one day. There is data for every day in the year in 2018 that can be obtained. 