**Applied Data Science Capstone Week 4**

The Problem: People are looking to move to Palmyra VA to save money from living in Charlottesville and want to find the area that most closely resembles downtown Charlottesville. Things that most folks like about the downtown center include how close it is, how many restaurants and bars are there, and the average high quality of those places.

The Solution: I am going to first enter the lat/long of a popular residential zone in Palmyra and then use ArcGIS API covering a 20mi radius to identify all of the commercial business zoned areas as they are most likely to contain a high volume of restaurants/bars and reduces the amount of information that needs to be processed. Next, I will break the commercial business zones into 2sqmi sections and query the area in foursquare for the number and rating of venues. I will then average the rating for each 2sqmi area and calculate the distance from the original lat/long. Lastly, I will rank each 2sqmi area based on the total value of the following equation: (# of restaurants and bars/distance) * Avg Rating. The highest-end value should be the location that most resembles downtown Charlottesville.

In [4]:
# importing and installing needed libraries and APIs
import numpy as np # library for data editing

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# for webscraping import Beautiful Soup 
from bs4 import BeautifulSoup

# import ability to read common online file type
import xml

!pip install geocoder # required for arcgis queries and fuctions
import geocoder # import geocoder

!pip install folium
import folium as fol # import folium

!pip install geopy
import geopy # library for geographic operations
from geopy.geocoders import Nominatim # import Nominatim

print('Libraries installed and imported.')

ERROR: Invalid requirement: '#'


Libraries installed and imported.


In [5]:
CLIENT_ID = '0FKU2JA2QKU2H3VZ3FCL1I4T445LGCCCZIEI4BDWEU0BPVVP' # Foursquare ID
CLIENT_SECRET = 'XK53IJ24QLOVU4NWL5IPBZODH2FL51VV2RIIUE4DRBQ3EUSX' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0FKU2JA2QKU2H3VZ3FCL1I4T445LGCCCZIEI4BDWEU0BPVVP
CLIENT_SECRET:XK53IJ24QLOVU4NWL5IPBZODH2FL51VV2RIIUE4DRBQ3EUSX


In [20]:
# using Nominatim to aquire lat long of Palmyra VA for map creation later
city = 'Lake Monticello, Virginia'

geolocator = Nominatim(user_agent="geoapiExercises")
location = geolocator.geocode(city)
city_lat = location.latitude
city_long = location.longitude
# create map of Palmyra VA using latitude and longitude values
map_pal = fol.Map(location=[city_lat, city_long], zoom_start=11)
    
map_pal

In [21]:
# names and plots location on the map.
start_lat = '37.916426'
start_long = '-78.329203'

fol.CircleMarker(
        [start_lat, start_long],
        radius=3,
        color='green',
        fill=True,
        fill_color= 'green',
        fill_opacity=0.3,
        parse_html=False).add_to(map_pal)

map_pal

In [22]:
# number of meters in 20 miles
radius = 32187

# creates client for API calls to Foursquare
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    start_lat, 
    start_long, 
    radius)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=0FKU2JA2QKU2H3VZ3FCL1I4T445LGCCCZIEI4BDWEU0BPVVP&client_secret=XK53IJ24QLOVU4NWL5IPBZODH2FL51VV2RIIUE4DRBQ3EUSX&v=20180605&ll=37.916426,-78.329203&radius=32187'

In [23]:
# gets business information.
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f44475830e7ef00419a28c3'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 203,
  'suggestedBounds': {'ne': {'lat': 38.20610928968329,
    'lng': -77.96269329515329},
   'sw': {'lat': 37.62674271031671, 'lng': -78.69571270484673}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ad4f29bf964a5202c0021e3',
       'name': 'Monticello',
       'location': {'address': '931 Thomas Jefferson Pkwy',
        'lat': 38.006029477067266,
        'lng': -78.45213988513116,
        'labeledLatLngs

In [24]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [25]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Monticello,Historic Site,38.006029,-78.45214
1,Carter Mountain Orchard,Farm,37.991308,-78.472063
2,Monticello Visitors Center,History Museum,38.005834,-78.452029
3,Glenmore Country Club,Golf Course,37.987337,-78.386514
4,il Castello,Italian Restaurant,37.978475,-78.210685
5,Saunders Monticello Trail,Trail,38.002852,-78.477184
6,Three Notch'd Craft Kitchen & Brewery,Brewery,38.026075,-78.482022
7,Lampo,Pizza Place,38.027356,-78.476907
8,Blenheim Vineyards,Winery,37.931656,-78.498872
9,Chick-fil-A,Fast Food Restaurant,38.029109,-78.440447


In [26]:
# plots locations in table to the map in red
for lat, lng in zip(nearby_venues['lat'], nearby_venues['lng']):
    fol.CircleMarker(
        [lat, lng],
        radius=3,
        color='red',
        fill=True,
        fill_color= 'red',
        fill_opacity=0.3,
        parse_html=False).add_to(map_pal)

map_pal

In [17]:
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
venue_cluster = nearby_venues[['lat', 'lng']]
venue_cluster = StandardScaler().fit_transform(venue_cluster)

db = DBSCAN(eps=0.3, min_samples=4).fit(venue_cluster)
core = np.zeros_like(db.labels_, dtype=bool)
core[db.core_sample_indices_] = True
labels = db.labels_
print (labels[500:500])
nearby_venues["DB_Cluster"] = labels

cluster_count = len(set(labels)) - (1 if -1 in labels else 0)
noise_count = list(labels).count(-1)

print('Estimated number of clusters: %d' % cluster_count)
print('Estimated number of noise points: %d' % noise_count)
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f"
      % metrics.adjusted_rand_score(labels_true, labels))
print("Adjusted Mutual Information: %0.3f"
      % metrics.adjusted_mutual_info_score(labels_true, labels))
print("Silhouette Coefficient: %0.3f"
      % metrics.silhouette_score(venue_cluster, labels))

[]
Estimated number of clusters: 1
Estimated number of noise points: 12


NameError: name 'labels_true' is not defined