# Capstone Project : Where to live in Denver, CO, USA <br>
## The Problem : <br>
### How to help a client select a neighborhood to live in? <br>


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we are imagining that the client needs assistance locating a neighborhood in a city (Denver) that fulfills their individual needs.  <br>
In Denver, as in many cities, different neighborhoods have different qualities, and finding the place to live that fulfills a clients individual desires can be difficult. In this project the client will be able to select a neighborhood name, view a dataframe of that neighborhoods ten most common venues and view the location of that neighborhood on a city map.

## The Data <a name = "data"></a>


The data is primarily FourSquare Venue data, using created centroids of Neighborhoods from the Denver Open Catalog Statistical Neighborhoods KML. <br>
The location of the neighborhood data can be found here https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-statistical-neighborhoods<br>
I convert the file into a geojson using : https://mygeodata.cloud/converter/kml-to-geojson and layer it on top of an ipyleaflet map

### Import Libraries

In [1]:

#set jupiter to widescreen
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))






from bs4 import BeautifulSoup
from sklearn.cluster import KMeans

from matplotlib.pyplot import figure
import geocoder # import geocoder
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from IPython.display import *


import folium # map rendering library

from fastkml import kml
import ipyleaflet
from ipyleaflet import *

from shapely.geometry import Point, LineString



### Initialize Foursquare Credentials

In [12]:
#Initialize Foursquare credentials
CLIENT_ID = 'X4SJA3AV3FTZ22OEQXP55EHFQLTOUCZKGTSCC5F2JOQO501J' # your Foursquare ID
CLIENT_SECRET = 'C2FQD1BVTVLMHFULLVVRA3X3AD0S4GHUJQKFYXO3MH1ARGTJ' # your Foursquare Secret
ACCESS_TOKEN = '45U4RLZCPAHXURFJBXLSCVEMY1R3IBZYIJT1ELCAM1D5PPJV' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100 # A default Foursquare API limit value

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

### Set Pandas Options and set Jupyter to widescreen

In [13]:
import pandas as pd
pd.set_option('display.max_rows', 100)
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

### Import Neigborhood Lat/Lon

We will use requests to get the shapefile from the Denver Open Data Catalog, open it as a variable denver_geojson and place the Neighborhood names and Polygon coordinates into a variable gdf. Then create centroids of neighborhoods using the geopandas.GeoSeries.centroid function. It give a warning that the coordinates may be off, which 

In [14]:
import geopandas as gpd
from shapely.geometry import Polygon, LineString, Point
import matplotlib.pyplot as plt
import requests
import pyproj 

file = requests.get("https://www.denvergov.org/media/gis/DataCatalog/statistical_neighborhoods/shape/statistical_neighborhoods.zip")

with open('.\Resources\statistical_neighborhoods.geojson', 'r') as f:
    denver_geojson = f.read()
    
gdf = gpd.read_file(denver_geojson)[['NBHD_NAME', 'geometry']]

#rename column NBHD to Neighborhood
gdf.columns = ['Neighborhood', 'geometry']
#attempt to project to CRS, not sure why it still gives warning after this is done.
gdf = gdf.to_crs("EPSG:4326")
#create centroid lat/lon for FourSquare search

centroids = gpd.GeoDataFrame()
centroids['Neighborhood'] = gdf['Neighborhood']
centroids["Latitude"] = gdf.centroid.map(lambda p: p.x)
centroids["Longitude"] = gdf.centroid.map(lambda p: p.y)

centroids
    






  centroids["Latitude"] = gdf.centroid.map(lambda p: p.x)

  centroids["Longitude"] = gdf.centroid.map(lambda p: p.y)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Auraria,-105.008267,39.745821
1,Cory - Merrill,-104.949822,39.690462
2,Belcaro,-104.950477,39.705044
3,Washington Park,-104.966267,39.701238
4,Washington Park West,-104.979904,39.702575
5,Speer,-104.980294,39.719245
6,Cherry Creek,-104.949281,39.7194
7,Country Club,-104.966005,39.722304
8,Congress Park,-104.950307,39.732851
9,City Park,-104.95017,39.745623


Function to pull venue data from Foursquare using neighborhood lat/lng

In [16]:
#Function to pull venue information about each postal code
def getNearbyVenues(names, latitudes, longitudes, radius=750):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
#make a 1d list of each item and append to columns
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

To save time and space we will first check if this data exists and if not, use the getNearbyVenues function to create it

In [19]:
import os
from os import path

if os.path.isfile('./Resources/denver_venues.csv'):
    
    denver_venues = pd.read_csv('./Resources/denver_venues.csv', index_col =[0])
   
    
    print("denver_venues.csv exists and read")
else:
    denver_venues = getNearbyVenues(names = centroids['Neighborhood'],
                                 latitudes = centroids['Latitude'],
                                 longitudes = centroids['Longitude'])
        
    denver_venues.to_csv( './Resources/denver_venues.csv')


denver_venues.head()

Auraria


KeyError: 'groups'

# Exploring the Venue Data

Here we will begin to look at the venue data and consider how to organize it to better filter it based on the clients desire to live near specific types of venues.

In [None]:
#How many venues in each neighborhood
denver_venues.groupby('Neighborhood').count()


In [None]:
print('There are {} unique categories.'.format(len(denver_venues['Venue Category'].unique())))

As there are 228 unique venue categories, many of these can be merged to simplify interpretation and clarify density and desirability of a neighborhood. <br>
First we will one hot code and transpose the venues to columns for easier processing.<br>

Next we are refining the venue categories the data into categories I think are relatable as types of things to want to live near.

In [None]:

# one hot encoding
denver_onehot = pd.get_dummies(denver_venues[['Venue Category']], prefix="", prefix_sep="")
#Replace Neighborhood column
denver_onehot['Neighborhood'] = denver_venues['Neighborhood'] 
#locate neighborhood
neigh_loc = denver_onehot.columns.get_loc('Neighborhood')
#move Neighborhood to the 0 column
fixed_columns = [denver_onehot.columns[neigh_loc]] + list(denver_onehot.columns)
denver_onehot = denver_onehot[fixed_columns]
#dropping duplicate ['Neighborhood']
denver_onehot = denver_onehot.loc[:,~denver_onehot.columns.duplicated()]

denver_onehot.to_csv( './Resources/denver_onehot.csv')
denver_onehot.shape

In [None]:


# one hot encoding
denver_onehot = pd.get_dummies(denver_venues[['Venue Category']], prefix="", prefix_sep="")
#Replace Neighborhood column
denver_onehot['Neighborhood'] = denver_venues['Neighborhood'] 
#locate neighborhood
neigh_loc = denver_onehot.columns.get_loc('Neighborhood')
#move Neighborhood to the 0 column
fixed_columns = [denver_onehot.columns[neigh_loc]] + list(denver_onehot.columns)
denver_onehot = denver_onehot[fixed_columns]
#dropping duplicate ['Neighborhood']
denver_onehot = denver_onehot.loc[:,~denver_onehot.columns.duplicated()]

denver_onehot.to_csv( './Resources/denver_onehot.csv')
denver_onehot.shape


# Renaming 'Fast Food Restaurant' and adding other fast food type venues 
import numpy as np

ff = ['Fast Food Restaurant','Food','Food & Drink Shop','Food Court','Food Truck'] 

denver_onehot['Fast Food'] = np.add.reduce(denver_onehot[ff], axis = 1)
denver_onehot = denver_onehot.drop(ff, axis = 1)
#Concatenating all Stadiums

sport = ['Baseball Field','Basketball Stadium','Football Stadium','Stadium','Hockey Arena']
denver_onehot['Sports Stadium'] = np.add.reduce(denver_onehot[sport], axis = 1)
denver_onehot = denver_onehot.drop(sport, axis = 1)



#Locating venues with 'Restaurant'  
column_names = denver_onehot.columns
r = []
for i in column_names:
    if 'Restaurant' in i:
        r.append(i)
        
r.extend(['BBQ Joint','Noodle House','Pizza Place','Steakhouse'])    



#Concatenating sit down Restaurants
denver_onehot['Restaurants'] = np.add.reduce(denver_onehot[r], axis =1 )
denver_onehot = denver_onehot.drop( r, axis = 1) 
 


 #Concatenating Joint or Boutique Restaurants

joints = ['Bagel Shop','Bakery','Bistro','Breakfast Spot','Burger Joint',
          'Burrito Place','Creperie','Deli / Bodega','Donut Shop',
          'Fried Chicken Joint','Gastropub','Hot Dog Joint','Sandwich Place',
          'Soup Place','Taco Place','Wings Joint','Diner','Juice Bar','Salad Place']

denver_onehot['Boutique Restaurants'] = np.add.reduce(denver_onehot[joints], axis =1 )
denver_onehot=denver_onehot.drop(labels = joints, axis = 1)

#Concatenating Bars and pubs

bars = ['Bar','Beer Bar','Beer Garden','Cocktail Bar','Dive Bar','Lounge',
        'Piano Bar','Pub','Speakeasy','Sports Bar','Whisky Bar','Wine Bar',
        'Wine Shop', 'Roof Deck','Karaoke Bar']
denver_onehot['Bars and Pubs'] = np.add.reduce(denver_onehot[bars], axis =1 )
denver_onehot = denver_onehot.drop(labels = bars, axis = 1)

#Concatenating Brewery and Distillery
brews = ['Brewery','Distillery']
denver_onehot['Brewerys & Distilleries']= np.add.reduce(denver_onehot[brews], axis =1 )
denver_onehot = denver_onehot.drop(labels = ['Brewery','Distillery'], axis =1 )

#Create list of clubs

clubs = []
for i in column_names:
    if 'Club' in i:
        clubs.append(i)

clubs.append('Nightclub')
#Concatenating Clubs
denver_onehot['Clubs'] = np.add.reduce(denver_onehot[clubs], axis =1 )

denver_onehot = denver_onehot.drop(labels = clubs, axis = 1)

#Dessert Shops

dessert = ['Candy Store','Churrascaria','Cupcake Shop','Dessert Shop','Ice Cream Shop']
denver_onehot['Dessert'] = np.add.reduce(denver_onehot[dessert], axis =1 )

denver_onehot = denver_onehot.drop(labels = dessert, axis = 1)

#Coffee, Cafe, Tea
coffee_tea = ['Caf√©','Coffee Shop','Tea Room']
denver_onehot['Coffee & Tea']= np.add.reduce(denver_onehot[coffee_tea], axis =1 )

denver_onehot = denver_onehot.drop(labels = coffee_tea, axis = 1)

#Theaters and Music Venues
#Create list of Theaters
column_names = denver_onehot.columns
t = []
for i in column_names:
    if 'Theater' in i:
        t.append(i)
t.extend(['Opera House','Music Venue'])
denver_onehot['Theaters & Music'] = np.add.reduce(denver_onehot[t], axis =1 )
denver_onehot=denver_onehot.drop(t,axis = 1)

#Fitness

f = ['Athletics & Sports','Dance Studio','Cycle Studio','Gym','Gym / Fitness Center',
     'Gym Pool','Pool','Recreation Center','Skating Rink','Soccer Field',
     'Tennis Court','Trail','Yoga Studio','Golf Course','Bike Rental / Bike Share' ]

denver_onehot['Fitness'] = np.add.reduce(denver_onehot[f], axis =1 )
denver_onehot = denver_onehot.drop(f, axis = 1)

#Self Care and Health

self = ['Health & Beauty Service','Massage Studio','Optical Shop','Spa',
        'Chiropractor','Pharmacy','Alternative Healer','Salon / Barbershop','Tanning Salon']

denver_onehot['Selfcare & Health'] = np.add.reduce(denver_onehot[self], axis =1 )
denver_onehot = denver_onehot.drop(self, axis = 1)

#Arts and History

arts = ['Art Gallery','Art Museum', 'Arts & Entertainment','Botanical Garden',
        'Event Space','Exhibit','General Entertainment','Historic Site',
        'History Museum','Museum','Outdoor Sculpture','Garden','Theme Park','Zoo Exhibit']

denver_onehot['Arts, History, & Entertainment']= np.add.reduce(denver_onehot[arts], axis =1 ) 
denver_onehot = denver_onehot.drop(arts, axis =1 )

#Hobbys and Games

g = ['Arcade','Bowling Alley','Escape Room','Gaming Cafe','Hobby Shop',
     'Music Store','Record Shop','Sporting Goods Shop',
     'Toy / Game Store','Video Store','Arts & Crafts Store']

denver_onehot['Hobbys & Games'] = np.add.reduce(denver_onehot[g], axis =1 )
denver_onehot = denver_onehot.drop(g, axis = 1)

#Shops and Boutiques

shops = ['Antique Shop','Baby Store','Big Box Store','Bookstore','Boutique',
         'Brasserie','Bridal Shop','Clothing Store','Cosmetics Shop','Department Store',
         'Discount Store','Electronics Store','Flower Shop','Furniture / Home Store',
         'Gift Shop','Jewelry Store','Kitchen Supply Store','Lingerie Store',
         'Market',"Men's Store",'Mobile Phone Shop','Outdoor Supply Store',
         'Paper / Office Supplies Store','Pet Service','Pet Store','Shop & Service',
         'Shopping Mall','Smoke Shop','Souvenir Shop',"Women's Store"]

denver_onehot['Shopping'] = np.add.reduce(denver_onehot[shops], axis =1 )
denver_onehot = denver_onehot.drop(shops, axis = 1)

#Groceries

groc = ['Butcher','Cheese Shop','Farm','Farmers Market','Fish Market',
        'Grocery Store','Herbs & Spices Store','Organic Grocery']
denver_onehot['Groceries'] = np.add.reduce(denver_onehot[groc], axis = 1)
denver_onehot = denver_onehot.drop(groc, axis = 1)

categories = [ 'Fast Food', 'Restaurants', 'Boutique Restaurants', 'Bars and Pubs',
              'Brewerys & Distilleries','Clubs','Dessert','Coffee & Tea','Theaters & Music',
              'Fitness','Selfcare & Health','Arts, History, & Entertainment','Hobbys & Games','Shopping','Groceries'] 
denver_onehot[categories]

In [7]:
#Pets

In [8]:
#Professional Services

In [9]:
#Vehicle Care

In [10]:
denver_onehot
denver_onehot.to_csv( './Resources/denver_onehot_refined.csv')

NameError: name 'denver_onehot' is not defined

Now we have all pertinent venues sorted into easily relatable categories. This concludes the data gathering phase, next we will use this data to analyse the density of venues within the neighborhoods of Denver.



## Methodology <a name="methodology"></a>

In this project we are creating tools to allow the client to specifiy important venues to be located near, and filtering the Denver neighborhoods for quick and easy assessment of whether a neighborhood would suit the client's desired lifestyle. <br>
<br>
In the first step we collected the necessary  **data: location and type of every venue 

In [11]:

#Find mean occurence of each category ie scale
denver_grouped = denver_onehot.groupby(['Neighborhood']).mean()
denver_grouped.reset_index(inplace = True)
#create a denver_cluster before denver_grouped is altered
denver_cluster = pd.DataFrame(denver_grouped)

denver_cluster[categories]

NameError: name 'denver_onehot' is not defined

Function that sorts the columns of the Neighborhoods in descending order 

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
#Function to create sorted data frame with top 10 most common venues
def top_ten_venues(df) : 
    num_top_venues = 10
    indicators = ['st', 'nd', 'rd']
    
    # create columns according to number of top venues
    columns = ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))
    
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood'] = df['Neighborhood'] 
    
    for ind in np.arange(denver_grouped.shape[0]):
        neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df.iloc[ind, :], num_top_venues)
    #important to have a return in a function, this is the output that can be attached to a variable
    return neighborhoods_venues_sorted
top_ten_venues(denver_grouped)


In [None]:
#creating a variable to hold the df for later access
neighborhoods_venues_sorted = top_ten_venues(denver_grouped)
neighborhoods_venues_sorted = neighborhoods_venues_sorted.set_index(['Neighborhood'])

## Checkboxes to select important neighborhood attributes

In [None]:
#Checkable boxes that will filter output Dataframe
import ipywidgets as widgets

data = categories

names = []
checkbox_objects = []
for key in data:
    checkbox_objects.append(widgets.Checkbox(value=False, description=key))
    names.append(key)

arg_dict = {names[i]: checkbox for i, checkbox in enumerate(checkbox_objects)}

ui = widgets.VBox(children=checkbox_objects)

selected_data = []
def select_data(**kwargs):
    selected_data.clear()

    for key in kwargs:
        if kwargs[key] is True:
            selected_data.append(key)

    print(selected_data)

out = widgets.interactive_output(select_data, arg_dict)
display(ui, out)

In [None]:
filtered

In [None]:
#Dataframe of Neighborhoods with checkbox choices in top 10
filtered = neighborhoods_venues_sorted
filtered = filtered.isin(selected_data)
filtered_index = []
for row in filtered.isin(selected_data):     
    for col in filtered.isin(selected_data):
        if col == False: #and col[1] == False:
        
            filtered_index.append(filtered.index)
print(filtered_index)




In [None]:
#Filters for neighborhoods that have at least 3 of the chosen categories.
filtered = neighborhoods_venues_sorted
filtered = filtered.iloc[:,0:5]
filtered = filtered[filtered.isin(selected_data)]
filtered.dropna(thresh = 3,inplace = True)
filtered.index


# Functioning dropdown that displays specific Neighborhoods and data and outputs selected neighborhood as variable geometry for mapping

In [None]:
#model Jupyter dropdown
#initialized filtered_df to become a global that is update in the the dropdown selection
import ipywidgets as widgets
from ipywidgets import *
filtered_df = None

#Dropdown pulling from filtered dataframe of neighborhoods
dropdown = widgets.SelectMultiple(
                        options=filtered.index,
                        description='Neighborhood',
                        disabled=False,
                        layout={'height':'100px', 'width':'20%'})

#function to filter dataframe based on dropdown selection and cast to global variable filtered_df
def filter_dataframe(widget):
    global filtered_df
    selection = list(widget['new'])

    with out:
        clear_output()
        display(neighborhoods_venues_sorted.loc[selection])        
        filtered_df = neighborhoods_venues_sorted.loc[selection]
    
    #Reset index so that Neighborhood is a column
    df=filtered_df.reset_index()
#loop to find index value of neighborhood
    for i in gdf['Neighborhood'] :
    
        if i == df['Neighborhood'][0]:
        
            geometry = (gdf[gdf.Neighborhood == i].index)
    #print(geometry)
        

out = widgets.Output()
dropdown.observe(filter_dataframe, names='value')
display(dropdown)
display(out)




# Map that when run will highlight Neighborhhod in dropdown

In [None]:
df=filtered_df.reset_index()


    #loop to find index value of neighborhood
for i in gdf['Neighborhood'] :
    
    if i == df['Neighborhood'][0]:
        
        geometry = (gdf[gdf.Neighborhood == i].index)
       
m = Map(center = (39.73515, -104.97865), zoom = 12.2, basemap = basemaps.Stamen.Toner,
       layout = Layout(width = '100%', height = '720px'))




geo_data = GeoData(geo_dataframe = gdf)
neighborhood_data = GeoData(geo_dataframe = gdf.loc[geometry], 
                           style={'color': 'black', 'fillColor': '#3366cc', 'opacity':0.05, 'weight':1.9, 'dashArray':'2', 'fillOpacity':0.6},
                   hover_style={'fillColor': 'red' , 'fillOpacity': 0.2},
                   name = 'Countries')



m.add_layer(geo_data)
m.add_layer(neighborhood_data)
m