# Hipster Finder - Density Based Clustering

### Capstone Project for IBM's Data Science Certification

Stephen Ewing
06/03/2019

## Introduction

While nobody wants to be called a hipster recent history has shown that the places hipsters flock tend to have property values explode.  From Williamsburg Brooklyn to East Atlanta hipster hotspots have attracted the eye of real estate investors across the country.  This could be of use to real estate investors, small businesspeople and various taste makers.

On the other hand, hipsters are often unwashed beardos and their presence is anathema to many.  If your neighborhood is becoming a hipster haven you might want to move and start renting out your house for exorbitant amounts of money.

This project will show a method to identify such places.

## Data

The data for this project will draw from the Foursquare API.  I will query the Foursquare API with a city and the search term 'hipster'.  From which I will collect the name of the venue along with its latitude and longitude.

In [27]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
#!conda install -c conda-forge folium=0.5.0 --yes 
import folium
from geopy.geocoders import Nominatim
import json
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors

In [69]:
address = 'Atlanta, GA'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
lat = location.latitude
lon = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, lat, lon))

The geograpical coordinate of Atlanta, GA are 33.7490987, -84.3901849.


In [70]:
# The code was removed by Watson Studio for sharing.

In [71]:
LIMIT = 1000
query = 'hipster'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={},&query={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            address,  
            query,
            LIMIT)
            
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']

# make the list of venues
venues_list = []
venues_list.append([(
    v['venue']['name'], 
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],
    v['venue']['categories'][0]['name']) for v in results])
    
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])

nearby_venues.columns = ['Venue', 'Latitude', 'Longitude', 'Venue Category']
nearby_venues.head()

Unnamed: 0,Venue,Latitude,Longitude,Venue Category
0,Octane Coffee,33.779402,-84.410225,Coffee Shop
1,The Earl,33.740963,-84.346027,Bar
2,MJQ Concourse,33.774218,-84.363277,Nightclub
3,Sun in My Belly,33.764165,-84.316531,Breakfast Spot
4,Octane Coffee + Little Tart Bakeshop,33.746074,-84.372786,Coffee Shop


In [75]:
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler
sklearn.utils.check_random_state(1000)
Clus_dataSet = nearby_venues[['Latitude','Longitude']]
Clus_dataSet = np.nan_to_num(Clus_dataSet)
Clus_dataSet = StandardScaler().fit_transform(Clus_dataSet)

# Compute DBSCAN
db = DBSCAN(eps=0.15, min_samples=4).fit(Clus_dataSet)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
nearby_venues["Clus_Db"]=labels

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels)) 


# A sample of clusters
nearby_venues[["Venue", "Clus_Db"]].head(5)

Unnamed: 0,Venue,Clus_Db
0,Octane Coffee,-1
1,The Earl,0
2,MJQ Concourse,1
3,Sun in My Belly,-1
4,Octane Coffee + Little Tart Bakeshop,-1


In [76]:
map_hipsters = folium.Map(location=[lat, lon], zoom_start=13)

hip_clusters = nearby_venues[nearby_venues['Clus_Db'] >= 0]

colors_array = cm.rainbow(np.linspace(0, 1, len(set(hip_clusters['Clus_Db']))))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
for lat, lon, Venue, Category, Cluster in zip(hip_clusters['Latitude'], hip_clusters['Longitude'], hip_clusters['Venue'], hip_clusters['Venue Category'], hip_clusters['Clus_Db']):
    label = 'Venue: {}\n Category: {}'.format(Venue, Category)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[Cluster-1],
        fill=True,
        fill_color=rainbow[Cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_hipsters)  

folium.TileLayer('cartodbdark_matter').add_to(map_hipsters)
    
map_hipsters