# Coursera Capstone

### Introduction and Business Problem

Question:  Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

Answer: 

Two historic and very popular jazz clubs have recently closed in the District of Columbia.  This leaves a underserved market.  In this assignment I seek to determine what neighborhood is the best to open a new jazz club.

### The Data

Question 2:  Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

Answer:  The Foursquare API provides the category of "jazz clubs."  The idea is to cluster the neighborhoods and determine where the hotbeds of jazz are.  

#  Start of Analysis

## Import necessary libraries for the project

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Data Collection and Wrangling

The data required to complete this analysis is the following:

	* Neighborhoods of Washington, DC
	* Location of current jazz clubs
	* Neighborhoods with a strong nightlife
	* The ratings of the jazz clubs in the area


In [238]:
# import the neighborhoods in DC into a dataframe

neighborhoods = pd.read_csv('Neighborhood_Labels.csv')
neighborhoods.rename(columns={'X': 'location.lng', 'Y': 'location.lat', 'LABEL_NAME':'Neighborhood'}, inplace=True) # rename the columns to match Foursquare convention
neighborhoods = neighborhoods[['location.lng','location.lat','Neighborhood']] # select specific columns in the data
neighborhoods.head()

Unnamed: 0,location.lng,location.lat,Neighborhood
0,-76.980348,38.855658,Fort Stanton
1,-76.99795,38.841077,Congress Heights
2,-76.995636,38.830237,Washington Highlands
3,-77.009271,38.826952,Bellevue
4,-76.96766,38.853688,Knox Hill/Buena Vista


In [195]:
# Provide criteria for the Foursquare API

CLIENT_ID = 'OCWMDYHM2G1414ACMQSU1M4JUE2RGENSCHFRUDUYSIAKH4EC' # your Foursquare ID
CLIENT_SECRET = 'T3ZHW0FVTOXETI3WBQN1WJVWZ4IS1K02KKCUOPVEW5EJTQ0Z' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 400
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OCWMDYHM2G1414ACMQSU1M4JUE2RGENSCHFRUDUYSIAKH4EC
CLIENT_SECRET:T3ZHW0FVTOXETI3WBQN1WJVWZ4IS1K02KKCUOPVEW5EJTQ0Z


In [196]:
# Locate address in Washington, DC to get a lat long

address = 'Adams Morgan, DC'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

38.9215002 -77.0421992


In [222]:
# Search for a specific category

search_query = 'jazz'
radius = 4000
print(search_query + ' .... OK!')

jazz .... OK!


In [223]:
# Add search query to url 

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=OCWMDYHM2G1414ACMQSU1M4JUE2RGENSCHFRUDUYSIAKH4EC&client_secret=T3ZHW0FVTOXETI3WBQN1WJVWZ4IS1K02KKCUOPVEW5EJTQ0Z&ll=38.9215002,-77.0421992&v=20180604&query=jazz&radius=4000&limit=400'

In [224]:
# get the json from Foursquare

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e365236923935001cdb40be'},
 'response': {'venues': [{'id': '4ad66635f964a520040721e3',
    'name': 'Twins Jazz',
    'location': {'address': '1344 U St NW',
     'crossStreet': 'btwn 13th & 14th St NW',
     'lat': 38.91688730237296,
     'lng': -77.03138484898851,
     'labeledLatLngs': [{'label': 'display',
       'lat': 38.91688730237296,
       'lng': -77.03138484898851}],
     'distance': 1068,
     'postalCode': '20009',
     'cc': 'US',
     'city': 'Washington',
     'state': 'D.C.',
     'country': 'United States',
     'formattedAddress': ['1344 U St NW (btwn 13th & 14th St NW)',
      'Washington, D.C. 20009',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d1e7931735',
      'name': 'Jazz Club',
      'pluralName': 'Jazz Clubs',
      'shortName': 'Jazz Club',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/musicvenue_jazzclub_',
       'suffix': '.png'},
      'primary': True}],
   

In [225]:
# Get relevant parts of the JSON

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id
0,4ad66635f964a520040721e3,Twins Jazz,"[{'id': '4bf58dd8d48988d1e7931735', 'name': 'J...",v-1580618369,False,1344 U St NW,btwn 13th & 14th St NW,38.916887,-77.031385,"[{'label': 'display', 'lat': 38.91688730237296...",1068,20009.0,US,Washington,D.C.,United States,"[1344 U St NW (btwn 13th & 14th St NW), Washin...",288013.0,https://www.grubhub.com/restaurant/twins-jazz-...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,
1,50f097a9e4b0b237b6448e09,DC Jazz Festival Production Office,"[{'id': '50328a8e91d4c4b30a586d6c', 'name': 'N...",v-1580618369,False,Mozart Pl,,38.924954,-77.037823,"[{'label': 'display', 'lat': 38.92495432291002...",539,20009.0,US,Washington,D.C.,United States,"[Mozart Pl, Washington, D.C. 20009, United Sta...",,,,,,,
2,4c1580c5a1010f4795354e18,Center For Preservation Of Jazz And Blues,"[{'id': '4bf58dd8d48988d1e7931735', 'name': 'J...",v-1580618369,False,,,38.91157,-77.032109,"[{'label': 'display', 'lat': 38.91157, 'lng': ...",1409,,US,Washington,D.C.,United States,"[Washington, D.C., United States]",,,,,,,
3,4dc74448fa76d685ce0ec090,DC Jazz Loft,"[{'id': '4bf58dd8d48988d1e7931735', 'name': 'J...",v-1580618369,False,1402 Meridian Pl NW,,38.933366,-77.033001,"[{'label': 'display', 'lat': 38.933366, 'lng':...",1542,,US,Washington,D.C.,United States,"[1402 Meridian Pl NW, Washington, D.C., United...",,,,,,,
4,56bcb3cc498e583f3373a1b9,The Jazz Flat,"[{'id': '4d954b06a243a5684965b473', 'name': 'R...",v-1580618369,False,,,38.91632,-77.013747,"[{'label': 'display', 'lat': 38.91632, 'lng': ...",2530,,US,Washington,D.C.,United States,"[Washington, D.C., United States]",,,,,,,


In [226]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,Twins Jazz,Jazz Club,1344 U St NW,btwn 13th & 14th St NW,38.916887,-77.031385,"[{'label': 'display', 'lat': 38.91688730237296...",1068,20009.0,US,Washington,D.C.,United States,"[1344 U St NW (btwn 13th & 14th St NW), Washin...",4ad66635f964a520040721e3
1,DC Jazz Festival Production Office,Non-Profit,Mozart Pl,,38.924954,-77.037823,"[{'label': 'display', 'lat': 38.92495432291002...",539,20009.0,US,Washington,D.C.,United States,"[Mozart Pl, Washington, D.C. 20009, United Sta...",50f097a9e4b0b237b6448e09
2,Center For Preservation Of Jazz And Blues,Jazz Club,,,38.91157,-77.032109,"[{'label': 'display', 'lat': 38.91157, 'lng': ...",1409,,US,Washington,D.C.,United States,"[Washington, D.C., United States]",4c1580c5a1010f4795354e18
3,DC Jazz Loft,Jazz Club,1402 Meridian Pl NW,,38.933366,-77.033001,"[{'label': 'display', 'lat': 38.933366, 'lng':...",1542,,US,Washington,D.C.,United States,"[1402 Meridian Pl NW, Washington, D.C., United...",4dc74448fa76d685ce0ec090
4,The Jazz Flat,Residential Building (Apartment / Condo),,,38.91632,-77.013747,"[{'label': 'display', 'lat': 38.91632, 'lng': ...",2530,,US,Washington,D.C.,United States,"[Washington, D.C., United States]",56bcb3cc498e583f3373a1b9
5,LeRoy Neiman Jazz Cafe,Café,"14th St and Constitution Ave, NW",National Museum of American History,38.891989,-77.030122,"[{'label': 'display', 'lat': 38.89198937986075...",3447,,US,Washington,D.C.,United States,"[14th St and Constitution Ave, NW (National Mu...",55e9faeb498ee151a596c94f
6,Kennedy Center Jazz Club,Jazz Club,2700 F St NW,,38.895805,-77.055673,"[{'label': 'display', 'lat': 38.89580499999999...",3089,20566.0,US,Washington,D.C.,United States,"[2700 F St NW, Washington, D.C. 20566, United ...",4cd06c2ede0f6dcb50a76a63
7,The Salted Peanut Jazz Lounge,Lounge,,,38.89783,-77.048084,"[{'label': 'display', 'lat': 38.89782973195035...",2683,20037.0,US,Washington,D.C.,United States,"[Washington, D.C. 20037, United States]",50bc3af2e4b05be7dcfbbd30
8,Alice's Jazz and Cultural Society,Jazz Club,2813 12th St NE,at Franklin St NE,38.925583,-76.990305,"[{'label': 'display', 'lat': 38.925583, 'lng':...",4517,20017.0,US,Washington,D.C.,United States,"[2813 12th St NE (at Franklin St NE), Washingt...",58853ae0375c4a342e3d206a
9,Felix E. Grant Jazz Archives,General Entertainment,4200 Connecticut Ave NW,Connecticut Ave. & Van Ness St.,38.943623,-77.06392,"[{'label': 'display', 'lat': 38.94362273517887...",3098,20008.0,US,Washington,D.C.,United States,[4200 Connecticut Ave NW (Connecticut Ave. & V...,4d5bf3e9f57ca09056f97be0


In [227]:
# filter 

filtered_categories = set(dataframe_filtered.categories)
filtered_categories

{'Café',
 'General Entertainment',
 'Jazz Club',
 'Lounge',
 'Miscellaneous Shop',
 'Music Venue',
 'New American Restaurant',
 'Non-Profit',
 'Residential Building (Apartment / Condo)'}

In [228]:
dataframe_filtered.describe()

Unnamed: 0,lat,lng,distance
count,15.0,15.0,15.0
mean,38.91801,-77.028447,2595.6
std,0.018409,0.020323,1325.762681
min,38.889677,-77.06392,539.0
25%,38.9047,-77.035412,1475.5
50%,38.920665,-77.031385,2683.0
75%,38.925715,-77.020792,3595.5
max,38.953053,-76.990206,4528.0
