# _Battle of the Neighborhoods - Concert Venues_

## 1. Introduction / Business Problem

I am a musician, drum corps alumni/fanatic and therefore like to choose my travel and vacations based upon density and location of concert venues.  I am in the process of saving money for a big vacation for when cities and venues open post-COVID-19.  The following cities are on my bucket list:

- New York, NY
- San Francisco, CA
- Seattle, WA
- Boston, MA
- Chicago, IL

I will make my selection based on a combination of density and location of concert venues.  While I understand each of the above cities host a number of events on a regular basis, I am not sure of the density of said venues nor where I should make a hotel reservation.  The problem I want to solve is to analyze concert venue locations in the above cities and find a convenient area to make a hotel reservation.



## 2. Data

I will use the Foursquare application programming interface (API) to research and collect concert venue information for the aforementioned locations.  The data can be found here:
- Foursquare Link: https://developer.foursquare.com/docs/build-with-foursquare/categories
- Venue (concert hall) location code: 5032792091d4c4b30a586d5c
Using the Foursquare data, I will construct a map of the above cities showing each concert hall location and conduct analysis to inform a decision regarding where to go on my vacation.

A preview of the data is below.  However, before we can view the data, we need to install and import various libraries:


In [110]:
# Install geopy

!pip install geopy



In [111]:
# Install folium

!pip install folium



In [112]:
# Import other libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium

print('Libraries imported.')

Libraries imported.


In [113]:
# Establish Foursquare credentials

CLIENT_ID = 'E35M35R0TDYOGJIMFO3B5BPCFVHEZTOLIBB143AYE3MTOUVE' # your Foursquare ID
CLIENT_SECRET = 'BABTZXPQMIBOY12Y2UL15GFTOS4GWI5OCAX4VONCKFY4LKI3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)

Your credentails:
CLIENT_ID: E35M35R0TDYOGJIMFO3B5BPCFVHEZTOLIBB143AYE3MTOUVE


In [114]:
# Obrain the data from Foursquare

LIMIT = 500 # Maximum is 100
cities = ["New York, NY", 'Chicago, IL', 'San Francisco, CA', 'Seattle, WA', 'Boston, MA']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "5032792091d4c4b30a586d5c") # Concert Hall place category ID
    results[city] = requests.get(url).json()

In [115]:
# View raw data to make sure the data is correctly imported

results

{'New York, NY': {'meta': {'code': 200,
   'requestId': '5ee1373c1c213b7598830254'},
  'response': {'suggestedFilters': {'header': 'Tap to show:',
    'filters': [{'name': '$-$$$$', 'key': 'price'},
     {'name': 'Open now', 'key': 'openNow'}]},
   'geocode': {'what': '',
    'where': 'new york ny',
    'center': {'lat': 40.742185, 'lng': -73.992602},
    'displayString': 'New York, NY, United States',
    'cc': 'US',
    'geometry': {'bounds': {'ne': {'lat': 40.882214, 'lng': -73.907},
      'sw': {'lat': 40.679548, 'lng': -74.047285}}},
    'slug': 'new-york-city-new-york',
    'longId': '72057594043056517'},
   'headerLocation': 'New York',
   'headerFullLocation': 'New York',
   'headerLocationGranularity': 'city',
   'query': 'concert hall',
   'totalResults': 80,
   'suggestedBounds': {'ne': {'lat': 40.85926316355734,
     'lng': -73.47752102706853},
    'sw': {'lat': 40.58178665371454, 'lng': -74.03809494012417}},
   'groups': [{'type': 'Recommended Places',
     'name': 'recomm

In [116]:
# Create a master dataframe to consolidate data for easy viewing and analysis.

df_master = pd.DataFrame(columns = ['City', 'Num Venues', 'MDMC'])

# City = the name of the respective city
# Num Venues = the number of concert venues in the respective city
# MDMC = mean distance from mean coordinates for the respective city cluster

df_master['City'] = cities
df_master

Unnamed: 0,City,Num Venues,MDMC
0,"New York, NY",,
1,"Chicago, IL",,
2,"San Francisco, CA",,
3,"Seattle, WA",,
4,"Boston, MA",,


## 3. Methodology

The methodology used is to examine the aforementioned Foursquare data for "Concert Halls" to do three things: (1) calculate the total number of concert venues in each city, (2) calculate the centroid (or mean coordinate) of said venues in each city, and (3) use this information to make a more informed decision regarding where to take a nice vacation post COVID-19.  The data will be examined to minimize influence of outlier data points (e.g., distant concert halls).

In [117]:
# Create a dataframe of the concert venues for each city.

df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']
    


In [118]:
# Calculate the number of concert venues in each city.

list_numvenues = []
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  

    print(f"Total number of concert halls in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")
    list_numvenues.append(results[city]['response']['totalResults'])
list_numvenues

Total number of concert halls in New York, NY =  80
Showing Top 100
Total number of concert halls in Chicago, IL =  45
Showing Top 100
Total number of concert halls in San Francisco, CA =  26
Showing Top 100
Total number of concert halls in Seattle, WA =  17
Showing Top 100
Total number of concert halls in Boston, MA =  25
Showing Top 100


[80, 45, 26, 17, 25]

In [119]:
# The following maps depict the location of concert halls in each city.
maps[cities[0]]

In [120]:
maps[cities[1]]

In [121]:
maps[cities[2]]

In [122]:
maps[cities[3]]

In [123]:
maps[cities[4]]

In [124]:
# Calcuate the mean distance of each concert hall from the mean centroid/coordinate of said cluster.  
# Please note below the line item that removes from the data set the three (3) most distant venues.

list_MDMC = []

maps = {}

for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    print(city)
    print("Mean Distance from Mean coordinates")
    dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
    dists.sort()
    value = np.mean(dists[:-3]) # Ignore the three most distant concert halls for each city in the MDMC calculation
    print(value)
    list_MDMC.append(value)
list_MDMC


New York, NY
Mean Distance from Mean coordinates
0.037085463733032335
Chicago, IL
Mean Distance from Mean coordinates
0.04859572181663778
San Francisco, CA
Mean Distance from Mean coordinates
0.011450706000091671
Seattle, WA
Mean Distance from Mean coordinates
0.0345251108598156
Boston, MA
Mean Distance from Mean coordinates
0.013596709090995179


[0.037085463733032335,
 0.04859572181663778,
 0.011450706000091671,
 0.0345251108598156,
 0.013596709090995179]

In [125]:
maps[cities[0]]

In [126]:
maps[cities[1]]

In [127]:
maps[cities[2]]

In [128]:
maps[cities[3]]

In [129]:
maps[cities[3]]

## 4. Results

This section contains consolidated results from the above analysis.  The table below lists for each city, the total number of concert halls, and the mean distance to the mean centroid/coordinate the respective cluster of said venues.  Examining the data, most of the cities have ~3 outlier data points.  Therefore, the three most distant venues were removed from each data set as to not skew results.

In [130]:
# The following is a consolidated dataframe listing each city, and for each one, the number of concert halls and mean distance from the mean coordinate/centroid of said cluster.

df_master['Num Venues'] = list_numvenues
df_master['MDMC'] = list_MDMC
df_master.sort_values(by = ['MDMC'])

Unnamed: 0,City,Num Venues,MDMC
2,"San Francisco, CA",26,0.011451
4,"Boston, MA",25,0.013597
3,"Seattle, WA",17,0.034525
0,"New York, NY",80,0.037085
1,"Chicago, IL",45,0.048596


## 5. Discussion

New York, NY clearly possesses the greatest quantity of concert halls relative to the other cities; nearly twice as much for each, but the mean distance to the mean centroid/coordinate (MDMC) is higher than other cities (0.037 km).  San Francisco, CA and Boston, MA each have ~25 venues with a similar MDMC of ~0.01 km.  All cities have several venues within a convenient distance from their respective concert hall MDMC.

## 6. Conclusion

New York, NY appears to be the best choice for me to visit for my next vacation.  It offers nearly twice the number of concert halls relative to the other cities examined in this project.  While New York's MDMC is higher than three other cities (San Francisco, Boston and Seattle), it is only ~0.024 km more, which is negligible.  Therefore, I will keep my eye on when concert venues in New York, NY start to open post COVID-19 and when they do, make a reservation in a hotel vicinity the center of New York's concert venues.