# Peer-graded Assignment Capstone Project - The Battle of Neighborhoods (Week 2)

## Introduction/Problem

Yesterday I received an email from an entrepreneur. He came up with the idea that he wanted to open a bar in New York. At the moment there are already many different bars in New York. This therefore ensures a good number of competition. To outsmart the competition, the entrepreneur was looking for the most suitable location. The entrepreneur was already thinking of questions such as: are there hotels nearby? how many other bars are there in the area? are there other places of entertainment nearby?

The aim of this project is therefore to find the best location for the entrepreneur, where the competition is as low as possible (not too many bars in that area) and where there are hotels and other places of entertainment nearby.

The entrepreneur mentioned that he wanted to open his bar in Queens (borough). This is because he lives there and he doenst want to drive far to get to his own business. So for this project the scope will be Queens and within Queens the most suitable location will be selected for the entrepreneur. 

It is the largest borough geographically and is adjacent to the borough of Brooklyn, at the western end of Long Island. To its east is Nassau County. Queens also shares water borders with the boroughs of Manhattan, the Bronx, and Staten Island (via the Rockaways). The borough of Queens is the second-largest in population. 

Queens has the most diversified economy of the five boroughs of New York City. It is home to John F. Kennedy International Airport and LaGuardia Airport, both among the world's busiest, which in turn makes the airspace above Queens among the busiest in the United States. Landmarks in Queens include Flushing Meadows–Corona Park; Citi Field, home to the New York Mets baseball team; the USTA Billie Jean King National Tennis Center, site of the US Open tennis tournament; Kaufman Astoria Studios; Silvercup Studios; and Aqueduct Racetrack.

## Data needed for project

We will need data about different venues in New York City in the neighborhoods of Queens. In order to gain that information we will use "Foursquare". Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

The data retrieved from Foursquare contains information about venues within Queens. The information obtained per venue is as follows:

1. Neighborhood
2. Name of the venue e.g. the name of a store or restaurant
3. Venue Latitude
4. Venue Longitude
5. Venue Category

This data will be used to find the best suitable location for the entrepreneur. (See introduction)

### Using the data
In this section the data will be used to find the best place for a new bar

In [1]:
import json, requests
import pandas as pd
from pandas.io.json import json_normalize
import folium

### Plot all the bars on a folium map

In [2]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id= '*****',
  client_secret= '*****',
  v='20180323',
  ll='40.742054, -73.769417',
  query= 'Bar',
  limit=500
)
resp = requests.get(url=url, params=params)
results = json.loads(resp.text)

In [3]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues.columns

Index(['flags.outsideRadius', 'reasons.count', 'reasons.items', 'referralId',
       'venue.beenHere.count', 'venue.beenHere.lastCheckinExpiredAt',
       'venue.beenHere.marked', 'venue.beenHere.unconfirmedCount',
       'venue.categories', 'venue.delivery.id',
       'venue.delivery.provider.icon.name',
       'venue.delivery.provider.icon.prefix',
       'venue.delivery.provider.icon.sizes', 'venue.delivery.provider.name',
       'venue.delivery.url', 'venue.hereNow.count', 'venue.hereNow.groups',
       'venue.hereNow.summary', 'venue.id', 'venue.location.address',
       'venue.location.cc', 'venue.location.city', 'venue.location.country',
       'venue.location.crossStreet', 'venue.location.distance',
       'venue.location.formattedAddress', 'venue.location.labeledLatLngs',
       'venue.location.lat', 'venue.location.lng',
       'venue.location.neighborhood', 'venue.location.postalCode',
       'venue.location.state', 'venue.name', 'venue.photos.count',
       'venue.photos.gr

In [4]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [5]:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.address', 'venue.location.city']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng,venue.location.address,venue.location.city
0,Press 195,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",40.763947,-73.770881,4011 Bell Blvd,Bayside
1,Fillmore's Tavern,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",40.736552,-73.802801,166-02 65th Ave,Fresh Meadows
2,BB's Pub & Grill,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",40.753585,-73.794235,17157 46th Ave,Flushing
3,Fiamma 41,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",40.763707,-73.769949,214-26 41st Ave,Bayside
4,Brian Dempsey's,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",40.764426,-73.771369,3931 Bell Blvd,Bayside


In [6]:
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng,address,city
0,Press 195,Bar,40.763947,-73.770881,4011 Bell Blvd,Bayside
1,Fillmore's Tavern,Bar,40.736552,-73.802801,166-02 65th Ave,Fresh Meadows
2,BB's Pub & Grill,Bar,40.753585,-73.794235,17157 46th Ave,Flushing
3,Fiamma 41,Bar,40.763707,-73.769949,214-26 41st Ave,Bayside
4,Brian Dempsey's,Bar,40.764426,-73.771369,3931 Bell Blvd,Bayside


In [7]:
m_bars = folium.Map(location=[40.742054, -73.769417], zoom_start = 12)

markers_colors = []
for lat, lon, name in zip(nearby_venues['lat'], 
                                   nearby_venues['lng'], 
                                   nearby_venues['name']):
    label = folium.Popup(str(name), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(m_bars)
       
m_bars

### Plot all hotels on a folium map

In [8]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id= '*****',
  client_secret= '*****',
  v='20180323',
  ll='40.742054, -73.769417',
  query= 'Hotel',
  limit=500
)
resp2 = requests.get(url=url, params=params)
hotels = json.loads(resp2.text)

In [9]:
hotels = hotels['response']['groups'][0]['items']
nearby_hotels = json_normalize(hotels)
nearby_hotels.columns

Index(['flags.outsideRadius', 'reasons.count', 'reasons.items', 'referralId',
       'venue.beenHere.count', 'venue.beenHere.lastCheckinExpiredAt',
       'venue.beenHere.marked', 'venue.beenHere.unconfirmedCount',
       'venue.categories', 'venue.delivery.id',
       'venue.delivery.provider.icon.name',
       'venue.delivery.provider.icon.prefix',
       'venue.delivery.provider.icon.sizes', 'venue.delivery.provider.name',
       'venue.delivery.url', 'venue.hereNow.count', 'venue.hereNow.groups',
       'venue.hereNow.summary', 'venue.id', 'venue.location.address',
       'venue.location.cc', 'venue.location.city', 'venue.location.country',
       'venue.location.crossStreet', 'venue.location.distance',
       'venue.location.formattedAddress', 'venue.location.labeledLatLngs',
       'venue.location.lat', 'venue.location.lng',
       'venue.location.neighborhood', 'venue.location.postalCode',
       'venue.location.state', 'venue.name', 'venue.photos.count',
       'venue.photos.gr

In [10]:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.address', 'venue.location.city']
nearby_hotels =nearby_hotels.loc[:, filtered_columns]
nearby_hotels.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng,venue.location.address,venue.location.city
0,Wyndham Garden Fresh Meadows,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",40.739418,-73.787829,6127 186th Street,New York
1,Fairfield Inn & Suites by Marriott New York Qu...,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",40.740296,-73.790168,183-31 Horace Harding Expy,Fresh Meadows
2,Anchor Inn,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",40.760724,-73.766404,215-34 Northern Blvd,Bayside
3,Hyatt Place Flushing/Laguardia Airport,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",40.75946,-73.832799,133-42 39th Ave,Flushing
4,Wyndham Garden Mayflower,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",40.736984,-73.78387,64-45 188th Street,Fresh Meadows


In [11]:
nearby_hotels['venue.categories'] = nearby_hotels.apply(get_category_type, axis=1)

# clean columns
nearby_hotels.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_hotels.head()

Unnamed: 0,name,categories,lat,lng,address,city
0,Wyndham Garden Fresh Meadows,Hotel,40.739418,-73.787829,6127 186th Street,New York
1,Fairfield Inn & Suites by Marriott New York Qu...,Hotel,40.740296,-73.790168,183-31 Horace Harding Expy,Fresh Meadows
2,Anchor Inn,Hotel,40.760724,-73.766404,215-34 Northern Blvd,Bayside
3,Hyatt Place Flushing/Laguardia Airport,Hotel,40.75946,-73.832799,133-42 39th Ave,Flushing
4,Wyndham Garden Mayflower,Hotel,40.736984,-73.78387,64-45 188th Street,Fresh Meadows


In [12]:
m_hotels = folium.Map(location=[40.742054, -73.769417], zoom_start = 12)

markers_colors = []
for lat, lon, name in zip(nearby_hotels['lat'], 
                            nearby_hotels['lng'], 
                            nearby_hotels['name']):
    label = folium.Popup(str(name), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='#F41616',
        fill=True,
        fill_opacity=0.7).add_to(m_hotels)
       
m_hotels

### Plot all hotels and bars in one folium map

In [13]:
m_hotelsandbars = folium.Map(location=[40.742054, -73.769417], zoom_start = 12)

markers_colors = []
for lat, lon, name in zip(nearby_hotels['lat'], 
                            nearby_hotels['lng'], 
                            nearby_hotels['name']):
    label = folium.Popup(str(name) + '(Hotel)', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='#F41616',
        fill=True,
        fill_opacity=0.7).add_to(m_hotelsandbars)

for lat, lon, name in zip(nearby_venues['lat'], 
                            nearby_venues['lng'], 
                            nearby_venues['name']):
    label = folium.Popup(str(name) + '(Bar)', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(m_hotelsandbars)
       
m_hotelsandbars

In the folium map above the blue dots are bars and the red dots are hotels

### Checking other venues then bars/hotels in Queens
I had to do it in parts because i couldn't get all of the venues in Queens in one go. Even when i changed the limit to 10000 it still gave me only 100-200 venues.

First part of the venues

In [14]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id= '*****',
  client_secret= '*****',
  v='20180323',
  ll='40.742054, -73.769417',
  limit=5000
)
resp = requests.get(url=url, params=params)
commonvenues = json.loads(resp.text)

commonvenues = commonvenues['response']['groups'][0]['items']
common_venues = json_normalize(commonvenues)

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.address', 'venue.location.city']
common_venues = common_venues.loc[:, filtered_columns]

common_venues['venue.categories'] = common_venues.apply(get_category_type, axis=1)

# clean columns
common_venues.columns = [col.split(".")[-1] for col in common_venues.columns]

common_venues.head()

Unnamed: 0,name,categories,lat,lng,address,city
0,Cunningham Park North Woods Mountain Bike Trails,Bike Trail,40.742346,-73.765238,210 Street & 67th Avenue,Queens
1,Fresh Meadows Pizzeria and Restaurant,Pizza Place,40.73711,-73.778166,19509 69th Ave,Fresh Meadows
2,Cunningham Park Trail,Trail,40.734473,-73.773975,,Fresh Meadows
3,Sweet Adele's,Snack Place,40.73883,-73.762745,73-10 Bell Blvd,Oakland Gardens
4,AMC Fresh Meadows 7,Movie Theater,40.741098,-73.784097,190-02 Horace Harding Expy,Fresh Meadows


Second part of the venues

In [15]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id= '*****',
  client_secret= '*****',
  v='20180323',
  ll='40.744574, -73.825109',
  limit=5000
)
resp = requests.get(url=url, params=params)
commonvenues2 = json.loads(resp.text)

commonvenues2 = commonvenues2['response']['groups'][0]['items']
common_venues2 = json_normalize(commonvenues2)

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.address', 'venue.location.city']
common_venues2 = common_venues2.loc[:, filtered_columns]

common_venues2['venue.categories'] = common_venues2.apply(get_category_type, axis=1)

# clean columns
common_venues2.columns = [col.split(".")[-1] for col in common_venues2.columns]

common_venues2.head()

Unnamed: 0,name,categories,lat,lng,address,city
0,Kung Fu Xiao Long Bao,Dumpling Restaurant,40.74338,-73.825741,59-16 Main St,Flushing
1,Yeh's Bakery 紅葉,Bakery,40.745714,-73.825912,5725 Main St,Flushing
2,New Bodai Vegetarian 新菩提素食,Vegetarian / Vegan Restaurant,40.74363,-73.825807,5908 Main St,Flushing
3,Tea Shop 168 & Bakery,Bakery,40.743241,-73.825726,5920 Main St,Flushing
4,Main Street Taiwanese Gourmet 北港台菜,Chinese Restaurant,40.743538,-73.825825,59-10 Main St,Flushing


Third part of the venues

In [16]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id= '*****',
  client_secret= '*****',
  v='20180323',
  ll='40.770260, -73.810282',
  limit=5000
)
resp = requests.get(url=url, params=params)
commonvenues3 = json.loads(resp.text)

commonvenues3 = commonvenues3['response']['groups'][0]['items']
common_venues3 = json_normalize(commonvenues3)

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.address', 'venue.location.city']
common_venues3 = common_venues3.loc[:, filtered_columns]

common_venues3['venue.categories'] = common_venues3.apply(get_category_type, axis=1)

# clean columns
common_venues3.columns = [col.split(".")[-1] for col in common_venues3.columns]

common_venues3.head()

Unnamed: 0,name,categories,lat,lng,address,city
0,Bowne Park,Park,40.7702,-73.807699,29th Ave,Flushing
1,Mad For Chicken,Korean Restaurant,40.763426,-73.807724,15718 Northern Blvd,Flushing
2,Hahm Ji Bach - 함지박,Korean Restaurant,40.763022,-73.815042,40-11 149th Pl,Flushing
3,NY Puppy Club,Pet Service,40.765407,-73.817102,149-05 Northern Blvd,Flushing
4,Mapo BBQ,Korean Restaurant,40.762309,-73.81488,14924 41st Ave,Flushing


Fourth part of the venues

In [17]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id= '*****',
  client_secret= '*****',
  v='20180323',
  ll='40.705672, -73.811867',
  limit=5000
)
resp = requests.get(url=url, params=params)
commonvenues4 = json.loads(resp.text)

commonvenues4 = commonvenues4['response']['groups'][0]['items']
common_venues4 = json_normalize(commonvenues4)

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.address', 'venue.location.city']
common_venues4 = common_venues4.loc[:, filtered_columns]

common_venues4['venue.categories'] = common_venues4.apply(get_category_type, axis=1)

# clean columns
common_venues4.columns = [col.split(".")[-1] for col in common_venues4.columns]

common_venues4.head()

Unnamed: 0,name,categories,lat,lng,address,city
0,Punto Rojo,South American Restaurant,40.705917,-73.809005,14716 Hillside Ave,Jamaica
1,Hado Sushi,Sushi Restaurant,40.707861,-73.81707,138-40 86th Ave,Briarwood
2,iLoveKickboxing,Boxing Gym,40.702594,-73.818747,132-40 Metropolitan Ave,Richmond Hill
3,Pupusa Market,Latin American Restaurant,40.70195,-73.809273,14516 Jamaica Ave,Jamaica
4,El Rey Restaurant,Spanish Restaurant,40.706008,-73.809158,15000 Hillside Ave,Jamaica


Fifth part of venues

In [18]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
  client_id= '*****',
  client_secret= '*****',
  v='20180323',
  ll='40.732000, -73.814583',
  limit=5000
)
resp = requests.get(url=url, params=params)
commonvenues5 = json.loads(resp.text)

commonvenues5 = commonvenues5['response']['groups'][0]['items']
common_venues5 = json_normalize(commonvenues5)

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.address', 'venue.location.city']
common_venues5 = common_venues5.loc[:, filtered_columns]

common_venues5['venue.categories'] = common_venues5.apply(get_category_type, axis=1)

# clean columns
common_venues5.columns = [col.split(".")[-1] for col in common_venues5.columns]

common_venues5.head()

Unnamed: 0,name,categories,lat,lng,address,city
0,Micro Center,Electronics Store,40.72898,-73.814787,71-43 Kissena Blvd,Flushing
1,Valentino's Pizzeria & Restaurant,Pizza Place,40.728667,-73.815084,71-47 Kissena Blvd,Flushing
2,The Oneness-Fountain-Heart,Vegetarian / Vegan Restaurant,40.727897,-73.811311,157-19 72nd Ave,Flushing
3,Gino's Pizzeria,Pizza Place,40.737097,-73.814561,6501 Kissena Blvd,Flushing
4,Naomi's Kosher Pizza,Pizza Place,40.732449,-73.825176,6828 Main St,Flushing


In [19]:
df_allvenues = pd.concat([common_venues, common_venues2, common_venues3, common_venues4, common_venues5])

In [20]:
m_othervenues = folium.Map(location=[40.742054, -73.769417], zoom_start = 12)

markers_colors = []
for lat, lon, name in zip(df_allvenues['lat'], 
                            df_allvenues['lng'], 
                            df_allvenues['name']):
    label = folium.Popup(str(name), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(m_othervenues)
       
m_othervenues

### Combine bars, hotels and all other venues in one map

In [21]:
m_everything = folium.Map(location=[40.742054, -73.769417], zoom_start = 12)

markers_colors = []
for lat, lon, name in zip(df_allvenues['lat'], 
                            df_allvenues['lng'], 
                            df_allvenues['name']):
    label = folium.Popup(str(name) + '(Other)', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='#14B400',
        fill=True,
        fill_opacity=0.7).add_to(m_everything)
    
for lat, lon, name in zip(nearby_hotels['lat'], 
                            nearby_hotels['lng'], 
                            nearby_hotels['name']):
    label = folium.Popup(str(name) + '(Hotel)', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='#F41616',
        fill=True,
        fill_opacity=0.7).add_to(m_everything)

for lat, lon, name in zip(nearby_venues['lat'], 
                            nearby_venues['lng'], 
                            nearby_venues['name']):
    label = folium.Popup(str(name) + '(Bar)', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(m_everything)
       
m_everything

Green are other venues, Red are hotels and Blue are other bars

### Find the best neighbourhood for a new bar
the best spot is where there are a lot of hotels and other venues but not a lot of bars

In [22]:
df_totalhotelsneigh = nearby_hotels['city'].value_counts()

In [23]:
df_totalbarsneigh = nearby_venues['city'].value_counts()

In [24]:
df_totalothervenues = df_allvenues['city'].value_counts()

In [25]:
df_everything = pd.concat([df_totalbarsneigh, df_totalhotelsneigh, df_totalothervenues], axis=1, sort=False)
df_everything.columns = ['bars', 'hotels', 'other_venues']
df_everything

Unnamed: 0,bars,hotels,other_venues
Bayside,26.0,4.0,45.0
Flushing,25.0,16.0,253.0
Jamaica,9.0,17.0,69.0
Forest Hills,6.0,1.0,
Fresh Meadows,5.0,3.0,45.0
Queens,5.0,5.0,19.0
Bellerose,4.0,1.0,
New York,4.0,4.0,13.0
Little Neck,3.0,,
Douglaston,3.0,,


In [26]:
df_everything['total'] = df_everything['hotels'] + df_everything['other_venues'] + df_everything['bars']

In [27]:
df_everything

Unnamed: 0,bars,hotels,other_venues,total
Bayside,26.0,4.0,45.0,75.0
Flushing,25.0,16.0,253.0,294.0
Jamaica,9.0,17.0,69.0,95.0
Forest Hills,6.0,1.0,,
Fresh Meadows,5.0,3.0,45.0,53.0
Queens,5.0,5.0,19.0,29.0
Bellerose,4.0,1.0,,
New York,4.0,4.0,13.0,21.0
Little Neck,3.0,,,
Douglaston,3.0,,,


In [28]:
df_everything['Percentage_Bars'] = (df_everything['bars'] / df_everything['total']) *100

In [29]:
df_everything

Unnamed: 0,bars,hotels,other_venues,total,Percentage_Bars
Bayside,26.0,4.0,45.0,75.0,34.666667
Flushing,25.0,16.0,253.0,294.0,8.503401
Jamaica,9.0,17.0,69.0,95.0,9.473684
Forest Hills,6.0,1.0,,,
Fresh Meadows,5.0,3.0,45.0,53.0,9.433962
Queens,5.0,5.0,19.0,29.0,17.241379
Bellerose,4.0,1.0,,,
New York,4.0,4.0,13.0,21.0,19.047619
Little Neck,3.0,,,,
Douglaston,3.0,,,,


### Conclusion
The best percentage bars against hotels/other venues is in Flushing. Though there are a lot of bars there but it is still the least percentage of the total venues.

If the entrepreneur really wants to go into a neighbourhood where there are not a lot of bars but quite a lot of hotels and other venues then Jamaica would be an option.
