# BATTLE OF THE NEIGHBORHOODS PROJECT
## For the Coursera IBM Data Science Course Capstone Project (Week 2 Assignment) 

## Background of the Problem Statement

<p>Building a good chain of outlets for any buisness is a necessity in the current world to survive as a brand. Many brands are unable to sustain in the market even after good quality of products and services for just one most important reason, which is, bad placement of the outlet/branch or not expanding to the correct location at the correct time.</p>
    <p>Correct time of expansion depends upon the brand, which usually depends on the quality or products and services as good quality of products and services gain them the necessary fundings to expand. After that, it's upon the brand to invest their resources on expansion or modification of existing outlets.</p>
    <p>Though modifications of current outlet/branch is a good step, but in most of the cases, in contrast to expansion, it's effects on the profits is very less.</p>
    <p>Correct placement of the outlet/branch in a given neighborhood is a very important step which must be done with all the necessary background studies done as one wrong placement can result into huge loss, and thus we decided to deal with this particular problem. Our area of concern for this project will be the state of New York.</p>
    

## Problem Statement

Keeping in mind the problem stated in the background study, and for a sample client in our scenario, i.e., a Pizza Place owner. Thus the problem statement can be stated as:<br>
**"To find the best locations in New York State for the expansion of a Pizza Place based in Carnegie Hill, Manhattan, NY."**

## Data Description

<p>Data with us is in the form of a JSON file which contains all the boroughs and their neighborhoods in New York State and their locations in the form of latitude and longitude.</p>
<p> We will also be leveraging the foursquare API to retrieve all the common venues in every neighborhood, which returns the data of all nearby venues (within specified range)(name and LatLng values) in the form of a JSON file.</p>

## Data Usage

<p> Data retrieved will be used to find all nearby venues for each neighborhood and then the retrieved data will be evaluated to find which areas do not have sufficient number of such places and/or sufficient number of venues in the area.</p>
<p> Result will be shown to the client as which areas will be best in New York state to expand the buisness.</p>

# ----------------------------------------------Project Code--------------------------------------------

## Importing all required dependencies

In [1]:
import json
import pandas as pd
import numpy as np
import geopy
import requests
import folium
from geopy import Nominatim

#### Load the JSON dataset of New York state

Load the dataset

In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Look into the dataset for further use

In [3]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [4]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [5]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [7]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time and quickly examine through the dataFrame created and make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [9]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use geopy library to get the latitude and longitude values of New York City.

In [10]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.


In [11]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

#### Define Foursquare Credentials and Version

In [12]:
CLIENT_ID = 'KZD13HJQZGW3IBKYAJGE3S2GMBHRHMC0SOSRWYUYXKHBRK1U' # your Foursquare ID
CLIENT_SECRET = '3XKWNBMBDVT0ZKXV1HLWIE1USOQGWJBQ4I3Q235KTFGD1GYS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#### Let's explore all boroughs in our DataFrame

In [13]:
neighborhoods['Borough'].unique()

array(['Bronx', 'Manhattan', 'Brooklyn', 'Queens', 'Staten Island'],
      dtype=object)

We have 5 different Boroughs in the state of New York which can be analysed to select the best location for our client to expand their buisness.

Let's divide each borough into separate DataFrames for easier ananlysis of neighborhoods in each borough.

In [14]:
bronx_data = neighborhoods[neighborhoods['Borough'] == 'Bronx'].reset_index(drop=True)
bronx_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [15]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [16]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


In [17]:
queens_data = neighborhoods[neighborhoods['Borough'] == 'Queens'].reset_index(drop=True)
queens_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Queens,Astoria,40.768509,-73.915654
1,Queens,Woodside,40.746349,-73.901842
2,Queens,Jackson Heights,40.751981,-73.882821
3,Queens,Elmhurst,40.744049,-73.881656
4,Queens,Howard Beach,40.654225,-73.838138


In [18]:
staten_island_data = neighborhoods[neighborhoods['Borough'] == 'Staten Island'].reset_index(drop=True)
staten_island_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Staten Island,St. George,40.644982,-74.079353
1,Staten Island,New Brighton,40.640615,-74.087017
2,Staten Island,Stapleton,40.626928,-74.077902
3,Staten Island,Rosebank,40.615305,-74.069805
4,Staten Island,West Brighton,40.631879,-74.107182


First, let's create a function for the GET request URL. Name your URL **url** and also set the search radius to 500m and maximum limit of number of places as 100.

In [19]:
LIMIT = 100
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now we run the above function on each neighborhood and create a new dataframe called ">>Borough_Name<<_venues."

In [20]:
bronx_venues = getNearbyVenues(names=bronx_data['Neighborhood'],
                                   latitudes=bronx_data['Latitude'],
                                   longitudes=bronx_data['Longitude']
                                  )
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )
queens_venues = getNearbyVenues(names=queens_data['Neighborhood'],
                                   latitudes=queens_data['Latitude'],
                                   longitudes=queens_data['Longitude']
                                  )
staten_island_venues = getNearbyVenues(names=staten_island_data['Neighborhood'],
                                   latitudes=staten_island_data['Latitude'],
                                   longitudes=staten_island_data['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Claremont Village
Concourse Village
Mount Eden
Mount Hope
Bronxdale
Allerton
Kingsbridge Heights
Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery 

Let's have a look at the recently made nearby venues tables.

In [21]:
bronx_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
3,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


In [22]:
manhattan_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


In [23]:
brooklyn_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Leo's Casa Calamari,40.6242,-74.030931,Pizza Place
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,The Bookmark Shoppe,40.624577,-74.030562,Bookstore


In [24]:
queens_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Astoria,40.768509,-73.915654,Favela Grill,40.767348,-73.917897,Brazilian Restaurant
1,Astoria,40.768509,-73.915654,Orange Blossom,40.769856,-73.917012,Gourmet Shop
2,Astoria,40.768509,-73.915654,Titan Foods Inc.,40.769198,-73.919253,Gourmet Shop
3,Astoria,40.768509,-73.915654,CrossFit Queens,40.769404,-73.918977,Gym
4,Astoria,40.768509,-73.915654,Off The Hook,40.7672,-73.918104,Seafood Restaurant


In [25]:
staten_island_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,St. George,40.644982,-74.079353,A&S Pizzeria,40.64394,-74.077626,Pizza Place
1,St. George,40.644982,-74.079353,Beso,40.643306,-74.076508,Tapas Restaurant
2,St. George,40.644982,-74.079353,Staten Island September 11 Memorial,40.646767,-74.07651,Monument / Landmark
3,St. George,40.644982,-74.079353,Richmond County Bank Ballpark,40.645056,-74.076864,Baseball Stadium
4,St. George,40.644982,-74.079353,Shake Shack,40.64366,-74.075891,Burger Joint


We know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [26]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Let's check the sizes of the resulting dataframes

In [27]:
print('Bronx Venues DataFrame size: ', bronx_venues.shape)
print('Manhattan Venues DataFrame size: ', manhattan_venues.shape)
print('Brooklyn Venues DataFrame size: ', brooklyn_venues.shape)
print('Queens Venues DataFrame size: ', queens_venues.shape)
print('Staten Islands Venues DataFrame size: ', staten_island_venues.shape)

Bronx Venues DataFrame size:  (1206, 7)
Manhattan Venues DataFrame size:  (3077, 7)
Brooklyn Venues DataFrame size:  (2733, 7)
Queens Venues DataFrame size:  (2062, 7)
Staten Islands Venues DataFrame size:  (829, 7)


Let's check how many venues were returned for each neighborhood in each borough

In [28]:
bronx_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,33,33,33,33,33,33
Baychester,22,22,22,22,22,22
Bedford Park,35,35,35,35,35,35
Belmont,96,96,96,96,96,96
Bronxdale,12,12,12,12,12,12
Castle Hill,9,9,9,9,9,9
City Island,28,28,28,28,28,28
Claremont Village,16,16,16,16,16,16
Clason Point,10,10,10,10,10,10
Co-op City,17,17,17,17,17,17


In [29]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,67,67,67,67,67,67
Carnegie Hill,85,85,85,85,85,85
Central Harlem,46,46,46,46,46,46
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,96,96,96,96,96,96
Clinton,100,100,100,100,100,100
East Harlem,42,42,42,42,42,42
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


In [30]:
brooklyn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,46,46,46,46,46,46
Bay Ridge,81,81,81,81,81,81
Bedford Stuyvesant,29,29,29,29,29,29
Bensonhurst,31,31,31,31,31,31
Bergen Beach,6,6,6,6,6,6
Boerum Hill,94,94,94,94,94,94
Borough Park,20,20,20,20,20,20
Brighton Beach,44,44,44,44,44,44
Broadway Junction,18,18,18,18,18,18
Brooklyn Heights,100,100,100,100,100,100


In [31]:
queens_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arverne,18,18,18,18,18,18
Astoria,100,100,100,100,100,100
Astoria Heights,10,10,10,10,10,10
Auburndale,18,18,18,18,18,18
Bay Terrace,38,38,38,38,38,38
Bayside,70,70,70,70,70,70
Bayswater,2,2,2,2,2,2
Beechhurst,16,16,16,16,16,16
Bellaire,14,14,14,14,14,14
Belle Harbor,17,17,17,17,17,17


In [32]:
staten_island_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Annadale,11,11,11,11,11,11
Arden Heights,5,5,5,5,5,5
Arlington,7,7,7,7,7,7
Arrochar,20,20,20,20,20,20
Bay Terrace,8,8,8,8,8,8
Bloomfield,5,5,5,5,5,5
Bulls Head,44,44,44,44,44,44
Butler Manor,7,7,7,7,7,7
Castleton Corners,15,15,15,15,15,15
Charleston,29,29,29,29,29,29


<strong>Note</strong>: Number of venues in each neighborhood is a very important factor for our project as neighborhoods with less number of venues are very open to new buisness as compared to neighborhoods with more number of venues.

### Now we will analyze each neighborhood in each borough to find which neighborhoods will make a good spot to open a new chain for our client.

Analysis of Bronx Borough

In [33]:
# one hot encoding
bronx_onehot = pd.get_dummies(bronx_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bronx_onehot['Neighborhood'] = bronx_venues['Neighborhood'] 

# move neighborhood column to the first column
col_name = 'Neighborhood'
neighborhood_col = bronx_onehot.pop(col_name)
bronx_onehot.insert(0, col_name, neighborhood_col)

bronx_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waste Facility,Wings Joint,Women's Store
0,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
col_name = 'Neighborhood'
neighborhood_col = manhattan_onehot.pop(col_name)
manhattan_onehot.insert(0, col_name, neighborhood_col)

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [35]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 

# move neighborhood column to the first column
col_name = 'Neighborhood'
neighborhood_col = brooklyn_onehot.pop(col_name)
brooklyn_onehot.insert(0, col_name, neighborhood_col)

brooklyn_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
# one hot encoding
queens_onehot = pd.get_dummies(queens_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
queens_onehot['Neighborhood'] = queens_venues['Neighborhood'] 

# move neighborhood column to the first column
col_name = 'Neighborhood'
neighborhood_col = queens_onehot.pop(col_name)
queens_onehot.insert(0, col_name, neighborhood_col)

queens_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Astoria,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Astoria,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Astoria,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Astoria,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Astoria,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [37]:
# one hot encoding
staten_island_onehot = pd.get_dummies(staten_island_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
staten_island_onehot['Neighborhood'] = staten_island_venues['Neighborhood'] 

# move neighborhood column to the first column
col_name = 'Neighborhood'
neighborhood_col = staten_island_onehot.pop(col_name)
staten_island_onehot.insert(0, col_name, neighborhood_col)

staten_island_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,St. George,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,St. George,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,St. George,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,St. George,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,St. George,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [38]:
bronx_grouped = bronx_onehot.groupby('Neighborhood').mean().reset_index()
bronx_grouped

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waste Facility,Wings Joint,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Baychester,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bedford Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Belmont,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0
4,Bronxdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Castle Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,City Island,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Claremont Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Clason Point,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Co-op City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.014925,0.044776,0.0,0.014925,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,...,0.0,0.023529,0.0,0.0,0.0,0.011765,0.035294,0.0,0.0,0.035294
2,Central Harlem,0.0,0.0,0.0,0.065217,0.043478,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
5,Civic Center,0.0,0.0,0.0,0.0,0.052083,0.010417,0.0,0.0,0.0,...,0.0,0.010417,0.0,0.0,0.010417,0.020833,0.020833,0.0,0.0,0.03125
6,Clinton,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,...,0.0,0.02,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0
9,Financial District,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01


In [40]:
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()
brooklyn_grouped

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Bath Beach,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.00,0.000000,0.000000,...,0.021739,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,Bay Ridge,0.000000,0.000000,0.000000,0.037037,0.00,0.000000,0.00,0.000000,0.000000,...,0.012346,0.000000,0.012346,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.012346
2,Bedford Stuyvesant,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.00,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.034483,0.034483,0.000000,0.000000,0.000000
3,Bensonhurst,0.000000,0.000000,0.000000,0.032258,0.00,0.000000,0.00,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,Bergen Beach,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.00,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,Boerum Hill,0.000000,0.000000,0.000000,0.010638,0.00,0.010638,0.00,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.010638,0.000000,0.000000,0.021277
6,Borough Park,0.000000,0.000000,0.000000,0.050000,0.00,0.000000,0.00,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,Brighton Beach,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.00,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Broadway Junction,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.00,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
9,Brooklyn Heights,0.000000,0.000000,0.000000,0.020000,0.00,0.000000,0.00,0.000000,0.000000,...,0.000000,0.000000,0.010000,0.000000,0.000000,0.010000,0.020000,0.000000,0.010000,0.040000


In [41]:
queens_grouped = queens_onehot.groupby('Neighborhood').mean().reset_index()
queens_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Arverne,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.055556,0.0,0.000000,0.000000
1,Astoria,0.000000,0.000000,0.010000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.020000,0.0,0.000000,0.000000
2,Astoria Heights,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
3,Auburndale,0.000000,0.000000,0.055556,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
4,Bay Terrace,0.026316,0.000000,0.052632,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.026316,0.000000,0.000000,0.0,0.026316,0.000000,0.000000,0.0,0.052632,0.000000
5,Bayside,0.000000,0.000000,0.042857,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.014286,0.0,0.000000,0.014286,0.000000,0.0,0.000000,0.014286
6,Bayswater,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
7,Beechhurst,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.062500
8,Bellaire,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
9,Belle Harbor,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000


In [42]:
staten_island_grouped = staten_island_onehot.groupby('Neighborhood').mean().reset_index()
staten_island_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Annadale,0.00,0.090909,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.090909,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
1,Arden Heights,0.00,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
2,Arlington,0.00,0.142857,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
3,Arrochar,0.00,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.050000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
4,Bay Terrace,0.00,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.125000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
5,Bloomfield,0.00,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
6,Bulls Head,0.00,0.022727,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.000000,0.022727,0.000000,0.00,0.022727,0.000000
7,Butler Manor,0.00,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
8,Castleton Corners,0.00,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
9,Charleston,0.00,0.034483,0.00,0.000000,0.0,0.034483,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.034483,0.000000,0.000000,0.00,0.000000,0.000000


### Let's find the most common venues in each borough for each neighborhood for a better understanding of each neighborhood

In [43]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Most Common Venues for Bronx Borough

In [44]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
bronx_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
bronx_neighborhoods_venues_sorted['Neighborhood'] = bronx_grouped['Neighborhood']

for ind in np.arange(bronx_grouped.shape[0]):
    bronx_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bronx_grouped.iloc[ind, :], num_top_venues)

bronx_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Pizza Place,Supermarket,Deli / Bodega,Chinese Restaurant,Bus Station,Fast Food Restaurant,Martial Arts Dojo,Spanish Restaurant,Pharmacy,Food
1,Baychester,Electronics Store,Donut Shop,Bank,Baseball Field,Bus Station,Mattress Store,Mexican Restaurant,Pet Store,Spanish Restaurant,Fast Food Restaurant
2,Bedford Park,Diner,Chinese Restaurant,Deli / Bodega,Pizza Place,Mexican Restaurant,Spanish Restaurant,Sandwich Place,Supermarket,Fried Chicken Joint,Bus Station
3,Belmont,Italian Restaurant,Pizza Place,Deli / Bodega,Bakery,Donut Shop,Bank,Dessert Shop,Grocery Store,Bar,Mexican Restaurant
4,Bronxdale,Mexican Restaurant,Spanish Restaurant,Pizza Place,Performing Arts Venue,Breakfast Spot,Italian Restaurant,Gym,Paper / Office Supplies Store,Chinese Restaurant,Bank


Most Common Venues for Manhattan Borough

In [45]:
manhattan_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
manhattan_neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    manhattan_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

manhattan_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Coffee Shop,Hotel,Memorial Site,Gym,Wine Shop,Shopping Mall,Gourmet Shop,Plaza,Boat or Ferry
1,Carnegie Hill,Coffee Shop,Pizza Place,Café,Yoga Studio,Gym,Wine Shop,Bar,Bookstore,Japanese Restaurant,Grocery Store
2,Central Harlem,African Restaurant,Chinese Restaurant,French Restaurant,American Restaurant,Bar,Cosmetics Shop,Fried Chicken Joint,Seafood Restaurant,Food Truck,Market
3,Chelsea,Art Gallery,Coffee Shop,Café,Bakery,Ice Cream Shop,American Restaurant,Italian Restaurant,Theater,Seafood Restaurant,Hotel
4,Chinatown,Chinese Restaurant,Optical Shop,Bakery,Cocktail Bar,Bubble Tea Shop,Salon / Barbershop,Spa,Ice Cream Shop,American Restaurant,Coffee Shop


Most Common Venues for Brooklyn Borough

In [46]:
brooklyn_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
brooklyn_neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    brooklyn_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

brooklyn_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Pharmacy,Chinese Restaurant,Sushi Restaurant,Pizza Place,Italian Restaurant,Bubble Tea Shop,Gas Station,Fast Food Restaurant,Ice Cream Shop,Sandwich Place
1,Bay Ridge,Spa,Italian Restaurant,Pizza Place,Greek Restaurant,American Restaurant,Bar,Thai Restaurant,Ice Cream Shop,Playground,Pharmacy
2,Bedford Stuyvesant,Bus Stop,Café,Pizza Place,Coffee Shop,Bar,Discount Store,Cocktail Bar,Thrift / Vintage Store,Gourmet Shop,Basketball Court
3,Bensonhurst,Chinese Restaurant,Ice Cream Shop,Sushi Restaurant,Italian Restaurant,Bakery,Donut Shop,Pizza Place,Noodle House,Cosmetics Shop,Butcher
4,Bergen Beach,Harbor / Marina,Donut Shop,Baseball Field,Playground,Athletics & Sports,Filipino Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


Most Common Venues for Queens Borough

In [47]:
queens_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
queens_neighborhoods_venues_sorted['Neighborhood'] = queens_grouped['Neighborhood']

for ind in np.arange(queens_grouped.shape[0]):
    queens_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(queens_grouped.iloc[ind, :], num_top_venues)

queens_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arverne,Surf Spot,Metro Station,Sandwich Place,Playground,Wine Shop,Bed & Breakfast,Pizza Place,Thai Restaurant,Bus Stop,Beach
1,Astoria,Middle Eastern Restaurant,Bar,Indian Restaurant,Hookah Bar,Mediterranean Restaurant,Greek Restaurant,Bakery,Deli / Bodega,Café,Seafood Restaurant
2,Astoria Heights,Burger Joint,Italian Restaurant,Bakery,Supermarket,Bus Station,Pizza Place,Bowling Alley,Hostel,Playground,Plaza
3,Auburndale,Mobile Phone Shop,Italian Restaurant,Supermarket,Korean Restaurant,Bar,Fast Food Restaurant,Furniture / Home Store,Toy / Game Store,Noodle House,Athletics & Sports
4,Bay Terrace,Clothing Store,Women's Store,Mobile Phone Shop,American Restaurant,Lingerie Store,Cosmetics Shop,Donut Shop,Kids Store,Shoe Store,Gift Shop


Most Common Venues for Staten Island Borough

In [48]:
staten_island_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
staten_island_neighborhoods_venues_sorted['Neighborhood'] = staten_island_grouped['Neighborhood']

for ind in np.arange(staten_island_grouped.shape[0]):
    staten_island_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(staten_island_grouped.iloc[ind, :], num_top_venues)

staten_island_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Annadale,Pizza Place,Train Station,Sports Bar,Bakery,Liquor Store,Restaurant,Diner,Dance Studio,Pub,American Restaurant
1,Arden Heights,Pharmacy,Bus Stop,Lawyer,Coffee Shop,Pizza Place,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Fish & Chips Shop
2,Arlington,Intersection,Grocery Store,Coffee Shop,Bus Stop,Boat or Ferry,Deli / Bodega,American Restaurant,Hotel,Falafel Restaurant,Food Truck
3,Arrochar,Bus Stop,Italian Restaurant,Deli / Bodega,Supermarket,Mediterranean Restaurant,Middle Eastern Restaurant,Food Truck,Outdoors & Recreation,Pizza Place,Polish Restaurant
4,Bay Terrace,Supermarket,Insurance Office,Train Station,Sushi Restaurant,Donut Shop,Salon / Barbershop,Shipping Store,Diner,Farmers Market,Food


Now we narrow down the neighborhoods where our Venue Category of concern, i.e., Pizza Place, is not in the top 10 most common venues and combine them into one table because if our category of concern is not so common in any area, then the business of our client will have greater chance to succeed since they will be one of a kind in the area and will have higher chances of being liked by the people and will have lesser competition.

In [49]:
suggestedAreas = pd.DataFrame(columns=['Borough', 'Neighborhood', 'Latitude', 'Longitude'])

Now we will check for all areas in Bronx Borough and add the neighborhoods which do not contain our Venue Category of concern, i.e., Pizza Place, among the top 10 most common venues.

In [50]:
common_venues = []
for i in range(bronx_neighborhoods_venues_sorted.shape[0]):
    for j in range(1, bronx_neighborhoods_venues_sorted.shape[1]):
        common_venues.append(bronx_neighborhoods_venues_sorted.iloc[i][j])
    if "Pizza Place" not in common_venues:
        neigh = bronx_neighborhoods_venues_sorted.iloc[i][0]
        lat = bronx_data.loc[bronx_data['Neighborhood'] == bronx_neighborhoods_venues_sorted.iloc[i][0], 'Latitude'].values[0]
        lng = bronx_data.loc[bronx_data['Neighborhood'] == bronx_neighborhoods_venues_sorted.iloc[i][0], 'Longitude'].values[0]
        suggestedAreas = suggestedAreas.append({'Borough': 'Bronx', 'Neighborhood': neigh, 'Latitude': lat, 'Longitude': lng}, ignore_index=True)
    common_venues=[]
suggestedAreas

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Baychester,40.866858,-73.835798
1,Bronx,Claremont Village,40.831428,-73.901199
2,Bronx,Clason Point,40.806551,-73.854144
3,Bronx,Co-op City,40.874294,-73.829939
4,Bronx,Concourse,40.834284,-73.915589
5,Bronx,Concourse Village,40.82478,-73.915847
6,Bronx,Country Club,40.844246,-73.824099
7,Bronx,Eastchester,40.887556,-73.827806
8,Bronx,Fieldston,40.895437,-73.905643
9,Bronx,Hunts Point,40.80973,-73.883315


Similarly let's add neighborhoods of all other boroughs in New York state where "Pizza Place" is not among the top 10 most common places.

In [51]:
common_venues = []
for i in range(manhattan_neighborhoods_venues_sorted.shape[0]):
    for j in range(1, manhattan_neighborhoods_venues_sorted.shape[1]):
        common_venues.append(manhattan_neighborhoods_venues_sorted.iloc[i][j])
    if "Pizza Place" not in common_venues:
        neigh = manhattan_neighborhoods_venues_sorted.iloc[i][0]
        lat = manhattan_data.loc[manhattan_data['Neighborhood'] == manhattan_neighborhoods_venues_sorted.iloc[i][0], 'Latitude'].values[0]
        lng = manhattan_data.loc[manhattan_data['Neighborhood'] == manhattan_neighborhoods_venues_sorted.iloc[i][0], 'Longitude'].values[0]
        suggestedAreas = suggestedAreas.append({'Borough': 'Manhattan', 'Neighborhood': neigh, 'Latitude': lat, 'Longitude': lng}, ignore_index=True)
    common_venues=[]
suggestedAreas

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Baychester,40.866858,-73.835798
1,Bronx,Claremont Village,40.831428,-73.901199
2,Bronx,Clason Point,40.806551,-73.854144
3,Bronx,Co-op City,40.874294,-73.829939
4,Bronx,Concourse,40.834284,-73.915589
5,Bronx,Concourse Village,40.82478,-73.915847
6,Bronx,Country Club,40.844246,-73.824099
7,Bronx,Eastchester,40.887556,-73.827806
8,Bronx,Fieldston,40.895437,-73.905643
9,Bronx,Hunts Point,40.80973,-73.883315


In [52]:
common_venues = []
for i in range(brooklyn_neighborhoods_venues_sorted.shape[0]):
    for j in range(1, brooklyn_neighborhoods_venues_sorted.shape[1]):
        common_venues.append(brooklyn_neighborhoods_venues_sorted.iloc[i][j])
    if "Pizza Place" not in common_venues:
        neigh = brooklyn_neighborhoods_venues_sorted.iloc[i][0]
        lat = brooklyn_data.loc[brooklyn_data['Neighborhood'] == brooklyn_neighborhoods_venues_sorted.iloc[i][0], 'Latitude'].values[0]
        lng = brooklyn_data.loc[brooklyn_data['Neighborhood'] == brooklyn_neighborhoods_venues_sorted.iloc[i][0], 'Longitude'].values[0]
        suggestedAreas = suggestedAreas.append({'Borough': 'Brooklyn', 'Neighborhood': neigh, 'Latitude': lat, 'Longitude': lng}, ignore_index=True)
    common_venues=[]
suggestedAreas

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Baychester,40.866858,-73.835798
1,Bronx,Claremont Village,40.831428,-73.901199
2,Bronx,Clason Point,40.806551,-73.854144
3,Bronx,Co-op City,40.874294,-73.829939
4,Bronx,Concourse,40.834284,-73.915589
5,Bronx,Concourse Village,40.824780,-73.915847
6,Bronx,Country Club,40.844246,-73.824099
7,Bronx,Eastchester,40.887556,-73.827806
8,Bronx,Fieldston,40.895437,-73.905643
9,Bronx,Hunts Point,40.809730,-73.883315


In [53]:
common_venues = []
for i in range(queens_neighborhoods_venues_sorted.shape[0]):
    for j in range(1, queens_neighborhoods_venues_sorted.shape[1]):
        common_venues.append(queens_neighborhoods_venues_sorted.iloc[i][j])
    if "Pizza Place" not in common_venues:
        neigh = queens_neighborhoods_venues_sorted.iloc[i][0]
        lat = queens_data.loc[queens_data['Neighborhood'] == queens_neighborhoods_venues_sorted.iloc[i][0], 'Latitude'].values[0]
        lng = queens_data.loc[queens_data['Neighborhood'] == queens_neighborhoods_venues_sorted.iloc[i][0], 'Longitude'].values[0]
        suggestedAreas = suggestedAreas.append({'Borough': 'Queens', 'Neighborhood': neigh, 'Latitude': lat, 'Longitude': lng}, ignore_index=True)
    common_venues=[]
suggestedAreas

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Baychester,40.866858,-73.835798
1,Bronx,Claremont Village,40.831428,-73.901199
2,Bronx,Clason Point,40.806551,-73.854144
3,Bronx,Co-op City,40.874294,-73.829939
4,Bronx,Concourse,40.834284,-73.915589
5,Bronx,Concourse Village,40.824780,-73.915847
6,Bronx,Country Club,40.844246,-73.824099
7,Bronx,Eastchester,40.887556,-73.827806
8,Bronx,Fieldston,40.895437,-73.905643
9,Bronx,Hunts Point,40.809730,-73.883315


In [54]:
common_venues = []
for i in range(staten_island_neighborhoods_venues_sorted.shape[0]):
    for j in range(1, staten_island_neighborhoods_venues_sorted.shape[1]):
        common_venues.append(staten_island_neighborhoods_venues_sorted.iloc[i][j])
    if "Pizza Place" not in common_venues:
        neigh = staten_island_neighborhoods_venues_sorted.iloc[i][0]
        lat = staten_island_data.loc[staten_island_data['Neighborhood'] == staten_island_neighborhoods_venues_sorted.iloc[i][0], 'Latitude'].values[0]
        lng = staten_island_data.loc[staten_island_data['Neighborhood'] == staten_island_neighborhoods_venues_sorted.iloc[i][0], 'Longitude'].values[0]
        suggestedAreas = suggestedAreas.append({'Borough': 'Staten Island', 'Neighborhood': neigh, 'Latitude': lat, 'Longitude': lng}, ignore_index=True)
    common_venues=[]
suggestedAreas

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Baychester,40.866858,-73.835798
1,Bronx,Claremont Village,40.831428,-73.901199
2,Bronx,Clason Point,40.806551,-73.854144
3,Bronx,Co-op City,40.874294,-73.829939
4,Bronx,Concourse,40.834284,-73.915589
5,Bronx,Concourse Village,40.824780,-73.915847
6,Bronx,Country Club,40.844246,-73.824099
7,Bronx,Eastchester,40.887556,-73.827806
8,Bronx,Fieldston,40.895437,-73.905643
9,Bronx,Hunts Point,40.809730,-73.883315


Now let's plot all these areas on map for better visualization of our gathered and processed data.

In [55]:
# create map of New York using latitude and longitude values
newyork_map = folium.Map(location=[latitude, longitude], zoom_start=10, width=900, height=500)

# add markers to map
for lat, lng, borough, neighborhood in zip(suggestedAreas['Latitude'], suggestedAreas['Longitude'], suggestedAreas['Borough'], suggestedAreas['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=500,
        color=None,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(newyork_map)
    folium.Circle(
        [lat, lng],
        radius=1000,
        popup=label,
        color=None,
        fill=True,
        fill_color='#e8f21d',
        fill_opacity=0.3,
        parse_html=False).add_to(newyork_map) 
    
newyork_map

Let's make a dataframe consisting of all Pizza Places and their LatLng values, that are in New York state, and cluster them based on their density to find which areas have lesser number of pizza places.

In [56]:
pizzaPlaces = pd.DataFrame(columns=['Borough', 'Neighborhood', 'Venue', 'Venue Latitude', 'Venue Longitude'])

In [57]:
columns = ['Borough', 'Neighborhood', 'Venue', 'Venue Latitude', 'Venue Longitude']
df = bronx_venues[bronx_venues['Venue Category'] == 'Pizza Place']
df['Borough'] = 'Bronx'
df = df[columns]
pizzaPlaces = pizzaPlaces.append(df, ignore_index=True)
df = manhattan_venues[manhattan_venues['Venue Category'] == 'Pizza Place']
df['Borough'] = 'Manhattan'
df = df[columns]
pizzaPlaces = pizzaPlaces.append(df, ignore_index=True)
df = brooklyn_venues[brooklyn_venues['Venue Category'] == 'Pizza Place']
df['Borough'] = 'Brooklyn'
df = df[columns]
pizzaPlaces = pizzaPlaces.append(df, ignore_index=True)
df = queens_venues[queens_venues['Venue Category'] == 'Pizza Place']
df['Borough'] = 'Queens'
df = df[columns]
pizzaPlaces = pizzaPlaces.append(df, ignore_index=True)
df = staten_island_venues[staten_island_venues['Venue Category'] == 'Pizza Place']
df['Borough'] = 'Staten Island'
df = df[columns]
pizzaPlaces = pizzaPlaces.append(df, ignore_index=True)
pizzaPlaces

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = 

Unnamed: 0,Borough,Neighborhood,Venue,Venue Latitude,Venue Longitude
0,Bronx,Co-op City,Capri II Pizza,40.876374,-73.829940
1,Bronx,Eastchester,Mario's Pizza,40.888628,-73.831260
2,Bronx,Kingsbridge,Kingsbridge Social Club,40.884545,-73.901964
3,Bronx,Kingsbridge,Sam's Pizza,40.879435,-73.905859
4,Bronx,Kingsbridge,Broadway Pizza & Pasta,40.878822,-73.904494
5,Bronx,Kingsbridge,Little Caesars Pizza,40.880002,-73.904140
6,Bronx,Kingsbridge,Domino's Pizza,40.884200,-73.902400
7,Bronx,Kingsbridge,Tony & Cyndi's Pizzeria & Restaurants,40.883566,-73.901809
8,Bronx,Kingsbridge,Papa John's,40.884015,-73.903083
9,Bronx,Kingsbridge,Acapella Gourmet Pizza & Restaurant,40.883504,-73.897901


Now let's plot all the Pizza Places in the New York state to see where are all the Pizza places situated with respect to our sorted out areas based on common venues.

In [58]:
for lat, lng, borough, neighborhood, venue in zip(pizzaPlaces['Venue Latitude'], pizzaPlaces['Venue Longitude'], pizzaPlaces['Borough'], pizzaPlaces['Neighborhood'], pizzaPlaces['Venue']):
    label = '{}, {}, {}'.format(venue, neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=2,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1,
        parse_html=False).add_to(newyork_map)  
    
newyork_map

Now let's plot a heat-map to identify the density of pizza places in entire New York state.

In [59]:
from folium import plugins
locationsArr = pizzaPlaces[['Venue Latitude', 'Venue Longitude']].as_matrix()
newyork_map.add_child(plugins.HeatMap(locationsArr, radius=15))
newyork_map

  


In the above graph, green circles are the regions where our venue of concern, i.e., Pizza Place is not among the top 10 most common venues. And the red dots denote all the pizza places in the state of New York and a heat map is plotted on the map to show the density of Pizza Places in areas of New York.

Since we have many options, to narrow down our list, we will consider only the neighborhoods which have either no pizza place or just 1 pizza place in 1Km radius.

Since we have limited data, we will consider that all the neighborhoods are public zones and people live in those regions.

Based on the parameters defined and the heatmap, we can make a new dataframe which can be the best locations for our client to set up their new branch/outlet.

In [60]:
refinedSuggestedAreas = pd.DataFrame(columns=['Borough', 'Neighborhood', 'Latitude', 'Longitude'])

In [61]:
refinedVenues = getNearbyVenues(names=suggestedAreas['Neighborhood'],
                                   latitudes=suggestedAreas['Latitude'],
                                   longitudes=suggestedAreas['Longitude'],
                                   radius = 1000
                                  )

Baychester
Claremont Village
Clason Point
Co-op City
Concourse
Concourse Village
Country Club
Eastchester
Fieldston
Hunts Point
Longwood
Olinville
Pelham Bay
Pelham Gardens
Port Morris
Riverdale
Soundview
Spuyten Duyvil
Unionport
Wakefield
Williamsbridge
Battery Park City
Central Harlem
Chelsea
Chinatown
Civic Center
Clinton
Flatiron
Hudson Yards
Lincoln Square
Lower East Side
Manhattanville
Marble Hill
Midtown South
Roosevelt Island
Stuyvesant Town
Sutton Place
Tribeca
Tudor City
Turtle Bay
Upper East Side
Upper West Side
Washington Heights
Bergen Beach
Boerum Hill
Brighton Beach
Brownsville
Canarsie
City Line
Coney Island
Dumbo
Dyker Heights
East Flatbush
East New York
East Williamsburg
Flatbush
Flatlands
Georgetown
Manhattan Beach
Mill Island
Ocean Hill
Ocean Parkway
Paerdegat Basin
Remsen Village
Sea Gate
Vinegar Hill
Weeksville
Williamsburg
Windsor Terrace
Wingate
Astoria
Auburndale
Bay Terrace
Bayswater
Bellaire
Belle Harbor
Blissville
Breezy Point
Briarwood
Brookville
Cambria He

Let's make a copy of our retrieved data to process.

In [62]:
refinedVenuesUsable = refinedVenues.copy(deep=True)

Let's add the borough names of each neighborhood to the dataFrame.

In [63]:
all_borough = []
for neigh in refinedVenuesUsable['Neighborhood']:
    df = suggestedAreas[suggestedAreas['Neighborhood'] == neigh]
    all_borough.append(df['Borough'].values[0])
refinedVenuesUsable.insert(0, 'Borough', all_borough)

In [64]:
refinedVenuesUsable.head()

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bronx,Baychester,40.866858,-73.835798,Caridad & Louie,40.865843,-73.837707,Spanish Restaurant
1,Bronx,Baychester,40.866858,-73.835798,Panera Bread,40.867866,-73.827845,Bakery
2,Bronx,Baychester,40.866858,-73.835798,Fratelli's,40.863019,-73.843607,Italian Restaurant
3,Bronx,Baychester,40.866858,-73.835798,Four Seasons Nails,40.869285,-73.844468,Spa
4,Bronx,Baychester,40.866858,-73.835798,Nicks Pizza,40.870352,-73.846171,Pizza Place


Now we will divide all the possible identified neighborhoods into best, moderate and bad categories based on number of competitors of our client(i.e., Pizza Place Venue Category).

In [65]:
bestSuggested = pd.DataFrame(columns=['Borough','Neighborhood', 'Latitude', 'Longitude'])
moderateSuggested = pd.DataFrame(columns=['Borough', 'Neighborhood', 'Latitude', 'Longitude'])
worstSuggested = pd.DataFrame(columns=['Borough', 'Neighborhood', 'Latitude', 'Longitude'])
for neigh in refinedVenuesUsable['Neighborhood'].unique():
    df = refinedVenuesUsable[refinedVenuesUsable['Neighborhood'] == neigh]
    if df[df['Venue Category'] == 'Pizza Place'].shape[0] < 2:
        bestSuggested = bestSuggested.append({'Borough': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Borough'].values[0], 'Neighborhood': neigh, 'Latitude': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Latitude'].values[0], 'Longitude': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Longitude'].values[0]}, ignore_index=True)
    if df[df['Venue Category'] == 'Pizza Place'].shape[0] in range(2, 5):
        moderateSuggested = moderateSuggested.append({'Borough': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Borough'].values[0], 'Neighborhood': neigh, 'Latitude': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Latitude'].values[0], 'Longitude': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Longitude'].values[0]}, ignore_index=True)
    if df[df['Venue Category'] == 'Pizza Place'].shape[0] >=5:
        worstSuggested = worstSuggested.append({'Borough': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Borough'].values[0], 'Neighborhood': neigh, 'Latitude': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Latitude'].values[0], 'Longitude': suggestedAreas[suggestedAreas['Neighborhood'] == neigh]['Longitude'].values[0]}, ignore_index=True)

In [66]:
bestSuggested.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Clason Point,40.806551,-73.854144
1,Bronx,Hunts Point,40.80973,-73.883315
2,Bronx,Soundview,40.821012,-73.865746
3,Bronx,Wakefield,40.894705,-73.847201
4,Manhattan,Chelsea,40.744035,-74.003116


In [67]:
moderateSuggested.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Baychester,40.866858,-73.835798
1,Bronx,Claremont Village,40.831428,-73.901199
2,Bronx,Concourse,40.834284,-73.915589
3,Bronx,Concourse Village,40.82478,-73.915847
4,Bronx,Country Club,40.844246,-73.824099


In [68]:
worstSuggested.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Co-op City,40.874294,-73.829939
1,Bronx,Fieldston,40.895437,-73.905643
2,Bronx,Olinville,40.871371,-73.863324
3,Bronx,Unionport,40.829774,-73.850535
4,Bronx,Williamsbridge,40.881039,-73.857446


Now we plot the best, moderate and worst places on a map to visualize our data.

In [69]:
suggestedMap = folium.Map(location=[latitude, longitude], zoom_start=10, width=900, height=500)
for lat, lng, borough, neighborhood in zip(bestSuggested['Latitude'], bestSuggested['Longitude'], bestSuggested['Borough'], bestSuggested['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=500,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=1,
        opacity=0.3,
        parse_html=False).add_to(suggestedMap)
for lat, lng, borough, neighborhood in zip(moderateSuggested['Latitude'], moderateSuggested['Longitude'], moderateSuggested['Borough'], moderateSuggested['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=500,
        color='orange',
        fill=True,
        fill_color='orange',
        fill_opacity=0.5,
        opacity=0.3,
        parse_html=False).add_to(suggestedMap) 
for lat, lng, borough, neighborhood in zip(worstSuggested['Latitude'], worstSuggested['Longitude'], worstSuggested['Borough'], worstSuggested['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=500,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.3,
        opacity=0.3,
        parse_html=False).add_to(suggestedMap) 
    
suggestedMap

<h3>Now we will use <u>DBSCAN clustering</u> method to cluster all the clusters of <i>best</i> neighborhoods for our client.
    We do this to find the regions which do not have sufficient number of <i>Pizza Places</i> in a big area, thus increasing the chances of success for our client.</h3>

In [70]:
from sklearn.cluster import DBSCAN
coords = bestSuggested.as_matrix(columns=['Latitude', 'Longitude'])
db = DBSCAN(eps=0.0005, min_samples=3, metric='haversine', metric_params=None, algorithm='ball_tree',
          leaf_size=30, p=None, n_jobs=1).fit(np.radians(coords))

  


In [71]:
clusters = db.labels_

In [72]:
bestSuggested.insert(0, 'Cluster Labels', db.labels_)
bestSuggested

Unnamed: 0,Cluster Labels,Borough,Neighborhood,Latitude,Longitude
0,0,Bronx,Clason Point,40.806551,-73.854144
1,0,Bronx,Hunts Point,40.80973,-73.883315
2,0,Bronx,Soundview,40.821012,-73.865746
3,-1,Bronx,Wakefield,40.894705,-73.847201
4,1,Manhattan,Chelsea,40.744035,-74.003116
5,1,Manhattan,Civic Center,40.715229,-74.005415
6,1,Manhattan,Clinton,40.759101,-73.996119
7,1,Manhattan,Flatiron,40.739673,-73.990947
8,1,Manhattan,Hudson Yards,40.756658,-74.000111
9,1,Manhattan,Lincoln Square,40.773529,-73.985338


We will select only those rows which are classified among some cluster(i.e. cluster value is not -1)

In [73]:
from scipy import stats
clusters = clusters[clusters >= 0]

### Now we will plot all the clusters and all the suggested neighborhoods in the clusters on a map to show which areas are best for expansion for our client.

Make a list of all cluster centers and radii as well as all LatLng and label values of the neighborhoods in identified clusters.

In [74]:
circleRadius = []
circleLatLng = []
labels = []
allLat = []
allLng = []

In [75]:
from math import radians, cos, sin, asin, sqrt 
def distance(lat1, lat2, lon1, lon2): 
      
    # The math module contains a function named 
    # radians which converts from degrees to radians. 
    lon1 = radians(lon1) 
    lon2 = radians(lon2) 
    lat1 = radians(lat1) 
    lat2 = radians(lat2) 
       
    # Haversine formula  
    dlon = lon2 - lon1  
    dlat = lat2 - lat1 
    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
  
    c = 2 * asin(sqrt(a))  
     
    # Radius of earth in kilometers. Use 3956 for miles 
    r = 6371
       
    # calculate the result 
    return(c * r) 

In [76]:
distances = []
for i in range(bestSuggested['Cluster Labels'].max() + 1):
    print('Cluster ', str(i + 1), ': ', str(bestSuggested[bestSuggested['Cluster Labels'] == i].count().values[0]))
    latitudes = np.array(bestSuggested[bestSuggested['Cluster Labels'] == i]['Latitude'].values)
    longitudes = np.array(bestSuggested[bestSuggested['Cluster Labels'] == i]['Longitude'].values)
    allLat.append(latitudes)
    allLng.append(longitudes)
    meanLat = latitudes.mean()
    meanLng = longitudes.mean()
    circleLatLng.append((meanLat, meanLng))
    clusterLabels = []
    for j in range(bestSuggested[bestSuggested['Cluster Labels'] == i].count().values[0]):
        distances.append(distance(latitudes[j], meanLat, longitudes[j], meanLng))
        clusterLabels.append(", ".join([bestSuggested[bestSuggested['Cluster Labels'] == i]['Neighborhood'].values[j], bestSuggested[bestSuggested['Cluster Labels'] == i]['Borough'].values[j]]))
    labels.append(clusterLabels)
    distances = np.array(distances)
    circleRadius.append(distances.max())
    distances = []

Cluster  1 :  3
Cluster  2 :  12
Cluster  3 :  6
Cluster  4 :  4
Cluster  5 :  4
Cluster  6 :  11


Plot the clusters and neighborhoods which are best suited for our client based on location, to establish their new outlet/branch.

In [77]:
finalMap = folium.Map(location=[latitude, longitude], zoom_start=10, width=900, height=500)
for i in range(len(circleRadius)):
    folium.Circle(
        [circleLatLng[i][0], circleLatLng[i][1]],
        radius=circleRadius[i]*1000,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.3,
        opacity=0.3,
        parse_html=False).add_to(finalMap)
finalMap

In [78]:
for i in range(len(labels)):
    for j in range(len(labels[i])):
        folium.Marker([allLat[i][j], allLng[i][j]], tooltip=labels[i][j], popup=labels[i][j], icon=folium.Icon(color='white', icon_color='green', icon='thumbs-o-up', prefix='fa')).add_to(finalMap)
finalMap

Now we find the biggest cluster of neighborhoods qualifying all out criterias to suggest as the best region for our client to establish their new branch/outlet.

In [79]:
mostDenseCluster = stats.mode(clusters).mode[0]

In [80]:
finalColumns = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
bestSuggestedFinalPlaces = pd.DataFrame(columns=finalColumns)
df = bestSuggested[bestSuggested['Cluster Labels'] == mostDenseCluster]
for columnName in finalColumns:
    bestSuggestedFinalPlaces[columnName] = df[columnName].values
df2 = bestSuggested[bestSuggested['Cluster Labels'] != mostDenseCluster]
otherFinalPlaces = pd.DataFrame(columns=finalColumns)
for columnName in finalColumns:
    otherFinalPlaces[columnName] = df2[columnName].values

In [81]:
bestSuggestedFinalPlaces

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Chelsea,40.744035,-74.003116
1,Manhattan,Civic Center,40.715229,-74.005415
2,Manhattan,Clinton,40.759101,-73.996119
3,Manhattan,Flatiron,40.739673,-73.990947
4,Manhattan,Hudson Yards,40.756658,-74.000111
5,Manhattan,Lincoln Square,40.773529,-73.985338
6,Manhattan,Tribeca,40.721522,-74.010683
7,Manhattan,Turtle Bay,40.752042,-73.967708
8,Brooklyn,Boerum Hill,40.685683,-73.983748
9,Brooklyn,Vinegar Hill,40.703321,-73.981116


# Conclusion

From all the above data processing and analysis, we can come to a final conclusion that following neighborhoods are the best for our client to carry out further research on the likings and average money spent on eating out, by the residents, to further narrow down their new outlet location:

In [82]:
print(bestSuggestedFinalPlaces[['Neighborhood', 'Borough']].to_string(index=False))

   Neighborhood    Borough
        Chelsea  Manhattan
   Civic Center  Manhattan
        Clinton  Manhattan
       Flatiron  Manhattan
   Hudson Yards  Manhattan
 Lincoln Square  Manhattan
        Tribeca  Manhattan
     Turtle Bay  Manhattan
    Boerum Hill   Brooklyn
   Vinegar Hill   Brooklyn
  Hunters Point     Queens
     Ravenswood     Queens


Client may also want to look into following neighborhoods, although they do not make a huge cluster and thus may not be as beneficial for our client to invest their resources on:

In [83]:
print(otherFinalPlaces[['Neighborhood', 'Borough']].to_string(index=False))

        Neighborhood        Borough
        Clason Point          Bronx
         Hunts Point          Bronx
           Soundview          Bronx
           Wakefield          Bronx
        Bergen Beach       Brooklyn
      Brighton Beach       Brooklyn
       East Flatbush       Brooklyn
     Manhattan Beach       Brooklyn
            Sea Gate       Brooklyn
           Bayswater         Queens
        Belle Harbor         Queens
        Breezy Point         Queens
          Brookville         Queens
          Douglaston         Queens
     Jamaica Estates         Queens
           Laurelton         Queens
         Lefrak City         Queens
            Neponsit         Queens
     Oakland Gardens         Queens
     Queensboro Hill         Queens
            Rosedale         Queens
             Roxbury         Queens
          Somerville         Queens
 Springfield Gardens         Queens
           Arlington  Staten Island
          Bloomfield  Staten Island
        Butler Manor  Staten