# COURSERA CAPSTONE ASSIGNMENT - BATTLE OF NEIGHBORHOODS

##### _By Sean Morris_

## WEEK ONE: BACKGROUND AND DATA

### WEEK ONE PART ONE - BACKGROUND AND BUSINESS PROBLEM

#### Ray Kroc, the Founder of McDonald’s, once famously quipped that he was in the business of “real estate” (Forbes). Although it sounds like a joke, Kroc was a shrewd businessman who meant what he had said. McDonald’s became an iconic bran because of the way that its businesses were set up. Kroc owned the land and set up the establishments as franchises that would be managed by others. This arrangement allowed Kroc to leverage the benefits of being a pseudo-landlord to his franchisees without the headache of managing local operations beyond basic performance standards. Ultimately, this model lead to the development of one of the most iconic food industry brands in America. 

#### To follow Kroc’s lead, one might leverage location and geographic data to try to identify new real estate opportunities that could serve as growth areas for new markets. This is a ripe area for data science to explore. 

#### The Foursquare database contains extensive data that maps food industry locations. It is an ideal resource for determining the density and concentration of specific types of food industry locations. 

#### With a powerful scripting and data science language like Python, large location databases like Foursquare’s can be leveraged to help identify possible new opportunities in key growth areas. However, data scientists would do well to use data like this to identify whether specific locations may already be experiencing some saturation of existing competitors already. If one neighborhood already has three hamburger joints, it might be more difficult to attract customers to another.  

#### This final project will use geographic data for the city of Boston to identify food establishments of a particular type (in this case what Foursquare refers to as “Pizza Places”) to determine whether certain neighborhoods may already be showing some saturation already. In addition, we will use this same information to understand if the existing neighborhood locations serve as sufficient categories for which to group these pizza places. Are all pizza places in the Financial District the same, or is there more variation between neighborhoods than within neighborhoods geographically?

#### These questions are difficult to answer completely because of the limited nature of the data that we will be pulling from Foursquare for this assignment. However, many of these questions can be explored in detail even with the data that we already have access to. This is an exciting and useful area to be doing research in data science and it could be extremely useful to corporate headquarters of any number of brick-and-morter service industries including food service, retail, fitness centers, and grocery stores. 


### WEEK ONE PART TWO: DATA

#### The city of Boston has more than 20 distinct neighborhoods – each with a unique character. You can read more about each neighborhood [here](https://www.boston.gov/neighborhoods). For example, the North End was originally a neighborhood of Italian immigrants but more recently has become an upscale location for tourists with plenty of Italian-style dining options. The neighborhood of Brighton is known for being a slightly more affordable area and typically caters to young professionals. 

#### The city of Boston posts its location data publicly on its website to help real estate developers, researchers, and city planners have access to this information. In this lab, we will download one of the city’s more popular neighborhood datasets simply called “Boston Neighborhoods”, available [here]( https://data.boston.gov/dataset/boston-neighborhoods/resource/13ee2b65-6547-4168-b112-83995f138602). 

#### In this project will also be downloading and accessing data from Foursquare. Foursquare is a company that provides location data and intelligence to its customers - primarily web developers - for use in their applications. In this project, we will be making a limited number of calls to Foursquare’s API to pull data about pizza places in areas of the city of particular interest.  We will be using the Foursquare [explore endpoint]( https://developer.foursquare.com/docs/places-api/endpoints/) to get venue recommendations in the “Pizza Place” Category. 

#### To begin, we will be pulling those pizza place categories that correspond to the top 50 locations of the geographic center of each of the neighborhoods in question. These geographic centers we will identify through Google.  

#### To make this assignment more practical, we will be excluding several neighborhoods in our analysis. This will include most of the larger, outlying neighborhoods outside of the city center (including Brighton, Allston, Dorchester, Roxbury, Mattapan, the Harbor Islands, Roslindale, West Roxbury, and others).  


### Dictionaries and Libraries

#### To complete this project, we will also be downloading a number of python scripting libraries. This includes the pandas library to work with data, the Numpy library to work with vectorized data, the json library to work with the json data that we will be downloading from the City of Boston, the Geopy library to find location data for Boston, the MatplotLib library to plot our data, Folium to render our data in a map, and SkLearn to run a Kmeans clustering algorithm to group the restaurant data into clusters. 

## METHODS

#### Clustering is one of the more popular data science methods for exploring and categorizing data. Clustering is considered an unsupervised learning method since we cannot segment the data for “training”. The kmeans algorithm is one that partitions the dataset into k clusters based on an initially randomized placement of k centroids. In short, the algorithm iterates through the placement for each centroid until the distance between each centroid and the members points is minimized. Because the kmeans centroids are initially selected randomly, the clusters specify only local optima and not global ones. However, kmeans remains one of the most popular data science algorithms, particularly for data explorations. 

## Import Necessary Libraries

In [7]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe
#OLD #from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
#FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.11.8  |       ha878542_0         145 KB  conda-forge
    certifi-2020.11.8          |   py36h5fab9bb_0         150 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         392 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forg

### The code section below was an early attempt to use the latitude and longitude coordinates provided in the json file but these were not stored in a consistent way that could be easily accessed with a loop. This code was not used, but we save it here for reference.

!wget -q -O 'boston_data.json' http://bostonopendata-boston.opendata.arcgis.com/datasets/3525b0ee6e6b427f9aab5d0a1d0a1a28_0.geojson?outSR=%7B%22latestWkid%22%3A2249%2C%22wkid%22%3A102686%7D
print('Data downloaded')
with open('boston_data.json') as json_data:
    boston_data = json.load(json_data)
    #boston_data
neighborhoods_data = boston_data['features']
neighborhoods_data[0]

_______________________________________
#### define the dataframe columns
column_names = ['Neighborhood', 'CoordinateURL', 'Latitude', 'Longitude'] 

#### instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

_____________________________________________________________________________
for data in neighborhoods_data:
    neighborhood_name = data['properties']['Name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    
#### COULDN'T FIGURE OUT HOW TO PULL IN THE LATITUDE AND LONGITUDE WITH THIS JSON data, so we commented this section out    
    #neighborhood_lat = neighborhood_latlon[1]
    #neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name}, ignore_index=True)
                                          #'Latitude': neighborhood_lat,
                                          #'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods      

## Set the source of the Boston Neighborhoods Data and read the data into a Dataframe

In [8]:
GeographicURL = 'http://bostonopendata-boston.opendata.arcgis.com/datasets/3525b0ee6e6b427f9aab5d0a1d0a1a28_0.csv?outSR=%7B%22latestWkid%22%3A2249%2C%22wkid%22%3A102686%7D'

In [9]:
df_geogs = pd.read_csv(GeographicURL) # To read CSV file

In [10]:
df_geogs.head()

Unnamed: 0,OBJECTID,Name,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength
0,27,Roslindale,1605.568237,15,2.51,69938270.0,53563.912597
1,28,Jamaica Plain,2519.245394,11,3.94,109737900.0,56349.937161
2,29,Mission Hill,350.853564,13,0.55,15283120.0,17918.724113
3,30,Longwood,188.611947,28,0.29,8215904.0,11908.757148
4,31,Bay Village,26.539839,33,0.04,1156071.0,4650.635493


## Drop unnecessary neighborhoods, reset the index, and drop several unnecessary columns

In [11]:
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Harbor Islands'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'West Roxbury'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Mattapan'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Hyde Park'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Roslindale'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Brighton'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Jamaica Plain'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Dorchester'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Roxbury'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Mission Hill'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Longwood'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Fenway'].index, inplace = True)
df_geogs.drop(df_geogs[df_geogs['Name'] == 'Allston'].index, inplace = True)

BostonHoods = df_geogs.reset_index(drop=True)
BostonHoods = BostonHoods.sort_values('Name')
del BostonHoods['ShapeSTArea']
del BostonHoods['ShapeSTLength']
del BostonHoods['OBJECTID']
del BostonHoods['Neighborhood_ID']

In [12]:
BostonHoods.head()

Unnamed: 0,Name,Acres,SqMiles
5,Back Bay,399.314411,0.62
0,Bay Village,26.539839,0.04
9,Beacon Hill,200.156904,0.31
7,Charlestown,871.541223,1.36
2,Chinatown,76.32441,0.12


## Enter in neighborhood center point latitude and longitudes from Google

In [13]:
#Enter in Data from Google Search API
LatLong = pd.DataFrame({'Name':["South Boston", "Allston","Roslindale", "Jamaica Plain", "Mission Hill", "Longwood", "Bay Village", "Leather District", "Chinatown", "North End", "Roxbury", "South End", "Back Bay", "East Boston", "Charlestown", "West End", "Beacon Hill", "Downtown", "Fenway", "Brighton", "West Roxbury", "Hyde Park", "Mattapan", "Dorchester", "South Boston Waterfront"],
                           'Latitude': [42.3381, 42.3539, 42.2832, 42.3097, 42.3296, 42.3389, 42.349, 42.3511, 42.3501, 42.3647, 42.3152, 42.3388, 42.3503, 42.3702, 42.3782, 42.3644, 42.3588, 42.3557, 42.3429, 42.3464, 42.2798, 42.2565, 42.2771, 42.3016, 42.3492],
                           'Longitude':[-71.0476, -71.1337, -71.127, -71.1151, -71.1062, -71.1073, -71.0698, -71.0579, -71.0624, -71.0542, -71.0914, -71.0765, -71.081, -71.0389, -71.0602, -71.0661, -71.0707, -71.0572, -71.1003, -71.1627, -71.1627, -71.1241, -71.0914, -71.0676, -71.0432]})

In [14]:
LatLong

Unnamed: 0,Name,Latitude,Longitude
0,South Boston,42.3381,-71.0476
1,Allston,42.3539,-71.1337
2,Roslindale,42.2832,-71.127
3,Jamaica Plain,42.3097,-71.1151
4,Mission Hill,42.3296,-71.1062
5,Longwood,42.3389,-71.1073
6,Bay Village,42.349,-71.0698
7,Leather District,42.3511,-71.0579
8,Chinatown,42.3501,-71.0624
9,North End,42.3647,-71.0542


## Merge the CSV Data with the central latitude and longitude data

In [15]:
merged_df = pd.merge(BostonHoods, LatLong, 
                     left_on = 'Name', 
                     right_on = 'Name', 
                     how='left')

In [16]:
merged_df

Unnamed: 0,Name,Acres,SqMiles,Latitude,Longitude
0,Back Bay,399.314411,0.62,42.3503,-71.081
1,Bay Village,26.539839,0.04,42.349,-71.0698
2,Beacon Hill,200.156904,0.31,42.3588,-71.0707
3,Charlestown,871.541223,1.36,42.3782,-71.0602
4,Chinatown,76.32441,0.12,42.3501,-71.0624
5,Downtown,397.472846,0.62,42.3557,-71.0572
6,East Boston,3012.059593,4.71,42.3702,-71.0389
7,Leather District,15.639908,0.02,42.3511,-71.0579
8,North End,126.910439,0.2,42.3647,-71.0542
9,South Boston,1439.888807,2.25,42.3381,-71.0476


In [17]:
merged_df.shape

(13, 5)

## Create a map of Boston with the selected neighborhoods superimposed on top of it for reference

In [18]:
address = 'Boston, MA'
geolocator = Nominatim(user_agent="boston_explorer")
location = geolocator.geocode(address)
latitude1 = location.latitude
longitude1 = location.longitude
print('The geograpical coordinate of Boston, MA are {}, {}.'.format(latitude1, longitude1))

The geograpical coordinate of Boston, MA are 42.3602534, -71.0582912.


In [19]:
# create map of New York using latitude and longitude values
map_boston = folium.Map(location=[latitude1, longitude1], zoom_start=14)

# add markers to map
for latitude, longitude, name in zip(merged_df['Latitude'], merged_df['Longitude'], merged_df['Name']):
    label = '{}'.format(name)
    #label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_boston)  
    
map_boston

## Specify Fousquare credentials for Foursquare API Calls

In [20]:
CLIENT_ID = '23BVCUH52BR2HTRZUUKALEFIRE2WZPB2BODRIDXNC2DO2AH2' # your Foursquare ID
CLIENT_SECRET = 'B3TI2FKFXH0NI3F3TZ3DU5XXIA5XIM0J1SGSBP5ICII4RDIK' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 50
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 23BVCUH52BR2HTRZUUKALEFIRE2WZPB2BODRIDXNC2DO2AH2
CLIENT_SECRET:B3TI2FKFXH0NI3F3TZ3DU5XXIA5XIM0J1SGSBP5ICII4RDIK


## Test out procedure For pulling pizza place data from a latitude and longitude

> `https://api.foursquare.com/v2/venues/`**search**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`&query=`**QUERY**`&radius=`**RADIUS**`&limit=`**LIMIT**

In [23]:
radius = 500
search_query = 'Pizza'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude1, longitude1, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=23BVCUH52BR2HTRZUUKALEFIRE2WZPB2BODRIDXNC2DO2AH2&client_secret=B3TI2FKFXH0NI3F3TZ3DU5XXIA5XIM0J1SGSBP5ICII4RDIK&ll=42.3602534,-71.0582912&v=20180604&query=Pizza&radius=500&limit=50'

In [24]:
results = requests.get(url).json()
#results

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id,location.neighborhood
0,4accca58f964a52087c920e3,Ernesto's Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1606617682,False,69 Salem St,btwn Morton & Stillman,42.36337,-71.055998,"[{'label': 'display', 'lat': 42.36337010778569...",394,2113,US,Boston,MA,United States,"[69 Salem St (btwn Morton & Stillman), Boston,...",299737.0,https://www.grubhub.com/restaurant/ernestos-69...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,
1,5287a20411d2a7ff9cc0351d,Crush Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1606617682,False,107 State St,,42.359068,-71.054923,"[{'label': 'display', 'lat': 42.35906808216016...",306,2109,US,Boston,MA,United States,"[107 State St, Boston, MA 02109, United States]",295803.0,https://www.grubhub.com/restaurant/crush-pizza...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,
2,4b8dd656f964a520fd0f33e3,Adam's Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1606617682,False,120 Blackstone St,,42.361592,-71.056545,"[{'label': 'display', 'lat': 42.361592, 'lng':...",206,2109,US,Boston,MA,United States,"[120 Blackstone St, Boston, MA 02109, United S...",,,,,,,,
3,4aa98799f964a5201a5420e3,Haymarket Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1606617682,False,106 Blackstone St.,btwn Hanover & North,42.361602,-71.056465,"[{'label': 'display', 'lat': 42.36160188912569...",212,2109,US,Boston,MA,United States,"[106 Blackstone St. (btwn Hanover & North), Bo...",,,,,,,,
4,4b6c4a4bf964a520352e2ce3,Domino's Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1606617682,False,64 Staniford St,Merrimac St,42.362858,-71.064173,"[{'label': 'display', 'lat': 42.36285766489993...",563,2114,US,Boston,MA,United States,"[64 Staniford St (Merrimac St), Boston, MA 021...",,,,,,,,


In [25]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Ernesto's Pizza,Pizza Place,69 Salem St,btwn Morton & Stillman,42.36337,-71.055998,"[{'label': 'display', 'lat': 42.36337010778569...",394,2113,US,Boston,MA,United States,"[69 Salem St (btwn Morton & Stillman), Boston,...",,4accca58f964a52087c920e3
1,Crush Pizza,Pizza Place,107 State St,,42.359068,-71.054923,"[{'label': 'display', 'lat': 42.35906808216016...",306,2109,US,Boston,MA,United States,"[107 State St, Boston, MA 02109, United States]",,5287a20411d2a7ff9cc0351d
2,Adam's Pizza,Pizza Place,120 Blackstone St,,42.361592,-71.056545,"[{'label': 'display', 'lat': 42.361592, 'lng':...",206,2109,US,Boston,MA,United States,"[120 Blackstone St, Boston, MA 02109, United S...",,4b8dd656f964a520fd0f33e3
3,Haymarket Pizza,Pizza Place,106 Blackstone St.,btwn Hanover & North,42.361602,-71.056465,"[{'label': 'display', 'lat': 42.36160188912569...",212,2109,US,Boston,MA,United States,"[106 Blackstone St. (btwn Hanover & North), Bo...",,4aa98799f964a5201a5420e3
4,Domino's Pizza,Pizza Place,64 Staniford St,Merrimac St,42.362858,-71.064173,"[{'label': 'display', 'lat': 42.36285766489993...",563,2114,US,Boston,MA,United States,"[64 Staniford St (Merrimac St), Boston, MA 021...",,4b6c4a4bf964a520352e2ce3


## Add a new column to the merged location data that contains the API URL to pull the pizza places from this location

In [26]:
merged_df["PizzaURL"] = 'https://api.foursquare.com/v2/venues/search?client_id=23BVCUH52BR2HTRZUUKALEFIRE2WZPB2BODRIDXNC2DO2AH2&client_secret=B3TI2FKFXH0NI3F3TZ3DU5XXIA5XIM0J1SGSBP5ICII4RDIK&ll=' + merged_df["Latitude"].astype(str) + "," +  merged_df["Longitude"].astype(str) + '&v=20180604&query=Pizza&radius=500&limit=50'

In [27]:
merged_df.head()

Unnamed: 0,Name,Acres,SqMiles,Latitude,Longitude,PizzaURL
0,Back Bay,399.314411,0.62,42.3503,-71.081,https://api.foursquare.com/v2/venues/search?cl...
1,Bay Village,26.539839,0.04,42.349,-71.0698,https://api.foursquare.com/v2/venues/search?cl...
2,Beacon Hill,200.156904,0.31,42.3588,-71.0707,https://api.foursquare.com/v2/venues/search?cl...
3,Charlestown,871.541223,1.36,42.3782,-71.0602,https://api.foursquare.com/v2/venues/search?cl...
4,Chinatown,76.32441,0.12,42.3501,-71.0624,https://api.foursquare.com/v2/venues/search?cl...


## Test to see that a single url API is callable

In [51]:
merged_df.iloc[0,5]

'https://api.foursquare.com/v2/venues/search?client_id=23BVCUH52BR2HTRZUUKALEFIRE2WZPB2BODRIDXNC2DO2AH2&client_secret=B3TI2FKFXH0NI3F3TZ3DU5XXIA5XIM0J1SGSBP5ICII4RDIK&ll=42.3503,-71.081&v=20180604&query=Pizza&radius=500&limit=50'

## Create dictionaries for each of the API calls for each neighborhood and clean the json data and put it into a dataframe

In [30]:
results0 = requests.get(merged_df.iloc[0,5]).json()
results1 = requests.get(merged_df.iloc[1,5]).json()
results2 = requests.get(merged_df.iloc[2,5]).json()
results3 = requests.get(merged_df.iloc[3,5]).json()
results4 = requests.get(merged_df.iloc[4,5]).json()
results5 = requests.get(merged_df.iloc[5,5]).json()
results6 = requests.get(merged_df.iloc[6,5]).json()
results7 = requests.get(merged_df.iloc[7,5]).json()
results8 = requests.get(merged_df.iloc[8,5]).json()
results9 = requests.get(merged_df.iloc[9,5]).json()
results10 = requests.get(merged_df.iloc[10,5]).json()
results11 = requests.get(merged_df.iloc[11,5]).json()
results12 = requests.get(merged_df.iloc[12,5]).json()

In [31]:
venues0 = results0['response']['venues']
dataframe0 = json_normalize(venues0)
venues1 = results1['response']['venues']
dataframe1 = json_normalize(venues1)
venues2 = results2['response']['venues']
dataframe2 = json_normalize(venues2)
venues3 = results3['response']['venues']
dataframe3 = json_normalize(venues3)
venues4 = results4['response']['venues']
dataframe4 = json_normalize(venues4)
venues5 = results5['response']['venues']
dataframe5 = json_normalize(venues5)
venues6 = results6['response']['venues']
dataframe6 = json_normalize(venues6)
venues7 = results7['response']['venues']
dataframe7 = json_normalize(venues7)
venues8 = results8['response']['venues']
dataframe8 = json_normalize(venues8)
venues9 = results9['response']['venues']
dataframe9 = json_normalize(venues9)
venues10 = results10['response']['venues']
dataframe10 = json_normalize(venues10)
venues11 = results11['response']['venues']
dataframe11 = json_normalize(venues11)
venues12 = results12['response']['venues']
dataframe12 = json_normalize(venues12)

In [32]:
filtered_columns0 = ['name', 'categories'] + [col for col in dataframe0.columns if col.startswith('location.')] + ['id']
dataframe_filtered0 = dataframe0.loc[:, filtered_columns0]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered0['categories'] = dataframe_filtered0.apply(get_category_type, axis=1)
dataframe_filtered0.columns = [column.split('.')[-1] for column in dataframe_filtered0.columns]
filtered_columns1 = ['name', 'categories'] + [col for col in dataframe1.columns if col.startswith('location.')] + ['id']
dataframe_filtered1 = dataframe1.loc[:, filtered_columns1]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered1['categories'] = dataframe_filtered1.apply(get_category_type, axis=1)
dataframe_filtered1.columns = [column.split('.')[-1] for column in dataframe_filtered1.columns]
filtered_columns2 = ['name', 'categories'] + [col for col in dataframe2.columns if col.startswith('location.')] + ['id']
dataframe_filtered2 = dataframe2.loc[:, filtered_columns2]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered2['categories'] = dataframe_filtered2.apply(get_category_type, axis=1)
dataframe_filtered2.columns = [column.split('.')[-1] for column in dataframe_filtered2.columns]
filtered_columns3 = ['name', 'categories'] + [col for col in dataframe3.columns if col.startswith('location.')] + ['id']
dataframe_filtered3 = dataframe3.loc[:, filtered_columns3]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered3['categories'] = dataframe_filtered3.apply(get_category_type, axis=1)
dataframe_filtered3.columns = [column.split('.')[-1] for column in dataframe_filtered3.columns]
filtered_columns4 = ['name', 'categories'] + [col for col in dataframe4.columns if col.startswith('location.')] + ['id']
dataframe_filtered4 = dataframe4.loc[:, filtered_columns4]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered4['categories'] = dataframe_filtered4.apply(get_category_type, axis=1)
dataframe_filtered4.columns = [column.split('.')[-1] for column in dataframe_filtered4.columns]
filtered_columns5 = ['name', 'categories'] + [col for col in dataframe5.columns if col.startswith('location.')] + ['id']
dataframe_filtered5 = dataframe5.loc[:, filtered_columns5]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered5['categories'] = dataframe_filtered5.apply(get_category_type, axis=1)
dataframe_filtered5.columns = [column.split('.')[-1] for column in dataframe_filtered5.columns]
filtered_columns6 = ['name', 'categories'] + [col for col in dataframe6.columns if col.startswith('location.')] + ['id']
dataframe_filtered6 = dataframe6.loc[:, filtered_columns6]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered6['categories'] = dataframe_filtered6.apply(get_category_type, axis=1)
dataframe_filtered6.columns = [column.split('.')[-1] for column in dataframe_filtered6.columns]
filtered_columns7 = ['name', 'categories'] + [col for col in dataframe7.columns if col.startswith('location.')] + ['id']
dataframe_filtered7 = dataframe7.loc[:, filtered_columns7]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered7['categories'] = dataframe_filtered7.apply(get_category_type, axis=1)
dataframe_filtered7.columns = [column.split('.')[-1] for column in dataframe_filtered7.columns]
filtered_columns8 = ['name', 'categories'] + [col for col in dataframe8.columns if col.startswith('location.')] + ['id']
dataframe_filtered8 = dataframe8.loc[:, filtered_columns8]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered8['categories'] = dataframe_filtered8.apply(get_category_type, axis=1)
dataframe_filtered8.columns = [column.split('.')[-1] for column in dataframe_filtered8.columns]
filtered_columns9 = ['name', 'categories'] + [col for col in dataframe9.columns if col.startswith('location.')] + ['id']
dataframe_filtered9 = dataframe9.loc[:, filtered_columns9]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered9['categories'] = dataframe_filtered9.apply(get_category_type, axis=1)
dataframe_filtered9.columns = [column.split('.')[-1] for column in dataframe_filtered9.columns]
filtered_columns10 = ['name', 'categories'] + [col for col in dataframe10.columns if col.startswith('location.')] + ['id']
dataframe_filtered10 = dataframe10.loc[:, filtered_columns10]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered10['categories'] = dataframe_filtered10.apply(get_category_type, axis=1)
dataframe_filtered10.columns = [column.split('.')[-1] for column in dataframe_filtered10.columns]
filtered_columns11 = ['name', 'categories'] + [col for col in dataframe11.columns if col.startswith('location.')] + ['id']
dataframe_filtered11 = dataframe11.loc[:, filtered_columns11]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered11['categories'] = dataframe_filtered11.apply(get_category_type, axis=1)
dataframe_filtered11.columns = [column.split('.')[-1] for column in dataframe_filtered11.columns]
filtered_columns12 = ['name', 'categories'] + [col for col in dataframe12.columns if col.startswith('location.')] + ['id']
dataframe_filtered12 = dataframe12.loc[:, filtered_columns12]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered12['categories'] = dataframe_filtered12.apply(get_category_type, axis=1)
dataframe_filtered12.columns = [column.split('.')[-1] for column in dataframe_filtered12.columns]

In [34]:
concat = pd.concat([dataframe_filtered0, dataframe_filtered1, dataframe_filtered2, dataframe_filtered3, dataframe_filtered4, dataframe_filtered5, dataframe_filtered6, dataframe_filtered7, dataframe_filtered8, dataframe_filtered9, dataframe_filtered10, dataframe_filtered11, dataframe_filtered12])
concat.drop(concat.columns.difference(['name','categories','lat','lng']), 1, inplace=True)
concat.head()

Unnamed: 0,name,categories,lat,lng
0,California Pizza Kitchen,Pizza Place,42.34731,-71.080086
1,Crazy Dough's Pizza-Artisan Slice Bar,Pizza Place,42.346363,-71.082405
2,Pizza al Taglio,Pizza Place,42.350063,-71.081703
3,La Pizza & La Pasta,Italian Restaurant,42.347787,-71.082444
4,Pizza Depot,Pizza Place,42.347677,-71.076148


## Create a map of all of the pizza place venues

In [35]:
venues_map = folium.Map(location=[latitude1, longitude1], zoom_start=14) # generate map centred around the Conrad Hotel

# add the Pizza restaurants as blue circle markers
for lat, lng, label in zip(concat.lat,concat.lng, concat.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(venues_map)

#display map
venues_map

## Format the data for running kmeans

In [36]:
concat2 = concat.drop('name', 1)
concat2 = concat2.drop('categories', 1)
concat2.head()

Unnamed: 0,lat,lng
0,42.34731,-71.080086
1,42.346363,-71.082405
2,42.350063,-71.081703
3,42.347787,-71.082444
4,42.347677,-71.076148


In [37]:
# set number of clusters
kclusters = 12

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(concat2)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 9, 9, 1, 9, 9, 7], dtype=int32)

In [38]:
# add clustering labels
concat.insert(0, 'Cluster Labels', kmeans.labels_)

In [39]:
# create map
map_clusters = folium.Map(location=[latitude1, longitude1], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [40]:

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(concat['lat'], concat['lng'], concat['name'], concat['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results

#### As you can see from the two maps above, even though 12 clusters were specified in the kmeans algorithm, these 12 clusters do NOT correspond to the same neighborhoods as mapped initially in this project. While the downtown core consists of 4 major neighborhoods (North End, West End, Downtown, Beacon Hill) there are 4 clusters that do not correspond to these neighborhood boundaries. Cluster 10 transverses both the North End and Downtown, Cluster 3 transverses the West End and the North End, Cluster 0 transverses Chinatown and Downtown, and Cluster 2 transverses Downtown and the South Boston Waterfront (Seaport). 

## Discussion

#### One large takeaway from this analysis is that the South Boston Waterfront (Otherwise known as the Seaport District) has a dearth of pizza restaurants in general. Although this section of the city has been under extensive development recently, there is a paucity of pizza restaurants in this section of the city. Anyone who desired pizza who happened to be in this area would have to cross the fort point channel to get downtown. 

#### Another key takeaway is that the neighborhoods are not a particularly useful starting place from which to begin our analysis. In order to get a fuller picture of the most operative development opportunities for Pizza Restaurants around the city, more research would be necessary to understand zoning restrictions and also the general socioeconomic status of any development area in question. Like McDonald’s, a pizza chain might fit best along highly trafficked areas of sprawling suburban areas. 

#### Thank you for reading and reviewing this assignment! 
