# A Boston Brewery - What Location is Most Ideal?

### for Capstone Project - The Battle of the Neighborhoods

### Applied Data Science Capstone by IBM/Coursera

#### by Carina DeBarcelos

#### Note: For the purposes of this project, Country Roads Brewery and DeBarcelos Contractors are fictional and are not affiliated with any other companies with the same name. 

## Table of contents
* [Introduction/Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction/Business Problem <a name="introduction"></a>

Boston, Massachusetts is one of the most populous cities in the United States of America, containing nearly 700,000 residents across 26 official neighborhoods. Boston is also a central hub for various higher education institutions, including Boston University, Emerson College, Northeastern University, Suffolk University, and UMass Boston, as well as multiple community colleges. While these institutions are dispersed across the city, students, recent graduates, and other young professionals who live in Boston live in a wide range of neighborhoods. Living in newly developed neighborhoods such as the Seaport District in South Boston expose young adults to a different mixture of venues compared to West Roxbury, one of the furthest boroughs from Downtown Boston.

Country Roads Brewery, based in Western Massachusetts, is looking to expand its operations into a location in Boston. CRB has emerged as one of the highest-rated breweries in New England for its eclectic mixture of beers, and its flagship location in South Deerfield has attracted customers from many states. It also holds a great reputation with young adults after holding promotions at restaurants near college towns like Amherst. The owners of CRB have a great knowledge of the cities and towns in Western Massachusetts, but are unfamiliar with Boston’s neighborhoods outside of the downtown area. As a result, they hired DeBarcelos Contractors to help them determine an ideal location for their second brewery. The owners prefer a neighborhood that lacks breweries, but is close to a university area that attracts students with residential housing and student centers. While this project will help Country Roads Brewery, DeBarcelos Contractors hopes that the project will attract future clients, such as other emerging or trending restaurants, to strategically determine locations that fit their customer demographics.

With this, we ask the following question: <b>When controlling for university proximity and proximity to other breweries, what neighborhood would be most ideal for Country Roads Brewery to open a location in Boston?</b>

## Data <a name="data"></a>


There are a few factors that will influence the answer to our busines problem, such as:
* the number of existing breweries in each Boston neighborhood; and 
* the number of university venues and their proxminity in each neighborhood (such as classrooms, residential halls, and student centers).

The neighborhood data will be officially obtained via the <a href="https://data.boston.gov/dataset/boston-neighborhoods">City of Boston's Analytics Team </a>in order to define the 26 neighborhoods in Boston, and their coordinates will be obtained using **Google Maps API geocoding**.

Further, the project will use the **Foursquare API** to explore and visualize location and venue data within the neighborhoods of Boston, using the following endpoints:
* the **College and University endpoint**, which will help identify higher education location data and the various venue categories these institutions hold; and  
* the **Brewery endpoint** to determine how many breweries are located in each neighborhood.

## Methodology <a name="methodology"></a>

Through the identified data sources, I will scrape, extract, and clean the necessary data for this project, and will further analyze the neighborhoods via k-means clustering. The code will be followed one step at a time by my analysis.

## Analysis <a name="analysis"></a>

Let's perform some preliminary data analysis to learn more about the neighborhoods in Boston. First I will import the following libraries that are required for my analysis:

In [1]:
#importing required libraries 

import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 


import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 

import folium 

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-OpenCE

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2021.5.30          |   py37h89c1867_0         141 KB  conda-forge
    geographiclib-1.52         |     pyhd8ed1ab_0          35 KB  conda-forge
    geopy-2.2.0                |     pyhd8ed1ab_0          67 KB  conda-forge
    openssl-1.1.1k             |       h7f98852_0         2.1 MB  conda-forge
    python_abi-3.7             |          2_cp37m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.52-pyhd8ed1ab_0
  geopy     

Next, I will open my data file of the Boston neighborhoods and transforming it into a Pandas dataframe:

In [2]:
#Opening CSV File of Boston Neighborhoods.
import os, types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.

if os.environ.get('RUNTIME_ENV_LOCATION_TYPE') == 'external':
    endpoint_0013ae3e51984ad1b886bff8038b1d26 = 'https://s3.us.cloud-object-storage.appdomain.cloud'
else:
    endpoint_0013ae3e51984ad1b886bff8038b1d26 = 'https://s3.private.us.cloud-object-storage.appdomain.cloud'

client_0013ae3e51984ad1b886bff8038b1d26 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='1n_u7CA791HBpWGGcdDEVzmE1W_kz7fnbuqV7Mz5o7G2',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url=endpoint_0013ae3e51984ad1b886bff8038b1d26)

body = client_0013ae3e51984ad1b886bff8038b1d26.get_object(Bucket='battleoftheneighborhoods-donotdelete-pr-mgkmdfu9icfohr',Key='Boston_Neighborhoods.csv')['Body']
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

#Obtaining dataframe and viewing first few rows 
df_data_1 = pd.read_csv(body)
df_data_1.head()


Unnamed: 0,OBJECTID,Name,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength
0,27,Roslindale,1605.568237,15,2.51,69938270.0,53563.912597
1,28,Jamaica Plain,2519.245394,11,3.94,109737900.0,56349.937161
2,29,Mission Hill,350.853564,13,0.55,15283120.0,17918.724113
3,30,Longwood,188.611947,28,0.29,8215904.0,11908.757148
4,31,Bay Village,26.539839,33,0.04,1156071.0,4650.635493


As seen here, we can view the data of the official Boston neighborhoods, including the size of these neighborhoods. 

In [3]:
#Viewing content of all neighborhoods in Boston
df_data_1

Unnamed: 0,OBJECTID,Name,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength
0,27,Roslindale,1605.568237,15,2.51,69938270.0,53563.912597
1,28,Jamaica Plain,2519.245394,11,3.94,109737900.0,56349.937161
2,29,Mission Hill,350.853564,13,0.55,15283120.0,17918.724113
3,30,Longwood,188.611947,28,0.29,8215904.0,11908.757148
4,31,Bay Village,26.539839,33,0.04,1156071.0,4650.635493
5,32,Leather District,15.639908,27,0.02,681271.7,3237.140537
6,33,Chinatown,76.32441,26,0.12,3324678.0,9736.590413
7,34,North End,126.910439,14,0.2,5527506.0,16177.826815
8,35,Roxbury,2108.469072,16,3.29,91844550.0,49488.800485
9,36,South End,471.535356,32,0.74,20540000.0,17912.333569


Let's clean the data up! I will remove the unnecessary columns and store them into a new dataframe called BostonNeighborhoods.

In [4]:
#Cleaning data into new dataframe
BostonNeighborhoods = df_data_1.drop(columns=['OBJECTID', 'Neighborhood_ID', 'ShapeSTArea', 'ShapeSTLength'])

In [5]:
#Let's rename the neighborhoods column to Neighborhoods.
BostonNeighborhoods = BostonNeighborhoods.rename(columns={"Name":"Neighborhood"})

In [6]:
#Let's view the new dataframe.
BostonNeighborhoods.head()

Unnamed: 0,Neighborhood,Acres,SqMiles
0,Roslindale,1605.568237,2.51
1,Jamaica Plain,2519.245394,3.94
2,Mission Hill,350.853564,0.55
3,Longwood,188.611947,0.29
4,Bay Village,26.539839,0.04


Now we have a dataframe that just has the neighborhood names, as well as their size in terms of acres and square miles. Let's see what neighborhoods are largest in the city:

In [7]:
#Determining 5 largest neighborhoods in Boston
BostonNeighborhoods.sort_values(by='Acres', ascending=False).head(5)

Unnamed: 0,Neighborhood,Acres,SqMiles
21,Dorchester,4662.879457,7.29
18,West Roxbury,3516.421786,5.49
11,East Boston,3012.059593,4.71
19,Hyde Park,2927.221168,4.57
1,Jamaica Plain,2519.245394,3.94


....and let's see the smallest neighborhoods in the city:

In [8]:
#Determining 5 smallest neighborhoods in Boston
BostonNeighborhoods.sort_values(by='Acres', ascending=True).head(5)

Unnamed: 0,Neighborhood,Acres,SqMiles
5,Leather District,15.639908,0.02
4,Bay Village,26.539839,0.04
6,Chinatown,76.32441,0.12
7,North End,126.910439,0.2
3,Longwood,188.611947,0.29


It looks like that Dorchester, West Roxbury, East Boston, Hyde Park, and Jamaica Plain have the largest land footprints in Boston - compared to the Leather District, Bay Village, Chinatown, the North End, and Longwood. This means that if a neighborhood only has 1 brewery or 1 college campus, proximity from its residents isn't really a problem in small neighborhoods. On the other hand, one brewery or college in a neighborhood like Dorchester might not be close to much of its residents.

Let's continue by installing the Google Geocoder to obtain coordinates for these neighborhoods.

In [9]:
#Geocoding coordinates for neighborhoods
!pip install geocoder

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 12.3 MB/s eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [10]:
#Importing geocoder library
import geocoder

#Storing latitude and longitude data in each neighborhood
latitude=[]
longitude=[]
for code in BostonNeighborhoods['Neighborhood']:
    g = geocoder.arcgis('{}, Boston, Massachusetts'.format(code))
    print (code, g.latlng)
    while (g.latlng is None):
        g = geocoder.arcgis('{}, Boston, Massachusetts'.format(code))
        print(code, g.latlng)
    latlng = g.latlng
    latitude.append(latlng[0])
    longitude.append(latlng[1])

Roslindale [42.28182009628248, -71.13710364030405]
Jamaica Plain [42.30584890846422, -71.11909201668144]
Mission Hill [42.33571000000006, -71.10979999999995]
Longwood [42.339821816258834, -71.10879160520724]
Bay Village [42.34816503121898, -71.06846991510525]
Leather District [42.32522395859234, -71.06224830446084]
Chinatown [42.35251000000005, -71.06089999999995]
North End [42.36549000000008, -71.05296999999996]
Roxbury [42.330303515648225, -71.08946869163574]
South End [42.34256000000005, -71.07357999999994]
Back Bay [42.34999000000005, -71.08764999999994]
East Boston [42.35141817326235, -71.05671435784329]
Charlestown [42.3677501180056, -71.05905551335397]
West End [42.36394000000007, -71.06738999999999]
Beacon Hill [42.35842000000008, -71.06859999999995]
Downtown [42.35829000000007, -71.05662999999998]
Fenway [42.34552077464411, -71.09050686448579]
Brighton [42.35213365368456, -71.12492527560583]
West Roxbury [42.28220076055744, -71.14599982157858]
Hyde Park [42.27477303496225, -71

Now, let's store this information into the BostonNeighborhoods dataframe and verify that the information came in correctly.

In [11]:
#Inserting latitude and longitude data into dataframe
BostonNeighborhoods.insert(3, "Latitude", latitude)
BostonNeighborhoods.insert(4, "Longitude", longitude)

In [12]:
#Viewing first 5 rows of dataframe to see if actions worked
BostonNeighborhoods.head()

Unnamed: 0,Neighborhood,Acres,SqMiles,Latitude,Longitude
0,Roslindale,1605.568237,2.51,42.28182,-71.137104
1,Jamaica Plain,2519.245394,3.94,42.305849,-71.119092
2,Mission Hill,350.853564,0.55,42.33571,-71.1098
3,Longwood,188.611947,0.29,42.339822,-71.108792
4,Bay Village,26.539839,0.04,42.348165,-71.06847


In [13]:
#Double checking to see if the coordinate information was entered correctly into dataframe - identify how many neighborhoods were in dataframe.
print('The dataframe has {} neighborhoods.'.format(
        len(BostonNeighborhoods['Neighborhood'].unique()),
        BostonNeighborhoods.shape[0]
    )
)

The dataframe has 26 neighborhoods.


I will now visualize the 26 Boston neighborhoods on a map, to show the size Boston actually is beyond its downtown center. First, we will get the central coordinates of Boston.

In [14]:
#Identifying central coordinates of Boston
address = 'Boston, Massachusetts'

geolocator = Nominatim(user_agent="T_locator")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Boston is 42.3602534, -71.0582912.


In [15]:
#Installing Folium to visualize neighborhoods and future clusters.
!pip install folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


In [16]:
#Visualising different neighborhoods in Boston
boston_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(BostonNeighborhoods['Latitude'], BostonNeighborhoods['Longitude'], BostonNeighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='purple',
        fill=True,
        fill_color='#800080',
        fill_opacity=0.7,
        parse_html=False).add_to(boston_map)  
    
boston_map

Now, we will explore the college and university venues across Boston's neighborhoods. I will use the Foursquare API to look up top 100 places within 500 meters of coordinates. I will store the venues, as well as their latitude and longitude, into a dataframe entitled Venues.


In [17]:
CLIENT_ID = 'YHUTQI2CWEKUN54YVDVCHQFN14GRTZF3TVVGEGRHFRPZ3QIG' # your Foursquare ID
CLIENT_SECRET = 'ESM5V155Z13BIVYECNH4GQ0513RRZSIBD1EPYOEXQXKSZ1WZ' # your Foursquare Secret
VERSION = '20200605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):    
    colleges_list = []
    v2=[]
    college_Category_id = '4d4b7105d754a06372d81259'


    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        #Creating the API request URL 
        college_url ='https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            lat,
            lng,
            VERSION,
            college_Category_id ,
            radius,
            LIMIT)
        #Making the GET request
        results = requests.get(college_url).json()["response"]['groups'][0]['items']
        #Returning relevant information for each venue
        colleges_list.append([(name, 
                             lat, 
                             lng, 
                             v['venue']['name'],  
                             v['venue']['location']['lat'], 
                             v['venue']['location']['lng'],
                             v['venue']['categories'][0]['name']) for v in results])

    nearby_colleges = pd.DataFrame([item for college_list in colleges_list for item in college_list])
    nearby_colleges.columns = ['Area', 
                             'Neighborhood Latitude', 
                             'Neighborhood Longitude', 
                             'Venue',  
                             'Venue Latitude', 
                             'Venue Longitude',
                             'Venue Category']

    return(nearby_colleges)

In [19]:
LIMIT = 100
venues = getNearbyVenues(names=BostonNeighborhoods['Neighborhood'],
                                   latitudes=BostonNeighborhoods['Latitude'],
                                   longitudes=BostonNeighborhoods['Longitude'],
                                  )

Roslindale
Jamaica Plain
Mission Hill
Longwood
Bay Village
Leather District
Chinatown
North End
Roxbury
South End
Back Bay
East Boston
Charlestown
West End
Beacon Hill
Downtown
Fenway
Brighton
West Roxbury
Hyde Park
Mattapan
Dorchester
South Boston Waterfront
South Boston
Allston
Harbor Islands


In [20]:
#printing first 5 rows of dataframe with venue information
print(venues.shape)
venues.head()

(561, 7)


Unnamed: 0,Area,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Roslindale,42.28182,-71.137104,Washington Beech Computer Learning Center,42.277622,-71.136947,College Classroom
1,Jamaica Plain,42.305849,-71.119092,Mission Hill School,42.307343,-71.114391,General College & University
2,Mission Hill,42.33571,-71.1098,Shambala Center,42.334004,-71.112822,Student Center
3,Mission Hill,42.33571,-71.1098,Manville School,42.331322,-71.109969,General College & University
4,Mission Hill,42.33571,-71.1098,Bornstein Amphitheater,42.336066,-71.105947,College Auditorium


Let's count how many higher education venues are in each neighborhood.

In [21]:
venues.groupby('Area').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allston,3,3,3,3,3,3
Back Bay,73,73,73,73,73,73
Bay Village,39,39,39,39,39,39
Beacon Hill,22,22,22,22,22,22
Brighton,35,35,35,35,35,35
Charlestown,6,6,6,6,6,6
Chinatown,100,100,100,100,100,100
Dorchester,10,10,10,10,10,10
Downtown,51,51,51,51,51,51
East Boston,28,28,28,28,28,28


It appears that Chinatown, Back Bay, Fenway, and Longwood have the most higher education venues. On the other hand, Jamaica Plain, Mattapan, and Roslindale just have 1 higher education venue. 

Let's see how many categories the Foursquare API found under the Colleges and Universities endpoint.

In [22]:
print('There are {} unique categories.'.format(len(venues['Venue Category'].unique())))

There are 52 unique categories.


In [23]:
#Identification of the different College and University endpoint categorise
venues['Venue Category'].unique()

array(['College Classroom', 'General College & University',
       'Student Center', 'College Auditorium',
       'College Administrative Building', 'College Library',
       'College Science Building', 'Medical School', 'College Stadium',
       'Medical Lab', 'College Rec Center', 'Athletics & Sports',
       'University', 'College Academic Building', 'College Arts Building',
       'College Residence Hall', 'College Lab', 'College Quad',
       'College Cafeteria', 'College Gym', 'College Bookstore',
       'College & University', 'Community College', 'Language School',
       'Trade School', 'School', 'Office', 'Library', 'Law School',
       'TV Station', 'Dance Studio', 'College Theater',
       'Performing Arts Venue', 'College Communications Building', 'Gym',
       'Fraternity House', 'General Travel', 'Track', 'Music Venue',
       'College Technology Building', 'Social Club', 'Bookstore',
       'Speakeasy', 'Theater', 'Coworking Space',
       'College Baseball Diamond', 'C

Now I will conduct further analysis through one hot encoding. I will create a seperate dataframe to do this, but I have some additional cleaning to make sure that the dataframe looks normal. This will display how many venues are in each category for each neighborhood


In [24]:
# one hot encoding
collegevenues_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
collegevenues_onehot['Neighborhood'] = venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [collegevenues_onehot.columns[-1]] + list(collegevenues_onehot.columns[:-1])
collegevenues_onehot = collegevenues_onehot[fixed_columns]

collegevenues_onehot.head()

Unnamed: 0,Neighborhood,Athletics & Sports,Bookstore,College & University,College Academic Building,College Administrative Building,College Arts Building,College Auditorium,College Baseball Diamond,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Engineering Building,College Gym,College Hockey Rink,College Lab,College Library,College Quad,College Rec Center,College Residence Hall,College Science Building,College Stadium,College Technology Building,College Theater,College Track,Community College,Coworking Space,Dance Studio,Entertainment Service,Fraternity House,General College & University,General Travel,Gym,Health & Beauty Service,Language School,Law School,Library,Medical Lab,Medical School,Music Venue,Office,Performing Arts Venue,Pool,School,Social Club,Speakeasy,Student Center,TV Station,Theater,Track,Trade School,University
0,Roslindale,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Jamaica Plain,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Mission Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,Mission Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Mission Hill,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
collegevenues_onehot.shape

(561, 53)

It looks like there are a total of 561 college and university venues across the Boston neighborhoods. Let's consolidate the dataframe and group all the categories into one row for each neighborhood.

In [26]:
collegevenues_grouped = collegevenues_onehot.groupby('Neighborhood').sum().reset_index()
collegevenues_grouped

Unnamed: 0,Neighborhood,Athletics & Sports,Bookstore,College & University,College Academic Building,College Administrative Building,College Arts Building,College Auditorium,College Baseball Diamond,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Engineering Building,College Gym,College Hockey Rink,College Lab,College Library,College Quad,College Rec Center,College Residence Hall,College Science Building,College Stadium,College Technology Building,College Theater,College Track,Community College,Coworking Space,Dance Studio,Entertainment Service,Fraternity House,General College & University,General Travel,Gym,Health & Beauty Service,Language School,Law School,Library,Medical Lab,Medical School,Music Venue,Office,Performing Arts Venue,Pool,School,Social Club,Speakeasy,Student Center,TV Station,Theater,Track,Trade School,University
0,Allston,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Back Bay,0,0,1,9,9,12,0,0,1,2,1,0,0,1,0,1,2,1,0,3,0,0,2,1,0,0,0,0,0,9,5,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,7,0,0,0,0,3
2,Bay Village,0,0,3,2,3,2,1,0,1,1,1,0,0,0,0,2,3,0,0,1,0,0,0,2,0,1,0,1,0,0,6,0,0,0,1,1,1,0,0,0,2,0,0,1,0,0,0,1,0,0,1,1
3,Beacon Hill,0,1,0,5,1,0,0,0,0,1,1,1,0,0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,4,0,1,0,0,1
4,Brighton,0,0,0,0,7,1,0,1,0,1,0,0,0,2,1,0,1,0,0,5,0,3,0,0,1,0,0,0,0,2,6,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,2
5,Charlestown,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,2,1
6,Chinatown,0,0,3,9,9,2,3,0,1,0,4,1,0,1,0,10,4,1,0,7,2,0,0,1,0,0,0,0,0,1,16,0,1,0,0,4,0,0,4,0,3,1,0,1,0,0,6,1,0,0,3,1
7,Dorchester,0,0,0,3,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0
8,Downtown,0,0,0,5,11,0,0,0,1,1,1,0,0,0,0,1,3,0,0,2,1,0,0,0,0,0,1,0,0,0,7,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,5,0,0,0,3,5
9,East Boston,0,0,1,2,4,0,1,0,1,0,1,0,0,2,0,1,1,0,0,0,2,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,3,0,1,0,0,0,0,0,2,0,0,0,2,0


Next, I will print out each neighborhood in Boston with the 10 most common college and university venue categories and adjust them to descending order.

In [27]:
#Printing each neighborhood with 10 most common venues
num_top_venues = 10

for hood in collegevenues_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = collegevenues_grouped[collegevenues_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allston----
                          venue  freq
0      College Science Building   1.0
1  College Engineering Building   1.0
2                Medical School   1.0
3            Athletics & Sports   0.0
4                   Music Venue   0.0
5              Fraternity House   0.0
6  General College & University   0.0
7                General Travel   0.0
8                           Gym   0.0
9       Health & Beauty Service   0.0


----Back Bay----
                             venue  freq
0            College Arts Building  12.0
1        College Academic Building   9.0
2  College Administrative Building   9.0
3                 Fraternity House   9.0
4                   Student Center   7.0
5     General College & University   5.0
6                       University   3.0
7           College Residence Hall   3.0
8                  College Library   2.0
9                College Cafeteria   2.0


----Bay Village----
                             venue  freq
0     General College & Universit

At first glance, there are a few neighborhoods that have a large frequency of academic buildings, residential halls, and student centers including:
* **Back Bay,**  
* **Chinatown,** 
* **Downtown,** 
* **Fenway,** and
* **Longwood.**

**Brighton** and the **North End** have a notable frequency of residential halls.

**Beacon Hill, Charlestown, East Boston, Leather District, Mission Hill, and South Boston** have a frequency of student centers.

Before we can make any additional assumptions, there is a good chance that Country Roads Brewing may be located in one of the 5 neighborhoods listed, but the other outliers may have a chance in the running. Let's continue our analysis.

In [28]:
#Venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now I will create a new dataframe with the grouped venues to prepare for clustering.

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
collegevenues_sorted = pd.DataFrame(columns=columns)
collegevenues_sorted['Neighborhood'] = collegevenues_grouped['Neighborhood']

for ind in np.arange(collegevenues_sorted.shape[0]):
    collegevenues_sorted.iloc[ind, 1:] = return_most_common_venues(collegevenues_grouped.iloc[ind, :], num_top_venues)

collegevenues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allston,Medical School,College Science Building,College Engineering Building,College Track,College Technology Building,College Stadium,College Residence Hall,College Rec Center,College Quad,College Library
1,Back Bay,College Arts Building,College Academic Building,College Administrative Building,Fraternity House,Student Center,General College & University,University,College Residence Hall,College Cafeteria,College Library
2,Bay Village,General College & University,College Library,College & University,College Administrative Building,College Academic Building,College Arts Building,College Theater,College Lab,Office,University
3,Beacon Hill,College Academic Building,Student Center,College Lab,University,College Cafeteria,General College & University,College Library,College Communications Building,College Classroom,Theater
4,Brighton,College Administrative Building,General College & University,College Residence Hall,College Stadium,University,Fraternity House,College Gym,College Cafeteria,College Library,College Hockey Rink


I will now start creating 5 clusters via k-means clustering, and will establish a new dataframe containing the clusters and the top 10 venue categories in each neighborhood.

In [30]:
# set number of clusters
kclusters = 5

collegevenues_grouped_clustering = collegevenues_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(collegevenues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 2, 0, 2, 0, 3, 0, 2, 2], dtype=int32)

In [31]:
# add clustering labels
collegevenues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

colleges_merged = BostonNeighborhoods

#adding latitude/longitude for each neighborhood
colleges_merged = colleges_merged.join(collegevenues_sorted.set_index('Neighborhood'), on='Neighborhood')

colleges_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Acres,SqMiles,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Roslindale,1605.568237,2.51,42.28182,-71.137104,0.0,College Classroom,University,College Gym,College Theater,College Technology Building,College Stadium,College Science Building,College Residence Hall,College Rec Center,College Quad
1,Jamaica Plain,2519.245394,3.94,42.305849,-71.119092,0.0,General College & University,University,College Track,College Technology Building,College Stadium,College Science Building,College Residence Hall,College Rec Center,College Quad,College Library
2,Mission Hill,350.853564,0.55,42.33571,-71.1098,0.0,Medical School,Athletics & Sports,College Auditorium,College Stadium,College Science Building,College Rec Center,General College & University,College Library,Medical Lab,Student Center
3,Longwood,188.611947,0.29,42.339822,-71.108792,4.0,College Residence Hall,College Cafeteria,Medical School,College Gym,College Academic Building,General College & University,College Classroom,College Quad,College Administrative Building,College Arts Building
4,Bay Village,26.539839,0.04,42.348165,-71.06847,2.0,General College & University,College Library,College & University,College Administrative Building,College Academic Building,College Arts Building,College Theater,College Lab,Office,University


Let's clean the dataframe to remove any null values to prepare for visualization of the neighborhood clusters.

In [32]:
#cleaning out null values
colleges_merged = colleges_merged[~colleges_merged['Cluster Labels'].isnull()]

# final conversion
colleges_merged[['Cluster Labels']] = colleges_merged[['Cluster Labels']].astype(int)

In [33]:
#Visualising the clusters 
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(colleges_merged['Latitude'], colleges_merged['Longitude'], colleges_merged['Neighborhood'], colleges_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

What a variety of the clusters! Let's examine each higher education cluster carefully.

In [34]:
#Examining College Cluster 1
colleges_merged.loc[colleges_merged['Cluster Labels'] == 0, colleges_merged.columns[[0] + list(range(5, colleges_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Roslindale,0,College Classroom,University,College Gym,College Theater,College Technology Building,College Stadium,College Science Building,College Residence Hall,College Rec Center,College Quad
1,Jamaica Plain,0,General College & University,University,College Track,College Technology Building,College Stadium,College Science Building,College Residence Hall,College Rec Center,College Quad,College Library
2,Mission Hill,0,Medical School,Athletics & Sports,College Auditorium,College Stadium,College Science Building,College Rec Center,General College & University,College Library,Medical Lab,Student Center
5,Leather District,0,Trade School,College Lab,Student Center,College Engineering Building,College Theater,College Technology Building,College Stadium,College Science Building,College Residence Hall,College Rec Center
7,North End,0,Trade School,College Residence Hall,General Travel,College Classroom,Law School,College Gym,College Technology Building,College Stadium,College Science Building,College Rec Center
8,Roxbury,0,Track,General College & University,College Classroom,University,College Gym,College Technology Building,College Stadium,College Science Building,College Residence Hall,College Rec Center
9,South End,0,Dance Studio,College Arts Building,General College & University,University,College Gym,College Technology Building,College Stadium,College Science Building,College Residence Hall,College Rec Center
12,Charlestown,0,Trade School,Student Center,College Classroom,University,College Academic Building,College Hockey Rink,College Theater,College Technology Building,College Stadium,College Science Building
13,West End,0,Medical School,College Lab,Bookstore,College Academic Building,College Bookstore,College Library,College Gym,College Technology Building,College Stadium,College Science Building
14,Beacon Hill,0,College Academic Building,Student Center,College Lab,University,College Cafeteria,General College & University,College Library,College Communications Building,College Classroom,Theater


### In the first cluster, it looks like there is more variety in the different venues presented at colleges and universities in the city. But it also looks like there aren't as much residence halls for students.

In [35]:
#Examining College Cluster 2
colleges_merged.loc[colleges_merged['Cluster Labels'] == 1, colleges_merged.columns[[0] + list(range(5, colleges_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Back Bay,1,College Arts Building,College Academic Building,College Administrative Building,Fraternity House,Student Center,General College & University,University,College Residence Hall,College Cafeteria,College Library
16,Fenway,1,College Academic Building,College Arts Building,College Residence Hall,Student Center,General College & University,College Administrative Building,University,College Cafeteria,College Quad,College Library


### There is more common thread seen here between Back Bay and Fenway - the most common venues are college arts and academic buildings. There are also student centers, residence halls, and quads - areas where students may gather.

In [36]:
#Examining College Cluster 3
colleges_merged.loc[colleges_merged['Cluster Labels'] == 2, colleges_merged.columns[[0] + list(range(5, colleges_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bay Village,2,General College & University,College Library,College & University,College Administrative Building,College Academic Building,College Arts Building,College Theater,College Lab,Office,University
11,East Boston,2,College Administrative Building,General College & University,Medical School,College Gym,Trade School,College Academic Building,Student Center,College Science Building,Office,College Library
15,Downtown,2,College Administrative Building,General College & University,University,College Academic Building,Student Center,Law School,College Library,Trade School,College Residence Hall,Coworking Space
17,Brighton,2,College Administrative Building,General College & University,College Residence Hall,College Stadium,University,Fraternity House,College Gym,College Cafeteria,College Library,College Hockey Rink
23,South Boston,2,General College & University,Trade School,College Academic Building,College Administrative Building,College Gym,Office,Student Center,College Classroom,College Lab,College Library


### This cluster is more likely to have a combination of general higher education buildings, as well as administrative facilities. This group of neighborhoods do not have as many areas to congregate for students, such as residence halls or student centers.

In [37]:
#Examining College Cluster 4
colleges_merged.loc[colleges_merged['Cluster Labels'] == 3, colleges_merged.columns[[0] + list(range(5, colleges_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Chinatown,3,General College & University,College Lab,College Academic Building,College Administrative Building,College Residence Hall,Student Center,Medical School,College Library,College Classroom,Law School


In [38]:
#Examining College Cluster 5
colleges_merged.loc[colleges_merged['Cluster Labels'] == 4, colleges_merged.columns[[0] + list(range(5, colleges_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Longwood,4,College Residence Hall,College Cafeteria,Medical School,College Gym,College Academic Building,General College & University,College Classroom,College Quad,College Administrative Building,College Arts Building


### The other clusters are standalone neighborhoods - Chinatown and Longwood. These are interesting because they have areas for students to congregate but are more likely to house older students attending medical schools.

Another interesting thing to notice is that **Hyde Park** and **Roslindale** are excluded from the clusters because they do not share common thread with the other clusters. We can determine that these neighborhoods will not be ideal based on Country Roads Brewing's ideal population.

Now let's look at the presence of breweries in these neighborhoods.

Again, I will use the Foursquare API to look up top 100 breweries within 500 meters of coordinates for each neighborhood. I will store the venues, as well as their latitude and longitude, into a dataframe entitled Breweries.

In [39]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):    
    brewerys_list = []
    v2=[]
    brewery_Category_id = '50327c8591d4c4b30a586d5d'


    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        brewery_url ='https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            lat,
            lng,
            VERSION,
            brewery_Category_id ,
            radius,
            LIMIT)

        brewresults = requests.get(brewery_url).json()["response"]['groups'][0]['items']
        
        brewerys_list.append([(name, 
                             lat, 
                             lng, 
                             v['venue']['name'],  
                             v['venue']['location']['lat'], 
                             v['venue']['location']['lng'],
                             v['venue']['categories'][0]['name']) for v in brewresults])

    nearby_breweries = pd.DataFrame([item for brewery_list in brewerys_list for item in brewery_list])
    nearby_breweries.columns = ['Area', 
                             'Neighborhood Latitude', 
                             'Neighborhood Longitude', 
                             'Venue',  
                             'Venue Latitude', 
                             'Venue Longitude',
                             'Venue Category']

    return(nearby_breweries)

In [40]:
LIMIT = 100
breweries = getNearbyVenues(names=BostonNeighborhoods['Neighborhood'],
                                   latitudes=BostonNeighborhoods['Latitude'],
                                   longitudes=BostonNeighborhoods['Longitude'],
                                  )

Roslindale
Jamaica Plain
Mission Hill
Longwood
Bay Village
Leather District
Chinatown
North End
Roxbury
South End
Back Bay
East Boston
Charlestown
West End
Beacon Hill
Downtown
Fenway
Brighton
West Roxbury
Hyde Park
Mattapan
Dorchester
South Boston Waterfront
South Boston
Allston
Harbor Islands


In [41]:
#printing first 5 rows of dataframe with venue information
print(breweries.shape)
breweries.head()

(31, 7)


Unnamed: 0,Area,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Village,42.348165,-71.06847,Rock Bottom Restaurant & Brewery,42.351329,-71.065327,Brewery
1,Bay Village,42.348165,-71.06847,Picco,42.34478,-71.070467,Pizza Place
2,Bay Village,42.348165,-71.06847,Parish Cafe & Bar,42.351746,-71.071525,Sandwich Place
3,Leather District,42.325224,-71.062248,Dorchester Brewing Company,42.322017,-71.062638,Brewery
4,Leather District,42.325224,-71.062248,M&M BBQ,42.321992,-71.062505,Brewery


Let's count how many brewery-like venues there are in each neighborhood. 

In [42]:
breweries.groupby('Area').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allston,1,1,1,1,1,1
Back Bay,1,1,1,1,1,1
Bay Village,3,3,3,3,3,3
Beacon Hill,2,2,2,2,2,2
Charlestown,3,3,3,3,3,3
Chinatown,4,4,4,4,4,4
Dorchester,3,3,3,3,3,3
Downtown,3,3,3,3,3,3
Fenway,4,4,4,4,4,4
Harbor Islands,2,2,2,2,2,2


Interestingly enough, there are not as many breweries across Boston as expected. Neighborhoods with the highest brewery-like venues include:
* **Fenway**,
* **Chinatown**,
* **Bay Village**, 
* **Charlestown**,
* **Dorchester**, and
* **Downtown**.

On the other hand, **Allston, Back Bay,** and the **South End** have just 1 brewery-like venue. 

Let's see what categories exist under the Breweries dataframe.

In [43]:
breweries['Venue Category'].unique()

array(['Brewery', 'Pizza Place', 'Sandwich Place', 'Burger Joint', 'Bar',
       'American Restaurant', 'Pub', 'Mexican Restaurant', 'Gastropub'],
      dtype=object)

### As we can see here, Foursquare has identified breweries that may serve as other purposes, such as pubs, gastropubs, and bars, as well as other restaurant types that may also identify as a brewery.

Similar to what was already done for the higher education venues, I will conduct further analysis through one hot encoding. I will create a seperate dataframe to do this, but I have some additional cleaning to make sure that the dataframe looks normal. This will display how many venues are in each category for each neighborhood.

In [44]:
# one hot encoding
brewvenues_onehot = pd.get_dummies(breweries[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brewvenues_onehot['Neighborhood'] = breweries['Area'] 

# move neighborhood column to the first column
fixed_columns = [brewvenues_onehot.columns[-1]] + list(brewvenues_onehot.columns[:-1])
brewvenues_onehot = brewvenues_onehot[fixed_columns]

brewvenues_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Bar,Brewery,Burger Joint,Gastropub,Mexican Restaurant,Pizza Place,Pub,Sandwich Place
0,Bay Village,0,0,1,0,0,0,0,0,0
1,Bay Village,0,0,0,0,0,0,1,0,0
2,Bay Village,0,0,0,0,0,0,0,0,1
3,Leather District,0,0,1,0,0,0,0,0,0
4,Leather District,0,0,1,0,0,0,0,0,0


In [45]:
brewvenues_onehot.shape

(31, 10)

As seen here, only **31 brewery-like venues** exist in Boston.

Let's consolidate the dataframe and group all the categories into one row for each neighborhood.

In [46]:
brewvenues_grouped = brewvenues_onehot.groupby('Neighborhood').sum().reset_index()
brewvenues_grouped

Unnamed: 0,Neighborhood,American Restaurant,Bar,Brewery,Burger Joint,Gastropub,Mexican Restaurant,Pizza Place,Pub,Sandwich Place
0,Allston,0,0,0,0,1,0,0,0,0
1,Back Bay,0,0,1,0,0,0,0,0,0
2,Bay Village,0,0,1,0,0,0,1,0,1
3,Beacon Hill,1,0,0,0,0,0,0,1,0
4,Charlestown,0,1,2,0,0,0,0,0,0
5,Chinatown,0,0,3,1,0,0,0,0,0
6,Dorchester,0,0,3,0,0,0,0,0,0
7,Downtown,0,0,2,0,0,1,0,0,0
8,Fenway,0,0,4,0,0,0,0,0,0
9,Harbor Islands,0,0,2,0,0,0,0,0,0


Further analysis shows us that there are only **20 true breweries** in Boston. Here, we can see that **Fenway, Chinatown, and Dorchester** have the most actual breweries, whereas **Allston, Beacon Hill, the South End, and the West End** have no breweries in their neighborhoods.

Next, I will print out each neighborhood in Boston with the 10 most common brewery venue categories and adjust them to descending order.

In [47]:
num_top_venues = 10

for hood in brewvenues_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = brewvenues_grouped[brewvenues_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allston----
                 venue  freq
0            Gastropub   1.0
1  American Restaurant   0.0
2                  Bar   0.0
3              Brewery   0.0
4         Burger Joint   0.0
5   Mexican Restaurant   0.0
6          Pizza Place   0.0
7                  Pub   0.0
8       Sandwich Place   0.0


----Back Bay----
                 venue  freq
0              Brewery   1.0
1  American Restaurant   0.0
2                  Bar   0.0
3         Burger Joint   0.0
4            Gastropub   0.0
5   Mexican Restaurant   0.0
6          Pizza Place   0.0
7                  Pub   0.0
8       Sandwich Place   0.0


----Bay Village----
                 venue  freq
0              Brewery   1.0
1          Pizza Place   1.0
2       Sandwich Place   1.0
3  American Restaurant   0.0
4                  Bar   0.0
5         Burger Joint   0.0
6            Gastropub   0.0
7   Mexican Restaurant   0.0
8                  Pub   0.0


----Beacon Hill----
                 venue  freq
0  American Restaurant

# What happens if we try to cluster groups based on colleges AND breweries?

First I will merge the two consolidated one-hot encoded dataframes that were created for higher education venues and brewery venues into a new dataframe.

In [48]:
#merging grouped higher education and brewery venue dataframes into one for k-means clustering
collegebrew_merged = brewvenues_grouped.join(collegevenues_grouped.set_index('Neighborhood'), on='Neighborhood')
collegebrew_merged.head()

Unnamed: 0,Neighborhood,American Restaurant,Bar,Brewery,Burger Joint,Gastropub,Mexican Restaurant,Pizza Place,Pub,Sandwich Place,Athletics & Sports,Bookstore,College & University,College Academic Building,College Administrative Building,College Arts Building,College Auditorium,College Baseball Diamond,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Engineering Building,College Gym,College Hockey Rink,College Lab,College Library,College Quad,College Rec Center,College Residence Hall,College Science Building,College Stadium,College Technology Building,College Theater,College Track,Community College,Coworking Space,Dance Studio,Entertainment Service,Fraternity House,General College & University,General Travel,Gym,Health & Beauty Service,Language School,Law School,Library,Medical Lab,Medical School,Music Venue,Office,Performing Arts Venue,Pool,School,Social Club,Speakeasy,Student Center,TV Station,Theater,Track,Trade School,University
0,Allston,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Back Bay,0,0,1,0,0,0,0,0,0,0,0,1,9,9,12,0,0,1,2,1,0,0,1,0,1,2,1,0,3,0,0,2,1,0,0,0,0,0,9,5,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,7,0,0,0,0,3
2,Bay Village,0,0,1,0,0,0,1,0,1,0,0,3,2,3,2,1,0,1,1,1,0,0,0,0,2,3,0,0,1,0,0,0,2,0,1,0,1,0,0,6,0,0,0,1,1,1,0,0,0,2,0,0,1,0,0,0,1,0,0,1,1
3,Beacon Hill,1,0,0,0,0,0,0,1,0,0,1,0,5,1,0,0,0,0,1,1,1,0,0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,4,0,1,0,0,1
4,Charlestown,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,2,1


From here, let's calculate the 10 most common venue categories in each neighborhood and adjust them to descending order. Let's see what neighborhoods have more brewery venue frequency than higher education venues.

In [49]:
#top 10 common higher education and brewery venues in Boston neighborhoods
num_top_venues = 10

for hood in collegebrew_merged['Neighborhood']:
    print("----"+hood+"----")
    temp = collegebrew_merged[collegebrew_merged['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allston----
                          venue  freq
0                Medical School   1.0
1      College Science Building   1.0
2                     Gastropub   1.0
3  College Engineering Building   1.0
4           American Restaurant   0.0
5                General Travel   0.0
6               Language School   0.0
7       Health & Beauty Service   0.0
8                           Gym   0.0
9              Fraternity House   0.0


----Back Bay----
                             venue  freq
0            College Arts Building  12.0
1                 Fraternity House   9.0
2  College Administrative Building   9.0
3        College Academic Building   9.0
4                   Student Center   7.0
5     General College & University   5.0
6           College Residence Hall   3.0
7                       University   3.0
8      College Technology Building   2.0
9                  College Library   2.0


----Bay Village----
                             venue  freq
0     General College & Universit

In [50]:
#venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now I will create a new dataframe with the grouped venues to prepare for clustering.

In [52]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
collegebrewvenues_sorted = pd.DataFrame(columns=columns)
collegebrewvenues_sorted['Neighborhood'] = collegebrew_merged['Neighborhood']

for ind in np.arange(collegebrewvenues_sorted.shape[0]):
    collegebrewvenues_sorted.iloc[ind, 1:] = return_most_common_venues(collegebrew_merged.iloc[ind, :], num_top_venues)

collegebrewvenues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allston,College Engineering Building,College Science Building,Medical School,Gastropub,University,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Gym
1,Back Bay,College Arts Building,College Administrative Building,College Academic Building,Fraternity House,Student Center,General College & University,College Residence Hall,University,College Library,College Technology Building
2,Bay Village,General College & University,College & University,College Library,College Administrative Building,Office,College Theater,College Arts Building,College Academic Building,College Lab,Pizza Place
3,Beacon Hill,College Academic Building,Student Center,College Lab,University,General College & University,Pub,Bookstore,College Administrative Building,College Cafeteria,College Classroom
4,Charlestown,Trade School,Student Center,Brewery,College Classroom,University,Bar,Burger Joint,Gastropub,College Rec Center,College Quad


Again, I will now start creating 5 clusters via k-means clustering, and will establish a new dataframe containing the clusters and the top 10 venue categories in each neighborhood.

In [53]:
# set number of clusters
kclusters = 5

collegebrew_grouped_clustering = collegebrew_merged.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(collegebrew_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 4, 0, 0, 0, 2, 0, 3, 1, 0], dtype=int32)

In [54]:
# add clustering labels
collegebrewvenues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

collegebreweries = BostonNeighborhoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
collegebreweries = collegebreweries.join(collegebrewvenues_sorted.set_index('Neighborhood'), on='Neighborhood')

collegebreweries.head() # check the last columns!

Unnamed: 0,Neighborhood,Acres,SqMiles,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Roslindale,1605.568237,2.51,42.28182,-71.137104,,,,,,,,,,,
1,Jamaica Plain,2519.245394,3.94,42.305849,-71.119092,,,,,,,,,,,
2,Mission Hill,350.853564,0.55,42.33571,-71.1098,,,,,,,,,,,
3,Longwood,188.611947,0.29,42.339822,-71.108792,,,,,,,,,,,
4,Bay Village,26.539839,0.04,42.348165,-71.06847,0.0,General College & University,College & University,College Library,College Administrative Building,Office,College Theater,College Arts Building,College Academic Building,College Lab,Pizza Place


This is interesting - we see that there are some neighborhoods that cannot be clustered because there weren't enough similarities between common higher education **and** brewery venues. These neighborhoods include:
* **Brighton**,
* **East Boston**,
* **Hyde Park**, 
* **Jamaica Plain**,
* **Longwood**,
* **Mattapan**,
* **Mission Hill**,
* the **North End**,
* **Roslindale**,
* **Roxbury**, 
* **South Boston**, 
* **South Boston Waterfront**, and
* **West Roxbury**.

This is a good indication that these neighborhoods may not be the most ideal locations for Country Roads Brewing's new facility.

Let's clean the dataframe to remove any null values to prepare for visualization of the clusters from the remaining neighborhoods.

In [55]:
#cleaning out null values
collegebreweries = collegebreweries[~collegebreweries['Cluster Labels'].isnull()]

# final conversion
collegebreweries[['Cluster Labels']] = collegebreweries[['Cluster Labels']].astype(int)

In [56]:
#checking dataframe again to ensure no null values remain!
collegebreweries.head()

Unnamed: 0,Neighborhood,Acres,SqMiles,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bay Village,26.539839,0.04,42.348165,-71.06847,0,General College & University,College & University,College Library,College Administrative Building,Office,College Theater,College Arts Building,College Academic Building,College Lab,Pizza Place
5,Leather District,15.639908,0.02,42.325224,-71.062248,0,Trade School,Brewery,College Lab,Student Center,College Auditorium,College Rec Center,College Quad,College Library,College Hockey Rink,College Gym
6,Chinatown,76.32441,0.12,42.35251,-71.0609,2,General College & University,College Lab,College Academic Building,College Administrative Building,College Residence Hall,Student Center,Medical School,College Classroom,College Library,Law School
9,South End,471.535356,0.74,42.34256,-71.07358,0,College Arts Building,Pizza Place,Dance Studio,General College & University,College Auditorium,College Quad,College Library,College Lab,College Hockey Rink,College Gym
10,Back Bay,399.314411,0.62,42.34999,-71.08765,4,College Arts Building,College Administrative Building,College Academic Building,Fraternity House,Student Center,General College & University,College Residence Hall,University,College Library,College Technology Building


We'll have a good idea of what communities are being excluded from the clusters as a result of visualizing.

In [57]:
#Visualising the clusters 
# create map
collegebrew_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(collegebreweries['Latitude'], collegebreweries['Longitude'], collegebreweries['Neighborhood'], collegebreweries['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(collegebrew_clusters)
       
collegebrew_clusters

Let's examine each cluster carefully. 

In [58]:
#Examining College Brewery Cluster 1
collegebreweries.loc[collegebreweries['Cluster Labels'] == 0, collegebreweries.columns[[0] + list(range(5, collegebreweries.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bay Village,0,General College & University,College & University,College Library,College Administrative Building,Office,College Theater,College Arts Building,College Academic Building,College Lab,Pizza Place
5,Leather District,0,Trade School,Brewery,College Lab,Student Center,College Auditorium,College Rec Center,College Quad,College Library,College Hockey Rink,College Gym
9,South End,0,College Arts Building,Pizza Place,Dance Studio,General College & University,College Auditorium,College Quad,College Library,College Lab,College Hockey Rink,College Gym
12,Charlestown,0,Trade School,Student Center,Brewery,College Classroom,University,Bar,Burger Joint,Gastropub,College Rec Center,College Quad
13,West End,0,College Lab,Medical School,American Restaurant,College Academic Building,Bookstore,College Bookstore,College Library,Bar,College Hockey Rink,College Baseball Diamond
14,Beacon Hill,0,College Academic Building,Student Center,College Lab,University,General College & University,Pub,Bookstore,College Administrative Building,College Cafeteria,College Classroom
21,Dorchester,0,Brewery,College Academic Building,Trade School,General College & University,Health & Beauty Service,Entertainment Service,College Classroom,College Bookstore,College Cafeteria,University
24,Allston,0,College Engineering Building,College Science Building,Medical School,Gastropub,University,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Gym
25,Harbor Islands,0,University,Brewery,College Administrative Building,Trade School,Burger Joint,College Bookstore,College Residence Hall,College Rec Center,College Quad,College Library


### In this first cluster, we see that these neighborhoods have a brewery, or a brewery-like venue that are just as common as their college and university venues. We can remove these neighborhoods for contention.

In [59]:
#Examining College Brewery Cluster 2
collegebreweries.loc[collegebreweries['Cluster Labels'] == 1, collegebreweries.columns[[0] + list(range(5, collegebreweries.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Fenway,1,College Academic Building,College Arts Building,College Residence Hall,Student Center,General College & University,Brewery,College Administrative Building,College Cafeteria,University,College Library


### Interstingly enough, one of the neighborhoods we identified as ripe for student interaction already has a good frequency of breweries. We can eliminate this neighborhood for contention.

In [60]:
#Examining College Brewery Cluster 3
collegebreweries.loc[collegebreweries['Cluster Labels'] == 2, collegebreweries.columns[[0] + list(range(5, collegebreweries.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Chinatown,2,General College & University,College Lab,College Academic Building,College Administrative Building,College Residence Hall,Student Center,Medical School,College Classroom,College Library,Law School


In [61]:
#Examining College Brewery Cluster 4
collegebreweries.loc[collegebreweries['Cluster Labels'] == 3, collegebreweries.columns[[0] + list(range(5, collegebreweries.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Downtown,3,College Administrative Building,General College & University,Student Center,College Academic Building,University,Trade School,College Library,Law School,Brewery,College Residence Hall


In [62]:
#Examining College Brewery Cluster 5
collegebreweries.loc[collegebreweries['Cluster Labels'] == 4, collegebreweries.columns[[0] + list(range(5, collegebreweries.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Back Bay,4,College Arts Building,College Administrative Building,College Academic Building,Fraternity House,Student Center,General College & University,College Residence Hall,University,College Library,College Technology Building


The last 3 clusters are interesting because they are standalone. 

Cluster 4, which contains the **Downtown** neighborhood, can be removed from contention because a brewery is among their most common venues. That leaves us with **Back Bay and Chinatown** as our potential locations for Country Roads Brewery. These two neighborhoods do not have a brewery or similar venue among their most common venues, but the two neighborhoods have differences.

For example, Chinatown has centers for student interaction but appears to have more medical school campuses. Based on our brewery count earlier in our analysis, we notice that Chinatown has 3 breweries.

Meanwhile, Back Bay has a similar composition of venues compared to Fenway, as they were previously clustered together when we solely looked at higher education venues. Unlike Fenway, breweries are not common and actually has 1 listed in their brewery count. 

To continue our analysis, let's take a quick look at higher education venues in Back Bay. What universities and breweries are located there? 

In [66]:
#Count and explore venues in Back Bay
backbay = venues.loc[venues["Area"] == "Back Bay"]
backbay

Unnamed: 0,Area,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
232,Back Bay,42.34999,-71.08765,The BIRN,42.348241,-71.084504,Music Venue
233,Back Bay,42.34999,-71.08765,Harvard Commonwealth Room,42.348427,-71.089066,University
234,Back Bay,42.34999,-71.08765,The Boston Architectural College - Practice De...,42.348531,-71.085696,University
235,Back Bay,42.34999,-71.08765,Berklee College of Music,42.347031,-71.087606,University
236,Back Bay,42.34999,-71.08765,Boston Architectural College,42.34839,-71.085813,College Academic Building
237,Back Bay,42.34999,-71.08765,BAC 951 Boylston,42.348105,-71.085775,College Academic Building
238,Back Bay,42.34999,-71.08765,New England College of Optometry,42.351773,-71.086667,Medical School
239,Back Bay,42.34999,-71.08765,Berklee College of Music - 136 Massachusetts Ave,42.347003,-71.087561,College Academic Building
240,Back Bay,42.34999,-71.08765,Berklee College of Music Building 18,42.346639,-71.086457,College Academic Building
241,Back Bay,42.34999,-71.08765,Berklee College Of Music - 150 Mass Ave,42.34647,-71.087171,College Academic Building


In [67]:
print (backbay.shape)

(73, 7)


In [68]:
#What about breweries?
backbaybreweries = breweries.loc[breweries["Area"] == "Back Bay"]
backbaybreweries

Unnamed: 0,Area,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
10,Back Bay,42.34999,-71.08765,Edge!,42.345795,-71.087747,Brewery


It looks like there are a variety of colleges - from The Boston Architectural College, Berklee College of Music, the New England College of Optometry, the Boston Conservatory, and a Boston University sub-campus. These institutions represent over 73 individual venues in the neighborhood. 

There is one brewery - Edge! - as well.

Let's do the same for Chinatown - what venues are there? 

In [69]:
#What about Chinatown?
chinatown = venues.loc[venues["Area"] == "Chinatown"]
chinatown

Unnamed: 0,Area,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
119,Chinatown,42.35251,-71.0609,Paramount Center,42.353621,-71.062405,Performing Arts Venue
120,Chinatown,42.35251,-71.0609,Emerson College Bike Room-Boylston Place,42.352032,-71.066338,College & University
121,Chinatown,42.35251,-71.0609,Emerson College - Cultural Center,42.352184,-71.066483,College & University
122,Chinatown,42.35251,-71.0609,Emerson College,42.3523,-71.06573,University
123,Chinatown,42.35251,-71.0609,Emerson College - Iwasaki Library,42.352522,-71.065593,College Library
124,Chinatown,42.35251,-71.0609,Stafford House Boston,42.353897,-71.05985,General College & University
125,Chinatown,42.35251,-71.0609,Emerson College - Little Building,42.352158,-71.064593,College Residence Hall
126,Chinatown,42.35251,-71.0609,Emerson College Writing and Academic Resource ...,42.351487,-71.064319,College Academic Building
127,Chinatown,42.35251,-71.0609,General Assembly,42.353015,-71.057263,College Academic Building
128,Chinatown,42.35251,-71.0609,216 Tremont Street,42.351559,-71.064415,College Academic Building


In [70]:
print(chinatown.shape)

(100, 7)


In [71]:
#Breweries in Chinatown?
chinatownbreweries = breweries.loc[breweries["Area"] == "Chinatown"]
chinatownbreweries

Unnamed: 0,Area,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
5,Chinatown,42.35251,-71.0609,Democracy Brewing Company,42.355073,-71.062135,Brewery
6,Chinatown,42.35251,-71.0609,Rock Bottom Restaurant & Brewery,42.351329,-71.065327,Brewery
7,Chinatown,42.35251,-71.0609,Boston beer company,42.352899,-71.064451,Brewery
8,Chinatown,42.35251,-71.0609,Back Deck,42.354448,-71.061898,Burger Joint


Chinatown, on the other hand, houses Emerson College, New England College of Finance, Babson College, Tufts Medical School, New England Law, and Suffolk University Law School. These instituions represent much more than 100 venues under the Foursquare API.

There are also four brewery-like venues: Democracy Brewing Company, Rock Bottom, Boston Beer Company, and Back Deck.

## Results and Discussion <a name="results"></a>

Based on our analysis, we discovered there are a lot of higher education venues across Boston, in a wide swath of neighborhoods. That gave us the flexibility to go outside of the immediate downtown area, knowing that there were colleges and universities in some of the smaller neighborhoods. Conversely, there were less brewery venues than anticipated in Boston. This was surprising to our client because of the wide variation of breweries across New England and their inaccurate assumption that they were entering a crowded brewery market in the city. 

It was no surprise that the neighborhoods with the highest frequency of higher education venues were in the most popular neighborhoods in the city, which have access to public transit, restaurants, historical places and various tourist attractions. Brewery locations were varied across the city, regardless of popularity. Clustering neighborhoods with both of these factors helped weed out neighborhoods that would not be ideal. 

Filtering out neighborhoods was easier than expected - nearly half of the Boston neighborhoods were not included in these new clusters because they lacked brewery categories and/or higher education venue categories. We would not find it ideal to place Country Roads Brewing in a neighborhood that does not have many higher education venues that may posit student interaction. A caveat of this approach is that it may weed out neighborhoods that may simply not have ***any brewery*** but had plenty of higher education venues. This brought us to our final cluster groups, which were easily identifiable. The elimination of each cluster group was based on frequency of breweries, as well as lack of venue frequency that promote student interaction, such as student centers. The final two clusters - Back Bay and Chinatown - were known as having the most higher education venues, with Back Bay having the edge of having more venues with additional student interaction. 

Ultimately, DeBarcelos Contractors will select **Back Bay** as the ideal neighborhood to locate the new facility for Country Roads Brewery, due to the lack of brewery venues in the area despite its large student population from neighboring colleges such as Berklee College of Music and the Boston Conservatory. 

It would be interesting to see how the results would differ if we focused on ***all nightlife venues*** and not just brewery-like venues. What implications would be there? Would the ideal location be in a neighborhood that lacks nightlife overall? Would the neighborhood already have multiple breweries? Back Bay is rich in other restaurants and may have some nightlife as well, and it could be possible it may have not been selected as the final venue if our analysis was approached differently. 

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify an ideal location for Country Roads Brewery to open in Boston, when controlling for university proximity and proximity to other breweries. Through the Foursquare API, we were able to identify various higher education venues, as well as brewery-like facilities across Boston's 26 neighborhoods, and established k-clustering mechanisms to explore common threads across the neighborhoods. By merging one-hot encoded groups for higher education institutions and breweries, we were able to identify neighborhoods that were more likely to have a student presence, as well as those that may have a potential brewery following. We wanted to select an ideal location that the brewery industry hasn't fulfilled yet, but would attract that young adult crowd that made Country Roads Brewery so popular in Western Massachusetts. 

We are comfortable with selecting **Back Bay** as the ideal neighborhood to place Country Roads Brewery's new location, as they fulfill our criteria of abundant higher education venues, as well as a brewery market that has not been fulfilled in this neighborhood. We hope that this excercise inspires other breweries and restaurants to reach out to us so we can continue strategizing potential markets for them.