# **Capstone Final Project: Battle of the Neighborhoods**

## Introduction/Business Problem


Portland Oregon is home to over 50 craft breweries in the city and 84 total in the metro area, approximately 9 breweries per 50,000 people. That is a lot of breweries. How do you know which brewery to visit? What if you want to visit a few breweries and not worry about driving? An eco friendly, net-zero carbon brewery tasting shuttle bike tour is the answer!
The tasting shuttle bike tour will guide guests throughout the neighborhood of choice stopping at 5 breweries allowing guests to try a sample flight at each brewery. 
This study will analyze the neighborhoods in Portland, Oregon to determine the best areas to open up a brewery tasting tour shuttle. Craft breweries are very popular as a spot to hangout, have a lunch meeting or even a tourist attracion. Everyone loves craft breweries and they are at an all time high in popularity in the USA. People from all over the spectrum gather at breweries to indulge, relax, and create a welcoming community. Millenials, to Gen Z, and even baby boomers enjoy craft beer and would use a service like the shuttle bike. 

## Data

Portland, Oregon city data will be imported to a dataframe from a postal code data set. This will provide the borough, neighborhood postal codes, and location coordinates. Any incomplete data will be removed from the data frame.

Foursquare will be used to get information on the neighborhoods, more specifically the breweries in the neighborhoods. The venue info like popularity and ratings will help to determine the best spots to open a shuttle bike tour. Location of the breweries in the neighborhoods will be clustered based on k nearest neighbor algorithm. The brewery clusters do not have to contain breweries all from the same neighborhood. There could be 2 breweries next to each other but in different neighborhoods so it would not make sense to put them in separate tours. Each brewery cluster will be a tour offered by the company.  Within each cluster there could be different brewery tours depending on which neighborhood you will be starting nearest.

## Methodology


In [2]:

#importing Libraries

import requests
import pandas as pd
import numpy as np


Import Neighborhood Data from csv

In [3]:

import types
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_ac5e268c01f14815b0ced2e4f0132259 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='S1dV3RosxfGtPkdB9g77es-RoM-uyQsAoGuPuE5W0wuN',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_ac5e268c01f14815b0ced2e4f0132259.get_object(Bucket='capstonefinalproject-donotdelete-pr-dz2by4svg8todw',Key='Zip_Neighborhood.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.head()


Unnamed: 0,ZIP Code,Borough,Neighborhood
0,97212,NE,Alameda
1,97217,NoPo,Arbor Lodge
2,97222,,Ardenwald
3,97230,East,Argay
4,97201,NW,Arlington Heights


In [4]:
body = client_ac5e268c01f14815b0ced2e4f0132259.get_object(Bucket='capstonefinalproject-donotdelete-pr-dz2by4svg8todw',Key='Zip_Coordinates.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_2 = pd.read_csv(body)
df_data_2.head()

Unnamed: 0,ZIP Code,Latitude,Longitude
0,97201,45.508,-122.69
1,97204,45.518,-122.674
2,97205,45.521,-122.689
3,97209,45.527,-122.685
4,97210,45.53,-122.703


In [5]:
body = client_ac5e268c01f14815b0ced2e4f0132259.get_object(Bucket='capstonefinalproject-donotdelete-pr-dz2by4svg8todw',Key='Zip_Population.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_3 = pd.read_csv(body)
df_data_3.head()


Unnamed: 0,ZIP Code,Population
0,97034,18905
1,97035,23912
2,97080,40888
3,97086,26010
4,97201,15484


Join dataframes together to create one dataframe with the Portland Neighborhood data

In [6]:
#Joins Neighborhoods with lat long  csv
df_joined=df_data_1.join(df_data_2.set_index('ZIP Code'), on='ZIP Code')
df_joined.head()

Unnamed: 0,ZIP Code,Borough,Neighborhood,Latitude,Longitude
0,97212,NE,Alameda,45.544,-122.642
1,97217,NoPo,Arbor Lodge,45.574,-122.684
2,97222,,Ardenwald,,
3,97230,East,Argay,45.547,-122.5
4,97201,NW,Arlington Heights,45.508,-122.69


In [7]:
df_master=df_joined.join(df_data_3.set_index('ZIP Code'), on='ZIP Code')
df_master.head()

Unnamed: 0,ZIP Code,Borough,Neighborhood,Latitude,Longitude,Population
0,97212,NE,Alameda,45.544,-122.642,24126.0
1,97217,NoPo,Arbor Lodge,45.574,-122.684,31438.0
2,97222,,Ardenwald,,,34979.0
3,97230,East,Argay,45.547,-122.5,39752.0
4,97201,NW,Arlington Heights,45.508,-122.69,15484.0


In [8]:
df_master.shape

(94, 6)

In [9]:
df_master.dtypes

ZIP Code          int64
Borough          object
Neighborhood     object
Latitude        float64
Longitude       float64
Population      float64
dtype: object

Remove rows with missing data

In [10]:
df_master.dropna(inplace=True)

df_master

Unnamed: 0,ZIP Code,Borough,Neighborhood,Latitude,Longitude,Population
0,97212,NE,Alameda,45.544,-122.642,24126.0
1,97217,NoPo,Arbor Lodge,45.574,-122.684,31438.0
3,97230,East,Argay,45.547,-122.500,39752.0
4,97201,NW,Arlington Heights,45.508,-122.690,15484.0
5,97219,SW,Arnold,45.458,-122.707,38709.0
...,...,...,...,...,...,...
88,97211,NE,Vernon,45.565,-122.645,31254.0
89,97219,SW,West Portland Park,45.458,-122.707,38709.0
90,97230,East,Wilkes,45.547,-122.500,39752.0
92,97211,NE,Woodlawn,45.565,-122.645,31254.0


In [11]:
df_master.shape

(84, 6)

The current dataframe, df_master, has 84 rows after removing the missing Latitude Longitude Values. The original dataframe had 94 rows.

### Install package to create a map


In [12]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
#! pip install kmeans
#import kmeans
#! pip install setuptools
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Packages Installed')

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.11.0          

lcms2-2.12           | 443 KB    | ##################################### | 100% 
kiwisolver-1.3.1     | 78 KB     | ##################################### | 100% 
certifi-2020.12.5    | 143 KB    | ##################################### | 100% 
python-3.7.10        | 57.3 MB   | ##################################### | 100% 
statsmodels-0.12.2   | 11.3 MB   | ##################################### | 100% 
xz-5.2.5             | 343 KB    | ##################################### | 100% 
pexpect-4.8.0        | 47 KB     | ##################################### | 100% 
expat-2.2.10         | 164 KB    | ##################################### | 100% 
joblib-1.0.1         | 206 KB    | ##################################### | 100% 
llvm-openmp-11.0.1   | 4.7 MB    | ##################################### | 100% 
multidict-5.1.0      | 67 KB     | ##################################### | 100% 
sympy-1.7.1          | 10.9 MB   | ##################################### | 100% 
terminado-0.9.2      | 26 KB

pyyaml-5.4.1         | 189 KB    | ##################################### | 100% 
libffi-3.3           | 51 KB     | ##################################### | 100% 
pandoc-2.11.4        | 17.9 MB   | ##################################### | 100% 
fontconfig-2.13.1    | 344 KB    | ##################################### | 100% 
openjpeg-2.4.0       | 525 KB    | ##################################### | 100% 
python-dateutil-2.8. | 220 KB    | ##################################### | 100% 
libxcb-1.13          | 395 KB    | ##################################### | 100% 
setuptools-49.6.0    | 947 KB    | ##################################### | 100% 
plotly-4.14.3        | 5.9 MB    | ##################################### | 100% 
mpc-1.1.0            | 105 KB    | ##################################### | 100% 
scikit-image-0.18.1  | 11.5 MB   | ##################################### | 100% 
libevent-2.1.10      | 1.1 MB    | ##################################### | 100% 
nest-asyncio-1.4.3   | 9 KB 

dbus-1.13.6          | 585 KB    | ##################################### | 100% 
_py-xgboost-mutex-2. | 8 KB      | ##################################### | 100% 
keyring-18.0.0       | 50 KB     | ##################################### | 100% 
lxml-4.6.2           | 1.5 MB    | ##################################### | 100% 
pyqt-5.12.3          | 21 KB     | ##################################### | 100% 
keras-applications-1 | 30 KB     | ##################################### | 100% 
backports-1.0        | 4 KB      | ##################################### | 100% 
decorator-4.4.2      | 11 KB     | ##################################### | 100% 
c-ares-1.17.1        | 111 KB    | ##################################### | 100% 
cytoolz-0.11.0       | 403 KB    | ##################################### | 100% 
libpng-1.6.37        | 306 KB    | ##################################### | 100% 
urllib3-1.26.3       | 99 KB     | ##################################### | 100% 
libgcc-ng-9.3.0      | 7.8 M

done
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.2               |     pyhd8ed1ab_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         713 KB

The following NEW packages will be INSTALLED:

  altair          

In [13]:
address = 'Portland, Oregon'

geolocator = Nominatim(user_agent="or_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Portland are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Portland are 45.5202471, -122.6741949.


In [14]:
# create map of Portland Neighborhoods using latitude and longitude values
map_portland = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_master['Latitude'], df_master['Longitude'], df_master['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_portland)  
    
map_portland

### Foursquare API for venue (brewery) data

Foursquare credentials

In [15]:
CLIENT_ID = '3BQOUGP0C03BI2IYK4IEZUQYP2ZYLP4KCUU33OU5IECK2XO2' # your Foursquare ID
CLIENT_SECRET = 'OZDZ0IV1ZX4KPUPBMUH4BNBOS21IQXTJ2KWBTPUJAPHCWZ3J' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
ACCESS_TOKEN = 'N3UXL2GUX4H1SSNXEVWYMV4D5HN1WU23LQRT30BYEOY2HMCH' # your FourSquare Access Token
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
print('ACCESS_TOKEN:' + ACCESS_TOKEN)

Your credentails:
CLIENT_ID: 3BQOUGP0C03BI2IYK4IEZUQYP2ZYLP4KCUU33OU5IECK2XO2
CLIENT_SECRET:OZDZ0IV1ZX4KPUPBMUH4BNBOS21IQXTJ2KWBTPUJAPHCWZ3J
ACCESS_TOKEN:N3UXL2GUX4H1SSNXEVWYMV4D5HN1WU23LQRT30BYEOY2HMCH


In [16]:
df_master['Neighborhood']

0                Alameda
1            Arbor Lodge
3                  Argay
4      Arlington Heights
5                 Arnold
             ...        
88                Vernon
89    West Portland Park
90                Wilkes
92              Woodlawn
93             Woodstock
Name: Neighborhood, Length: 84, dtype: object

In [17]:
neighborhood_latitude = df_master['Latitude'] # neighborhood latitude value
neighborhood_longitude = df_master['Longitude'] # neighborhood longitude value

neighborhood_name = pd.DataFrame(df_master['Neighborhood']) # neighborhood name

neighborhood_name

Unnamed: 0,Neighborhood
0,Alameda
1,Arbor Lodge
3,Argay
4,Arlington Heights
5,Arnold
...,...
88,Vernon
89,West Portland Park
90,Wilkes
92,Woodlawn


In [18]:
search_query = 'Brewery'
radius = 500
print(search_query + ' .... OK!')

Brewery .... OK!


In [19]:
LIMIT = 50000 # limit of number of venues returned by Foursquare API


radius = 50000 # define radius



 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        latitude, 
        longitude,
        VERSION, 
        search_query, 
        radius, 
        LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?client_id=3BQOUGP0C03BI2IYK4IEZUQYP2ZYLP4KCUU33OU5IECK2XO2&client_secret=OZDZ0IV1ZX4KPUPBMUH4BNBOS21IQXTJ2KWBTPUJAPHCWZ3J&ll=45.5202471,-122.6741949&v=20180605&query=Brewery&radius=50000&limit=50000'

In [20]:
##GET request 
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '603436601a62722185e827cc'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Portland',
  'headerFullLocation': 'Portland',
  'headerLocationGranularity': 'city',
  'query': 'brewery',
  'totalResults': 171,
  'suggestedBounds': {'ne': {'lat': 45.97024755000045,
    'lng': -122.0331392384063},
   'sw': {'lat': 45.07024664999955, 'lng': -123.3152505615937}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '48837e4ef964a52036511fe3',
       'name': 'Deschutes Brewery Portland Public House',
       'location': {'address': '210 NW 11th Ave',
        'crossStreet': 'at NW Davis St',
        'lat': 45.524544086316

 Function that extracts the category of the venue

In [21]:
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

In [22]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Returns a dataframe containing the Breweries and Coordinates in Portland

In [23]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']

    
breweries = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.location.lat', 'venue.location.lng']
breweries =breweries.loc[:, filtered_columns]

# filter the category for each row
#breweries['venue.categories'] = breweries.apply(get_category_type, axis=1)

# clean columns
#breweries.columns = [col.split(".")[-1] for col in nearby_venues.columns]

breweries.head(15)



Unnamed: 0,venue.name,venue.location.lat,venue.location.lng
0,Deschutes Brewery Portland Public House,45.524544,-122.681982
1,Cascade Brewing Barrel House,45.516603,-122.655837
2,Breakside Brewery,45.533924,-122.696465
3,Base Camp Brewing,45.519896,-122.656464
4,Hair of the Dog Brewery & Tasting Room,45.515866,-122.665682
5,Hopworks Urban Brewery,45.496928,-122.634908
6,Gigantic Brewing Company,45.48506,-122.639577
7,Labrewatory,45.540887,-122.673251
8,Ecliptic Brewing,45.547326,-122.675073
9,Ex Novo Brewing,45.540049,-122.668583


In [24]:
print('# of Breweries:', breweries.count())

print('Shape:', breweries.shape)

print('Data Types:', breweries.dtypes)    


# of Breweries: venue.name            100
venue.location.lat    100
venue.location.lng    100
dtype: int64
Shape: (100, 3)
Data Types: venue.name             object
venue.location.lat    float64
venue.location.lng    float64
dtype: object


Rename the column headers to Brewery, Latitude, Longitude

In [25]:
breweries.rename(columns={'venue.name': 'Brewery', 'venue.location.lat': 'Latitude', 'venue.location.lng': 'Longitude'}, inplace=True)
breweries.head()

Unnamed: 0,Brewery,Latitude,Longitude
0,Deschutes Brewery Portland Public House,45.524544,-122.681982
1,Cascade Brewing Barrel House,45.516603,-122.655837
2,Breakside Brewery,45.533924,-122.696465
3,Base Camp Brewing,45.519896,-122.656464
4,Hair of the Dog Brewery & Tasting Room,45.515866,-122.665682


Rename df_master column to Postal Code

In [26]:
df_master.rename(columns={'ZIP Code':'Postal Code'}, inplace=True)
df_master.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Population
0,97212,NE,Alameda,45.544,-122.642,24126.0
1,97217,NoPo,Arbor Lodge,45.574,-122.684,31438.0
3,97230,East,Argay,45.547,-122.5,39752.0
4,97201,NW,Arlington Heights,45.508,-122.69,15484.0
5,97219,SW,Arnold,45.458,-122.707,38709.0


Create a map of breweries in Portland

In [27]:
# create map of Portland Breweries using latitude and longitude values
map_breweries = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(breweries['Latitude'], breweries['Longitude'], breweries['Brewery']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_breweries)  
    
map_breweries

In [28]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

 Code below will create a dataframe containing the breweries and coordinates

In [29]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
#nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()



Unnamed: 0,name,lat,lng
0,Deschutes Brewery Portland Public House,45.524544,-122.681982
1,Cascade Brewing Barrel House,45.516603,-122.655837
2,Breakside Brewery,45.533924,-122.696465
3,Base Camp Brewing,45.519896,-122.656464
4,Hair of the Dog Brewery & Tasting Room,45.515866,-122.665682


In [30]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


We will now create a function to get nearby breweries to each neighborhood

In [31]:
def getNearbyBreweries(names, latitudes, longitudes, radius=5000):
    
    brewery_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        brewery_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_breweries = pd.DataFrame([item for brewery_list in brewery_list for item in brewery_list])
    nearby_breweries.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_breweries)

In [32]:
# Create new dataframe
portland_breweries = getNearbyBreweries(names=df_master['Neighborhood'],
                                   latitudes=df_master['Latitude'],
                                   longitudes=df_master['Longitude']
                                  )

Alameda
Arbor Lodge
Argay
Arlington Heights
Arnold
Ash
Beaumont
Boise
Brentwood Darlington
Bridgeton
Bridlemile
Brooklyn
Buckman
Centennial
Collins
Concordia
Creston Kenilworth
Cully
Downtown Portland
East Columbia
Eastmoreland
Eliot
Far Southwest
Forest Park
Foster Powell
Cathedral Park
Glenfair
Goose Hollow
Grant Park
Hayden Island
Hayhurst
Hazelwood
Hillsdale 
Hillside
Hollywood
Hosford Abernethy
Humboldt
Irvington
Kenton
Kerns
King
Laurelhurst
Lents
Linnton
Lloyd District
Madison South
Maplewood
Markham 
Marshall Park
Mill Park
Montavilla
Mt Scott Arieta
Mt Tabor
Multnomah
Northwest Heights
Old Town Chinatown
Overlook
Parkrose Heights
Pearl District
Piedmont
Pleasant Valley
Portsmouth
Powellhurst Gilbert
Reed
Richmond
Rose City Park
Roseway
Russell 
Sabin
Sellwood
South Burlingame
South Tabor
Southwest Hills
St Johns
Sullivans
Summer
Sunnyside
Sylvan Highlands
University Park
Vernon
West Portland Park
Wilkes
Woodlawn
Woodstock


In [33]:
print(portland_breweries.shape)
portland_breweries.head()

(8272, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alameda,45.544,-122.642,Saint Simon Coffee Co.,45.535182,-122.64574,Coffee Shop
1,Alameda,45.544,-122.642,Zama Massage,45.53511,-122.643812,Massage Studio
2,Alameda,45.544,-122.642,Hale Pele,45.535264,-122.637343,Tiki Bar
3,Alameda,45.544,-122.642,Pets on Broadway,45.535045,-122.636965,Pet Store
4,Alameda,45.544,-122.642,Helen Bernhard Bakery,45.535152,-122.648294,Bakery


Looks like we extracted all venue category types from the json file. We will need to select only venue category of "Brewery" 

In [34]:
pdx_breweries=portland_breweries.loc[portland_breweries['Venue Category'] == 'Brewery']
print(pdx_breweries.shape)
pdx_breweries.head()

(299, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
20,Alameda,45.544,-122.642,Great Notion Brewing,45.558881,-122.642771,Brewery
22,Alameda,45.544,-122.642,Culmination Brewing,45.528877,-122.64369,Brewery
39,Alameda,45.544,-122.642,Ex Novo Brewing,45.540049,-122.668583,Brewery
69,Alameda,45.544,-122.642,Ecliptic Brewing,45.547326,-122.675073,Brewery
78,Alameda,45.544,-122.642,StormBreaker Brewing,45.549539,-122.675435,Brewery


We returned 302 breweries nearby to the neighborhoods. We know that there are only 100 breweries in Portland so some breweries are close to multiple Neighborhood geo coordinates

In [35]:
pdx_breweries.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alameda,7,7,7,7,7,7
Arbor Lodge,7,7,7,7,7,7
Argay,1,1,1,1,1,1
Arlington Heights,4,4,4,4,4,4
Arnold,1,1,1,1,1,1
...,...,...,...,...,...,...
Vernon,5,5,5,5,5,5
West Portland Park,1,1,1,1,1,1
Wilkes,1,1,1,1,1,1
Woodlawn,5,5,5,5,5,5


In [36]:
print('There are {} unique breweries.'.format(len(pdx_breweries['Venue'].unique())))

There are 25 unique breweries.


Within our data frame we have 25 breweries within a close range to the Neighborhoods. We will create brewery clusters containing 5 breweries each to determine areas for the brewery bike tour. 

### Analyzing the Neighborhoods

In [37]:
# one hot encoding
portland_onehot = pd.get_dummies(pdx_breweries[['Venue']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
portland_onehot['Neighborhood'] = pdx_breweries['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [portland_onehot.columns[-1]] + list(portland_onehot.columns[:-1])
portland_onehot = portland_onehot[fixed_columns]

portland_onehot.head()

Unnamed: 0,Neighborhood,10 Barrel Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing,Deschutes Brewery Portland Public House,Double Mountain Brewery & Taproom,Ecliptic Brewing,Ex Novo Brewing,...,Modern Times Belmont Fermentorium,Montavilla Brew Works,Occidental Brewing Company,Old Market Pub & Brewery,Ruse Brewing,Sasquatch Brewery,St Johns Beer Porch,StormBreaker Brewing,StormBreaker St. Johns,Zoiglhaus
20,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
22,Alameda,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
39,Alameda,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
69,Alameda,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
78,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


Take the mean frequency and group rows by neighborhood

In [38]:
portland_grouped = portland_onehot.groupby('Neighborhood').mean().reset_index()
portland_grouped

Unnamed: 0,Neighborhood,10 Barrel Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing,Deschutes Brewery Portland Public House,Double Mountain Brewery & Taproom,Ecliptic Brewing,Ex Novo Brewing,...,Modern Times Belmont Fermentorium,Montavilla Brew Works,Occidental Brewing Company,Old Market Pub & Brewery,Ruse Brewing,Sasquatch Brewery,St Johns Beer Porch,StormBreaker Brewing,StormBreaker St. Johns,Zoiglhaus
0,Alameda,0.00,0.0,0.142857,0.142857,0.142857,0.00,0.00,0.142857,0.142857,...,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0
1,Arbor Lodge,0.00,0.0,0.285714,0.000000,0.000000,0.00,0.00,0.142857,0.142857,...,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0
2,Argay,0.00,0.0,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
3,Arlington Heights,0.25,0.0,0.250000,0.000000,0.000000,0.25,0.00,0.000000,0.000000,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
4,Arnold,0.00,0.0,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,...,0.00,0.0,0.0,1.0,0.0,0.0,0.0,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
78,Vernon,0.00,0.0,0.200000,0.000000,0.000000,0.00,0.00,0.200000,0.200000,...,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.200000,0.0,0.0
79,West Portland Park,0.00,0.0,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,...,0.00,0.0,0.0,1.0,0.0,0.0,0.0,0.000000,0.0,0.0
80,Wilkes,0.00,0.0,0.000000,0.000000,0.000000,0.00,0.00,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
81,Woodlawn,0.00,0.0,0.200000,0.000000,0.000000,0.00,0.00,0.200000,0.200000,...,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.200000,0.0,0.0


In [39]:
portland_grouped.shape

(83, 26)

We now have 83 neighborhoods containing 26 different breweries

We will get the Neighborhoods with the top 5 breweries

In [40]:
num_top_breweries = 5

for hood in portland_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = portland_grouped[portland_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_breweries))
    print('\n')

----Alameda----
                          venue  freq
0             Breakside Brewery  0.14
1  Cascade Brewing Barrel House  0.14
2           Culmination Brewing  0.14
3          StormBreaker Brewing  0.14
4              Ecliptic Brewing  0.14


----Arbor Lodge----
                  venue  freq
0     Breakside Brewery  0.29
1  Great Notion Brewing  0.29
2  StormBreaker Brewing  0.14
3      Ecliptic Brewing  0.14
4       Ex Novo Brewing  0.14


----Argay----
                    venue  freq
0              Level Beer   1.0
1       10 Barrel Brewing   0.0
2  StormBreaker St. Johns   0.0
3    StormBreaker Brewing   0.0
4     St Johns Beer Porch   0.0


----Arlington Heights----
                                     venue  freq
0                        10 Barrel Brewing  0.25
1                        Breakside Brewery  0.25
2  Deschutes Brewery Portland Public House  0.25
3        Modern Times Belmont Fermentorium  0.25
4         Little Beast Brewing Beer Garden  0.00


----Arnold----
       

                          venue  freq
0             Breakside Brewery  0.14
1  Cascade Brewing Barrel House  0.14
2           Culmination Brewing  0.14
3          StormBreaker Brewing  0.14
4              Ecliptic Brewing  0.14


----Hosford Abernethy----
                               venue  freq
0             Hopworks Urban Brewery   0.2
1   Little Beast Brewing Beer Garden   0.2
2  Double Mountain Brewery & Taproom   0.2
3                       Ruse Brewing   0.2
4           Gigantic Brewing Company   0.2


----Humboldt----
                  venue  freq
0     Breakside Brewery  0.29
1  Great Notion Brewing  0.29
2  StormBreaker Brewing  0.14
3      Ecliptic Brewing  0.14
4       Ex Novo Brewing  0.14


----Irvington----
                          venue  freq
0             Breakside Brewery  0.14
1  Cascade Brewing Barrel House  0.14
2           Culmination Brewing  0.14
3          StormBreaker Brewing  0.14
4              Ecliptic Brewing  0.14


----Kenton----
                  venu

                               venue  freq
0             Hopworks Urban Brewery   0.2
1   Little Beast Brewing Beer Garden   0.2
2       Cascade Brewing Barrel House   0.2
3                Culmination Brewing   0.2
4  Modern Times Belmont Fermentorium   0.2


----Sylvan Highlands----
                    venue  freq
0       Sasquatch Brewery   1.0
1       10 Barrel Brewing   0.0
2              Level Beer   0.0
3  StormBreaker St. Johns   0.0
4    StormBreaker Brewing   0.0


----University Park----
                        venue  freq
0      StormBreaker St. Johns  0.33
1         St Johns Beer Porch  0.33
2  Occidental Brewing Company  0.33
3           10 Barrel Brewing  0.00
4                  Level Beer  0.00


----Vernon----
                  venue  freq
0     Breakside Brewery   0.2
1  StormBreaker Brewing   0.2
2      Ecliptic Brewing   0.2
3       Ex Novo Brewing   0.2
4  Great Notion Brewing   0.2


----West Portland Park----
                      venue  freq
0  Old Market Pub & B

Now we will create a dataframe

In [41]:
def return_most_common_breweries(row, num_top_breweries):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_breweries]

In [42]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_breweries):
    try:
        columns.append('{}{} Most Common Brewery'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Brewery'.format(ind+1))

# create a new dataframe
neighborhoods_breweries_sorted = pd.DataFrame(columns=columns)
neighborhoods_breweries_sorted['Neighborhood'] = portland_grouped['Neighborhood']

for ind in np.arange(portland_grouped.shape[0]):
    neighborhoods_breweries_sorted.iloc[ind, 1:] = return_most_common_breweries(portland_grouped.iloc[ind, :], num_top_breweries)

neighborhoods_breweries_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Brewery,2nd Most Common Brewery,3rd Most Common Brewery,4th Most Common Brewery,5th Most Common Brewery
0,Alameda,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing
1,Arbor Lodge,Great Notion Brewing,Breakside Brewery,StormBreaker Brewing,Ecliptic Brewing,Ex Novo Brewing
2,Argay,Level Beer,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
3,Arlington Heights,10 Barrel Brewing,Breakside Brewery,Deschutes Brewery Portland Public House,Modern Times Belmont Fermentorium,Great Notion Brewing
4,Arnold,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
5,Ash,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
6,Beaumont,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing
7,Boise,Great Notion Brewing,Breakside Brewery,StormBreaker Brewing,Ecliptic Brewing,Ex Novo Brewing
8,Brentwood Darlington,Hopworks Urban Brewery,Double Mountain Brewery & Taproom,Little Beast Brewing Beer Garden,Gigantic Brewing Company,Great Notion Brewing
9,Bridgeton,Great Notion Brewing,Breakside Brewery,StormBreaker Brewing,Ecliptic Brewing,Ex Novo Brewing


### Cluster the Neighborhoods

K Nearest Neighbor(hood)

In [46]:
# set number of clusters
kclusters = 5

portland_grouped_clustering = portland_grouped.drop('Neighborhood', 1)

# run k-means clustering
means = KMeans(n_clusters=kclusters, random_state=0).fit(portland_grouped_clustering)

# check cluster labels generated for each row in the dataframe
means.labels_[0:10] 

array([1, 1, 4, 1, 2, 2, 1, 1, 0, 1], dtype=int32)

New Dataframe with breweries and clusters

In [54]:
# add clustering labels

portland_merged = df_master

# merge portland_grouped with portland_data to add latitude/longitude for each neighborhood
portland_merged = portland_merged.join(neighborhoods_breweries_sorted.set_index('Neighborhood'), on='Neighborhood')

portland_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Brewery,2nd Most Common Brewery,3rd Most Common Brewery,4th Most Common Brewery,5th Most Common Brewery
0,97212,NE,Alameda,45.544,-122.642,24126.0,1.0,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing
1,97217,NoPo,Arbor Lodge,45.574,-122.684,31438.0,1.0,Great Notion Brewing,Breakside Brewery,StormBreaker Brewing,Ecliptic Brewing,Ex Novo Brewing
3,97230,East,Argay,45.547,-122.5,39752.0,4.0,Level Beer,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
4,97201,NW,Arlington Heights,45.508,-122.69,15484.0,1.0,10 Barrel Brewing,Breakside Brewery,Deschutes Brewery Portland Public House,Modern Times Belmont Fermentorium,Great Notion Brewing
5,97219,SW,Arnold,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery


Visualize the clusters

In [100]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(portland_merged['Latitude'], portland_merged['Longitude'], portland_merged['Neighborhood'], portland_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### **Examine Each Cluster**

Cluster 1

In [110]:
cluster_1=portland_merged.loc[portland_merged['Cluster Labels'] == 0, portland_merged.columns[[1,2,3,4] + list(range(5, portland_merged.shape[1]))]]
print(cluster_1.shape)
cluster_1

(32, 11)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Brewery,2nd Most Common Brewery,3rd Most Common Brewery,4th Most Common Brewery,5th Most Common Brewery
9,SE,Brentwood Darlington,45.484,-122.597,47596.0,0.0,Hopworks Urban Brewery,Double Mountain Brewery & Taproom,Little Beast Brewing Beer Garden,Gigantic Brewing Company,Great Notion Brewing
11,SW,Bridlemile,45.492,-122.727,11630.0,0.0,Sasquatch Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
12,SE,Brooklyn,45.484,-122.637,38762.0,0.0,Hopworks Urban Brewery,Ruse Brewing,Double Mountain Brewery & Taproom,Little Beast Brewing Beer Garden,Gigantic Brewing Company
13,SE,Buckman,45.514,-122.636,23813.0,0.0,Hopworks Urban Brewery,Cascade Brewing Barrel House,Culmination Brewing,Modern Times Belmont Fermentorium,Little Beast Brewing Beer Garden
19,SE,Creston Kenilworth,45.484,-122.637,38762.0,0.0,Hopworks Urban Brewery,Ruse Brewing,Double Mountain Brewery & Taproom,Little Beast Brewing Beer Garden,Gigantic Brewing Company
24,SE,Eastmoreland,45.484,-122.637,38762.0,0.0,Hopworks Urban Brewery,Ruse Brewing,Double Mountain Brewery & Taproom,Little Beast Brewing Beer Garden,Gigantic Brewing Company
27,NW,Forest Park,45.548,-122.828,58217.0,0.0,Great Notion Beaverton,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
28,SE,Foster Powell,45.484,-122.597,47596.0,0.0,Hopworks Urban Brewery,Double Mountain Brewery & Taproom,Little Beast Brewing Beer Garden,Gigantic Brewing Company,Great Notion Brewing
29,NoPo,Cathedral Park,45.589,-122.735,31042.0,0.0,St Johns Beer Porch,Occidental Brewing Company,StormBreaker St. Johns,Zoiglhaus,Great Notion Beaverton
31,NW,Goose Hollow,45.518,-122.674,1036.0,0.0,Cascade Brewing Barrel House,Deschutes Brewery Portland Public House,Modern Times Belmont Fermentorium,Zoiglhaus,Great Notion Brewing


Cluster 2

In [111]:
cluster_2=portland_merged.loc[portland_merged['Cluster Labels'] == 1, portland_merged.columns[[1,2,3,4] + list(range(5, portland_merged.shape[1]))]]
print(cluster_2.shape)
cluster_2


(28, 11)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Brewery,2nd Most Common Brewery,3rd Most Common Brewery,4th Most Common Brewery,5th Most Common Brewery
0,NE,Alameda,45.544,-122.642,24126.0,1.0,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing
1,NoPo,Arbor Lodge,45.574,-122.684,31438.0,1.0,Great Notion Brewing,Breakside Brewery,StormBreaker Brewing,Ecliptic Brewing,Ex Novo Brewing
4,NW,Arlington Heights,45.508,-122.69,15484.0,1.0,10 Barrel Brewing,Breakside Brewery,Deschutes Brewery Portland Public House,Modern Times Belmont Fermentorium,Great Notion Brewing
7,NE,Beaumont,45.544,-122.642,24126.0,1.0,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing
8,NoPo,Boise,45.55,-122.674,3847.0,1.0,Great Notion Brewing,Breakside Brewery,StormBreaker Brewing,Ecliptic Brewing,Ex Novo Brewing
10,NoPo,Bridgeton,45.574,-122.684,31438.0,1.0,Great Notion Brewing,Breakside Brewery,StormBreaker Brewing,Ecliptic Brewing,Ex Novo Brewing
17,NE,Concordia,45.565,-122.645,31254.0,1.0,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Ecliptic Brewing,Ex Novo Brewing
22,PDX,Downtown Portland,45.521,-122.689,7688.0,1.0,Breakside Brewery,Deschutes Brewery Portland Public House,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod
23,NoPo,East Columbia,45.565,-122.645,31254.0,1.0,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Ecliptic Brewing,Ex Novo Brewing
25,NoPo,Eliot,45.544,-122.642,24126.0,1.0,Great Notion Brewing,StormBreaker Brewing,Breakside Brewery,Cascade Brewing Barrel House,Culmination Brewing


Cluster 3

In [112]:
cluster_3=portland_merged.loc[portland_merged['Cluster Labels'] == 2, portland_merged.columns[[1,2,3,4] + list(range(5, portland_merged.shape[1]))]]
print(cluster_3.shape)
cluster_3

(10, 11)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Brewery,2nd Most Common Brewery,3rd Most Common Brewery,4th Most Common Brewery,5th Most Common Brewery
5,SW,Arnold,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
6,SW,Ash,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
16,SW,Collins,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
26,SW,Far Southwest,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
52,SW,Maplewood,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
53,SW,Markham,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
54,SW,Marshall Park,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
59,SW,Multnomah,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
79,SW,South Burlingame,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
89,SW,West Portland Park,45.458,-122.707,38709.0,2.0,Old Market Pub & Brewery,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery


Cluster 4

In [113]:
cluster_4=portland_merged.loc[portland_merged['Cluster Labels'] == 3, portland_merged.columns[[1,2,3,4] + list(range(5, portland_merged.shape[1]))]]
print(cluster_4.shape)
cluster_4

(9, 11)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Brewery,2nd Most Common Brewery,3rd Most Common Brewery,4th Most Common Brewery,5th Most Common Brewery
21,NE,Cully,45.56,-122.6,14561.0,3.0,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Zoiglhaus,Breakside Brewery,Cascade Brewing Barrel House
35,East,Hazelwood,45.514,-122.557,15594.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Montavilla Brew Works,Zoiglhaus,Great Notion Brewing,Breakside Brewery
51,NE,Madison South,45.541,-122.557,28495.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Montavilla Brew Works,Level Beer,Zoiglhaus,Great Notion Beaverton
55,East,Mill Park,45.514,-122.557,15594.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Montavilla Brew Works,Zoiglhaus,Great Notion Brewing,Breakside Brewery
56,SE,Montavilla,45.537,-122.599,29219.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Zoiglhaus,Great Notion Brewing,Breakside Brewery,Cascade Brewing Barrel House
65,East,Parkrose Heights,45.541,-122.557,28495.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Montavilla Brew Works,Level Beer,Zoiglhaus,Great Notion Beaverton
74,NE,Rose City Park,45.537,-122.599,29219.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Zoiglhaus,Great Notion Brewing,Breakside Brewery,Cascade Brewing Barrel House
75,NE,Roseway,45.537,-122.599,29219.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Zoiglhaus,Great Notion Brewing,Breakside Brewery,Cascade Brewing Barrel House
84,NE,Summer,45.541,-122.557,28495.0,3.0,Baerlic Brewing Beer Hall at the Barley Pod,Montavilla Brew Works,Level Beer,Zoiglhaus,Great Notion Beaverton


Cluster 5

In [114]:
cluster_5=portland_merged.loc[portland_merged['Cluster Labels'] == 4, portland_merged.columns[[1,2,3,4] + list(range(5, portland_merged.shape[1]))]]
print(cluster_5.shape)
cluster_5

(4, 11)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Population,Cluster Labels,1st Most Common Brewery,2nd Most Common Brewery,3rd Most Common Brewery,4th Most Common Brewery,5th Most Common Brewery
3,East,Argay,45.547,-122.5,39752.0,4.0,Level Beer,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
30,East,Glenfair,45.547,-122.5,39752.0,4.0,Level Beer,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
76,East,Russell,45.547,-122.5,39752.0,4.0,Level Beer,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery
90,East,Wilkes,45.547,-122.5,39752.0,4.0,Level Beer,Zoiglhaus,Great Notion Brewing,Baerlic Brewing Beer Hall at the Barley Pod,Breakside Brewery


## Results and Discussion


Portland, Oregon is a city with 5 boroughs, 93 neighborhoods and over 50 craft breweries in the metro area. After cleaning up the dataset by removing entries with missing values for latitude and longitude we were able to analyze 83 neighborhoods containing 26 breweries.  

I used k means algorithm with k set to 5 to determine clusters in the neighborhoods suitable to operate a brewery bike tour in. Looking at the brewery map you can see pockets in the city with more breweries than others. The algorithm returned 5 clusters with each containing different neighborhoods and then ranked the top 5 breweries in each of the neighborhoods. 

Cluster 1 returned 32 neighborhoods, Cluster 2 returned 28, Cluster 3 returned 10, Cluster 4 returned 9 and Cluster 5 returned 4. From this alone Cluster 1 or Cluster 2 would be the preferred choices to open a brewery bike tour. The results were separated by neighborhood and breweries based on location to each other and popularity within the neighborhood. So, each row in the Cluster dataframe will give us the stops on the brewery tour depending on what neighborhood we are starting in. 



## Conclusion 

The purpose of this study was to determine where in Portland, Oregon would be the best spot to open a brewery bike tour. The study returned promising results with 1 cluster containing 32 neighborhoods or 32 different brewery tour options. 

Why stop in Portland? This study can be used for any city in the U.S. to analyze the city to determine the best spot to operate a brewery bike tour.  With the popularity of craft beer throughout the country it would be advantageous to open brewery bike tours in multiple cities. Next steps would be to compare city to city and city neighborhood to city neighborhood data to make a business plan for expanding the brewery bike tour nationally.
