# Introduction

## Problem

About a quarter million people are moving to Colorado every year. Many of these people are looking to set up new business and may not know what city they would want to set up their business. Specifically where in each city. This is important so people who are starting up their business are in a great location that will make them successful.

### Data

The data I will be using are zip codes of all the cities in colorado with a population count. I will be joining these with other zip code data sets that are specific to cities. I will be using this data to find optimal cities to build a specific business in and what areas in those cities are best. I will be using foursquare to find information on each cities and see what kind of business are in each zip codes in order to provide feedback on what businesses would be ideal in each area. Some of the features we will get from this data are the different types of business in an area and how many of them there are in that area. This can be a clear indicator of what business would be best in what areas.

In [105]:
# importing packages
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 
import seaborn as sns
from geopy.geocoders import Nominatim
import sklearn
from sklearn.neighbors import NearestNeighbors

from sklearn.linear_model import LogisticRegression

In [2]:
# Importing data

df_nb = pd.read_csv ('/Users/jairgalindoflores/Documents/GitHub/IBM-Capstone/De\
nver_neighborhoods_data.csv', index_col=0) 

df_zip = pd.read_csv ('/Users/jairgalindoflores/Documents/GitHub/IBM-Capstone/u\
szips.csv')

In [43]:
# joining data

df_merged= pd.merge(df_nb, df_zip, on='zip')

df = df_merged[['Neighborhood', 'zip', 'lat', 'lng']]
df.head(10)

(78, 4)

## Exploratory Data Analysis

#### Understanding data

In [4]:
# getting the shape of the dataframe
print('The data frame has', df.shape[0], 'neighborhoods and a total of',
      df.shape[1], 'columns.')

The data frame has 78 neighborhoods and a total of 4 columns.


In [5]:
# data types of the columns 
df.dtypes 

Neighborhood     object
zip               int64
lat             float64
lng             float64
dtype: object

In [6]:
df.nunique(axis=0)


Neighborhood    78
zip             32
lat             32
lng             32
dtype: int64

What we can see from this function is that a lot of the neighborhoods are very close to each other. What we might have to do is remove or merge together some of the neighborhoods that have data that are the same.

In [7]:
# get the coordinatesg for Denver

geolocator = Nominatim(user_agent='co_explorer')
location = geolocator.geocode('Denver, Colorado')
lat = location.latitude
lng = location.longitude
print('The geographical coordinate of Denver is {} {}.'.format(
    location.latitude, location.longitude))

The geographical coordinate of Denver is 39.7392364 -104.9848623.


In [8]:

# create map of Toronto using latitude and longitude values
denver_map = folium.Map(location=[lat, lng], zoom_start=10)
# add markers to map
for lat, lng, neighborhood in zip(df['lat'], df['lng'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(denver_map)  
    
denver_map


In [9]:
# Define Foursquare Credentials and Version
CLIENT_ID = '0BKDKQNCTVRN2LTH0WDXJ4FDSHNE3KMHCMZWK5HM3GXTD5KU'
CLIENT_SECRET = '53QE3SDM52XLBB3EUB4PSEFVRUYNSOXQP3XCNJ0FV3KLEQK4'
VERSION = '20180605'
LIMIT = 100

In [10]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
# function to apply to individual neighborhoods in toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
denver_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['lat'],
                                   longitudes=df['lng']
                                  )

Athmar Park 
Overland 
Valverde 
Auraria 
Barnum 
Lincoln Park 
Platt Park 
Sun Valley 
Villa Park 
West Colfax 
Baker 
Belcaro 
Cherry Creek 
Washington Park 
Washington Park West 
Barnum West 
Harvey Park 
Mar Lee 
Westwood 
Bear Valley 
Berkeley 
Highland 
Regis 
Sloan Lake 
Capitol Hill 
Speer 
North Capitol Hill 
City Park West 
Country Club 
CBD (downtown) 
Civic Center 
Union Station (Lodo) 
Central Park 
Chaffee Park 
Ruby Hill 
Cheesman Park 
Congress Park 
City Park 
Clayton 
Cole 
Skyland 
Whittier 
College View / South Platte 
Harvey Park South 
Cory-Merrill  
Rosedale 
University 
University Park 
Wellshire 
DIA 
Gateway / Green Valley Ranch 
East Colfax 
Hale 
Montclair 
South Park Hill
Elyria-Swansea 
Globeville 
Five Points 
Fort Logan 
Goldsmith 
University Hills 
Hampden 
Southmoor Park 
Hampden South 
Hilltop 
Virginia Village 
Washington Virginia Vale 
Indian Creek 
Jefferson Park 
Sunnyside 
West Highland 
Kennedy 
Lowry Field 
Marston 
Montbello 
Northeast Park Hi

In [13]:
denver_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Athmar Park,39.69617,-105.00186,Chain Reaction Brewery,39.699577,-105.001335,Brewery
1,Athmar Park,39.69617,-105.00186,Rome's Saloon,39.696839,-104.996446,Sports Bar
2,Athmar Park,39.69617,-105.00186,National Barricade Co Of Denver Inc,39.695881,-105.000135,Clothing Store
3,Athmar Park,39.69617,-105.00186,Kroger Central Fill,39.694432,-105.001271,Pharmacy
4,Athmar Park,39.69617,-105.00186,Tony's Handy Man Services,39.696418,-105.004287,Home Service
...,...,...,...,...,...,...,...
1351,Windsor,39.69715,-104.88179,Breakers Cafe,39.699197,-104.883001,Bar
1352,Windsor,39.69715,-104.88179,Sunfish Lake Trail,39.699125,-104.879405,Lake
1353,Windsor,39.69715,-104.88179,Windsor Lake,39.699067,-104.879317,Lake
1354,Windsor,39.69715,-104.88179,The Pool,39.694125,-104.883602,Pool


In [14]:
denver_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Athmar Park,10,10,10,10,10,10
Auraria,14,14,14,14,14,14
Baker,3,3,3,3,3,3
Barnum West,3,3,3,3,3,3
Barnum,14,14,14,14,14,14
...,...,...,...,...,...,...
West Colfax,14,14,14,14,14,14
West Highland,22,22,22,22,22,22
Westwood,3,3,3,3,3,3
Whittier,9,9,9,9,9,9


In [15]:
print('There are {} uniques categories.'.format(len(denver_venues['Venue Category'].unique())))

There are 162 uniques categories.


In [32]:
# one hot encoding
denver_onehot = pd.get_dummies(denver_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
denver_onehot['Neighborhood'] = denver_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [denver_onehot.columns[0]] + list(denver_onehot.columns[:-1])
denver_onehot = denver_onehot[fixed_columns]

denver_onehot.head()

Unnamed: 0,Acupuncturist,Acupuncturist.1,Alternative Healer,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,...,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
denver_grouped = denver_onehot.groupby('Neighborhood').mean().reset_index()
denver_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Acupuncturist,Alternative Healer,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar
0,Athmar Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Auraria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Baker,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Barnum West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Barnum,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
num_top_venues = 5

# function with neghbor goods top 5 most common venues
for hood in denver_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = denver_grouped[denver_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Athmar Park ----
                        venue  freq
0           Convenience Store   0.2
1                  Sports Bar   0.1
2                        Park   0.1
3                     Brewery   0.1
4  Construction & Landscaping   0.1


----Auraria ----
        venue  freq
0  Print Shop  0.07
1    Pharmacy  0.07
2        Café  0.07
3  Taco Place  0.07
4         Gym  0.07


----Baker ----
                           venue  freq
0                    Pizza Place  0.33
1                    Flower Shop  0.33
2                 Scenic Lookout  0.33
3                    Yoga Studio  0.00
4  Paper / Office Supplies Store  0.00


----Barnum West ----
                           venue  freq
0                           Park  0.33
1                      Disc Golf  0.33
2                      Locksmith  0.33
3  Paper / Office Supplies Store  0.00
4                      Nightclub  0.00


----Barnum ----
        venue  freq
0  Print Shop  0.07
1    Pharmacy  0.07
2        Café  0.07
3  Taco Place  0.0

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = denver_grouped['Neighborhood']

for ind in np.arange(denver_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(denver_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Athmar Park,Convenience Store,Gym,Pharmacy,Clothing Store,Park,Brewery,Construction & Landscaping,Sports Bar,Home Service,Disc Golf
1,Auraria,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store
2,Baker,Pizza Place,Scenic Lookout,Flower Shop,Wine Bar,Distillery,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office
3,Barnum West,Locksmith,Disc Golf,Park,Wine Bar,Dive Bar,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run
4,Barnum,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store
...,...,...,...,...,...,...,...,...,...,...,...
72,West Colfax,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store
73,West Highland,Marijuana Dispensary,Convenience Store,Chinese Restaurant,Breakfast Spot,Bed & Breakfast,Gym / Fitness Center,Grocery Store,Liquor Store,Mexican Restaurant,Event Space
74,Westwood,Locksmith,Disc Golf,Park,Wine Bar,Dive Bar,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run
75,Whittier,Park,Food,French Restaurant,Rental Car Location,Trail,Bar,Dog Run,Grocery Store,Dive Bar,Fast Food Restaurant


## Clustering Neighborhoods

In [21]:
# set number of clusters
kclusters = 5

denver_grouped_clustering = denver_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(
    denver_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 4, 2, 1, 0, 4, 1, 1, 1, 1, 1, 2, 4, 1, 0, 1, 0, 0, 1, 2, 1,
       1, 3, 1, 0, 1, 0, 3, 0, 1, 1, 0, 1, 1, 2, 1, 0, 1, 1, 1, 1, 2, 1,
       2, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 4, 4, 0, 1, 1, 1, 2, 0, 1], dtype=int32)

In [50]:
denver_merged = df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
denver_merged = denver_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

denver_merged.head() # check the last columns!

denver_merged.dropna(inplace=True)

denver_merged["Cluster Labels"] = kmeans.labels_

denver_merged.drop(['lat', 'lng'], inplace=True, axis = 1)

denver_merged.head() # check the last columns!



Unnamed: 0,Neighborhood,zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Athmar Park,80223,Convenience Store,Gym,Pharmacy,Clothing Store,Park,Brewery,Construction & Landscaping,Sports Bar,Home Service,Disc Golf,1
1,Overland,80223,Convenience Store,Gym,Pharmacy,Clothing Store,Park,Brewery,Construction & Landscaping,Sports Bar,Home Service,Disc Golf,1
2,Valverde,80223,Convenience Store,Gym,Pharmacy,Clothing Store,Park,Brewery,Construction & Landscaping,Sports Bar,Home Service,Disc Golf,4
3,Auraria,80204,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,2
4,Barnum,80204,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,1


In [23]:
map_clusters = folium.Map(location=[lat, lng], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(denver_merged['lat'], 
                                  denver_merged['lng'], 
                                  denver_merged['Neighborhood'], 
                                  denver_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [24]:
denver_merged.loc[denver_merged['Cluster Labels'] == 0, denver_merged.columns[[0] + list(range(4, denver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
5,Lincoln Park,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,0
15,Barnum West,Locksmith,Disc Golf,Park,Wine Bar,Dive Bar,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,0
17,Mar Lee,Locksmith,Disc Golf,Park,Wine Bar,Dive Bar,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,0
18,Westwood,Locksmith,Disc Golf,Park,Wine Bar,Dive Bar,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,0
25,Speer,Nightclub,Italian Restaurant,Coffee Shop,Bar,Pizza Place,Mexican Restaurant,Convenience Store,Jazz Club,Karaoke Bar,Marijuana Dispensary,0
27,City Park West,Coffee Shop,Park,Pilates Studio,Gym / Fitness Center,Pet Store,Asian Restaurant,Grocery Store,Track,Hardware Store,Video Store,0
29,CBD (downtown),Hotel,Coffee Shop,Mexican Restaurant,American Restaurant,Seafood Restaurant,Plaza,New American Restaurant,Asian Restaurant,Bar,Cocktail Bar,0
32,Central Park,Gym,Coffee Shop,American Restaurant,Breakfast Spot,Wine Bar,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,0
37,City Park,Park,Food,French Restaurant,Rental Car Location,Trail,Bar,Dog Run,Grocery Store,Dive Bar,Fast Food Restaurant,0
47,University Park,Breakfast Spot,Sandwich Place,Café,Japanese Restaurant,Coffee Shop,Bar,Ice Cream Shop,Performing Arts Venue,Burrito Place,Chinese Restaurant,0


In [25]:
denver_merged.loc[denver_merged['Cluster Labels'] == 1, denver_merged.columns[[0] + list(range(4, denver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Athmar Park,Convenience Store,Gym,Pharmacy,Clothing Store,Park,Brewery,Construction & Landscaping,Sports Bar,Home Service,Disc Golf,1
1,Overland,Convenience Store,Gym,Pharmacy,Clothing Store,Park,Brewery,Construction & Landscaping,Sports Bar,Home Service,Disc Golf,1
4,Barnum,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,1
7,Sun Valley,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,1
8,Villa Park,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,1
9,West Colfax,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,1
10,Baker,Pizza Place,Scenic Lookout,Flower Shop,Wine Bar,Distillery,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office,1
11,Belcaro,Pizza Place,Scenic Lookout,Flower Shop,Wine Bar,Distillery,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office,1
14,Washington Park West,Pizza Place,Scenic Lookout,Flower Shop,Wine Bar,Distillery,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office,1
16,Harvey Park,Locksmith,Disc Golf,Park,Wine Bar,Dive Bar,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,1


In [26]:
denver_merged.loc[denver_merged['Cluster Labels'] == 2, denver_merged.columns[[0] + list(range(4, denver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
3,Auraria,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,2
12,Cherry Creek,Pizza Place,Scenic Lookout,Flower Shop,Wine Bar,Distillery,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office,2
20,Berkeley,Coffee Shop,Grocery Store,Sushi Restaurant,Pizza Place,Music Venue,Mexican Restaurant,Breakfast Spot,Brewery,Liquor Store,Gym / Fitness Center,2
35,Cheesman Park,Park,Wine Bar,Dive Bar,Flower Shop,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office,2
42,College View / South Platte,Coffee Shop,Fast Food Restaurant,Big Box Store,Furniture / Home Store,Pizza Place,Bakery,Pet Store,Pharmacy,Mobile Phone Shop,Salon / Barbershop,2
44,Cory-Merrill,Breakfast Spot,Sandwich Place,Café,Japanese Restaurant,Coffee Shop,Bar,Ice Cream Shop,Performing Arts Venue,Burrito Place,Chinese Restaurant,2
75,Northeast Park Hill,Trail,Park,Wine Bar,Distillery,Flea Market,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office,2


In [27]:
denver_merged.loc[denver_merged['Cluster Labels'] == 3, denver_merged.columns[[0] + list(range(4, denver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
23,Sloan Lake,Coffee Shop,Grocery Store,Sushi Restaurant,Pizza Place,Music Venue,Mexican Restaurant,Breakfast Spot,Brewery,Liquor Store,Gym / Fitness Center,3
28,Country Club,Coffee Shop,Park,Pilates Studio,Gym / Fitness Center,Pet Store,Asian Restaurant,Grocery Store,Track,Hardware Store,Video Store,3


In [28]:
denver_merged.loc[denver_merged['Cluster Labels'] == 4, denver_merged.columns[[0] + list(range(4, denver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
2,Valverde,Convenience Store,Gym,Pharmacy,Clothing Store,Park,Brewery,Construction & Landscaping,Sports Bar,Home Service,Disc Golf,4
6,Platt Park,Convenience Store,Distillery,Mexican Restaurant,Spa,Pharmacy,Baseball Field,Taco Place,Café,Light Rail Station,Discount Store,4
13,Washington Park,Pizza Place,Scenic Lookout,Flower Shop,Wine Bar,Distillery,Fast Food Restaurant,Eye Doctor,Event Space,Dog Run,Doctor's Office,4
68,Jefferson Park,Marijuana Dispensary,Convenience Store,Chinese Restaurant,Breakfast Spot,Bed & Breakfast,Gym / Fitness Center,Grocery Store,Liquor Store,Mexican Restaurant,Event Space,4
69,Sunnyside,Marijuana Dispensary,Convenience Store,Chinese Restaurant,Breakfast Spot,Bed & Breakfast,Gym / Fitness Center,Grocery Store,Liquor Store,Mexican Restaurant,Event Space,4


## K Nearest Neighbor

In [84]:
df_dg = pd.merge(denver_grouped, df, on='Neighborhood')
df_dg.drop(['lat', 'lng'], inplace=True, axis = 1)
cols = df_dg.columns[1:162]
features_matix = df_dg.iloc[:,1:162].values

knn =  NearestNeighbors(n_neighbors=1).fit(features_matix)

## Logistic Regression

In [239]:
f = denver_merged.iloc[:,2:5]

f

df_hot = pd.get_dummies(f.iloc[:,[0,1,2]], prefix="", prefix_sep="")

df_hot = df_hot.merge(denver_merged['zip'], left_index=True, right_index=True)


X = df_hot.iloc[:,0:74]

y = df_hot['zip']

lr_model = LogisticRegression()
lr_model.fit(X,y)

df_hot.columns



cols = df_hot.columns[0:74]

venues = [[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
           0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
           0,0,0,0,0,0]

]
i = 0
for x in venues:
    for y in x:
        print(cols[i])
        if y == 0:
            venues[0][i] = 1

        else:
            venues[0][i] = 1
            venues[0][i-1] = 0
            
        y_pred = lr_model.predict(venues)
        print(y_pred)
        i+=1     
    


Alternative Healer
[80216]
Beer Garden
[80216]
Breakfast Spot
[80210]
Burger Joint
[80210]
Coffee Shop
[80210]
Construction & Landscaping
[80210]
Convenience Store
[80204]
Cosmetics Shop
[80204]
Furniture / Home Store
[80204]
Gym
[80204]
Hotel
[80210]
Lake
[80210]
Locksmith
[80210]
Marijuana Dispensary
[80210]
Mexican Restaurant
[80210]
Nightclub
[80210]
Other Repair Shop
[80210]
Park
[80210]
Pizza Place
[80204]
Rock Club
[80204]
Sporting Goods Shop
[80204]
Trail
[80204]
Art Gallery
[80123]
Bar
[80247]
Baseball Stadium
[80236]
Café
[80221]
Cocktail Bar
[80221]
Coffee Shop
[80220]
Convenience Store
[80220]
Disc Golf
[80219]
Discount Store
[80219]
Distillery
[80204]
Fast Food Restaurant
[80204]
Food
[80204]
Grocery Store
[80204]
Gym
[80205]
Italian Restaurant
[80205]
Mexican Restaurant
[80205]
Park
[80205]
Pharmacy
[80205]
Pool
[80204]
Recreation Center
[80204]
Sandwich Place
[80204]
Scenic Lookout
[80204]
Trail
[80204]
Train Station
[80204]
Wine Bar
[80204]
American Restaurant
[80238]
A

In [241]:
df_hot = pd.get_dummies(denver_merged.iloc[:,2], prefix="", prefix_sep="")

X = df_dg.iloc[:, 1:162]

df_hot = df_hot.merge(denver_merged['zip'], left_index=True, right_index=True)

df_hot


X = df_hot.iloc[:,0:22]

y = df_hot['zip']

lr_model = LogisticRegression()
lr_model.fit(X,y)

df_hot.columns


cols = df_hot.columns.drop('zip')

venues = [[1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0]]

            
y_pred = lr_model.predict(venues)
print(y_pred)
     
        



[80204]


## Methodology

Some of the exploratory data analsis that I completed was learning a little more about the data I had create. I found out a little about each city, the size of the dataframe, and learning more about the data types. I obersved some of the dublicate values that were in the data set and was able to handle them accordingling, 

## Results