<h2> Data science Capstone Project </h2>
<h3>Opening a new coffee shop in Cincinnati</h3>

Build a dataframe of neighborhoods in Cincinnati, Ohio by web scraping the data from Wikipedia page<br>
Get the geographical coordinates of the neighborhoods<br>
Obtain the venue data for the neighborhoods from Foursquare API<br>
Explore and cluster the neighborhoods<br>
Select the best cluster to open a new coffee shop<br>

<b>Importing Libraries</b>

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

!conda install -c conda-forge geocoder --yes
import geocoder

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
! conda install -c conda-forge bs4 --yes
from bs4 import BeautifulSoup # library to parse HTML and XML documents

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

<b>Scrapping data from wikipedia to a data frame</b>

In [3]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighborhoods_in_Cincinnati").text

In [4]:
soup = BeautifulSoup(data, 'html.parser')

In [5]:
neighborhoodList = []

In [6]:
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [9]:
cn_df = pd.DataFrame({"Neighborhood": neighborhoodList})

cn_df.shape

(53, 1)

<b> Getting the geographical coordinates</b>

In [10]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Ohio'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [11]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in cn_df["Neighborhood"].tolist() ]

In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [13]:
# merge the coordinates into the original dataframe
cn_df['Latitude'] = df_coords['Latitude']
cn_df['Longitude'] = df_coords['Longitude']
cn_df.drop([0], axis = 0, inplace = True)

In [14]:
# check the neighborhoods and the coordinates
print(cn_df.shape)
cn_df.head()

(52, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
1,"Avondale, Cincinnati",39.14771,-84.4949
2,"Bond Hill, Cincinnati",39.1746,-84.46715
3,"California, Cincinnati",39.06536,-84.42365
4,"Camp Washington, Cincinnati",39.13691,-84.5373
5,"Carthage, Cincinnati",39.19733,-84.48062


In [15]:
# save the DataFrame as CSV file
cn_df.to_csv("cn_df.csv", index=False)

<b> Creating a Map of neignborhoods of Cincinnati </b>

In [17]:
# get the coordinates of Kuala Lumpur
address = 'Cincinnati, Ohio'

geolocator = Nominatim(user_agent="myapplication")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Cincinnati, Ohio {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cincinnati, Ohio 39.1014537, -84.5124602.


In [18]:
# create map of Toronto using latitude and longitude values
map_cn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(cn_df['Latitude'], cn_df['Longitude'], cn_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_cn)  
    
map_cn

In [19]:
# save the map as HTML file
map_cn.save('map_cn.html')

<b> Use the Foursquare API to explore the neighborhoods </b> 

In [20]:
# define Foursquare Credentials and Version
CLIENT_ID = '2VQLJ5RKLN0E2WO5FVGVSMWUF1BFMBIY5MZDA0HJKTHHOBZT' # your Foursquare ID
CLIENT_SECRET = 'KZCKC4NL4GYQBEJZO3JMYFGDEK4XFY4YZQRWSSZZPKIRBDNZ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2VQLJ5RKLN0E2WO5FVGVSMWUF1BFMBIY5MZDA0HJKTHHOBZT
CLIENT_SECRET:KZCKC4NL4GYQBEJZO3JMYFGDEK4XFY4YZQRWSSZZPKIRBDNZ


<b> Getting top 200 locations(venues) of the particular neighborhood within 2000 meters radius</b> 

In [21]:
radius = 2000
LIMIT = 200

venues = []

for lat, long, neighborhood in zip(cn_df['Latitude'], cn_df['Longitude'], cn_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()['response']['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))
        


In [22]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head(50)

(2979, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Avondale, Cincinnati",39.14771,-84.4949,Hippo Cove,39.145257,-84.506104,Zoo Exhibit
1,"Avondale, Cincinnati",39.14771,-84.4949,Dobsa Giraffe Ridge,39.143495,-84.506975,Zoo Exhibit
2,"Avondale, Cincinnati",39.14771,-84.4949,Jungle Trails,39.146071,-84.506643,Zoo Exhibit
3,"Avondale, Cincinnati",39.14771,-84.4949,Cincinnati Zoo & Botanical Garden,39.14274,-84.509266,Zoo
4,"Avondale, Cincinnati",39.14771,-84.4949,Marge Schott-Unnewehr Elephant Reserve,39.143109,-84.508114,Zoo Exhibit
5,"Avondale, Cincinnati",39.14771,-84.4949,Kroger Lords of the Arctic,39.145949,-84.507424,Zoo Exhibit
6,"Avondale, Cincinnati",39.14771,-84.4949,Cincinnati Children's Hospital Cafeteria,39.140524,-84.501905,Cafeteria
7,"Avondale, Cincinnati",39.14771,-84.4949,Cat Canyon,39.14513,-84.509434,Zoo Exhibit
8,"Avondale, Cincinnati",39.14771,-84.4949,Otto M Budig Manatee Springs,39.146575,-84.509065,Zoo Exhibit
9,"Avondale, Cincinnati",39.14771,-84.4949,Skyline Chili,39.143275,-84.508362,Hot Dog Joint


<b> Check how many venues were returned for a particular neighborhood </b> 

In [23]:
venues_df.groupby(["Neighborhood"]).count().head()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Avondale, Cincinnati",77,77,77,77,77,77
"Bond Hill, Cincinnati",47,47,47,47,47,47
"CUF, Cincinnati",92,92,92,92,92,92
"California, Cincinnati",34,34,34,34,34,34
"Camp Washington, Cincinnati",88,88,88,88,88,88


In [24]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 248 uniques categories.


In [25]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Zoo Exhibit', 'Zoo', 'Cafeteria', 'Hot Dog Joint', 'Soup Place',
       'Gift Shop', 'Brewery', 'German Restaurant', 'Exhibit', 'Bar',
       'American Restaurant', 'Hotel', 'Coffee Shop',
       'Mexican Restaurant', 'High School', 'Discount Store', 'Hotel Bar',
       'Fried Chicken Joint', 'Pizza Place', 'Sandwich Place',
       'Donut Shop', 'Supermarket', 'Burrito Place', 'Train Station',
       'ATM', 'Ice Cream Shop', 'College Cafeteria', 'Video Store',
       'Bank', 'Pharmacy', 'Fast Food Restaurant',
       'Eastern European Restaurant', 'Gas Station', 'Golf Course',
       'Bakery', 'Movie Theater', 'Athletics & Sports',
       'Convenience Store', 'Mediterranean Restaurant', 'Theater',
       'Basketball Court', 'Food Truck', 'Grocery Store',
       'Motorcycle Shop', 'Breakfast Spot', 'Food', 'Burger Joint',
       'Planetarium', 'Roller Rink', 'Furniture / Home Store', 'Diner',
       'Chinese Restaurant', 'Seafood Restaurant', 'Italian Restaurant',
       'Bus St

In [26]:
# check if the results contain "Shopping Mall"
"Coffee Shop" in venues_df['VenueCategory'].unique()

True

<b> One hot encoding of the venues by venue categories </b>

In [39]:
# one hot encoding
cn_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
cn_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [cn_onehot.columns[-1]] + list(cn_onehot.columns[:-1])
cn_onehot = cn_onehot[fixed_columns]

print(kl_onehot.shape)
cn_onehot.head()

(2979, 249)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bavarian Restaurant,Beach,Beer Bar,Beer Garden,Big Box Store,Bistro,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Baseball Diamond,College Basketball Court,College Cafeteria,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,Eye Doctor,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Herbs & Spices Store,High School,Historic Site,History Museum,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Lake,Latin American Restaurant,Lawyer,Library,Lingerie Store,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Observatory,Office,Optical Shop,Other Nightlife,Other Repair Shop,Outlet Store,Paper / Office Supplies Store,Park,Pawn Shop,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Post Office,Pub,Record Shop,Rental Car Location,Restaurant,River,Rock Club,Roller Rink,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoothie Shop,Soccer Field,Soup Place,South American Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Storage Facility,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Trail,Train,Train Station,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Avondale, Cincinnati",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,"Avondale, Cincinnati",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,"Avondale, Cincinnati",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,"Avondale, Cincinnati",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,"Avondale, Cincinnati",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


<b> Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category</b>

In [40]:
cn_grouped = cn_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(cn_grouped.shape)
cn_grouped.head()

(52, 249)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bavarian Restaurant,Beach,Beer Bar,Beer Garden,Big Box Store,Bistro,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Baseball Diamond,College Basketball Court,College Cafeteria,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,Eye Doctor,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Herbs & Spices Store,High School,Historic Site,History Museum,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Lake,Latin American Restaurant,Lawyer,Library,Lingerie Store,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Observatory,Office,Optical Shop,Other Nightlife,Other Repair Shop,Outlet Store,Paper / Office Supplies Store,Park,Pawn Shop,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Post Office,Pub,Record Shop,Rental Car Location,Restaurant,River,Rock Club,Roller Rink,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoothie Shop,Soccer Field,Soup Place,South American Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Storage Facility,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Trail,Train,Train Station,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Avondale, Cincinnati",0.025974,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.012987,0.012987,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.012987,0.025974,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.051948,0.012987,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.025974,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.220779
1,"Bond Hill, Cincinnati",0.021277,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.06383,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085106,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.021277,0.0,0.06383,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06383,0.042553,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CUF, Cincinnati",0.0,0.01087,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.032609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.021739,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.043478,0.0,0.021739,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.021739,0.0,0.01087,0.0,0.0,0.01087,0.0,0.01087,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,0.0,0.01087,0.01087,0.021739,0.0,0.032609,0.0,0.0,0.0,0.0,0.01087,0.01087,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032609,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.01087,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.043478,0.01087,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.021739,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.032609,0.01087,0.0,0.0
3,"California, Cincinnati",0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.029412,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.088235,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Camp Washington, Cincinnati",0.011364,0.0,0.0,0.0,0.034091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.056818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.022727,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.022727,0.0,0.011364,0.022727,0.022727,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.0,0.034091,0.0,0.0,0.011364,0.0,0.0,0.022727,0.0,0.0,0.011364,0.0,0.0,0.0,0.022727,0.056818,0.011364,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.011364,0.022727,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.034091,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.011364,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,0.0


In [41]:
len(cn_grouped[cn_grouped["Coffee Shop"] > 0])

37

<b>Create a new DataFrame for Coffee shop data only</b>

In [42]:
cn_mall = cn_grouped[["Neighborhoods","Coffee Shop"]]
cn_mall.head()

Unnamed: 0,Neighborhoods,Coffee Shop
0,"Avondale, Cincinnati",0.038961
1,"Bond Hill, Cincinnati",0.042553
2,"CUF, Cincinnati",0.021739
3,"California, Cincinnati",0.0
4,"Camp Washington, Cincinnati",0.022727


<b> Cluster Neighborhoods </b><br>
k means algorithm with k =3

In [43]:
# set number of clusters
kclusters = 3

cn_clustering = cn_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 2, 0, 2, 0, 2, 2, 2, 2], dtype=int32)

In [44]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
cn_merged = cn_mall.copy()

# add clustering labels
cn_merged["Cluster Labels"] = kmeans.labels_

In [45]:
cn_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
cn_merged.head()

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels
0,"Avondale, Cincinnati",0.038961,1
1,"Bond Hill, Cincinnati",0.042553,1
2,"CUF, Cincinnati",0.021739,2
3,"California, Cincinnati",0.0,0
4,"Camp Washington, Cincinnati",0.022727,2


In [46]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
cn_merged = cn_merged.join(cn_df.set_index("Neighborhood"), on="Neighborhood")

print(cn_merged.shape)
cn_merged.head() # check the last columns!

(52, 5)


Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
0,"Avondale, Cincinnati",0.038961,1,39.14771,-84.4949
1,"Bond Hill, Cincinnati",0.042553,1,39.1746,-84.46715
2,"CUF, Cincinnati",0.021739,2,39.162,-84.45689
3,"California, Cincinnati",0.0,0,39.06536,-84.42365
4,"Camp Washington, Cincinnati",0.022727,2,39.13691,-84.5373


In [47]:
# sort the results by Cluster Labels
print(cn_merged.shape)
cn_merged.sort_values(["Cluster Labels"], inplace=True)
cn_merged

(52, 5)


Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
51,"Winton Hills, Cincinnati",0.0,0,39.18932,-84.51107
15,"English Woods, Cincinnati",0.0,0,39.14022,-84.55721
14,"East Westwood, Cincinnati",0.0,0,39.14991,-84.56419
29,"North Fairmount, Cincinnati",0.0,0,39.13589,-84.55687
39,"Riverside, Cincinnati",0.0,0,39.07746,-84.60223
40,"Roselawn, Cincinnati",0.0,0,39.19478,-84.46243
10,"Covedale, Cincinnati",0.0,0,39.12143,-84.60408
22,"Millvale, Cincinnati",0.0,0,39.14443,-84.55207
41,"Sayler Park, Cincinnati",0.0,0,39.10291,-84.68131
44,"South Fairmount, Cincinnati",0.0,0,39.12756,-84.55538


<b> Lets visualize the clusters </b>

In [49]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cn_merged['Latitude'], cn_merged['Longitude'], cn_merged['Neighborhood'], cn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [50]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

<b> Examining Clusters </b><br>
Cluster 0

In [51]:
cn_merged.loc[cn_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
51,"Winton Hills, Cincinnati",0.0,0,39.18932,-84.51107
15,"English Woods, Cincinnati",0.0,0,39.14022,-84.55721
14,"East Westwood, Cincinnati",0.0,0,39.14991,-84.56419
29,"North Fairmount, Cincinnati",0.0,0,39.13589,-84.55687
39,"Riverside, Cincinnati",0.0,0,39.07746,-84.60223
40,"Roselawn, Cincinnati",0.0,0,39.19478,-84.46243
10,"Covedale, Cincinnati",0.0,0,39.12143,-84.60408
22,"Millvale, Cincinnati",0.0,0,39.14443,-84.55207
41,"Sayler Park, Cincinnati",0.0,0,39.10291,-84.68131
44,"South Fairmount, Cincinnati",0.0,0,39.12756,-84.55538


<b> cluster 1 </b>

In [53]:
cn_merged.loc[cn_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
32,Over-the-Rhine,0.04,1,39.11145,-84.51522
31,"Oakley, Cincinnati",0.06,1,39.15333,-84.42486
30,"Northside, Cincinnati",0.043478,1,39.16547,-84.54113
33,"Paddock Hills, Cincinnati",0.054545,1,39.16437,-84.47839
37,"Price Hill, Cincinnati",0.060606,1,39.11034,-84.57603
35,"Pill Hill, Cincinnati",0.045455,1,39.166175,-84.54628
36,"Pleasant Ridge, Cincinnati",0.042553,1,39.18358,-84.42469
42,"Sedamsville, Cincinnati",0.041667,1,39.09304,-84.57292
43,"South Cumminsville, Cincinnati",0.04,1,39.15346,-84.55005
48,"West End, Cincinnati",0.04,1,39.10882,-84.53272


<b> Cluster 2 </b>

In [54]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels
2,"CUF, Cincinnati",0.021739,2
4,"Camp Washington, Cincinnati",0.022727,2
6,"Clifton, Cincinnati",0.014493,2
7,"College Hill, Cincinnati",0.022222,2
8,"Columbia-Tusculum, Cincinnati",0.028571,2
9,"Corryville, Cincinnati",0.030928,2
12,"East End, Cincinnati",0.022727,2
17,"Hartwell, Cincinnati",0.025,2
20,"Linwood, Cincinnati",0.025641,2
27,"Mount Washington, Cincinnati",0.032258,2


<b> Observations: </b>

Most of the Coffee Shops are concentrated in the central area of Cincinnati, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 0 has very low number to totally no Coffee Shops in the neighborhoods. This represents a great opportunity and high potential areas to open new Coffee Shops as there is very little to no competition from existing malls. Meanwhile, Coffee Shops in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new Coffee Shops in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new Coffee Shops in neighborhoods in cluster 2 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of Coffee Shops and suffering from intense competition.