# The Battle of the Neighborhoods

By Cristina Aledo González

## Introduction: Business Problem

In this project, I will try to find the optimal pet-friendly areas in different cities. 
People who have pets and are moving, either to a new city or to a new place in the same city, probably want to live in a pet-friendly area. In this work, we are going to find the best areas to find trendy venues for pets. We are going to analyse these pet-friendly areas in the following cities:
- Detroit, MI
- Chicago, IL
- Boston, MA
- Philadelphia, PA

To look for the most trendy pet venues, we are going to explore the different categories of venues related to pets, such as stores, veterinarians or medical centers. We are going to detect the areas with most common types of venues. The areas will be found by the postal codes and the different type of venues will form different clusters. Depending on the needs of the people, we will suggest moving to an area organized in one cluster or by another. For instance, a couple with an old dog will prefer move closer to an area where they could easily find a veterinarian or a medical center.

## Data description

Taking into account all the above described, in order to make a suggestion of a pet-friendly area to move in, first we need to know the kind of pet venues in the area, such as pet stores, pet clinics, veterinarians, and so on.   

So we are going to **Foursquare API** to look for the trendy pet venues in the cities of Detroit, Chicago, Boston and Philadelphia within a radius of 10 km from the downtown. This API will give the number of the existing pet venues in the area, their locations and the category of each of them.
   
With this information and after processing the extracted data, we will be able to plot the maps of each city with the different venues around the downtown. We use **geopy** library to get the latitude and longitude values for each of the cities and **follium** library to generate a map. We will add to the maps the location of the venues. 
   
Also, for each city, the top 5 most common venues for each postal code are examined. 
   
Finally, k-means clustering by category of pet venue will be done by using **sklearn** package. A new map is created for each of the cities. Clusters, which are represented with different colors, are analyzed and compared.

## Import neccesary libraries

In [1]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import requests # library to handle requests
# from geopy.geocoders import Nominatim #convert an address into latitude and longitude values
from IPython.display import Image #for displaying images
from IPython.core.display import HTML #for displaying images
import json # library to handle JSON files
from pandas.io.json import json_normalize #tranforming json file into a pandas dataframe library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium #plotting library
import matplotlib.cm as cm #Matplotlib and associated plotting modules
import matplotlib.colors as colors #Matplotlib and associated plotting modules
from sklearn.cluster import KMeans #import k-means from clustering stage
print('Libraries imported.')

Libraries imported.


## Data acquisition for pet venues

Since we are going to use **Foursquare API**, we have to define the Foursquare credentials and version:

In [2]:
CLIENT_ID = 'HY4XQAE4NJ11C22HOXQAYS33Q1NUTFAL1JCQ5CG005KYIFZB' # your Foursquare ID
CLIENT_SECRET = 'BXZ2YNLXCNGVH3HKGZIFRTKWEUWV4FC2VHR20UBMBXOATLJ5' # your Foursquare Secret
VERSION = '20180604'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HY4XQAE4NJ11C22HOXQAYS33Q1NUTFAL1JCQ5CG005KYIFZB
CLIENT_SECRET:BXZ2YNLXCNGVH3HKGZIFRTKWEUWV4FC2VHR20UBMBXOATLJ5


In [None]:
CLIENT_ID = 'your_Foursquare_ID' # your Foursquare ID
CLIENT_SECRET = 'your_Foursquare_Secret' # your Foursquare Secret
VERSION = '20180604'

### Using *search* request

Assume we have a pet and want to move to a pet-friendly area. A pet-friendly area will probably have many pet stores, pet clinics,... So, let's find for venues for pets in some cities within a radius of 10 km and compare the cities. To do this, we get data from Foursquare:

In [4]:
search_query = 'pet'
search_cities = ['Detroit, MI', 'Chicago, IL', 'Boston, MA', 'Philadelphia, PA']
radius = 10000
print('Let us look for ' + search_query + ' venues in the following cities:')
print(search_cities)

Let us look for pet venues in the following cities:
['Detroit, MI', 'Chicago, IL', 'Boston, MA', 'Philadelphia, PA']


Let's define the url for the **search** query and a limit of 500 results. We'll use a loop for to go through all the cities.

In [5]:
LIMIT = 500 # Maximum is 500

# Send the search query and examine the results
results = {}
for city in search_cities:
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&query={}&near={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, search_query, city, radius, LIMIT)
    results[city] = requests.get(url).json()

results

{'Detroit, MI': {'meta': {'code': 200,
   'requestId': '5e6525de29ce6a001b397000'},
  'response': {'venues': [{'id': '542a0d8e498e4ff3ed300cdc',
     'name': 'Pet Value',
     'location': {'lat': 42.31764810184536,
      'lng': -83.04583660843393,
      'labeledLatLngs': [{'label': 'display',
        'lat': 42.31764810184536,
        'lng': -83.04583660843393}],
      'cc': 'CA',
      'country': 'Canada',
      'formattedAddress': ['Canada']},
     'categories': [{'id': '4bf58dd8d48988d100951735',
       'name': 'Pet Store',
       'pluralName': 'Pet Stores',
       'shortName': 'Pet Store',
       'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/pet_store_',
        'suffix': '.png'},
       'primary': True}],
     'referralId': 'v-1583687443',
     'hasPerk': False},
    {'id': '51a0ba58498eb2aca65b1d75',
     'name': 'Urban Pet Shoppe',
     'location': {'address': '2472 Riopelle St',
      'lat': 42.34624812300687,
      'lng': -83.03839009271893,
      'labeledLat

We get the relevant part of JSON and transform it into a *pandas* dataframe. See the results first for the city of Detroit, MI:

In [6]:
city = search_cities[0]
print('Let us first see the results for ' + city + ':')

#Asign relevant part of JSON to venues
venues = results[city]['response']['venues']

#Tranform venues into a dataframe
df_venues_norm = json_normalize(venues)
df_venues_norm

Let us first see the results for Detroit, MI:


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.cc,location.country,...,location.state,location.crossStreet,location.neighborhood,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id
0,542a0d8e498e4ff3ed300cdc,Pet Value,"[{'id': '4bf58dd8d48988d100951735', 'name': 'P...",v-1583687443,False,42.317648,-83.045837,"[{'label': 'display', 'lat': 42.31764810184536...",CA,Canada,...,,,,,,,,,,
1,51a0ba58498eb2aca65b1d75,Urban Pet Shoppe,"[{'id': '5032897c91d4c4b30a586d69', 'name': 'P...",v-1583687443,False,42.346248,-83.03839,"[{'label': 'display', 'lat': 42.34624812300687...",US,United States,...,MI,,,,,,,,,
2,4da0b73ec6e96ea85f95b05d,Corbret's Pet Depot,"[{'id': '4bf58dd8d48988d100951735', 'name': 'P...",v-1583687443,False,42.28101,-82.981633,"[{'label': 'display', 'lat': 42.28100989056196...",CA,Canada,...,ON,,,,,,,,,
3,53221b9b498eaeeccca5d690,Painted petalZ,"[{'id': '4bf58dd8d48988d1e2931735', 'name': 'A...",v-1583687443,False,42.330171,-83.047628,"[{'label': 'display', 'lat': 42.33017113582424...",US,United States,...,MI,,,,,,,,,
4,4bacf892f964a520ac1f3be3,Pete's Barbershop,"[{'id': '4bf58dd8d48988d110951735', 'name': 'S...",v-1583687443,False,42.335375,-83.043566,"[{'label': 'display', 'lat': 42.33537473307644...",US,United States,...,MI,Beaubien St.,,,,,,,,
5,4da70022ab5241bd2ed83e63,Pet Valu,"[{'id': '4bf58dd8d48988d100951735', 'name': 'P...",v-1583687443,False,42.298502,-83.020557,"[{'label': 'display', 'lat': 42.29850165074885...",CA,Canada,...,ON,,,,,,,,,
6,5b3e2fa7916bc1002c4c6417,Pet Valu,"[{'id': '4bf58dd8d48988d100951735', 'name': 'P...",v-1583687443,False,42.288329,-83.059693,"[{'label': 'display', 'lat': 42.28832881608442...",CA,Canada,...,ON,,,,,,,,,
7,517168b6498e19604bf5aac4,Peter Pan @ Detroit,"[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",v-1583687443,False,42.338235,-83.053838,"[{'label': 'display', 'lat': 42.33823534516725...",US,United States,...,MI,,,,,,,,,
8,4c4f6d818b5520a1c2e57504,Pet Valu,"[{'id': '4bf58dd8d48988d100951735', 'name': 'P...",v-1583687443,False,42.290019,-83.05821,"[{'label': 'display', 'lat': 42.29001916015181...",CA,Canada,...,ON,Huron Church Rd,,,,,,,,
9,55b3921f498ea8d6526fc5c6,le petit dejeuner,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",v-1583687443,False,42.338174,-83.062604,"[{'label': 'display', 'lat': 42.3381739839191,...",US,United States,...,MI,,,,,,,,,


Let's filter the dataframe and select the columns we are interesting in, which are the name of the pet venues, its address, latitude, longitude and postal code:

In [7]:
df_venues = df_venues_norm[['name', 'location.address', 'location.lat', 'location.lng', 'location.postalCode']]
df_venues.columns = ['Name', 'Address', 'Latitude', 'Longitude', 'PostalCode']
df_venues

Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode
0,Pet Value,,42.317648,-83.045837,
1,Urban Pet Shoppe,2472 Riopelle St,42.346248,-83.03839,48207
2,Corbret's Pet Depot,3165 Walker Rd,42.28101,-82.981633,N8W 3R6
3,Painted petalZ,,42.330171,-83.047628,
4,Pete's Barbershop,438 Macomb St,42.335375,-83.043566,48226
5,Pet Valu,300 Tecumseh Road East,42.298502,-83.020557,N8X 5E8
6,Pet Valu,1556 Huron Church Road,42.288329,-83.059693,N9C 3Z3
7,Peter Pan @ Detroit,,42.338235,-83.053838,
8,Pet Valu,Tecumseh Rd W,42.290019,-83.05821,N9C 3Z3
9,le petit dejeuner,2548 Grand River Ave,42.338174,-83.062604,


For all the considered cities:

In [8]:
df_venues={}
for city in search_cities:
    venues = results[city]['response']['venues']
    df_venues_norm = json_normalize(venues)
    df_venues[city] = df_venues_norm[['name', 'location.address', 'location.lat', 'location.lng', 'location.postalCode']]
    df_venues[city].columns = ['Name', 'Address', 'Latitude', 'Longitude', 'PostalCode']
df_venues

{'Detroit, MI':                                       Name  \
 0                                Pet Value   
 1                         Urban Pet Shoppe   
 2                      Corbret's Pet Depot   
 3                           Painted petalZ   
 4                        Pete's Barbershop   
 5                                 Pet Valu   
 6                                 Pet Valu   
 7                      Peter Pan @ Detroit   
 8                                 Pet Valu   
 9                        le petit dejeuner   
 10      Law Offices of Peter C Rageas P.C.   
 11                   Pet Wise Pet Supplies   
 12     Saints Peter And Paul Jesuit Church   
 13  Filthy Phil's Pet Grooming and Day Spa   
 14                       Le Petit Dejeuner   
 15                                PetSmart   
 16                            Pete's Place   
 17                           Le Petit Zinc   
 18                                Pet Valu   
 19                         Petes Pipe Shop  

Print results for each of the cities:

In [108]:
city = search_cities[0]
print(city)
df_venues[city]

Detroit, MI


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode
0,Pet Value,,42.317648,-83.045837,
1,Urban Pet Shoppe,2472 Riopelle St,42.346248,-83.03839,48207
2,Corbret's Pet Depot,3165 Walker Rd,42.28101,-82.981633,N8W 3R6
3,Painted petalZ,,42.330171,-83.047628,
4,Pete's Barbershop,438 Macomb St,42.335375,-83.043566,48226
5,Pet Valu,300 Tecumseh Road East,42.298502,-83.020557,N8X 5E8
6,Pet Valu,1556 Huron Church Road,42.288329,-83.059693,N9C 3Z3
7,Peter Pan @ Detroit,,42.338235,-83.053838,
8,Pet Valu,Tecumseh Rd W,42.290019,-83.05821,N9C 3Z3
9,le petit dejeuner,2548 Grand River Ave,42.338174,-83.062604,


In [106]:
city = search_cities[1]
print(city)
df_venues[city]

Chicago, IL


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode
0,Out-U-Go! Pet Care,1100 W Cermak Road Suite 111,41.853193,-87.653539,60608
1,Pet Supplies Plus,3145 S Ashland Ave,41.837128,-87.663911,60608
2,Banfield Pet Hospital,1101 S Canal St,41.868224,-87.63853,60607
3,Kriser's Natural Pet,1103 S. State St.,41.869137,-87.627229,60605
4,Pet Supplies Plus Wicker Park,1289 North Milwaukee,41.905233,-87.669375,60622
5,"AMSTAPHY, Senior Pet Photography",1200 W 35th St #290,41.831633,-87.656597,60609
6,PetSmart,1101 S Canal St,41.867523,-87.638877,60607
7,Pet Supplies Plus N. Lincoln,3757 N Lincoln Ave,41.950083,-87.675568,60613
8,Vianey's Pet Salon,1824 S Ashland Ave,41.856996,-87.666533,60608
9,PetSmart,1415 N Kingsbury St,41.906464,-87.650007,60642


In [107]:
city = search_cities[2]
print(city)
df_venues[city]

Boston, MA


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode
0,Pet Supplies Plus Medford,630 Fellsway,42.406831,-71.083285,2155.0
1,"Peter L. Stern & Company, Inc.",15 Court Sq Lbby 101,42.358613,-71.059011,2108.0
2,Red Dog Pet Resort & Spa,274 Southampton St,42.331243,-71.063957,2118.0
3,PETROCELLI LAW,1 Boston St Suite 2600,42.358891,-71.058508,2127.0
4,Charlestown Pet Clinic,,42.378468,-71.068389,2129.0
5,D'Tails Pet Boutique,73 Berkeley St,42.346844,-71.07068,2116.0
6,Boston Pet Sitters,144 Commonwealth Ave Apt 3,42.351577,-71.077642,2116.0
7,PetSmart,160 Alewife Brook Pkwy,42.390646,-71.140164,2138.0
8,Peters Park,1205 Washington,42.342662,-71.067686,2118.0
9,BluePearl Pet Hospital,56 Roland St,42.381259,-71.080117,2129.0


In [105]:
city = search_cities[3]
print(city)
df_venues[city]

Philadelphia, PA


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode
0,Banfield Pet Hospital,1112 Chestnut St Spc 1120,39.950042,-75.159325,19107.0
1,Rittenhouse Pet Supply,135 S 20th St,39.950758,-75.17375,19103.0
2,Litterpaw Pet Supply,267 S 10th St,39.946476,-75.157771,19107.0
3,The Pet Snobs Boutique,534 S 4th St,39.941778,-75.149323,19147.0
4,Fairmount Pet Shoppe,2024 Fairmount Ave,39.96728,-75.171041,19130.0
5,Fetch! Pet Care,1229 Chestnut St,39.950823,-75.163651,19107.0
6,BONeJOUR Pet Supply,53 N 3rd St,39.951846,-75.145359,19106.0
7,Pet Cemetery,,39.959384,-75.161552,
8,The Pet Mechanic,920 South St,39.942714,-75.158006,19147.0
9,PetSmart,2360 W Oregon Ave,39.918076,-75.188899,19145.0


### Using *explore* request

We can use also define an **explore** request to get trending venues around the cities of interest:

In [9]:
LIMIT = 500 # Maximum is 500

# Send the explore query and examine the results
results_ex = {}
for city in search_cities:
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&query={}&near={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, search_query, city, radius, LIMIT)
    results_ex[city] = requests.get(url).json()

results_ex

{'Detroit, MI': {'meta': {'code': 200,
   'requestId': '5e6527b7b57e88001b27350c'},
  'response': {'suggestedFilters': {'header': 'Tap to show:',
    'filters': [{'name': 'Open now', 'key': 'openNow'}]},
   'geocode': {'what': '',
    'where': 'detroit mi',
    'center': {'lat': 42.33143, 'lng': -83.04575},
    'displayString': 'Detroit, MI, United States',
    'cc': 'US',
    'geometry': {'bounds': {'ne': {'lat': 42.45023, 'lng': -82.910451},
      'sw': {'lat': 42.255192, 'lng': -83.287803}}},
    'slug': 'detroit-michigan',
    'longId': '72057594042918665'},
   'headerLocation': 'Detroit',
   'headerFullLocation': 'Detroit',
   'headerLocationGranularity': 'city',
   'query': 'pet',
   'totalResults': 59,
   'suggestedBounds': {'ne': {'lat': 42.35692227, 'lng': -82.92484304248781},
    'sw': {'lat': 42.22905233, 'lng': -83.21632414083392}},
   'groups': [{'type': 'Recommended Places',
     'name': 'recommended',
     'items': [{'reasons': {'count': 0,
        'items': [{'summary': 

Obtain the JSON and transform it into a **pandas** dataframe. See the most popular spots around the city of Detroit, MI: 

In [10]:
city = search_cities[0]
print('Let us first see the results for ' + city + ':')

# Total number of venues
df_response_norm_ex = json_normalize(results_ex[city]['response'])
total_ex = df_response_norm_ex['totalResults'][0]
print('Total results:' + str(total_ex))

#Asign relevant part of JSON to venues
venues_ex = results_ex[city]['response']['groups'][0]['items']
venues_ex

#Tranform venues into a dataframe
df_venues_norm_ex = json_normalize(venues_ex)

# #Filter dataframe
df_venues_ex = df_venues_norm_ex[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng', 'venue.location.postalCode']]
df_venues_ex.columns = ['Name', 'Address', 'Latitude', 'Longitude', 'PostalCode']
df_venues_ex

Let us first see the results for Detroit, MI:
Total results:59


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode
0,Cass Corridog,4240 Cass Ave,42.35111,-83.063349,48201
1,City Bark,1222 Griswold st,42.332935,-83.04928,48226
2,PetSmart,5650 Mercury Dr,42.330686,-83.203075,48126
3,3Dogs1Cat,2472 Riopelle St,42.346373,-83.038368,48207
4,Urban Pet Shoppe,2472 Riopelle St,42.346248,-83.03839,48207
5,PetSmart,3164 Dougall Ave.,42.269737,-83.009991,N9E 1S6
6,Pet Valu,300 Tecumseh Road East,42.298502,-83.020557,N8X 5E8
7,Pet Valu,Tecumseh Rd W,42.290019,-83.05821,N9C 3Z3
8,Pet Valu,1556 Huron Church Road,42.288329,-83.059693,N9C 3Z3
9,Corbret's Pet Depot,3165 Walker Rd,42.28101,-82.981633,N8W 3R6


We can extract also the category of the venue with the following function:

In [11]:
#Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

For all the considered cities we get:

In [12]:
df_venues_ex = {}
for city in search_cities:
    
    # Total number of venues
    df_response_norm_ex = json_normalize(results_ex[city]['response'])
    total_ex = df_response_norm_ex['totalResults'][0]
    print('Total results in '+ city + ': ' + str(total_ex))
    
    #Asign relevant part of JSON to venues
    venues_ex = results_ex[city]['response']['groups'][0]['items']

    #Tranform venues into a dataframe
    df_venues_norm_ex = json_normalize(venues_ex)
    
    #Filter dataframe
    df_venues_ex[city] = df_venues_norm_ex[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng', 'venue.location.postalCode']]
    df_venues_ex[city].columns = ['Name', 'Address', 'Latitude', 'Longitude', 'PostalCode']

    df_venues_ex[city]['Category'] = df_venues_norm_ex.apply(get_category_type, axis=1)

df_venues_ex

Total results in Detroit, MI: 59
Total results in Chicago, IL: 151
Total results in Boston, MA: 85
Total results in Philadelphia, PA: 119


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


{'Detroit, MI':                        Name                     Address   Latitude  Longitude  \
 0             Cass Corridog               4240 Cass Ave  42.351110 -83.063349   
 1                 City Bark            1222 Griswold st  42.332935 -83.049280   
 2                  PetSmart             5650 Mercury Dr  42.330686 -83.203075   
 3                 3Dogs1Cat            2472 Riopelle St  42.346373 -83.038368   
 4          Urban Pet Shoppe            2472 Riopelle St  42.346248 -83.038390   
 5                  PetSmart           3164 Dougall Ave.  42.269737 -83.009991   
 6                  Pet Valu      300 Tecumseh Road East  42.298502 -83.020557   
 7                  Pet Valu               Tecumseh Rd W  42.290019 -83.058210   
 8                  Pet Valu      1556 Huron Church Road  42.288329 -83.059693   
 9       Corbret's Pet Depot              3165 Walker Rd  42.281010 -82.981633   
 10              Detroit K-9           7030 Michigan Ave  42.331336 -83.131077   
 

Print results for each of the cities:

In [114]:
city = search_cities[0]
print(city)
df_venues_ex[city][0:10]

Detroit, MI


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode,Category
0,Cass Corridog,4240 Cass Ave,42.35111,-83.063349,48201,Pet Store
1,City Bark,1222 Griswold st,42.332935,-83.04928,48226,Pet Store
2,PetSmart,5650 Mercury Dr,42.330686,-83.203075,48126,Pet Store
3,3Dogs1Cat,2472 Riopelle St,42.346373,-83.038368,48207,Pet Store
4,Urban Pet Shoppe,2472 Riopelle St,42.346248,-83.03839,48207,Pet Service
5,PetSmart,3164 Dougall Ave.,42.269737,-83.009991,N9E 1S6,Pet Store
6,Pet Valu,300 Tecumseh Road East,42.298502,-83.020557,N8X 5E8,Pet Store
7,Pet Valu,Tecumseh Rd W,42.290019,-83.05821,N9C 3Z3,Pet Store
8,Pet Valu,1556 Huron Church Road,42.288329,-83.059693,N9C 3Z3,Pet Store
9,Corbret's Pet Depot,3165 Walker Rd,42.28101,-82.981633,N8W 3R6,Pet Store


In [115]:
city = search_cities[1]
print(city)
df_venues_ex[city][0:10]

Chicago, IL


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode,Category
0,Kriser's Natural Pet,1103 S. State St.,41.869137,-87.627229,60605,Pet Service
1,Bark N' Bites,702 W 35th St,41.830842,-87.64393,60616,Pet Store
2,Paw Naturals,932 W Monroe St,41.880418,-87.65093,60607,Pet Store
3,The Anti-Cruelty Society,169 West Grand Avenue,41.891375,-87.632548,60654,Animal Shelter
4,Tucker Pup's Pet Resort,219 North Carpenter Street,41.886281,-87.653276,60607,Pet Service
5,Kriser's Natural Pet,1658 N. Milwaukee Ave.,41.911733,-87.679972,60647,Pet Service
6,Doggy Style Pet Shop,2023 W Division St,41.903182,-87.678284,60622,Pet Store
7,K9 University Chicago,2945 W Lake St,41.884079,-87.700532,60612,Pet Store
8,PetSmart,1101 S Canal St,41.867523,-87.638877,60607,Pet Store
9,VCA Lake Shore Animal Hospital,960 W Chicago Ave,41.896566,-87.652178,60642,Veterinarian


In [116]:
city = search_cities[2]
print(city)
df_venues_ex[city][0:10]

Boston, MA


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode,Category
0,The Fish & Bone,217 Newbury St,42.350022,-71.081334,2116,Pet Store
1,The Urban Hound,129 Malden St,42.339344,-71.066116,2118,Pet Store
2,Polka Dog,256 Shawmut Ave,42.34334,-71.068586,2118,Pet Store
3,Pawsh Dog Boutique,31 Gloucester St,42.349394,-71.084243,2115,Pet Store
4,Red Dog Pet Resort & Spa,274 Southampton St,42.331243,-71.063957,2118,Pet Service
5,Unleashed by Petco,1310 Washington St,42.341936,-71.068186,2118,Pet Store
6,The Pet Shop,165 Harvard Ave,42.351437,-71.131609,2134,Pet Store
7,Polka Dog Bakery,42 South St,42.308198,-71.115421,2130,Pet Store
8,LaundroMutt,489 Concord Ave,42.386901,-71.140664,2138,Pet Store
9,Unleashed by Petco,5 Austin St,42.375598,-71.065395,2129,Pet Store


In [117]:
city = search_cities[3]
print(city)
df_venues_ex[city][0:10]

Philadelphia, PA


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode,Category
0,BONeJOUR Pet Supply,53 N 3rd St,39.951846,-75.145359,19106,Pet Store
1,Doggie Style,1635 Spruce St,39.947662,-75.169523,19103,Pet Store
2,Baltimore Pet Shoppe,4532 Baltimore Ave,39.948954,-75.213978,19143,Pet Store
3,Doggie Style,315 Market St,39.950295,-75.146308,19106,Pet Store
4,Fairmount Pet Shoppe,2024 Fairmount Ave,39.96728,-75.171041,19130,Pet Store
5,Doggie Style,2101 South St,39.945219,-75.177205,19146,Pet Store
6,Doggie Style,1700 E Passyunk Ave,39.928551,-75.165086,19148,Pet Store
7,PetSmart,1415 Washington Ave,39.938397,-75.166976,19146,Pet Store
8,PetSmart,1112 Chestnut St,39.950128,-75.159289,19107,Pet Store
9,Unleashed by Petco,1939 Callowhill St,39.960682,-75.170396,19130,Pet Store


In [13]:
for i in range(len(search_cities)):
    city = search_cities[i]
    print(city + ':')
    print('Number of rows in df_venues for city ' + city + ': ' + str(df_venues[city].shape[0]))
    print('Number of rows in df_venues_ex for city ' + city + ': ' + str(df_venues_ex[city].shape[0]))
#     print(df_venues[city])
#     print(df_venues_ex[city])
    print('---------------------')

Detroit, MI:
Number of rows in df_venues for city Detroit, MI: 50
Number of rows in df_venues_ex for city Detroit, MI: 21
---------------------
Chicago, IL:
Number of rows in df_venues for city Chicago, IL: 47
Number of rows in df_venues_ex for city Chicago, IL: 100
---------------------
Boston, MA:
Number of rows in df_venues for city Boston, MA: 50
Number of rows in df_venues_ex for city Boston, MA: 85
---------------------
Philadelphia, PA:
Number of rows in df_venues for city Philadelphia, PA: 50
Number of rows in df_venues_ex for city Philadelphia, PA: 92
---------------------


## Visualization of pet venues locations in the selected cities

Let's start focusing on data from Detroit.

### Detroit, MI

Use geopy library to get the latitude and longitude values of the city. Let's define a function to do this:

In [15]:
def search4coord(city_address):
    address = city_address
    #In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent city_explorer 
    geolocator = Nominatim(user_agent = "city_explorer") 
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of ' + city_address + ' are {}, {}.'.format(latitude, longitude))
    return latitude, longitude

Now we can obtain the latitude and longitude values of Detroit, MI:

In [119]:
city = 'Detroit, MI'
latitude, longitude = search4coord(city_address = city)

The geograpical coordinate of Detroit, MI are 42.3315509, -83.0466403.


Generate map centred around Detroit, MI:

In [120]:
venues_map = {}
venues_map = folium.Map(location = [latitude, longitude], zoom_start = 10)

#Display map
venues_map

Add a red circle marker to represent the center of Detroit, MI:

In [121]:
folium.CircleMarker( [latitude, longitude], radius = 10, color = 'red', popup = city, fill = True,
                    fill_color = 'red', fill_opacity = 0.6).add_to(venues_map)
venues_map

Add the pet venues as blue circle markers:

In [122]:
for lat, lng, label in zip(df_venues_ex[city]['Latitude'], df_venues_ex[city]['Longitude'], df_venues_ex[city]['Name']):
    folium.CircleMarker([lat, lng], radius = 5, color = 'blue', popup = label, fill = True,
                        fill_color='blue', fill_opacity=0.6).add_to(venues_map)
venues_map

### All cities

Define a function for creating the map:

In [123]:
def create_map(dataframe_venues, city, latitude, longitude): 
    venues_map = {}
    
    #Generate map centered around the city
    venues_map = folium.Map(location = [latitude, longitude], zoom_start = 9)
    venues_map
    
    #Add a red circle marker to represent the center of the city
    folium.CircleMarker([latitude, longitude], radius = 10, color = 'red', popup = city, fill = True,
                        fill_color = 'red', fill_opacity = 0.6).add_to(venues_map)
    
    #Add the pet shops as blue circle markers
    for lat, lng, label in zip(dataframe_venues[city]['Latitude'], dataframe_venues[city]['Longitude'], dataframe_venues[city]['Name']):
        folium.CircleMarker([lat, lng], radius = 5, color = 'blue', popup = label, fill = True,
                        fill_color='blue', fill_opacity=0.6).add_to(venues_map)
    #Display map
    display(venues_map)
    
    return venues_map

Create the map for every city:

In [124]:
maps = {}
for city in search_cities:
    print('City ' + city)
    latit, long = search4coord(city_address = city)

    # Total number of venues
    df_response_norm_ex = json_normalize(results_ex[city]['response'])
    total_ex = df_response_norm_ex['totalResults'][0]
    print('Total number of pet venues in '+ city + ': ' + str(total_ex))
    
    #Create map
    vm = create_map(dataframe_venues = df_venues_ex, city = city, latitude = latit, longitude = long)  
    maps[city] = vm

    print('-------------------------------------------------------------------------------')

City Detroit, MI
The geograpical coordinate of Detroit, MI are 42.3315509, -83.0466403.
Total number of pet venues in Detroit, MI: 59


-------------------------------------------------------------------------------
City Chicago, IL
The geograpical coordinate of Chicago, IL are 41.8755616, -87.6244212.
Total number of pet venues in Chicago, IL: 151


-------------------------------------------------------------------------------
City Boston, MA
The geograpical coordinate of Boston, MA are 42.3602534, -71.0582912.
Total number of pet venues in Boston, MA: 85


-------------------------------------------------------------------------------
City Philadelphia, PA
The geograpical coordinate of Philadelphia, PA are 39.9527237, -75.1635262.
Total number of pet venues in Philadelphia, PA: 119


-------------------------------------------------------------------------------


In [30]:
maps

{'Detroit, MI': <folium.folium.Map at 0x1d93243a808>,
 'Chicago, IL': <folium.folium.Map at 0x1d93243d148>,
 'Boston, MA': <folium.folium.Map at 0x1d932490708>,
 'Philadelphia, PA': <folium.folium.Map at 0x1d932445d48>}

## Analysis of each pet venue category

### Detroit, MI

Let's see the venues in Detroit, MI:

In [32]:
city = search_cities[0]
print(city)
detroit_venues = df_venues_ex[city]
detroit_venues

Detroit, MI


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode,Category
0,Cass Corridog,4240 Cass Ave,42.35111,-83.063349,48201,Pet Store
1,City Bark,1222 Griswold st,42.332935,-83.04928,48226,Pet Store
2,PetSmart,5650 Mercury Dr,42.330686,-83.203075,48126,Pet Store
3,3Dogs1Cat,2472 Riopelle St,42.346373,-83.038368,48207,Pet Store
4,Urban Pet Shoppe,2472 Riopelle St,42.346248,-83.03839,48207,Pet Service
5,PetSmart,3164 Dougall Ave.,42.269737,-83.009991,N9E 1S6,Pet Store
6,Pet Valu,300 Tecumseh Road East,42.298502,-83.020557,N8X 5E8,Pet Store
7,Pet Valu,Tecumseh Rd W,42.290019,-83.05821,N9C 3Z3,Pet Store
8,Pet Valu,1556 Huron Church Road,42.288329,-83.059693,N9C 3Z3,Pet Store
9,Corbret's Pet Depot,3165 Walker Rd,42.28101,-82.981633,N8W 3R6,Pet Store


Now we are going to group by 'Category' feature:

In [33]:
print('There are {} uniques categories.'.format(len(detroit_venues['Category'].unique())))
detroit_venues.groupby('Category').count()

There are 4 uniques categories.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,PostalCode
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Animal Shelter,1,1,1,1,1
Pet Service,1,1,1,1,1
Pet Store,18,17,18,18,17
Veterinarian,1,1,1,1,1


Transform categorical variable 'Category' into one hot encoding features:

In [34]:
#One-hot encoding
detroit_venues_ohe = pd.get_dummies(detroit_venues[['Category']], prefix = "", prefix_sep = "")
detroit_venues_ohe.head()

Unnamed: 0,Animal Shelter,Pet Service,Pet Store,Veterinarian
0,0,0,1,0
1,0,0,1,0
2,0,0,1,0
3,0,0,1,0
4,0,1,0,0


Add column 'PostalCode' to the new dataframe:

In [35]:
detroit_venues_ohe['PostalCode'] = detroit_venues['PostalCode']
detroit_venues_ohe.head()

Unnamed: 0,Animal Shelter,Pet Service,Pet Store,Veterinarian,PostalCode
0,0,0,1,0,48201
1,0,0,1,0,48226
2,0,0,1,0,48126
3,0,0,1,0,48207
4,0,1,0,0,48207


Let's move column 'PostalCode' to the first column:

In [36]:
ordered_columns = [detroit_venues_ohe.columns[-1]] + list(detroit_venues_ohe.columns[:-1])
detroit_venues_ohe = detroit_venues_ohe[ordered_columns]
detroit_venues_ohe.head()

Unnamed: 0,PostalCode,Animal Shelter,Pet Service,Pet Store,Veterinarian
0,48201,0,0,1,0
1,48226,0,0,1,0
2,48126,0,0,1,0
3,48207,0,0,1,0
4,48207,0,1,0,0


The shape of the new dataframe is:

In [37]:
detroit_venues_ohe.shape

(21, 5)

Now let's see the analyis of postal codes:

In [38]:
print('There are {} uniques postal codes.'.format(len(detroit_venues['PostalCode'].unique())))
detroit_venues.groupby('PostalCode').count()

There are 19 uniques postal codes.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
48126,1,1,1,1,1
48201,1,1,1,1,1
48207,2,2,2,2,2
48210,1,1,1,1,1
48226,1,1,1,1,1
N8S 1T6,1,1,1,1,1
N8S 3M8,1,1,1,1,1
N8T 1C1,1,1,1,1,1
N8W 3R6,1,1,1,1,1
N8X 0A8,1,1,1,1,1


In [39]:
detroit_venues[detroit_venues['PostalCode'] == '48207']
# detroit_venues[detroit_venues['PostalCode'] == '48073']

Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode,Category
3,3Dogs1Cat,2472 Riopelle St,42.346373,-83.038368,48207,Pet Store
4,Urban Pet Shoppe,2472 Riopelle St,42.346248,-83.03839,48207,Pet Service


Now we are going to group rows by postal code and by taking the mean of the frequency of occurrence of each category:

In [40]:
detroit_grouped = detroit_venues_ohe.groupby('PostalCode').mean().reset_index()
detroit_grouped

Unnamed: 0,PostalCode,Animal Shelter,Pet Service,Pet Store,Veterinarian
0,48126,0.0,0.0,1.0,0.0
1,48201,0.0,0.0,1.0,0.0
2,48207,0.0,0.5,0.5,0.0
3,48210,0.0,0.0,1.0,0.0
4,48226,0.0,0.0,1.0,0.0
5,N8S 1T6,0.0,0.0,1.0,0.0
6,N8S 3M8,0.0,0.0,1.0,0.0
7,N8T 1C1,0.0,0.0,1.0,0.0
8,N8W 3R6,0.0,0.0,1.0,0.0
9,N8X 0A8,0.0,0.0,1.0,0.0


Confirm the size of the dataframe again:

In [41]:
detroit_grouped.shape

(18, 5)

Let's print each postal code along with the top 5 most common pet venues. In the case of Detroit, we have only three different categories of pet venues:

In [42]:
def show_most_common_venues(df_grouped, num_top_venues):
    for pc in detroit_grouped['PostalCode']:
        print("Postal Code = " + pc)
        temp = df_grouped[df_grouped['PostalCode'] == pc].T.reset_index()
        temp.columns = ['Venue','freq']
        temp = temp.iloc[1:]
        temp['freq'] = temp['freq'].astype(float)
        temp = temp.round({'freq': 2})
        print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
        print('--------------------')

show_most_common_venues(df_grouped = detroit_grouped, num_top_venues = 5)

Postal Code = 48126
            Venue  freq
0       Pet Store   1.0
1  Animal Shelter   0.0
2     Pet Service   0.0
3    Veterinarian   0.0
--------------------
Postal Code = 48201
            Venue  freq
0       Pet Store   1.0
1  Animal Shelter   0.0
2     Pet Service   0.0
3    Veterinarian   0.0
--------------------
Postal Code = 48207
            Venue  freq
0     Pet Service   0.5
1       Pet Store   0.5
2  Animal Shelter   0.0
3    Veterinarian   0.0
--------------------
Postal Code = 48210
            Venue  freq
0       Pet Store   1.0
1  Animal Shelter   0.0
2     Pet Service   0.0
3    Veterinarian   0.0
--------------------
Postal Code = 48226
            Venue  freq
0       Pet Store   1.0
1  Animal Shelter   0.0
2     Pet Service   0.0
3    Veterinarian   0.0
--------------------
Postal Code = N8S 1T6
            Venue  freq
0       Pet Store   1.0
1  Animal Shelter   0.0
2     Pet Service   0.0
3    Veterinarian   0.0
--------------------
Postal Code = N8S 3M8
          

Let's create a new dataframe to display the most common venues for each postal code. To do this, we create a first function to sort the venues in descending order and a second function to create the new dataframe with the top venues.

In [43]:
#Define a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False) 
    return row_categories_sorted.index.values[0:num_top_venues]

#Create a new dataframe with the top venues
def return_df_top_venues(df_grouped, num_top_venues):
    indicators = ['st', 'nd', 'rd']
    columns = ['PostalCode']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    #Create a new dataframe
    postalcode_venues_sorted = pd.DataFrame(columns = columns)
    postalcode_venues_sorted['PostalCode'] = df_grouped['PostalCode']

    for ind in np.arange(df_grouped.shape[0]):
        postalcode_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_grouped.iloc[ind, :], num_top_venues)

    return(postalcode_venues_sorted)

Now we are going to create the new dataframe and display the top 3 venues for each postal code:

In [126]:
detroit_postalcode_top_venues = return_df_top_venues(df_grouped = detroit_grouped, num_top_venues = 3)
detroit_postalcode_top_venues

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,48126,Pet Store,Veterinarian,Pet Service
1,48201,Pet Store,Veterinarian,Pet Service
2,48207,Pet Store,Pet Service,Veterinarian
3,48210,Pet Store,Veterinarian,Pet Service
4,48226,Pet Store,Veterinarian,Pet Service
5,N8S 1T6,Pet Store,Veterinarian,Pet Service
6,N8S 3M8,Pet Store,Veterinarian,Pet Service
7,N8T 1C1,Pet Store,Veterinarian,Pet Service
8,N8W 3R6,Pet Store,Veterinarian,Pet Service
9,N8X 0A8,Pet Store,Veterinarian,Pet Service


Let's create a function for processing the rest of the cities easily:

In [55]:
def processing_venues(city, city_venues, num_top_venues):
    
    print(city)
    print('\n')
    
    #Group by 'Category' feature:
    print('Show city venues grouped by Category:')
    print('There are {} uniques categories.'.format(len(city_venues['Category'].unique())))
    display(city_venues.groupby('Category').count())
    print('\n')

    #Transform categorical variable 'Category' into one hot encoding feature:
    city_venues_ohe = pd.get_dummies(city_venues[['Category']], prefix="", prefix_sep="")

    #Add columns 'PostalCode' to the new dataframe
    city_venues_ohe['PostalCode'] = city_venues['PostalCode']

    #Let's move the columns 'PostalCode' to the first column
    ordered_columns = [city_venues_ohe.columns[-1]] + list(city_venues_ohe.columns[:-1])
    city_venues_ohe = city_venues_ohe[ordered_columns]
    print('Show new dataframe with feature Category one-hot encoded and PostalCode:')
    display(city_venues_ohe)
    
    #See the shape of the new dataframe
    print('Shape of dataframe: ')
    print(city_venues_ohe.shape)
    print('\n')

    #Group by 'PostalCode' feature:
    print('Show city venues grouped by PostalCode:')
    print('There are {} uniques postal codes.'.format(len(city_venues['PostalCode'].unique())))
    display(city_venues.groupby('PostalCode').count())
    print('\n')
    
    #Group rows by postal code and by taking the mean of the frequency of occurrence of each category
    print('Show city venues grouped by PostalCode and by taking the mean of the frequency of ocurrence of each category:')
    city_grouped = city_venues_ohe.groupby('PostalCode').mean().reset_index()
    display(city_grouped)
    #Confirm the shape of the new dataframe
    print('Shape of dataframe: ')
    print(city_venues_ohe.shape)
    print('\n')
    
#     #Print each postal code along with the top 5 most common pet venues
#     print('Show most common pet venues by postal code:')
#     show_most_common_venues(df_grouped = city_grouped, num_top_venues = num_top_venues)
#     print('\n')
    
    #Create a new dataframe with the top 3 venues for each postal code
    city_postalcode_top_venues = return_df_top_venues(df_grouped = city_grouped, num_top_venues = num_top_venues)
    print('Show top ' + str(num_top_venues) + ' most common pet venues by postal code:')
    display(city_postalcode_top_venues)
    print('\n')
    
    return(city_grouped, city_postalcode_top_venues)

### Chicago, IL

We repeat the same procedure for the city of Chicago:

In [59]:
city = search_cities[1]
city_venues = df_venues_ex[city] #Load venues for the city
num_top_venues = 5 #Number of top most common pet venues

chicago_grouped, chicago_postalcode_top_venues = processing_venues(city, city_venues, num_top_venues)

Chicago, IL


Show city venues grouped by Category:
There are 7 uniques categories.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,PostalCode
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Animal Shelter,1,1,1,1,1
Aquarium,1,1,1,1,1
Dog Run,1,1,1,1,1
Pet Service,10,10,10,10,10
Pet Store,82,77,82,82,79
Salon / Barbershop,1,1,1,1,1
Veterinarian,4,4,4,4,4




Show new dataframe with feature Category one-hot encoded and PostalCode:


Unnamed: 0,PostalCode,Animal Shelter,Aquarium,Dog Run,Pet Service,Pet Store,Salon / Barbershop,Veterinarian
0,60605,0,0,0,1,0,0,0
1,60616,0,0,0,0,1,0,0
2,60607,0,0,0,0,1,0,0
3,60654,1,0,0,0,0,0,0
4,60607,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...
95,60647,0,0,0,0,1,0,0
96,60614,0,0,0,0,1,0,0
97,60611,0,0,0,0,1,0,0
98,60632,0,0,0,0,1,0,0


Shape of dataframe: 
(100, 8)


Show city venues grouped by PostalCode:
There are 25 uniques postal codes.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
60601,1,1,1,1,1
60605,3,3,3,3,3
60607,6,6,6,6,6
60608,4,4,4,4,4
60610,9,9,9,9,9
60611,4,4,4,4,4
60612,4,4,4,4,4
60613,1,1,1,1,1
60614,13,12,13,13,13
60615,2,2,2,2,2




Show city venues grouped by PostalCode and by taking the mean of the frequency of ocurrence of each category:


Unnamed: 0,PostalCode,Animal Shelter,Aquarium,Dog Run,Pet Service,Pet Store,Salon / Barbershop,Veterinarian
0,60601,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,60605,0.0,0.333333,0.0,0.333333,0.333333,0.0,0.0
2,60607,0.0,0.0,0.0,0.166667,0.833333,0.0,0.0
3,60608,0.0,0.0,0.0,0.0,0.75,0.25,0.0
4,60610,0.0,0.0,0.0,0.0,0.888889,0.0,0.111111
5,60611,0.0,0.0,0.0,0.0,1.0,0.0,0.0
6,60612,0.0,0.0,0.0,0.0,0.75,0.0,0.25
7,60613,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8,60614,0.0,0.0,0.0,0.076923,0.923077,0.0,0.0
9,60615,0.0,0.0,0.0,0.0,1.0,0.0,0.0


Shape of dataframe: 
(100, 8)


Show top 5 most common pet venues by postal code:


Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60601,Dog Run,Veterinarian,Salon / Barbershop,Pet Store,Pet Service
1,60605,Pet Store,Pet Service,Aquarium,Veterinarian,Salon / Barbershop
2,60607,Pet Store,Pet Service,Veterinarian,Salon / Barbershop,Dog Run
3,60608,Pet Store,Salon / Barbershop,Veterinarian,Pet Service,Dog Run
4,60610,Pet Store,Veterinarian,Salon / Barbershop,Pet Service,Dog Run
5,60611,Pet Store,Veterinarian,Salon / Barbershop,Pet Service,Dog Run
6,60612,Pet Store,Veterinarian,Salon / Barbershop,Pet Service,Dog Run
7,60613,Pet Store,Veterinarian,Salon / Barbershop,Pet Service,Dog Run
8,60614,Pet Store,Pet Service,Veterinarian,Salon / Barbershop,Dog Run
9,60615,Pet Store,Veterinarian,Salon / Barbershop,Pet Service,Dog Run






### Boston, MA

In [60]:
city = search_cities[2]
city_venues = df_venues_ex[city] #Load venues for the city
num_top_venues = 5 #Number of top most common pet venues

boston_grouped, boston_postalcode_top_venues = processing_venues(city, city_venues, num_top_venues)

Boston, MA


Show city venues grouped by Category:
There are 8 uniques categories.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,PostalCode
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Animal Shelter,1,1,1,1,1
Aquarium,2,2,2,2,2
Dog Run,2,0,2,2,0
Office,1,1,1,1,1
Park,1,1,1,1,1
Pet Service,6,6,6,6,6
Pet Store,67,62,67,67,63
Veterinarian,5,5,5,5,5




Show new dataframe with feature Category one-hot encoded and PostalCode:


Unnamed: 0,PostalCode,Animal Shelter,Aquarium,Dog Run,Office,Park,Pet Service,Pet Store,Veterinarian
0,02116,0,0,0,0,0,0,1,0
1,02118,0,0,0,0,0,0,1,0
2,02118,0,0,0,0,0,0,1,0
3,02115,0,0,0,0,0,0,1,0
4,02118,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...
80,01906,0,0,0,0,0,0,1,0
81,02144,0,0,0,0,0,0,0,1
82,02116,1,0,0,0,0,0,0,0
83,,0,0,1,0,0,0,0,0


Shape of dataframe: 
(85, 9)


Show city venues grouped by PostalCode:
There are 37 uniques postal codes.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
01906,1,1,1,1,1
02108,1,1,1,1,1
02110,2,2,2,2,2
02111,1,1,1,1,1
02113,1,1,1,1,1
02114,1,1,1,1,1
02115,1,1,1,1,1
02116,4,4,4,4,4
02118,6,6,6,6,6
02122,3,3,3,3,3




Show city venues grouped by PostalCode and by taking the mean of the frequency of ocurrence of each category:


Unnamed: 0,PostalCode,Animal Shelter,Aquarium,Dog Run,Office,Park,Pet Service,Pet Store,Veterinarian
0,01906,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,02108,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,02110,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,02111,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,02113,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
5,02114,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6,02115,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
7,02116,0.25,0.0,0.0,0.0,0.0,0.0,0.75,0.0
8,02118,0.0,0.0,0.0,0.0,0.0,0.166667,0.833333,0.0
9,02122,0.0,0.0,0.0,0.0,0.0,0.666667,0.333333,0.0


Shape of dataframe: 
(85, 9)


Show top 5 most common pet venues by postal code:


Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,01906,Pet Store,Veterinarian,Pet Service,Park,Office
1,02108,Pet Store,Veterinarian,Pet Service,Park,Office
2,02110,Aquarium,Veterinarian,Pet Store,Pet Service,Park
3,02111,Pet Store,Veterinarian,Pet Service,Park,Office
4,02113,Pet Store,Veterinarian,Pet Service,Park,Office
5,02114,Park,Veterinarian,Pet Store,Pet Service,Office
6,02115,Pet Store,Veterinarian,Pet Service,Park,Office
7,02116,Pet Store,Animal Shelter,Veterinarian,Pet Service,Park
8,02118,Pet Store,Pet Service,Veterinarian,Park,Office
9,02122,Pet Service,Pet Store,Veterinarian,Park,Office






### Philadelphia, PA

In [61]:
city = search_cities[3]
city_venues = df_venues_ex[city] #Load venues for the city
num_top_venues = 5 #Number of top most common pet venues

philly_grouped, philly_postalcode_top_venues = processing_venues(city, city_venues, num_top_venues)

Philadelphia, PA


Show city venues grouped by Category:
There are 6 uniques categories.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,PostalCode
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aquarium,2,2,2,2,2
Miscellaneous Shop,2,2,2,2,2
Park,2,2,2,2,2
Pet Service,8,7,8,8,8
Pet Store,72,69,72,72,66
Veterinarian,6,5,6,6,6




Show new dataframe with feature Category one-hot encoded and PostalCode:


Unnamed: 0,PostalCode,Aquarium,Miscellaneous Shop,Park,Pet Service,Pet Store,Veterinarian
0,19106,0,0,0,0,1,0
1,19103,0,0,0,0,1,0
2,19143,0,0,0,0,1,0
3,19106,0,0,0,0,1,0
4,19130,0,0,0,0,1,0
...,...,...,...,...,...,...,...
87,19127,0,0,1,0,0,0
88,08103,1,0,0,0,0,0
89,08109,0,0,0,0,0,1
90,08105,0,1,0,0,0,0


Shape of dataframe: 
(92, 7)


Show city venues grouped by PostalCode:
There are 37 uniques postal codes.


Unnamed: 0_level_0,Name,Address,Latitude,Longitude,Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
8002,2,2,2,2,2
8003,1,1,1,1,1
8030,3,3,3,3,3
8103,2,2,2,2,2
8105,1,1,1,1,1
8106,2,2,2,2,2
8108,1,1,1,1,1
8109,3,3,3,3,3
8110,1,1,1,1,1
8332,1,1,1,1,1




Show city venues grouped by PostalCode and by taking the mean of the frequency of ocurrence of each category:


Unnamed: 0,PostalCode,Aquarium,Miscellaneous Shop,Park,Pet Service,Pet Store,Veterinarian
0,8002,0.0,0.0,0.0,0.0,1.0,0.0
1,8003,0.0,0.0,0.0,1.0,0.0,0.0
2,8030,0.0,0.0,0.0,0.0,1.0,0.0
3,8103,1.0,0.0,0.0,0.0,0.0,0.0
4,8105,0.0,1.0,0.0,0.0,0.0,0.0
5,8106,0.0,0.0,0.0,0.0,1.0,0.0
6,8108,0.0,0.0,1.0,0.0,0.0,0.0
7,8109,0.0,0.0,0.0,0.0,0.666667,0.333333
8,8110,0.0,0.0,0.0,0.0,0.0,1.0
9,8332,0.0,0.0,0.0,0.0,1.0,0.0


Shape of dataframe: 
(92, 7)


Show top 5 most common pet venues by postal code:


Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,8002,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
1,8003,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop
2,8030,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
3,8103,Aquarium,Veterinarian,Pet Store,Pet Service,Park
4,8105,Miscellaneous Shop,Veterinarian,Pet Store,Pet Service,Park
5,8106,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
6,8108,Park,Veterinarian,Pet Store,Pet Service,Miscellaneous Shop
7,8109,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
8,8110,Veterinarian,Pet Store,Pet Service,Park,Miscellaneous Shop
9,8332,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop






Finally, let's check the size of the output dataframes:

In [62]:
print(philly_grouped.shape)
print('There are {} uniques postal codes.'.format(len(philly_grouped['PostalCode'].unique())))

(36, 7)
There are 36 uniques postal codes.


In [63]:
print(philly_postalcode_top_venues.shape)
print('There are {} uniques postal codes.'.format(len(philly_postalcode_top_venues['PostalCode'].unique())))

(36, 6)
There are 36 uniques postal codes.


## Clustering

We are going to consider now the Philadelphia data and group them into 5 different clusters.

### Processing data

In [131]:
city = search_cities[3]
city_venues = df_venues_ex[city] #Venues data
city_grouped = philly_grouped #Grouped data
city_postalcode_sorted = philly_postalcode_top_venues #Most common venues

print(city)
print('\n')
print('City venues:')
display(city_venues)
print('City grouped data:')
display(city_grouped)

Philadelphia, PA


City venues:


Unnamed: 0,Name,Address,Latitude,Longitude,PostalCode,Category
0,BONeJOUR Pet Supply,53 N 3rd St,39.951846,-75.145359,19106,Pet Store
1,Doggie Style,1635 Spruce St,39.947662,-75.169523,19103,Pet Store
2,Baltimore Pet Shoppe,4532 Baltimore Ave,39.948954,-75.213978,19143,Pet Store
3,Doggie Style,315 Market St,39.950295,-75.146308,19106,Pet Store
4,Fairmount Pet Shoppe,2024 Fairmount Ave,39.967280,-75.171041,19130,Pet Store
...,...,...,...,...,...,...
87,Pretzel Park,4300 Silverwood St.,40.025643,-75.221071,19127,Park
88,Zone A,1 Riverside Dr,39.945018,-75.131537,08103,Aquarium
89,Pennpet Clinic,3495 Haddonfield Rd,39.961259,-75.025555,08109,Veterinarian
90,Cleaning Revolution,3087 Stevens St,39.945255,-75.078951,08105,Miscellaneous Shop


City grouped data:


Unnamed: 0,PostalCode,Aquarium,Miscellaneous Shop,Park,Pet Service,Pet Store,Veterinarian
0,8002,0.0,0.0,0.0,0.0,1.0,0.0
1,8003,0.0,0.0,0.0,1.0,0.0,0.0
2,8030,0.0,0.0,0.0,0.0,1.0,0.0
3,8103,1.0,0.0,0.0,0.0,0.0,0.0
4,8105,0.0,1.0,0.0,0.0,0.0,0.0
5,8106,0.0,0.0,0.0,0.0,1.0,0.0
6,8108,0.0,0.0,1.0,0.0,0.0,0.0
7,8109,0.0,0.0,0.0,0.0,0.666667,0.333333
8,8110,0.0,0.0,0.0,0.0,0.0,1.0
9,8332,0.0,0.0,0.0,0.0,1.0,0.0


Let's run now a k-means algorithm to cluster the postal codes into 5 different clusters:

In [132]:
#Set number of clusters
kclusters = 5

#Drop column 'PostalCode'
city_grouped_clustering = city_grouped.drop('PostalCode', 1)
display(city_grouped_clustering)

#Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(city_grouped_clustering)
display(kmeans)

#Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

Unnamed: 0,Aquarium,Miscellaneous Shop,Park,Pet Service,Pet Store,Veterinarian
0,0.0,0.0,0.0,0.0,1.0,0.0
1,0.0,0.0,0.0,1.0,0.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,1.0,0.0
6,0.0,0.0,1.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.666667,0.333333
8,0.0,0.0,0.0,0.0,0.0,1.0
9,0.0,0.0,0.0,0.0,1.0,0.0


KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

array([0, 1, 0, 2, 3, 0, 2, 0, 4, 0])

Now we create a new dataframe which includes the cluster and the top 5 venues for each postal code:

In [134]:
#Add clustering labels
city_postalcode_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
display(city_postalcode_sorted)

Unnamed: 0,Cluster Labels,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,0,8002,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
1,1,8003,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop
2,0,8030,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
3,2,8103,Aquarium,Veterinarian,Pet Store,Pet Service,Park
4,3,8105,Miscellaneous Shop,Veterinarian,Pet Store,Pet Service,Park
5,0,8106,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
6,2,8108,Park,Veterinarian,Pet Store,Pet Service,Miscellaneous Shop
7,0,8109,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
8,4,8110,Veterinarian,Pet Store,Pet Service,Park,Miscellaneous Shop
9,0,8332,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop


Let's add the latitude and longitude coordinates for the postal codes to the city_postalcode_top_venues dataframe. First, we have to delete 'Name', 'Address' and 'Category' columns from city_venues dataframe. Now we have the city_venues_proc dataframe with postal codes and their corresponding latitude and longitude values:

In [75]:
city_venues_proc = city_venues
city_venues_proc = city_venues_proc.drop(['Name', 'Address', 'Category'], 1)
print('There are {} uniques postal codes.'.format(len(city_venues_proc['PostalCode'].unique())))
display(city_venues_proc)

There are 37 uniques postal codes.


Unnamed: 0,Latitude,Longitude,PostalCode
0,39.951846,-75.145359,19106
1,39.947662,-75.169523,19103
2,39.948954,-75.213978,19143
3,39.950295,-75.146308,19106
4,39.967280,-75.171041,19130
...,...,...,...
87,40.025643,-75.221071,19127
88,39.945018,-75.131537,08103
89,39.961259,-75.025555,08109
90,39.945255,-75.078951,08105


Now we drop the duplicated postal codes:

In [76]:
city_venues_proc = city_venues_proc.sort_values('PostalCode', ascending = True)
city_venues_proc = city_venues_proc.drop_duplicates(subset = 'PostalCode', keep = 'first')
display(city_venues_proc)
print(city_venues_proc.shape)

Unnamed: 0,Latitude,Longitude,PostalCode
15,39.936052,-75.025914,8002.0
84,39.93484,-75.03073,8003.0
22,39.879206,-75.111969,8030.0
19,39.945932,-75.131219,8103.0
90,39.945255,-75.078951,8105.0
83,39.890333,-75.066386,8106.0
30,39.911612,-75.08129,8108.0
25,39.929749,-75.083877,8109.0
86,39.96327,-75.050756,8110.0
78,39.910248,-75.049236,8332.0


(37, 3)


Let's merge city_postalcode_top_venues with city_venues_proc to add latitude and longitude vaues for each postal code:

In [77]:
city_merged = city_venues_proc.join(city_postalcode_sorted.set_index('PostalCode'), on = 'PostalCode')
display(city_merged)
print(city_merged.shape)

Unnamed: 0,Latitude,Longitude,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
15,39.936052,-75.025914,8002.0,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
84,39.93484,-75.03073,8003.0,1.0,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop
22,39.879206,-75.111969,8030.0,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
19,39.945932,-75.131219,8103.0,2.0,Aquarium,Veterinarian,Pet Store,Pet Service,Park
90,39.945255,-75.078951,8105.0,3.0,Miscellaneous Shop,Veterinarian,Pet Store,Pet Service,Park
83,39.890333,-75.066386,8106.0,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
30,39.911612,-75.08129,8108.0,2.0,Park,Veterinarian,Pet Store,Pet Service,Miscellaneous Shop
25,39.929749,-75.083877,8109.0,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
86,39.96327,-75.050756,8110.0,4.0,Veterinarian,Pet Store,Pet Service,Park,Miscellaneous Shop
78,39.910248,-75.049236,8332.0,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop


(37, 9)


Let's drop the rows with NaN postal code:

In [78]:
city_merged = city_merged.dropna()
display(city_merged)
print(city_merged.shape)

Unnamed: 0,Latitude,Longitude,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
15,39.936052,-75.025914,8002,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
84,39.93484,-75.03073,8003,1.0,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop
22,39.879206,-75.111969,8030,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
19,39.945932,-75.131219,8103,2.0,Aquarium,Veterinarian,Pet Store,Pet Service,Park
90,39.945255,-75.078951,8105,3.0,Miscellaneous Shop,Veterinarian,Pet Store,Pet Service,Park
83,39.890333,-75.066386,8106,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
30,39.911612,-75.08129,8108,2.0,Park,Veterinarian,Pet Store,Pet Service,Miscellaneous Shop
25,39.929749,-75.083877,8109,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
86,39.96327,-75.050756,8110,4.0,Veterinarian,Pet Store,Pet Service,Park,Miscellaneous Shop
78,39.910248,-75.049236,8332,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop


(36, 9)


### Creating map of clusters

We visualize the resulting clusters in a map. First, we create a new map centred around the city:

In [135]:
city = search_cities[3]
print(city)
latit, long = search4coord(city_address = city)

map_clusters = folium.Map(location = [latit, long], zoom_start = 11)
map_clusters

Philadelphia, PA
The geograpical coordinate of Philadelphia, PA are 39.9527237, -75.1635262.


We set a different color for each of the clusters:

In [80]:
print('Number of clusters: ' + str(kclusters))
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

Number of clusters: 5


Finally, we add the markers of the venues to the map:

In [136]:
markers_colors = []
for lat, lon, poi, cluster in zip(city_merged['Latitude'], city_merged['Longitude'], city_merged['PostalCode'], city_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker( [lat, lon], radius = 5, popup = label, 
        color = rainbow[int(cluster)-1],
        fill = True,
        fill_color = rainbow[int(cluster)-1], 
        fill_opacity = 0.7).add_to(map_clusters)
    
map_clusters

In [98]:
def create_map_clusters(city_merged, latitude, longitude, kclusters): 
    map_clusters = {}
    
    #Create a new map centered around the city:
    map_clusters = folium.Map(location = [latitude, longitude], zoom_start = 11)
    map_clusters

    #Set the different colors for the clusters
    print('Number of clusters: ' + str(kclusters))
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    #Let's add the markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(city_merged['Latitude'], city_merged['Longitude'], city_merged['PostalCode'], city_merged['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
        folium.CircleMarker( [lat, lon], radius = 5, popup = label, 
            color = rainbow[int(cluster)-1],
            fill = True,
            fill_color = rainbow[int(cluster)-1], 
            fill_opacity = 0.7).add_to(map_clusters)

    #Display map
    display(map_clusters)

In [99]:
city = search_cities[3]
print(city)
latit, long = search4coord(city_address = city)

create_map_clusters(city_merged = city_merged, latitude = latit, longitude = long, kclusters = kclusters)

Philadelphia, PA
The geograpical coordinate of Philadelphia, PA are 39.9527237, -75.1635262.
Number of clusters: 5


## Examine clusters

Now we can examine each of the clusters and see the venues categories distinguish by each cluster.

In [84]:
city_merged.head()

Unnamed: 0,Latitude,Longitude,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
15,39.936052,-75.025914,8002,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
84,39.93484,-75.03073,8003,1.0,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop
22,39.879206,-75.111969,8030,0.0,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
19,39.945932,-75.131219,8103,2.0,Aquarium,Veterinarian,Pet Store,Pet Service,Park
90,39.945255,-75.078951,8105,3.0,Miscellaneous Shop,Veterinarian,Pet Store,Pet Service,Park


In [85]:
city_merged.columns

Index(['Latitude', 'Longitude', 'PostalCode', 'Cluster Labels',
       '1st Most Common Venue', '2nd Most Common Venue',
       '3rd Most Common Venue', '4th Most Common Venue',
       '5th Most Common Venue'],
      dtype='object')

### Cluster 1

In [86]:
city_merged.loc[ city_merged['Cluster Labels'] == 0, city_merged.columns[[2] +  list(range(4, city_merged.shape[1])) ]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
15,8002,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
22,8030,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
83,8106,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
25,8109,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
78,8332,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
85,19003,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
67,19004,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
75,19050,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
14,19072,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop
23,19096,Pet Store,Veterinarian,Pet Service,Park,Miscellaneous Shop


### Cluster 2

In [87]:
city_merged.loc[ city_merged['Cluster Labels'] == 1, city_merged.columns[[2] +  list(range(4, city_merged.shape[1])) ]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
84,8003,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop
11,19079,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop
79,19083,Pet Service,Veterinarian,Pet Store,Park,Miscellaneous Shop


### Cluster 3

In [88]:
city_merged.loc[ city_merged['Cluster Labels'] == 2, city_merged.columns[[2] +  list(range(4, city_merged.shape[1])) ]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
19,8103,Aquarium,Veterinarian,Pet Store,Pet Service,Park
30,8108,Park,Veterinarian,Pet Store,Pet Service,Miscellaneous Shop
77,19127,Pet Store,Pet Service,Park,Veterinarian,Miscellaneous Shop


### Cluster 4

In [89]:
city_merged.loc[ city_merged['Cluster Labels'] == 3, city_merged.columns[[2] +  list(range(4, city_merged.shape[1])) ]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
90,8105,Miscellaneous Shop,Veterinarian,Pet Store,Pet Service,Park
20,19082,Pet Store,Miscellaneous Shop,Veterinarian,Pet Service,Park


### Cluster 5

In [90]:
city_merged.loc[ city_merged['Cluster Labels'] == 4, city_merged.columns[[2] +  list(range(4, city_merged.shape[1])) ]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
86,8110,Veterinarian,Pet Store,Pet Service,Park,Miscellaneous Shop


## Conclusion

In summary,
- Cluster 1 corresponds to areas that have mostly pet stores, followed by veterinarians.
- Cluster 2 has mostly pet services, followed by veterinarians.
- Cluster 3 most common venues are a mix of aquarium, park and pet store. 
- Cluster 4 most common venues are miscellaneous shops and pet stores in first place.
- Cluster 5 is composed of an only postal code, where most common venue is veterinarian.

In conclusion, most pet-friendly postal codes are in clusters 1 and 2, since they have much more pet venues around. If you want to move to an area plenty of pet stores, cluster 1 should be a good choice, and if you prefer to live near venues with pet services, cluster 2 will be recommended. However, if you want to live close to a park, postal codes in cluster 3 will be optimal for you. Moreover, miscellaneous shops are closer to areas of postal codes of cluster 4. Finally, veterinarians are in areas of cluster 5.