# The Battle of Neighborhood 2

## Tourist guide for all over the world

### INTRODUCTION

Where would you recommend an art adventurer to visit in order to fulfill her/his hunger for various art pieces? The world is full of wonders and maybe the most ambiguous ones are considered as art. Many people dedicated themselves to travel around the world and discover these ambiguities. This project seeks for art related venues in the capitals of all countries, cluster them and reveals the different cities in terms of art venues although they have closer geography. By the nature of this project, travel companies and their costumers might be interested in this project to find similar and different places around the world.

### DATA

In order to accomplish this goal, latitudes and longitudes of the capital cities of all countries in the world are required. The "simplemaps.com" offers a simple, accurate and up-to-date database of the world's cities and their locations. From this data, I need to select the capital cities and using their latitude and longitude values, I need to explore the venues with "art" section around these cities by utilizing Foursquare API.

The "simplemap" data contains city name, corresponding latitude, longitude and country name along with many other features for all cities in the world. I only use the features that I mentioned in this data and then filtered the capital cities for each country. After cleaning the data I have 225 countries with 225 capital cities.

Then I used the acquired location data to explore the nearby art venues from the Foursquare API. Using the "explore" option, I look for top 25 art venues in 10 km radius for each city center and I get "Venue", "Venue Latitude", "Venue Longitude" and "Venue Category" columns with 3316 row in total.

## Methodology

Basic skills from week 3 lab
Majorly relied on Foursquare API to retrieve all venues of each neighborhoods, then group by each neighborhoods and to count how many venues before filter top 10 most common venue types of each neighborhoods

In [6]:
# Import the required library
import numpy as np
import pandas as pd


In [7]:
# Load data of Providence,RI
df_pvd=pd.read_excel("Book1.xlsx")
df_pvd.head()

Unnamed: 0.1,Unnamed: 0,Zipcode,Neighborhoods,Latitude,Longitude
0,,2906,Blackstone,41.846388,-71.385406
1,,2904,Charles,41.812104,-71.429089
2,,2906,College Hill,41.830157,-71.403219
3,,2903,Downtown,41.822533,-71.415094
4,,2908,Elmhurst,41.840911,-71.438842


In [8]:
df_pvd=df_pvd.drop(["Unnamed: 0"],axis=1)

In [9]:
# Add '0' before '2' of each zipcode
df_pvd['Zipcode']='0'+ df_pvd['Zipcode'].astype(str)

In [10]:
df_pvd.head()

Unnamed: 0,Zipcode,Neighborhoods,Latitude,Longitude
0,2906,Blackstone,41.846388,-71.385406
1,2904,Charles,41.812104,-71.429089
2,2906,College Hill,41.830157,-71.403219
3,2903,Downtown,41.822533,-71.415094
4,2908,Elmhurst,41.840911,-71.438842


In [11]:
# Load data of Hartford,CT
df_bdl=pd.read_excel("Book2.xlsx")
df_bdl.head()

Unnamed: 0.1,Unnamed: 0,Zipcode,Neighborhoods,Latitude,Longitude
0,,6105,Asylum Hill,41.773149,-72.694937
1,,6106,Barry Square,41.747158,-72.683114
2,,6114,Barry Square,41.747158,-72.683114
3,,6106,Behind The Rocks,41.745071,-72.700849
4,,6112,Blue Hills,41.812877,-72.697593


In [12]:
df_bdl=df_bdl.drop(["Unnamed: 0"],axis=1)

In [13]:
# Add '0' before '6' of each zipcode
df_bdl['Zipcode']='0'+ df_bdl['Zipcode'].astype(str)

In [14]:
df_bdl.head()

Unnamed: 0,Zipcode,Neighborhoods,Latitude,Longitude
0,6105,Asylum Hill,41.773149,-72.694937
1,6106,Barry Square,41.747158,-72.683114
2,6114,Barry Square,41.747158,-72.683114
3,6106,Behind The Rocks,41.745071,-72.700849
4,6112,Blue Hills,41.812877,-72.697593


In [15]:
#!conda install -c conda-forge folium=0.5.0 --yes
#!conda install -c conda-forge geopy --yes
# Import mapping libraries
from geopy.geocoders import Nominatim
import folium

In [16]:
# Importing libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [17]:
# Working with Foursquare
CLIENT_ID = '3Y2UIXY513PP1Z4BP0Z5LEQZP1RQXTY5N1KL5L33NQXIHSJS' # your Foursquare ID
CLIENT_SECRET = '4IJ0F2MIVMNBA3EMF1JSTDF10JWSB5UZ2UF2VQKEDVAG4JLX' # your Foursquare Secret
VERSION = '20181224' # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: 3Y2UIXY513PP1Z4BP0Z5LEQZP1RQXTY5N1KL5L33NQXIHSJS
CLIENT_SECRET:4IJ0F2MIVMNBA3EMF1JSTDF10JWSB5UZ2UF2VQKEDVAG4JLX


In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit=100'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude',
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
pvd_venues=getNearbyVenues(df_pvd['Neighborhoods'], df_pvd['Latitude'], df_pvd['Longitude'])
bdl_venues=getNearbyVenues(df_bdl['Neighborhoods'], df_bdl['Latitude'], df_bdl['Longitude'])

In [20]:
pvd_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Charles,41.812104,-71.429089,Armory Park,41.812957,-71.431897,Plaza
1,Charles,41.812104,-71.429089,Long Live Beerworks,41.809406,-71.42595,Brewery
2,Charles,41.812104,-71.429089,Hudson Street Delicatessen,41.813256,-71.434625,Deli / Bodega
3,Charles,41.812104,-71.429089,Family Dollar,41.813167,-71.427751,Discount Store
4,Charles,41.812104,-71.429089,Tropical Liquors,41.812658,-71.429144,Liquor Store


In [21]:
bdl_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Asylum Hill,41.773149,-72.694937,Au Bon Pain,41.773804,-72.699015,Café
1,Asylum Hill,41.773149,-72.694937,Saint Francis Main Cafeteria,41.774191,-72.698254,Café
2,Asylum Hill,41.773149,-72.694937,Sigourney Square Park,41.775729,-72.693938,Park
3,Asylum Hill,41.773149,-72.694937,Women's Auxiliary Gift Shop,41.773952,-72.698893,Gift Shop
4,Asylum Hill,41.773149,-72.694937,Saint Francis Fitness Center,41.773976,-72.700199,Gym


In [22]:
# Number of venues of each 
pvd_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Charles,10,10,10,10,10,10
College Hill,51,51,51,51,51,51
Downtown,45,45,45,45,45,45
Elmhurst,6,6,6,6,6,6


In [23]:
# Number of venues of each 
pvd_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Charles,10,10,10,10,10,10
College Hill,51,51,51,51,51,51
Downtown,45,45,45,45,45,45
Elmhurst,6,6,6,6,6,6


In [24]:
# one hot encoding
pvd_onehot = pd.get_dummies(pvd_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
pvd_onehot['Neighborhood'] = pvd_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [pvd_onehot.columns[-1]] + list(pvd_onehot.columns[:-1])
pvd_onehot = pvd_onehot[fixed_columns]

pvd_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Bistro,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Café,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,College Gym,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Discount Store,Donut Shop,Food,Food Truck,Frozen Yogurt Shop,Gay Bar,Gift Shop,Greek Restaurant,Hockey Arena,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Music Venue,New American Restaurant,Nightclub,Park,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Ramen Restaurant,Recording Studio,Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shipping Store,Skating Rink,Snack Place,Speakeasy,Steakhouse,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,Charles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Charles,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Charles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Charles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Charles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
pvd_grouped = pvd_onehot.groupby('Neighborhood').mean().reset_index()
pvd_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Bistro,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Café,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,College Gym,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Discount Store,Donut Shop,Food,Food Truck,Frozen Yogurt Shop,Gay Bar,Gift Shop,Greek Restaurant,Hockey Arena,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Music Venue,New American Restaurant,Nightclub,Park,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Ramen Restaurant,Recording Studio,Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shipping Store,Skating Rink,Snack Place,Speakeasy,Steakhouse,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,Charles,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,College Hill,0.019608,0.019608,0.019608,0.019608,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.019608,0.019608,0.0,0.019608,0.0,0.058824,0.0,0.019608,0.019608,0.019608,0.019608,0.019608,0.0,0.0,0.0,0.019608,0.019608,0.0,0.019608,0.019608,0.0,0.019608,0.019608,0.019608,0.019608,0.019608,0.019608,0.019608,0.058824,0.0,0.019608,0.039216,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.019608,0.019608,0.019608,0.0,0.0,0.019608,0.019608,0.019608,0.019608,0.0,0.0,0.019608,0.0,0.019608,0.019608,0.0,0.019608,0.0,0.019608
2,Downtown,0.044444,0.0,0.022222,0.0,0.0,0.0,0.111111,0.022222,0.0,0.0,0.022222,0.0,0.0,0.022222,0.022222,0.022222,0.022222,0.044444,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.044444,0.022222,0.022222,0.044444,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.044444,0.022222,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.022222,0.0,0.0,0.022222,0.022222,0.0,0.022222,0.0,0.0,0.044444,0.0,0.022222,0.0
3,Elmhurst,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
# one hot encoding
bdl_onehot = pd.get_dummies(bdl_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bdl_onehot['Neighborhood'] = bdl_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bdl_onehot.columns[-1]] + list(bdl_onehot.columns[:-1])
bdl_onehot = bdl_onehot[fixed_columns]

bdl_onehot.head()

Unnamed: 0,Neighborhood,Arts & Entertainment,Baseball Stadium,Burger Joint,Café,College Gym,Donut Shop,Gas Station,Gift Shop,Gym,Park,Pizza Place,Restaurant,Rock Club,Sandwich Place
0,Asylum Hill,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,Asylum Hill,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,Asylum Hill,0,0,0,0,0,0,0,0,0,1,0,0,0,0
3,Asylum Hill,0,0,0,0,0,0,0,1,0,0,0,0,0,0
4,Asylum Hill,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [27]:
bdl_grouped = bdl_onehot.groupby('Neighborhood').mean().reset_index()
bdl_grouped.head()

Unnamed: 0,Neighborhood,Arts & Entertainment,Baseball Stadium,Burger Joint,Café,College Gym,Donut Shop,Gas Station,Gift Shop,Gym,Park,Pizza Place,Restaurant,Rock Club,Sandwich Place
0,Asylum Hill,0.166667,0.0,0.0,0.333333,0.0,0.0,0.0,0.166667,0.166667,0.166667,0.0,0.0,0.0,0.0
1,Barry Square,0.0,0.0,0.0,0.0,0.166667,0.0,0.333333,0.0,0.0,0.0,0.166667,0.0,0.333333,0.0
2,Behind The Rocks,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,Blue Hills,0.0,0.25,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25


In [28]:
# Define a function for top venus for a neighborhood
num_top_venues=10
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
pvd_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
pvd_neighborhoods_venues_sorted['Neighborhood'] = pvd_grouped['Neighborhood']

for ind in np.arange(pvd_grouped.shape[0]):
    pvd_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(pvd_grouped.iloc[ind, :], num_top_venues)

pvd_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Charles,Park,Plaza,Food,Deli / Bodega,Discount Store,Donut Shop,Brewery,Breakfast Spot,Liquor Store,Food Truck
1,College Hill,Korean Restaurant,Coffee Shop,Mexican Restaurant,Pizza Place,Dessert Shop,Juice Bar,Jewelry Store,Indie Movie Theater,Ice Cream Shop,Hotel
2,Downtown,Bar,Hotel,New American Restaurant,American Restaurant,Nightclub,Restaurant,Coffee Shop,Hockey Arena,Theater,Gay Bar
3,Elmhurst,Playground,Clothing Store,Bakery,Recording Studio,Liquor Store,Music Venue,Dessert Shop,College Gym,Creperie,Dance Studio


In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

bdl_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
bdl_neighborhoods_venues_sorted['Neighborhood'] = bdl_grouped['Neighborhood']

for ind in np.arange(bdl_grouped.shape[0]):
    bdl_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bdl_grouped.iloc[ind, :], num_top_venues)

bdl_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Asylum Hill,Café,Park,Gym,Gift Shop,Arts & Entertainment,Sandwich Place,Rock Club,Restaurant,Pizza Place,Gas Station
1,Barry Square,Rock Club,Gas Station,Pizza Place,College Gym,Sandwich Place,Restaurant,Park,Gym,Gift Shop,Donut Shop
2,Behind The Rocks,Restaurant,Sandwich Place,Rock Club,Pizza Place,Park,Gym,Gift Shop,Gas Station,Donut Shop,College Gym
3,Blue Hills,Sandwich Place,Donut Shop,Burger Joint,Baseball Stadium,Rock Club,Restaurant,Pizza Place,Park,Gym,Gift Shop


In [37]:
# set number of clusters
from sklearn.cluster import KMeans
kclusters = 2

pvd_grouped_clustering = pvd_grouped.drop('Neighborhood', 1)

# run k-means clustering
pvd_kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(pvd_grouped_clustering)

# check cluster labels generated for each row in the dataframe
pvd_kmeans.labels_[0:4]

array([0, 0, 0, 1], dtype=int32)