# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The problem I am trying to solve is one of correlating user data to venue data. What types of users frequent which kinds of restaurants? To narrow the scope and keep it relevant to the audience I will specify it to be of the category "Asian Restaurant". This type of analysis could aid in marketing efforts of a variety of locations in the area of focus. 
   
For this analysis I will focus on the area of Manhattan, New York. Specifically in the areas around the Empire State Building, where foot traffic and tourism would be most densely concentrated. 

My audience of stakeholders would be would be owners of Japanese cuisine restaurants in the Manhattan area, both local and chain restaurants that could benefit from this data to improve their targeted marketing.

## Data <a name="data"></a>

Data will be taken from the Foursquare database and primarily use the Venues and Users endpoints for the requests. Data will be matched based on the Check-in data to get details ffor the restaurant and the user in order to establish a connection. We will then establish counts for things like male/female patrons, home cities, friend counts, and so on. 

Based on our definition of the problem, the following data sources will be needed:
* Candidate areas based on a radius surrounding a central location (in our case the Empire State Building)
* Number of restaurants and their locations based on the Foursquare API calls
* User data matched from check-ins to those restarants, matched by unique ids provided by Foursquare

Let's start by getting our necessary libraries ready.

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import folium # plotting library
from pandas.io.json import json_normalize
from IPython.display import Image 
from IPython.core.display import HTML

In [3]:
CLIENT_ID = '14VG1SGJ4TAUZTPTD0D250U2SO2FYVOJMQGQDOI0SCMPK4WU' # your Foursquare ID
CLIENT_SECRET = '0JWMWUD3QFFTLYJF2FXZOGEZ12QPKTO4INZTU30RQKOKBYAF' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

In [4]:
address = '20 W 34th St, New York, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7486538125 -73.9853043125


In [5]:
search_query = 'Japanese'
radius = 2000
print(search_query + ' .... OK!')

Japanese .... OK!


In [6]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=14VG1SGJ4TAUZTPTD0D250U2SO2FYVOJMQGQDOI0SCMPK4WU&client_secret=0JWMWUD3QFFTLYJF2FXZOGEZ12QPKTO4INZTU30RQKOKBYAF&ll=40.7486538125,-73.9853043125&v=20180604&query=Japanese&radius=2000&limit=30'

In [7]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cf4bceef594df57e9913cd1'},
 'response': {'venues': [{'id': '4ef0d81fbe7ba3ed7c2296d2',
    'name': 'Gyu-Kaku Japanese BBQ',
    'location': {'address': '321 W 44th St',
     'crossStreet': 'btwn 8th & 9th Ave',
     'lat': 40.75904160050783,
     'lng': -73.99004327793013,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.75904160050783,
       'lng': -73.99004327793013}],
     'distance': 1223,
     'postalCode': '10036',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['321 W 44th St (btwn 8th & 9th Ave)',
      'New York, NY 10036',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d111941735',
      'name': 'Japanese Restaurant',
      'pluralName': 'Japanese Restaurants',
      'shortName': 'Japanese',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/japanese_',
       'suffix': '.png'},
      'primary': True}],
    'referra

In [8]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,...,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",,,,,,,False,4ef0d81fbe7ba3ed7c2296d2,321 W 44th St,...,"[321 W 44th St (btwn 8th & 9th Ave), New York,...","[{'label': 'display', 'lat': 40.75904160050783...",40.759042,-73.990043,,10036,NY,Gyu-Kaku Japanese BBQ,v-1559543022,
1,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",,,,,,,False,48208c8cf964a520894f1fe3,805 3rd Ave,...,"[805 3rd Ave (btwn E 49th & E 50th St), New Yo...","[{'label': 'display', 'lat': 40.75573050425004...",40.755731,-73.970897,,10022,NY,Gyu-Kaku Japanese BBQ,v-1559543022,
2,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",,,,,,,False,427ff980f964a520b7211fe3,34 Cooper Sq,...,"[34 Cooper Sq (btwn E 5th & E 6th St.), New Yo...","[{'label': 'display', 'lat': 40.72821272776984...",40.728213,-73.990913,,10003,NY,Gyu-Kaku Japanese BBQ,v-1559543022,
3,"[{'id': '4bf58dd8d48988d171941735', 'name': 'E...",,,,,,,False,4be20ec01dd22d7ffa3c93bd,,...,"[New York, NY 10036, United States]","[{'label': 'display', 'lat': 40.75544369348851...",40.755444,-73.980611,,10036,NY,Japanese American Association Of New York,v-1559543022,
4,"[{'id': '4bf58dd8d48988d199941735', 'name': 'C...",,,,,,,False,4d5b24ce6b2e3704ed237eee,227 W 27th St,...,"[227 W 27th St (btwn 7th & 8th), New York, NY ...","[{'label': 'display', 'lat': 40.75384633914933...",40.753846,-73.985761,,10001,NY,FIT Japanese Class,v-1559543022,


In [9]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Gyu-Kaku Japanese BBQ,Japanese Restaurant,321 W 44th St,US,New York,United States,btwn 8th & 9th Ave,1223,"[321 W 44th St (btwn 8th & 9th Ave), New York,...","[{'label': 'display', 'lat': 40.75904160050783...",40.759042,-73.990043,,10036.0,NY,4ef0d81fbe7ba3ed7c2296d2
1,Gyu-Kaku Japanese BBQ,Japanese Restaurant,805 3rd Ave,US,New York,United States,btwn E 49th & E 50th St,1447,"[805 3rd Ave (btwn E 49th & E 50th St), New Yo...","[{'label': 'display', 'lat': 40.75573050425004...",40.755731,-73.970897,,10022.0,NY,48208c8cf964a520894f1fe3
2,Gyu-Kaku Japanese BBQ,Japanese Restaurant,34 Cooper Sq,US,New York,United States,btwn E 5th & E 6th St.,2324,"[34 Cooper Sq (btwn E 5th & E 6th St.), New Yo...","[{'label': 'display', 'lat': 40.72821272776984...",40.728213,-73.990913,,10003.0,NY,427ff980f964a520b7211fe3
3,Japanese American Association Of New York,Event Space,,US,New York,United States,,853,"[New York, NY 10036, United States]","[{'label': 'display', 'lat': 40.75544369348851...",40.755444,-73.980611,,10036.0,NY,4be20ec01dd22d7ffa3c93bd
4,FIT Japanese Class,College Arts Building,227 W 27th St,US,New York,United States,btwn 7th & 8th,579,"[227 W 27th St (btwn 7th & 8th), New York, NY ...","[{'label': 'display', 'lat': 40.75384633914933...",40.753846,-73.985761,,10001.0,NY,4d5b24ce6b2e3704ed237eee
5,Japanese Culinary Center,Furniture / Home Store,711 3rd Ave,US,New York,United States,at E 45th St.,1136,"[711 3rd Ave (at E 45th St.), New York, NY 100...","[{'label': 'display', 'lat': 40.75265843016500...",40.752658,-73.97291,,10017.0,NY,4a8c4222f964a5207a0d20e3
6,Atami Japanese Fusion,Japanese Restaurant,1167 2nd Ave,US,New York,United States,,2396,"[1167 2nd Ave, New York, NY 10065, United States]","[{'label': 'display', 'lat': 40.76215844475948...",40.762158,-73.963174,,10065.0,NY,5006eaf1e4b0d9eeb5cd67ef
7,Ichiban Japanese Restaurant,Sushi Restaurant,409 8th Ave,US,New York,United States,31st Street,834,"[409 8th Ave (31st Street), New York, NY 10001...","[{'label': 'display', 'lat': 40.75008204022070...",40.750082,-73.995017,,10001.0,NY,4bde28866c1b9521ac51ad0f
8,Ruby's Japanese & Thai,Sushi Restaurant,259 1st Ave,US,New York,United States,at E 15th St.,1874,"[259 1st Ave (at E 15th St.), New York, NY 100...","[{'label': 'display', 'lat': 40.73199537280791...",40.731995,-73.982072,,10003.0,NY,4a8cb3fff964a520ed0e20e3
9,Edo Japanese Restaurant,Sushi Restaurant,9 E 17th St,US,New York,United States,btw 5th Ave & Broadway,1340,"[9 E 17th St (btw 5th Ave & Broadway), New Yor...","[{'label': 'display', 'lat': 40.73754004956548...",40.73754,-73.99143,,10003.0,NY,4a78718ff964a520a4e51fe3


In [11]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Empire State Building

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Empire State Building',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Japanese restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

## Methodology <a name="methodology"></a>

In this project we will direct efforts towards finding the ideal customer profile for the typical patron of a Japanese restarant in the most tourist-y areas of Manhattan. The analysis is limited to 2 km around the EMpire State Building, one of the most iconic tourist destinations in New York City. 

In the previous steps, we compiled a list of Japanese cuisine restaurants in the area described, and set up an example profile of a patron of one of the restaurants closest to the tower. We identified the area using and the centerpoint using the Geocoders library and location API. Using the FOursquare API, we were able to tie in this information to find Japanese restaurants within the suiatble radius. 

The next step of the analysis will be to run a user profiling on all restaurants and users who liked the restaurant. The ideology here being that is they took the time to "like" the restaurant, this would indicate an exceptionally good experience of the user. 

The third step would be to run clustering of the user profiles to determine what factors lead to liking the restaurants including gender, friend count, and home city. This woul be presented to the stakeholders in order to inform them of their target audience for marketing activities. 

## Analysis <a name="analysis"></a>

To start, we will use the ids of the restaurants we found to connect these to our check-ins.

In [10]:
dataframe_filtered.name

0                             Gyu-Kaku Japanese BBQ
1                             Gyu-Kaku Japanese BBQ
2                             Gyu-Kaku Japanese BBQ
3         Japanese American Association Of New York
4                                FIT Japanese Class
5                          Japanese Culinary Center
6                             Atami Japanese Fusion
7                       Ichiban Japanese Restaurant
8                            Ruby's Japanese & Thai
9                           Edo Japanese Restaurant
10                           Asahi Japanese Cuisine
11    Mount Sinai Doctors Japanese Medical Practice
12                  Japanese American United Church
13                   Ai's Sushi Japanese Restaurant
14                                     Japanese Pub
15                     Mizu Japanese & Thai Cuisine
16                            Japanese Medical Care
17                            Japanese Eyelash Perm
18                        Japanese Medical Practice
19          

In [12]:
dataframe_filtered.id

0     4ef0d81fbe7ba3ed7c2296d2
1     48208c8cf964a520894f1fe3
2     427ff980f964a520b7211fe3
3     4be20ec01dd22d7ffa3c93bd
4     4d5b24ce6b2e3704ed237eee
5     4a8c4222f964a5207a0d20e3
6     5006eaf1e4b0d9eeb5cd67ef
7     4bde28866c1b9521ac51ad0f
8     4a8cb3fff964a520ed0e20e3
9     4a78718ff964a520a4e51fe3
10    4e4e4aa6bd4101d0d7a6fc27
11    59e4e32a67e5f2780fbc998c
12    4da090c831a6b60cb6227618
13    4fd91c44e4b0d9021d3ee7ca
14    5202fac2498ebd638045bf59
15    45127b37f964a520a0391fe3
16    5b3bdbf3535d6f002c31c06d
17    51fc1ba7498e519f6f136456
18    4bbca5ae07809521f486d991
19    52e55531498e7a42cd0c25b1
20    49c2ede7f964a52044561fe3
21    4e4c4f09bd413c4cc66864b3
22    5a34365a47f87647f1aa7b90
23    4ea1ad4e6c25b62f3d0e6cef
24    4e4e4ad5bd4101d0d7a7002d
25    4b9fbebdf964a520c83a37e3
26    4f1dd96fe4b0288a058c5441
27    470363eaf964a5203e4b1fe3
28    4c2a71ebbbc7e21e1d423582
29    4f44870f19836ed00194b271
Name: id, dtype: object

Let's get an example of the users who liked Gyu-Kaku Japanese BBQ on 44th St.

In [17]:
venue_id = '4ef0d81fbe7ba3ed7c2296d2'
url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
#url
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cf4c2ac4434b9213f7e00f5'},
 'response': {'likes': {'count': 627,
   'summary': '627 Likes',
   'items': [{'id': '34929',
     'firstName': 'Amol',
     'lastName': 'Sarva',
     'gender': 'male',
     'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
      'suffix': '/4a8ff7dee1a99.jpg'}},
    {'id': '815298',
     'firstName': 'RauwCC | Maarten',
     'lastName': 'Reijgersberg',
     'gender': 'male',
     'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
      'suffix': '/815298-FBM3AT04C5SPJBEA.jpg'}}]}}}

Here, we got two people who loked the place, both male. One may consider this a restaurant who might benefit from advertising to males, although let's see if we can get more details on these two.

In [18]:
user_id = '34929'
url = 'https://api.foursquare.com/v2/users/{}?client_id={}&client_secret={}&v={}'.format(user_id, CLIENT_ID, CLIENT_SECRET, VERSION)
#url
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cf4c4fa1ed21914bb667fbc'},
 'notifications': [],
 'response': {'user': {'id': '34929',
   'firstName': 'Amol',
   'lastName': 'S',
   'gender': 'male',
   'canonicalUrl': 'https://foursquare.com/amol',
   'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
    'suffix': '/4a8ff7dee1a99.jpg'},
   'friends': {'count': 151,
    'groups': [{'type': 'others',
      'name': 'Other friends',
      'count': 151,
      'items': [{'id': '293091',
        'firstName': 'Rachel',
        'lastName': 'W',
        'gender': 'female',
        'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
         'suffix': '/OPVVFY51Z0VQZST3.jpg'},
        'tips': {'count': 0},
        'lists': {'groups': [{'type': 'created', 'count': 2, 'items': []}]},
        'homeCity': 'New York, NY',
        'bio': '',
        'contact': {}},
       {'id': '19509500',
        'firstName': 'Milos',
        'lastName': 'P',
        'gender': 'male',
        'photo': {'prefix': '

Above we can see one of the likes was from Amol Sarva, a male with quite a few friends who hails from Long Island, NY. These details can be useful in determining a profile of people who are attracted to Japanese cuisine, or are frequent patrons of Gyu-Kaku in particular.

Moving on, let's expand this analysis to the other restaurants. We can grap a list of all user_ids that liked every restaurant in our list to start with.

In [19]:
venue_list = dataframe_filtered.id
results = []
for venue in venue_list:
    url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(venue, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    results.append(result)

KeyError: 'user'

In [42]:
user_ids = []
for result in results:
    user_ids.append(result['response']['likes'])#['items']['id'])
    
user_ids

[{'count': 627,
  'summary': '627 Likes',
  'items': [{'id': '34929',
    'firstName': 'Amol',
    'lastName': 'Sarva',
    'gender': 'male',
    'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
     'suffix': '/4a8ff7dee1a99.jpg'}},
   {'id': '815298',
    'firstName': 'RauwCC | Maarten',
    'lastName': 'Reijgersberg',
    'gender': 'male',
    'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
     'suffix': '/815298-FBM3AT04C5SPJBEA.jpg'}}]},
 {'count': 471,
  'summary': '471 Likes',
  'items': [{'id': '378576637',
    'firstName': 'Sam',
    'lastName': 'Scariot',
    'gender': 'male',
    'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
     'suffix': '/378576637-NGC2K4FMBMKDAYGW.jpg'}},
   {'id': '134732237',
    'firstName': 'Takashi',
    'lastName': 'Arima',
    'gender': 'none',
    'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
     'suffix': '/134732237-WNE3CBFOQGAEXTV0.jpg'}}]},
 {'count': 773,
  'summary': '773 Likes',
  'items': [{'id': '

## Results and Discussion <a name="results"></a>

Discussion on results will be here.

## Conclusion <a name="conclusion"></a>

To conclude, the results we achieved are not conclusive by any means, but they do offer a good insight that can generalize the target audience for these Japanese restaurants and provide them with insight into the marketing strategy they should pursue in order to retain customers and attract new ones to their establishments. 