# Where to Open a Japanese Restaurant in San Diego?

# Introduction

A recent study by Zagat (https://qz.com/657013/what-americans-are-willing-to-pay-for-ethnic-restaurant-food-reveals-some-pretty-deep-prejudices/) demonstrated that Japanese food is the second most expensive cultural food that American's will spend money on.  Due to this, when deciding where to open a new Japanese restaurant, restauranteers should look to zip codes that are above the mean household income of the US, as well as a low number of existing Japanese restaurants. 

Furthermore, it is also important to consider the number of people living in an area code in order to support the business.  An article detialing population base to support restaurants (http://andrewwoodinc.com/will-the-population-base-support-your-business-2/#:~:text=A%20population%20base%20of%2020%2C000,to%20work%20or%20to%20school.) suggests that a minimum number of 20,000 people would do the trick. Therefore, I will also limit my zip code search to those with at least 20,000 people.

I set out to determine which zip codes in San Diego offer a low number of competing locations within zip codes above the mean household income for the US with at least 20,000 inhabitants which would offer a great location to start a Japanese restaurant.

# Data Collection


The below API's and online resources will serve as the source of our data:

Foursquare venue API - https://api.foursquare.com/v2/venues
Foursquare venue categories API - https://api.foursquare.com/v2/venues/categories
US Zip Code, household income & Census data - https://www.psc.isr.umich.edu/dis/census/Features/tract2zip/
Latitude/Longitude information - Geocoder API/https://cocl.us/Geospatial_data

The data collected from the above resources will provide me with location information from FourSquare on the density of Japanese restaurants in San Dieog.  
The household income data will provide me with the mean, median and population density of every zip code in the US, which I will then filter to San Diego County
Lastly, the geocoder API will plot my data points

# Methodology

I will begin by importing libraries to use for the analysis. 
I will then create an API to connect to FourSquare.  
This will allow me to do a detailed search for locations centered around downtown San Diego (I will use the courthouse as the centerpoint).  The resulting data will be in JSON format for which I will need to extract the meaningful elements to my analysis and then enter those as a dataframe. 

Since the search on FourSquare will be conducted by entering "Japanses", I will need to filter the data to only those that are for restaurants, excluding anything else that may be associated with the keyword "Japanese" but not a restaurant. 

I will then plot the location of the Japanses restaurants in San Diego county.  This will give me a high-level view of the distribution of restaurants.  However, it will not tell me if these restaurants are clustered any differently than each other.  One of the tenets of my analysis is that I want to look at zip codes which do not have many Japanese restaurants already, in order to keep competition to a minimum. Therefore, I will perform clustering on the data elements to identify which zip codes contain the most Japanese restaurants and I will then eclude those from further analysis. 

The next part of my analysis will focus only on the zip codes that contain mean household incomes greater than the US average.  This is because Japanese cuisine is rated as the second most costly cuisine that Americans will spend money on.  Therefore, if the zip codes has an income that exceeds that of the US average, the restaurant will be located in an area will an opportunity to support good sales. 

Lastly, I will use the same census data used for mean household to look at population densisity.  A recent article suggets that a minimum number of 20,000 residents will provide the needed sales to support a business. 

#### Import Necessary Libraries

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 7.8 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=ebe39ce454ce8853e205f0646b2d1ef61e428fc8d58ea6b6f49af4edecbbf7f9
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.5.0
Folium installed
Libraries imported.


#### Credentials for using FourSquare

In [3]:
CLIENT_ID = 'BZAOTE3PBPFSEZ4MBS44FD124XOQP1Z2R4DGJENDAYT1CTB0' 
CLIENT_SECRET = '2PAQDTDAMNIPMDD0PPOBN5JUENEJTHCAPSGGYOHUS0YRJV0I'
CODE = 'CULEFRQGLJRYLKIRI0ESVLS3QZPYFJ2QQVEQWCSN5P3STWX1#_=_'
ACCESS_TOKEN = 'HAVG3E4KDUFCRZUBUMTQQVFSZFWXVBD1I1MIF41K0U1I3J3K'
VERSION = '20201229'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
print('ACCESS_TOKEN:' + ACCESS_TOKEN)

Your credentails:
CLIENT_ID: BZAOTE3PBPFSEZ4MBS44FD124XOQP1Z2R4DGJENDAYT1CTB0
CLIENT_SECRET:2PAQDTDAMNIPMDD0PPOBN5JUENEJTHCAPSGGYOHUS0YRJV0I
ACCESS_TOKEN:HAVG3E4KDUFCRZUBUMTQQVFSZFWXVBD1I1MIF41K0U1I3J3K


#### I will start by looking at Japanese restaurants using downtown San Diego Courthouse as the center to begin the map search.  Will start with a 50 mile radius as San Diego county is a top 10 county for population in the US

In order to use FourSquare I need to use an agent which I will call: "foursquare_agent"

In [4]:
address = '1100 Union Street, San Diego, CA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

32.716855 -117.165704


In [5]:
search_query = 'Japanese'
radius = 80500
print(search_query + ' .... OK!')

Japanese .... OK!


In [6]:
url = 'https://api.foursquare.com/v2/venues/search?client_id=BZAOTE3PBPFSEZ4MBS44FD124XOQP1Z2R4DGJENDAYT1CTB0&client_secret=2PAQDTDAMNIPMDD0PPOBN5JUENEJTHCAPSGGYOHUS0YRJV0I&ll=32.716855,-117.165704&oauth_token=HAVG3E4KDUFCRZUBUMTQQVFSZFWXVBD1I1MIF41K0U1I3J3K&v=20201229&query=Japanese&radius=80500&limit=30'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=BZAOTE3PBPFSEZ4MBS44FD124XOQP1Z2R4DGJENDAYT1CTB0&client_secret=2PAQDTDAMNIPMDD0PPOBN5JUENEJTHCAPSGGYOHUS0YRJV0I&ll=32.716855,-117.165704&oauth_token=HAVG3E4KDUFCRZUBUMTQQVFSZFWXVBD1I1MIF41K0U1I3J3K&v=20201229&query=Japanese&radius=80500&limit=30'

#### send the GET requests and examine the results

In [7]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ff3bc4c119a9f2907b7c512'},
 'notifications': [{'type': 'notificationTray', 'item': {'unreadCount': 0}}],
 'response': {'venues': [{'id': '483d01cef964a52019501fe3',
    'name': 'Japanese Friendship Garden',
    'location': {'address': '2215 Pan American Rd E',
     'crossStreet': 'in Balboa Park',
     'lat': 32.73005388762262,
     'lng': -117.14998483657837,
     'labeledLatLngs': [{'label': 'display',
       'lat': 32.73005388762262,
       'lng': -117.14998483657837},
      {'label': 'entrance', 'lat': 32.730109, 'lng': -117.15003}],
     'distance': 2079,
     'postalCode': '92101',
     'cc': 'US',
     'city': 'San Diego',
     'state': 'CA',
     'country': 'United States',
     'formattedAddress': ['2215 Pan American Rd E (in Balboa Park)',
      'San Diego, CA 92101']},
    'categories': [{'id': '4bf58dd8d48988d15a941735',
      'name': 'Garden',
      'pluralName': 'Gardens',
      'shortName': 'Garden',
      'icon': {'prefix': 'https:/

#### Get the relevant parts of the JSON file and convert to a pandas dataframe

In [8]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()



Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,...,location.country,location.formattedAddress,venuePage.id,location.neighborhood,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name
0,483d01cef964a52019501fe3,Japanese Friendship Garden,"[{'id': '4bf58dd8d48988d15a941735', 'name': 'G...",v-1609808972,False,2215 Pan American Rd E,in Balboa Park,32.730054,-117.149985,"[{'label': 'display', 'lat': 32.73005388762262...",...,United States,"[2215 Pan American Rd E (in Balboa Park), San ...",45465452.0,,,,,,,
1,4dcc330dd22d480bd645169b,Katsu Japanese Cuisine,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1609808972,False,4th Ave,,32.727264,-117.161042,"[{'label': 'display', 'lat': 32.7272640624101,...",...,United States,"[4th Ave, San Diego, CA 92101]",,Park West,,,,,,
2,5912fef1a6fe4d28dcf09b80,Japanese Auto Sales,"[{'id': '4eb1c1623b7b52c0e1adc2ec', 'name': 'A...",v-1609808972,False,4825 A El Cajon Blvd,,32.717219,-117.159705,"[{'label': 'display', 'lat': 32.7172188597113,...",...,United States,"[4825 A El Cajon Blvd, San Diego, CA 92115]",,,,,,,,
3,50651697e4b0d3feb102b8f7,Sora Italian Japanese Influenced Cuisine,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1609808972,False,655 W Broadway,Kettner,32.715151,-117.168581,"[{'label': 'display', 'lat': 32.715151, 'lng':...",...,United States,"[655 W Broadway (Kettner), San Diego, CA 92101]",,Central San Diego,,,,,,
4,5f93a630aecea61d0fa246ce,Gyu-Kaku Japanese BBQ,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1609808972,False,801 5th Ave,F St.,32.713734,-117.159976,"[{'label': 'display', 'lat': 32.713734, 'lng':...",...,United States,"[801 5th Ave (F St.), San Diego, CA 92101]",,,2339200.0,https://www.grubhub.com/restaurant/gyu-kaku-80...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png


In [9]:

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Japanese Friendship Garden,Garden,2215 Pan American Rd E,in Balboa Park,32.730054,-117.149985,"[{'label': 'display', 'lat': 32.73005388762262...",2079,92101.0,US,San Diego,CA,United States,"[2215 Pan American Rd E (in Balboa Park), San ...",,483d01cef964a52019501fe3
1,Katsu Japanese Cuisine,Japanese Restaurant,4th Ave,,32.727264,-117.161042,"[{'label': 'display', 'lat': 32.7272640624101,...",1238,92101.0,US,San Diego,CA,United States,"[4th Ave, San Diego, CA 92101]",Park West,4dcc330dd22d480bd645169b
2,Japanese Auto Sales,Auto Dealership,4825 A El Cajon Blvd,,32.717219,-117.159705,"[{'label': 'display', 'lat': 32.7172188597113,...",563,92115.0,US,San Diego,CA,United States,"[4825 A El Cajon Blvd, San Diego, CA 92115]",,5912fef1a6fe4d28dcf09b80
3,Sora Italian Japanese Influenced Cuisine,Japanese Restaurant,655 W Broadway,Kettner,32.715151,-117.168581,"[{'label': 'display', 'lat': 32.715151, 'lng':...",329,92101.0,US,San Diego,CA,United States,"[655 W Broadway (Kettner), San Diego, CA 92101]",Central San Diego,50651697e4b0d3feb102b8f7
4,Gyu-Kaku Japanese BBQ,Japanese Restaurant,801 5th Ave,F St.,32.713734,-117.159976,"[{'label': 'display', 'lat': 32.713734, 'lng':...",639,92101.0,US,San Diego,CA,United States,"[801 5th Ave (F St.), San Diego, CA 92101]",,5f93a630aecea61d0fa246ce
5,The Tea Pavillion at the Japanese Friendship G...,Tea Room,2215 Pan American Rd E,in Balboa Park,32.730113,-117.149981,"[{'label': 'display', 'lat': 32.73011346770401...",2084,92101.0,US,San Diego,CA,United States,"[2215 Pan American Rd E (in Balboa Park), San ...",Balboa Park,4ba00ebef964a5205e5637e3
6,Koi Japanese Restaurant,Japanese Restaurant,744 Market St,8th Ave,32.71165,-117.15773,"[{'label': 'display', 'lat': 32.71165, 'lng': ...",945,92101.0,US,San Diego,CA,United States,"[744 Market St (8th Ave), San Diego, CA 92101]",,48701183f964a52004511fe3
7,Yoshino Japanese Restaurant,Japanese Restaurant,1790 W Washington St,at India St,32.743662,-117.181338,"[{'label': 'display', 'lat': 32.74366238711516...",3323,92103.0,US,San Diego,CA,United States,"[1790 W Washington St (at India St), San Diego...",Mission Hills,44533d88f964a520ab321fe3
8,SanSai Japanese Grill,Japanese Restaurant,7710 Hazard Center Dr,at Frazee Rd,32.771254,-117.155877,"[{'label': 'display', 'lat': 32.77125422339901...",6125,92108.0,US,San Diego,CA,United States,"[7710 Hazard Center Dr (at Frazee Rd), San Die...",,4ad3e184f964a520c2e620e3
9,Oriental Treasure Box: Japanese Antiques & Arts,Antique Shop,2310 Kettner Blvd,,32.728222,-117.171661,"[{'label': 'display', 'lat': 32.72822189331055...",1382,92101.0,US,San Diego,CA,United States,"[2310 Kettner Blvd, San Diego, CA 92101]",,4ec813d09adf9c7bf3c4e6b0


In [116]:
dataframe_filtered.shape

(30, 16)

#### There are a number of results for 'Japanese' that are not restaurants.  Need to filter these results to only those which are restaurants

In [10]:
dataframe_restaurant = dataframe_filtered.loc[dataframe_filtered.categories.str.contains('Restaurant', na=False)]

In [11]:
dataframe_restaurant

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
1,Katsu Japanese Cuisine,Japanese Restaurant,4th Ave,,32.727264,-117.161042,"[{'label': 'display', 'lat': 32.7272640624101,...",1238,92101,US,San Diego,CA,United States,"[4th Ave, San Diego, CA 92101]",Park West,4dcc330dd22d480bd645169b
3,Sora Italian Japanese Influenced Cuisine,Japanese Restaurant,655 W Broadway,Kettner,32.715151,-117.168581,"[{'label': 'display', 'lat': 32.715151, 'lng':...",329,92101,US,San Diego,CA,United States,"[655 W Broadway (Kettner), San Diego, CA 92101]",Central San Diego,50651697e4b0d3feb102b8f7
4,Gyu-Kaku Japanese BBQ,Japanese Restaurant,801 5th Ave,F St.,32.713734,-117.159976,"[{'label': 'display', 'lat': 32.713734, 'lng':...",639,92101,US,San Diego,CA,United States,"[801 5th Ave (F St.), San Diego, CA 92101]",,5f93a630aecea61d0fa246ce
6,Koi Japanese Restaurant,Japanese Restaurant,744 Market St,8th Ave,32.71165,-117.15773,"[{'label': 'display', 'lat': 32.71165, 'lng': ...",945,92101,US,San Diego,CA,United States,"[744 Market St (8th Ave), San Diego, CA 92101]",,48701183f964a52004511fe3
7,Yoshino Japanese Restaurant,Japanese Restaurant,1790 W Washington St,at India St,32.743662,-117.181338,"[{'label': 'display', 'lat': 32.74366238711516...",3323,92103,US,San Diego,CA,United States,"[1790 W Washington St (at India St), San Diego...",Mission Hills,44533d88f964a520ab321fe3
8,SanSai Japanese Grill,Japanese Restaurant,7710 Hazard Center Dr,at Frazee Rd,32.771254,-117.155877,"[{'label': 'display', 'lat': 32.77125422339901...",6125,92108,US,San Diego,CA,United States,"[7710 Hazard Center Dr (at Frazee Rd), San Die...",,4ad3e184f964a520c2e620e3
11,Niban Japanese Cuisine,Sushi Restaurant,7081 Clairemont Mesa Blvd,at Shawline St,32.83155,-117.164777,"[{'label': 'display', 'lat': 32.83155002015418...",12768,92111,US,San Diego,CA,United States,"[7081 Clairemont Mesa Blvd (at Shawline St), S...",,4297b480f964a5206b241fe3
13,Rakitori - Japanese Pub & Grill,Ramen Restaurant,530 University Ave,Fifth,32.748467,-117.159972,"[{'label': 'display', 'lat': 32.74846692082438...",3559,92103,US,San Diego,CA,United States,"[530 University Ave (Fifth), San Diego, CA 92103]",,562c3f00498ece4c631a3eb4
15,Ai Sushi & Teriyaki,Asian Restaurant,1139 6th Ave,,32.717401,-117.1591,"[{'label': 'display', 'lat': 32.71740131905021...",621,92101,US,San Diego,CA,United States,"[1139 6th Ave, San Diego, CA 92101]",,4bd1ff0ccaff9521f2d2d1f0
16,Furasshu Japanese Cuisine,Japanese Restaurant,Blvd. Cuauhtemoc 10237 Zona Río,A Un Lado De Total Fitness,32.523962,-117.019051,"[{'label': 'display', 'lat': 32.52396194685493...",25497,22010,MX,Tijuana,Baja California,México,[Blvd. Cuauhtemoc 10237 Zona Río (A Un Lado De...,,5029544fe4b00bbc7b40d9f8


In [12]:
dataframe_restaurant.shape

(16, 16)

#### Vizualize the location of the restaurants

In [13]:
dataframe_restaurant.postalCode.unique()

array(['92101', '92103', '92108', '92111', '22010', '92110', '91945',
       '91950', '92071'], dtype=object)

In [16]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11) # generate map centred around Downtown San Diego

# add a red circle marker to represent downtown San Diego
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Downtown',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Japanese restaurants as blue circle markers
for lat, lng, label in zip(dataframe_restaurant.lat, dataframe_restaurant.lng, dataframe_restaurant.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### I want to identify different clusters based on Japanese restaurant densities, so I will run clustering on the data to identify the location of areas with fewer Japanese restaurants.  

### Clustering - Add an integer column

In [22]:
# Need to update the df to include integer data for the postal codes in order to use clustering
dataframe_restaurant['Index']=dataframe_restaurant['postalCode'].replace(to_replace=['92101','92103','92108','92111','91945','22010','92110','91950','92071'],value=[1,2,3,4,5,6,7,8,9],inplace=False)
dataframe_restaurant.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id,Index
1,Katsu Japanese Cuisine,Japanese Restaurant,4th Ave,,32.727264,-117.161042,"[{'label': 'display', 'lat': 32.7272640624101,...",1238,92101,US,San Diego,CA,United States,"[4th Ave, San Diego, CA 92101]",Park West,4dcc330dd22d480bd645169b,1
3,Sora Italian Japanese Influenced Cuisine,Japanese Restaurant,655 W Broadway,Kettner,32.715151,-117.168581,"[{'label': 'display', 'lat': 32.715151, 'lng':...",329,92101,US,San Diego,CA,United States,"[655 W Broadway (Kettner), San Diego, CA 92101]",Central San Diego,50651697e4b0d3feb102b8f7,1
4,Gyu-Kaku Japanese BBQ,Japanese Restaurant,801 5th Ave,F St.,32.713734,-117.159976,"[{'label': 'display', 'lat': 32.713734, 'lng':...",639,92101,US,San Diego,CA,United States,"[801 5th Ave (F St.), San Diego, CA 92101]",,5f93a630aecea61d0fa246ce,1
6,Koi Japanese Restaurant,Japanese Restaurant,744 Market St,8th Ave,32.71165,-117.15773,"[{'label': 'display', 'lat': 32.71165, 'lng': ...",945,92101,US,San Diego,CA,United States,"[744 Market St (8th Ave), San Diego, CA 92101]",,48701183f964a52004511fe3,1
7,Yoshino Japanese Restaurant,Japanese Restaurant,1790 W Washington St,at India St,32.743662,-117.181338,"[{'label': 'display', 'lat': 32.74366238711516...",3323,92103,US,San Diego,CA,United States,"[1790 W Washington St (at India St), San Diego...",Mission Hills,44533d88f964a520ab321fe3,2


In [23]:
#Import Libraries for the map
!pip install geopy
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import numpy as np 



In [24]:
#coordinates of San Diego to begin the clustering

address = 'San Diego'
geolocator = Nominatim(user_agent="SD_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of San Diego are {latitude}, {longitude}.')


The geograpical coordinate of San Diego are 32.7174202, -117.1627728.


In [25]:
#set the cluster number as the Index number created above
kclusters=len(dataframe_restaurant.Index.unique())

#create map of San Diego
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters of boroughs in Toronto
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add the zip code clusters to the map of San Diego
markers_colors = []
for lat, lon, cluster in zip(dataframe_restaurant['lat'], dataframe_restaurant['lng'], dataframe_restaurant['Index']):
    label = folium.Popup(str(dataframe_restaurant['postalCode']) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

In [26]:
map_clusters

### Based on clustering, there are 8 distinct areas, with the downtown area more densly populated.  New Japanese restaurants should be excluded from area code = 92101

## Will install data for median, mean household incomes in the US to compare to the Japanese restaurant distribution

In [27]:
#Install household income by zipcode
!wget -q -O 'income' https://www.psc.isr.umich.edu/dis/census/Features/tract2zip/MedianZIP-3.xlsx
print('Data downloaded!')

Data downloaded!


In [28]:
# read the excel file and convert into a dataframe object 
df_income = pd.DataFrame(pd.read_excel("income"))

In [29]:
df_income.head()

Unnamed: 0,Zip,Median,Mean,Pop
0,1001,56662.5735,66687.8,16445
1,1002,49853.4177,75062.6,28069
2,1003,28462.0,35121.0,8491
3,1005,75423.0,82442.0,4798
4,1007,79076.354,85802.0,12962


### Add the Median income to the restaurants dataframe

In [30]:
#Convert the column name postalCode to Zip so that I can perform a join between both datasets
dataframe_restaurant = dataframe_restaurant.rename(columns = {"postalCode":"Zip"})
dataframe_restaurant.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,Zip,cc,city,state,country,formattedAddress,neighborhood,id,Index
1,Katsu Japanese Cuisine,Japanese Restaurant,4th Ave,,32.727264,-117.161042,"[{'label': 'display', 'lat': 32.7272640624101,...",1238,92101,US,San Diego,CA,United States,"[4th Ave, San Diego, CA 92101]",Park West,4dcc330dd22d480bd645169b,1
3,Sora Italian Japanese Influenced Cuisine,Japanese Restaurant,655 W Broadway,Kettner,32.715151,-117.168581,"[{'label': 'display', 'lat': 32.715151, 'lng':...",329,92101,US,San Diego,CA,United States,"[655 W Broadway (Kettner), San Diego, CA 92101]",Central San Diego,50651697e4b0d3feb102b8f7,1
4,Gyu-Kaku Japanese BBQ,Japanese Restaurant,801 5th Ave,F St.,32.713734,-117.159976,"[{'label': 'display', 'lat': 32.713734, 'lng':...",639,92101,US,San Diego,CA,United States,"[801 5th Ave (F St.), San Diego, CA 92101]",,5f93a630aecea61d0fa246ce,1
6,Koi Japanese Restaurant,Japanese Restaurant,744 Market St,8th Ave,32.71165,-117.15773,"[{'label': 'display', 'lat': 32.71165, 'lng': ...",945,92101,US,San Diego,CA,United States,"[744 Market St (8th Ave), San Diego, CA 92101]",,48701183f964a52004511fe3,1
7,Yoshino Japanese Restaurant,Japanese Restaurant,1790 W Washington St,at India St,32.743662,-117.181338,"[{'label': 'display', 'lat': 32.74366238711516...",3323,92103,US,San Diego,CA,United States,"[1790 W Washington St (at India St), San Diego...",Mission Hills,44533d88f964a520ab321fe3,2


Need to convert datatypes so that I can complete the join on Zip code (all need to be integers)

In [31]:
dataframe_restaurant.dtypes

name                 object
categories           object
address              object
crossStreet          object
lat                 float64
lng                 float64
labeledLatLngs       object
distance              int64
Zip                  object
cc                   object
city                 object
state                object
country              object
formattedAddress     object
neighborhood         object
id                   object
Index                 int64
dtype: object

In [32]:
dataframe_restaurant['Zip']= dataframe_restaurant['Zip'].astype(str).astype(int)

In [33]:
dataframe_restaurant.dtypes

name                 object
categories           object
address              object
crossStreet          object
lat                 float64
lng                 float64
labeledLatLngs       object
distance              int64
Zip                   int64
cc                   object
city                 object
state                object
country              object
formattedAddress     object
neighborhood         object
id                   object
Index                 int64
dtype: object

In [35]:
#Join on postal Code
#Merge data frames

df_restaurant = pd.merge(dataframe_restaurant, df_income, on = 'Zip')
df_restaurant.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,Zip,cc,city,state,country,formattedAddress,neighborhood,id,Index,Median,Mean,Pop
0,Katsu Japanese Cuisine,Japanese Restaurant,4th Ave,,32.727264,-117.161042,"[{'label': 'display', 'lat': 32.7272640624101,...",1238,92101,US,San Diego,CA,United States,"[4th Ave, San Diego, CA 92101]",Park West,4dcc330dd22d480bd645169b,1,39102.091,64852.7,32507
1,Sora Italian Japanese Influenced Cuisine,Japanese Restaurant,655 W Broadway,Kettner,32.715151,-117.168581,"[{'label': 'display', 'lat': 32.715151, 'lng':...",329,92101,US,San Diego,CA,United States,"[655 W Broadway (Kettner), San Diego, CA 92101]",Central San Diego,50651697e4b0d3feb102b8f7,1,39102.091,64852.7,32507
2,Gyu-Kaku Japanese BBQ,Japanese Restaurant,801 5th Ave,F St.,32.713734,-117.159976,"[{'label': 'display', 'lat': 32.713734, 'lng':...",639,92101,US,San Diego,CA,United States,"[801 5th Ave (F St.), San Diego, CA 92101]",,5f93a630aecea61d0fa246ce,1,39102.091,64852.7,32507
3,Koi Japanese Restaurant,Japanese Restaurant,744 Market St,8th Ave,32.71165,-117.15773,"[{'label': 'display', 'lat': 32.71165, 'lng': ...",945,92101,US,San Diego,CA,United States,"[744 Market St (8th Ave), San Diego, CA 92101]",,48701183f964a52004511fe3,1,39102.091,64852.7,32507
4,Ai Sushi & Teriyaki,Asian Restaurant,1139 6th Ave,,32.717401,-117.1591,"[{'label': 'display', 'lat': 32.71740131905021...",621,92101,US,San Diego,CA,United States,"[1139 6th Ave, San Diego, CA 92101]",,4bd1ff0ccaff9521f2d2d1f0,1,39102.091,64852.7,32507


The median household income of the US in 2019 was $68,703.  The zip codes for new Japanese restaurants need to have a mean income higher than this.  https://www.census.gov/library/publications/2020/demo/p60-270.html#:~:text=Median%20household%20income%20was%20%2468%2C703,and%20Table%20A%2D1).

In [36]:
df_filtered = df_restaurant[df_restaurant['Mean'] > 68703]

In [37]:
df_filtered.Zip.unique()

array([92103, 92108, 92110, 92071])

In [38]:
df_filtered.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,Zip,cc,city,state,country,formattedAddress,neighborhood,id,Index,Median,Mean,Pop
5,Yoshino Japanese Restaurant,Japanese Restaurant,1790 W Washington St,at India St,32.743662,-117.181338,"[{'label': 'display', 'lat': 32.74366238711516...",3323,92103,US,San Diego,CA,United States,"[1790 W Washington St (at India St), San Diego...",Mission Hills,44533d88f964a520ab321fe3,2,60669.8772,85051.3,31046
6,Rakitori - Japanese Pub & Grill,Ramen Restaurant,530 University Ave,Fifth,32.748467,-117.159972,"[{'label': 'display', 'lat': 32.74846692082438...",3559,92103,US,San Diego,CA,United States,"[530 University Ave (Fifth), San Diego, CA 92103]",,562c3f00498ece4c631a3eb4,2,60669.8772,85051.3,31046
7,SanSai Japanese Grill,Japanese Restaurant,7710 Hazard Center Dr,at Frazee Rd,32.771254,-117.155877,"[{'label': 'display', 'lat': 32.77125422339901...",6125,92108,US,San Diego,CA,United States,"[7710 Hazard Center Dr (at Frazee Rd), San Die...",,4ad3e184f964a520c2e620e3,3,63885.2291,74585.2,18858
8,Fuji Japanese Steakhouse & Sushi,Japanese Restaurant,911 Camino del Rio S,,32.763673,-117.155412,"[{'label': 'display', 'lat': 32.76367258150530...",5300,92108,US,San Diego,CA,United States,"[911 Camino del Rio S, San Diego, CA 92108]",,4ac82360f964a520c9bb20e3,3,63885.2291,74585.2,18858
9,Osaka Japanese Food & Sushi,Sushi Restaurant,4242 Camino del Rio N Ste 26,,32.780261,-117.103173,"[{'label': 'display', 'lat': 32.7802613750425,...",9170,92108,US,San Diego,CA,United States,"[4242 Camino del Rio N Ste 26, San Diego, CA 9...",,4aa56694f964a520304820e3,3,63885.2291,74585.2,18858


The filtering on US household income also excluded the zip code = 92101 which I was going to exclude anyways since it has a densly populated area of Japanese restaurants

### Lastly, will filter the results to zip codes with a minimum population of at least 20,000 inhabitants so the restaurant has enough potential customers

In [39]:
df_zip = df_filtered[df_filtered['Pop'] > 20000]

In [40]:
df_zip.Zip.unique()

array([92103, 92110, 92071])

In [41]:
df_zip.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,Zip,cc,city,state,country,formattedAddress,neighborhood,id,Index,Median,Mean,Pop
5,Yoshino Japanese Restaurant,Japanese Restaurant,1790 W Washington St,at India St,32.743662,-117.181338,"[{'label': 'display', 'lat': 32.74366238711516...",3323,92103,US,San Diego,CA,United States,"[1790 W Washington St (at India St), San Diego...",Mission Hills,44533d88f964a520ab321fe3,2,60669.8772,85051.3,31046
6,Rakitori - Japanese Pub & Grill,Ramen Restaurant,530 University Ave,Fifth,32.748467,-117.159972,"[{'label': 'display', 'lat': 32.74846692082438...",3559,92103,US,San Diego,CA,United States,"[530 University Ave (Fifth), San Diego, CA 92103]",,562c3f00498ece4c631a3eb4,2,60669.8772,85051.3,31046
11,Buta Japanese Ramen,Ramen Restaurant,5201 Linda Vista Rd Ste 103,,32.765363,-117.196404,"[{'label': 'display', 'lat': 32.765363, 'lng':...",6117,92110,US,San Diego,CA,United States,"[5201 Linda Vista Rd Ste 103, San Diego, CA 92...",,5d0bf2a5e937f3002306f505,7,56962.9362,71842.8,25341
14,Oishii Japanese Thai Restaurant,Japanese Restaurant,10251 Mast Blvd,,32.854823,-116.973405,"[{'label': 'display', 'lat': 32.85482333796696...",23659,92071,US,Santee,CA,United States,"[10251 Mast Blvd, Santee, CA 92071]",,4b622ba4f964a520fa392ae3,9,70200.987,78183.5,53422


# Results

In [121]:
#Initial data pull Zip codes
dataframe_filtered.postalCode.unique()

array(['92101', '92115', '92103', '92108', nan, '92111', '92102', '92109',
       '22010', '92131', '92119', '92110', '91945', '91950', '92121',
       '92071'], dtype=object)

In [123]:
#Filtered Zip codes to only restaurants
dataframe_restaurant.Zip.unique()

array([92101, 92103, 92108, 92111, 22010, 92110, 91945, 91950, 92071])

In [125]:
#Filtered Zip codes to only average household income > US average & excluding downtown San Diego
df_filtered.Zip.unique()

array([92103, 92108, 92110, 92071])

In [126]:
#Final results for population densities only above 20,000
df_zip.Zip.unique()

array([92103, 92110, 92071])

# Discussion 

The original FourSquare pull resulted in 30 different locations to analyze in 15 zip codes, however many of them were not actually restaurants.  This is due to the fact that the FourSquare query was only for 'Japanese' without any qualifier.  After this, the results were subsequently filtered to only include data with a 'Restaurant' tag. This brought the total to 16 locations in 9 zip codes. I ran the analysis on these 16 locations. 

I ran clustering on these to identify locations with high density of Japanese restaurants and then filtered out that zip code (92101). 

I then merged this data with census data looking at mean and population data by zip code.  After comparing mean household income by zip code in San Diego to the US average, the data was reduced to 4 zip codes. 

I then looked for zip codes with a population density of 20,000 or more.  This brought the final zip codes to 3 in total: 92103, 92110, 92071.

### Map of San Diego with restaurants highlighted in suitable Zip Codes

In [129]:
#Map of San Diego
SDmap = folium.Map(location = [32.743662, -117.181338], zoom_start=13)

# add the restaurants in Zip codes where more restaurants could be opened
for lat, lng, label in zip(df_zip.lat, df_zip.lng, df_zip.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=50,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(SDmap)

SDmap

# Conclusion

After considering the location of Japanese restaurants in San Diego, average household income by zip code and population densisty > 20,000 residents, there are 3 zip codes in San Diego that would be good locations for a new Japanese restaurant.

THE END!