# The Battle of the Neighborhoods part 2
### Applied Data Science Capstone by Andrew Holman

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>
This project is aimed to recommend a new location based on zip code for a restaurant type that is already present in Orange County. This will be done by gathering a list of all current restaurants in the area. From this list, a venue category can be picked to analyze. Ratings of restaurants in that venue category will be retrieved and used to create a ranking system to find the best city to build a new restaurant of the same type. Then, all the zip codes within the city will be listed as possible new locations for the restaurant.

In [1]:
#imports

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from bs4 import BeautifulSoup #BeautifulSoup

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from os import environ

print('Libraries imported.')

Libraries imported.


## Data <a name="data"></a>
First, we must gather location data about Orange County. It will be taken from the opendatasoft website. A link is provided in part 1. We can then gather location of restaurants and important information, such as rating, using the Foursquare API. The location of Orange County will be generated algorithmically and approximate addresses of the center will be obtained using **Google Maps API reverse geocoding**. We will also be using folium and geospatial data in order to create choropleth maps to show the frequency of restaurants within a city's limits.

In [2]:
#data set of all US zip codes
df = pd.read_csv('us-zip-code-latitude-and-longitude.csv', sep=';')

#obtain only CA zip codes
df = df[df['State'] == 'CA']

#Restrict to only Orange County cities
df = df[(df['City'] == 'Aliso Viejo') | 
        (df['City'] == 'Anaheim') | 
        (df['City'] == 'Brea') | 
        (df['City'] == 'Buena Park') | 
        (df['City'] == 'Costa Mesa') | 
        (df['City'] == 'Cypress') | 
        (df['City'] == 'Dana Point') | 
        (df['City'] == 'Fountain Valley') | 
        (df['City'] == 'Fullerton') | 
        (df['City'] == 'Garden Grove') | 
        (df['City'] == 'Huntington Beach') | 
        (df['City'] == 'Irvine') | 
        (df['City'] == 'La Habra') | 
        (df['City'] == 'La Palma') | 
        (df['City'] == 'Laguna Beach') | 
        (df['City'] == 'Laguna Hills') | 
        (df['City'] == 'Laguna Niguel') | 
        (df['City'] == 'Laguna Woods') | 
        (df['City'] == 'Lake Forest') | 
        (df['City'] == 'Los Alamitos') | 
        (df['City'] == 'Mission Viejo') | 
        (df['City'] == 'Newport Beach') | 
        (df['City'] == 'Orange') | 
        (df['City'] == 'Placentia') | 
        (df['City'] == 'Rancho Santa Margarita') | 
        (df['City'] == 'San Clemente') | 
        (df['City'] == 'San Juan Capistrano') | 
        (df['City'] == 'Santa Ana') | 
        (df['City'] == 'Seal Beach') | 
        (df['City'] == 'Stanton') | 
        (df['City'] == 'Tustin') | 
        (df['City'] == 'Villa Park') | 
        (df['City'] == 'Westminster') | 
        (df['City'] == 'Yorba Linda') ]

df.reset_index(inplace=True, drop=True)
df = df[['Zip', 'City', 'Latitude', 'Longitude']]
df.head(12)

Unnamed: 0,Zip,City,Latitude,Longitude
0,92709,Irvine,33.640302,-117.769442
1,92693,San Juan Capistrano,33.555323,-117.564
2,92837,Fullerton,33.640302,-117.769442
3,92840,Garden Grove,33.785166,-117.93406
4,92856,Orange,33.640302,-117.769442
5,92616,Irvine,33.640302,-117.769442
6,92801,Anaheim,33.844814,-117.95381
7,92841,Garden Grove,33.786915,-117.98224
8,92604,Irvine,33.68762,-117.78852
9,92815,Anaheim,33.640302,-117.769442


In [3]:
address = 'Orange County, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Orange County are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Orange County are 33.7500378, -117.8704931.


We create a map of Orange County with the locations of all the zip code addresses.

In [4]:
# create map of Orange County using latitude and longitude values
map_OC = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city, zipCode in zip(df['Latitude'], df['Longitude'], df['City'], df['Zip']):
    label = '{}, {}'.format(city, zipCode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_OC)  
    
map_OC

The credentials and version are stored here for using the Foursquare API.

In [25]:
CLIENT_ID = environ.get('FOURSQUARE_CLIENT_ID') # your Foursquare ID
CLIENT_SECRET = environ.get('FOURSQUARE_CLIENT_SECRET') # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentials:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

We create a function in order to make a dataframe of all the nearby venues to each zip code location within a 2km radius. Then, the function is used to create the dataframe for all the zip codes in Orange County. From this, we can group the data both by zip code and city to see where more restaurants lie.

In [7]:
catId = '4d4b7105d754a06374d81259' # food category
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius

def getNearbyVenues(names, zipcodes, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, zipcode, lat, lng in zip(names, zipcodes, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            catId,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            zipcode,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City',
                  'Zip',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue',
                  'Venue ID',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [8]:
OC_venues = getNearbyVenues(names=df['City'],
                            zipcodes=df['Zip'],
                            latitudes=df['Latitude'],
                            longitudes=df['Longitude']
                            )


Irvine
San Juan Capistrano
Fullerton
Garden Grove
Orange
Irvine
Anaheim
Garden Grove
Irvine
Anaheim
Brea
Garden Grove
Placentia
Fullerton
Yorba Linda
Cypress
Westminster
Huntington Beach
Santa Ana
Villa Park
Irvine
Costa Mesa
San Juan Capistrano
Orange
Fountain Valley
Irvine
Fullerton
Newport Beach
Huntington Beach
Lake Forest
Anaheim
Santa Ana
Santa Ana
Anaheim
Orange
Santa Ana
Buena Park
Santa Ana
Rancho Santa Margarita
Costa Mesa
Huntington Beach
Los Alamitos
Garden Grove
Anaheim
San Clemente
Huntington Beach
Westminster
Huntington Beach
Newport Beach
Fullerton
Anaheim
Santa Ana
Irvine
Anaheim
La Habra
Stanton
Newport Beach
Garden Grove
Brea
San Clemente
Santa Ana
Irvine
Tustin
Anaheim
Aliso Viejo
Anaheim
Anaheim
Costa Mesa
Orange
Laguna Niguel
San Clemente
Fountain Valley
Fullerton
Laguna Beach
Orange
Dana Point
Garden Grove
Laguna Hills
Orange
Westminster
Irvine
Buena Park
Tustin
Irvine
Laguna Niguel
Irvine
Placentia
Garden Grove
Fullerton
Orange
Seal Beach
Mission Viejo
Anaheim
M

In [9]:
print(OC_venues.shape)
OC_venues.head()

(891, 9)


Unnamed: 0,City,Zip,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
0,Irvine,92709,33.640302,-117.769442,Solaria,5337b91d498e75abcd5eff08,33.641483,-117.7728,Café
1,Irvine,92709,33.640302,-117.769442,Zafferano coffee bar,52c6a940498ea6e7b1710fdc,33.641483,-117.7728,Italian Restaurant
2,Irvine,92709,33.640302,-117.769442,Marie Callendar's,4f324dcf19836c91c7ca7dff,33.643339,-117.772006,Food
3,Fullerton,92837,33.640302,-117.769442,Solaria,5337b91d498e75abcd5eff08,33.641483,-117.7728,Café
4,Fullerton,92837,33.640302,-117.769442,Zafferano coffee bar,52c6a940498ea6e7b1710fdc,33.641483,-117.7728,Italian Restaurant


In [10]:
OC_venues[['Zip', 'Venue']].groupby('Zip').count()

Unnamed: 0_level_0,Venue
Zip,Unnamed: 1_level_1
90620,2
90621,9
90622,3
90623,9
90624,3
90630,8
90631,5
90632,3
90633,3
90680,5


In [11]:
venue_count = OC_venues[['City', 'Venue Category']].groupby('City').count().rename(columns={'Venue Category' : 'Count'}).sort_values(['Count'], ascending=False)
#OC_venues[(OC_venues['Venue Category'] == 'Food') & (OC_venues['Venue'] != 'Marie Callendar\'s')]
venue_count.reset_index(inplace=True)
venue_count['City'] = venue_count['City'].str.upper()
venue_count

Unnamed: 0,City,Count
0,ANAHEIM,120
1,IRVINE,97
2,TUSTIN,74
3,SANTA ANA,67
4,SAN CLEMENTE,65
5,NEWPORT BEACH,64
6,FULLERTON,52
7,GARDEN GROVE,45
8,ORANGE,43
9,SAN JUAN CAPISTRANO,38


#### The GeoJSON file for Orange County is downloaded

In [12]:
!wget --quiet https://opendata.arcgis.com/datasets/60119fce76d74dc08c3aa455f34b2b4d_0.geojson -O orange_county.json

print('GeoJSON file downloaded!')

GeoJSON file downloaded!


A choropleth map is created to show the relative frequency of venues for each city and the locations of the zip codes are also displayed.

In [13]:
OC_geo = r'orange_county.json' # geojson file

# create a plain world map
OC_map = folium.Map(location=[latitude, longitude], zoom_start=10)

OC_map.choropleth(
    geo_data=OC_geo,
    data=venue_count,
    columns=['City', 'Count'],
    key_on = 'feature.properties.CITY', 
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Orange County'
)

# add markers to map
for lat, lng, city, zipCode in zip(df['Latitude'], df['Longitude'], df['City'], df['Zip']):
    label = '{}, {}'.format(city, zipCode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(OC_map)  

# display map
OC_map

A dataframe is used to show the frequency of the different venue categories in Orange County.

In [14]:
venue_types = OC_venues[['Venue Category', 'Venue']].groupby('Venue Category').count().rename(columns={'Venue' : 'Count'}).sort_values(['Count'], ascending=False)
venue_types.reset_index(inplace=True)
venue_types

Unnamed: 0,Venue Category,Count
0,Food,72
1,Café,69
2,Italian Restaurant,67
3,Mexican Restaurant,63
4,Fast Food Restaurant,61
5,Pizza Place,51
6,American Restaurant,48
7,Bakery,35
8,Sandwich Place,32
9,Korean Restaurant,26


## Methodology <a name="methodology"></a>
This project's main focus is to find the best location to start a new restaurant based on information obtained for Foursquare. We will mainly be looking at the ratings and the frequency at which they appear in the different cities.

In our first step, we collected data on location and type of every venue within 2km from each zip code geospatial coordinate and identified all the venues that are restaurants.

The second step is to choose a restaurant type from the list above to analyze. For an example, we will be using Sushi Restaurant. With this, we can create a choropleth map to visualize what locations have higher density of sushi restaurants and where they are located using their geospatial data gathered from the Foursquare API.

Finally, we will be creating a ranking system in order to find the best city to start a new sushi restaurant in. This will be done using the rating data gathered from the Foursquare API. In order to find the best restuarant, we will be looking at two different factors: 
* The number of venues that have an above average rating in the city
* The average rating of all the locations within the city.
With the best city chosen, we can then generate a list of all zip codes within that city.

## Analysis <a name="analysis"></a>
We can now choose which type of venue we want(Sushi Restaurant for our example) to analyze and a dataframe of all the locations and the city they are in will be generated.

In [15]:
# Obtaining venue categories into an array of strings instead of objects
ObjArray = venue_types[['Venue Category']].to_numpy()
venue_array=[]
for i in ObjArray:
    venue_array = np.append(ObjArray, i[0])

    
print("Enter a venue category listed above")    

flag=0
while flag == 0:
    user_input = input()
    if(user_input in venue_array):
        print('The venue you chose is: ' + user_input)
        flag = 1
    else:
        print('Either the venue you chose is not listed or the venue may not have been spelled correctly. Keep in mind this is case sensitive.')
    




selected_venues = OC_venues[(OC_venues['Venue Category'] == user_input)].reset_index(drop=True)
selected_count = selected_venues[['City', 'Venue']].groupby('City').count().rename(columns={'Venue' : 'Count'}).sort_values(['Count'], ascending=False)
selected_count.reset_index(inplace=True)
selected_count['City'] = selected_count['City'].str.upper()
selected_count

Enter a venue category listed above
Sushi Restaurant
The venue you chose is: Sushi Restaurant


Unnamed: 0,City,Count
0,SAN CLEMENTE,7
1,TUSTIN,5
2,FULLERTON,2
3,IRVINE,2
4,COSTA MESA,1
5,FOUNTAIN VALLEY,1
6,HUNTINGTON BEACH,1
7,LAGUNA BEACH,1
8,NEWPORT BEACH,1
9,SAN JUAN CAPISTRANO,1


We can now create a Choropleth map to visualize where the locations are at. The locations of the restaurants are also shown as well.

In [16]:
# create a plain world map
selected_map = folium.Map(location=[latitude, longitude], zoom_start=10)

selected_map.choropleth(
    geo_data=OC_geo,
    data=selected_count,
    columns=['City', 'Count'],
    key_on = 'feature.properties.CITY', 
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Orange County'
)

# add markers to map
for lat, lng, city, zipCode in zip(selected_venues['Venue Latitude'], selected_venues['Venue Longitude'], selected_venues['City'], selected_venues['Zip']):
    label = '{}, {}'.format(city, zipCode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1.0,
        parse_html=False).add_to(selected_map)  

# display map
selected_map

Now we will be creating an array of the ratings for each of the venues and appending to the dataframe of the locations. If there is no rating, then NaN will be added.

In [17]:
venues_ids= selected_venues['Venue ID']
ratings=[]
for venue_id in venues_ids.values.tolist():
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
                venue_id, 
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION)
    
    result = requests.get(url).json()
    try:
        venues_rating=result['response']['venue']['rating']
        ratings=np.append(ratings, venues_rating)
    except KeyError:
        ratings = np.append(ratings, np.nan)
ratings

array([7.3, nan, nan, 6.7, nan, nan, nan, 6.9, 9. , 6.8, 5.9, nan, nan,
       7.8, 7.1, 7.3, nan, nan, 7.7, 7.9, 5.5, 8. ])

In [18]:
selected_venue_df = selected_venues.assign(Rating = ratings)
selected_venue_df

Unnamed: 0,City,Zip,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Rating
0,San Juan Capistrano,92675,33.500843,-117.65866,Oeeshi Japanese Grill,4af62fd6f964a5200a0222e3,33.497791,-117.660919,Sushi Restaurant,7.3
1,Fountain Valley,92708,33.708618,-117.95629,Duo Sushi,4dee9aea7d8be635ea67ed68,33.710296,-117.954844,Sushi Restaurant,
2,Huntington Beach,92649,33.720017,-118.04614,Shogun Sushi,4b842fc4f964a5204a2631e3,33.721658,-118.042222,Sushi Restaurant,
3,Irvine,92623,33.686519,-117.830788,Kura Sushi Bar,4ab19cd4f964a520386a20e3,33.688542,-117.833569,Sushi Restaurant,6.7
4,Irvine,92623,33.686519,-117.830788,Kiku Sushi,57c9dbde498e0875ec2dcd57,33.688826,-117.834683,Sushi Restaurant,
5,San Clemente,92674,33.438428,-117.623131,Nobu Sushi,548de7a4498edc77fcf008b6,33.437515,-117.62458,Sushi Restaurant,
6,San Clemente,92674,33.438428,-117.623131,Nobu Sushi,4c255b28f7ced13acb85246d,33.437275,-117.624895,Sushi Restaurant,
7,Costa Mesa,92627,33.647028,-117.91506,Sushi Wave,4ba04360f964a520ea6437e3,33.650178,-117.911336,Sushi Restaurant,6.9
8,San Clemente,92672,33.427078,-117.61401,9 Style Sushi,4b6b9fe7f964a5201f132ce3,33.426197,-117.612431,Sushi Restaurant,9.0
9,San Clemente,92672,33.427078,-117.61401,Sushi Taka-O,4bc13b28f8219c748e28b310,33.429774,-117.614935,Sushi Restaurant,6.8


To make the data easier to analyze, we can clean up the data by removing all the NaN values and grouping by city. This will allow us to see how many restaurants with valid ratings are in each city

In [19]:
clean_venues = selected_venue_df.dropna()

#Check if clean_venues is empty after getting rid of NaN values
if clean_venues.empty:
    print('There are no venues of this type that have ratings. Thus, there is not enough information to properly address the problem. \n It is recommended to choose a different venue category.')

clean_count = clean_venues[['City', 'Rating']].groupby('City').count().rename(columns={'Rating' : 'Count'}).sort_values(['Count'], ascending=False)
clean_count.reset_index(inplace=True)
clean_count['City'] = clean_count['City'].str.upper()

clean_count

Unnamed: 0,City,Count
0,SAN CLEMENTE,3
1,TUSTIN,3
2,FULLERTON,2
3,COSTA MESA,1
4,IRVINE,1
5,LAGUNA BEACH,1
6,NEWPORT BEACH,1
7,SAN JUAN CAPISTRANO,1


Here we can create a choropleth map of the remaining locations to visualize which areas have more of this type of venue.

In [20]:
# create a plain world map
selected_map = folium.Map(location=[latitude, longitude], zoom_start=10)

selected_map.choropleth(
    geo_data=OC_geo,
    data=clean_count,
    columns=['City', 'Count'],
    key_on = 'feature.properties.CITY', 
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Orange County'
)

# add markers to map
for lat, lng, city, zipCode in zip(clean_venues['Venue Latitude'], clean_venues['Venue Longitude'], clean_venues['City'], clean_venues['Zip']):
    label = '{}, {}'.format(city, zipCode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1.0,
        parse_html=False).add_to(selected_map)  

# display map
selected_map

We can calculate the average rating over all the locations to have a reference point on how each individual restaurant is performing compared to all.

In [21]:
avg_Selected_Rating= round(selected_venue_df['Rating'].mean(), 2)
avg_Selected_Rating

7.22

### Recommendation System
This is where we create the ranking system to find the best location for a new restaurant based on the performance of known restaurants. The ranking system takes into account two factors: 
* The number of venues that have an above average rating in the city
* The average rating of all the locations within the city.

These values are normalized using min-max normalization and the two numbers are averaged. The higher the number, the better it is to be recommended. The values of the final score will range from 0 to 1 where 0 is not likely to recommend and 1 is highly likely to recommend.


In [22]:
#Averge rating per city
rating_avg = selected_venue_df[['City', 'Rating']].groupby('City').mean().rename(columns={'Rating' : 'Rating Average'})

#Number of rated venues per city
rating_count = selected_venue_df[['City', 'Rating']].groupby('City').count().rename(columns={'Rating' : 'Number of venues with ratings'})

#Merging both dataframes into one and dropping NaN rows
rating_df = pd.merge(rating_avg, rating_count, on=['City']).dropna()

#Number of venues greater than overall average rating
abv_avg_rating = selected_venue_df[['City', 'Rating']][(selected_venue_df['Rating'] > avg_Selected_Rating)].groupby('City').count().rename(columns={'Rating' : 'Number of venues with ratings above average'})

#Merging dataframe with raitng_df
rating_df = pd.merge(rating_df, abv_avg_rating, on=['City'], how='left').fillna(0)

#Calculating Normalization of venues are above average and appending to dataframe
rating_df['Normalized venues above average'] = ((rating_df['Number of venues with ratings above average'] - rating_df['Number of venues with ratings above average'].min()) / (rating_df['Number of venues with ratings above average'].max() - rating_df['Number of venues with ratings above average'].min()))

#Calculating Normalization of Rating Average and appending to dataframe
rating_df['Normalized rating average'] = ((rating_df['Rating Average'] - rating_df['Rating Average'].min()) / (rating_df['Rating Average'].max() - rating_df['Rating Average'].min()))

rating_df

Unnamed: 0_level_0,Rating Average,Number of venues with ratings,Number of venues with ratings above average,Normalized venues above average,Normalized rating average
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Costa Mesa,6.9,1,0.0,0.0,0.153846
Fullerton,6.7,2,1.0,0.5,0.0
Irvine,6.7,1,0.0,0.0,0.0
Laguna Beach,8.0,1,1.0,0.5,1.0
Newport Beach,7.7,1,1.0,0.5,0.769231
San Clemente,7.233333,3,1.0,0.5,0.410256
San Juan Capistrano,7.3,1,1.0,0.5,0.461538
Tustin,7.4,3,2.0,1.0,0.538462


In [23]:
final_rating_score = ( (rating_df['Normalized venues above average'] + rating_df['Normalized rating average']) / 2).to_frame()
final_rating_score.columns = ['Final Score']
final_rating_score = final_rating_score.sort_values(['Final Score'], ascending=False)
final_rating_score

Unnamed: 0_level_0,Final Score
City,Unnamed: 1_level_1
Tustin,0.769231
Laguna Beach,0.75
Newport Beach,0.634615
San Juan Capistrano,0.480769
San Clemente,0.455128
Fullerton,0.25
Costa Mesa,0.076923
Irvine,0.0


## Results and Discussion <a name="results"></a>
From our recommendation system, we found that the best location for a new sushi restaurant is Tustin with a final score of about 0.78. We also can see that Laguna Beach is close behind with 0.75 and Newport Beach is 0.63. With this information we can conclude that the sushi restaurants in Tustin perform the highest, but we can also see that there is a higher frequency of sushi restaurants that are in coastal cities versus inland cities. This can mean that in addition to Tustin as a recommended location, we could also recommend that coastal cities are a good place for sushi restaurants. This makes sense as the fish will be more fresh and higher quality if they are closer to the ocean. Finally, we can create a list of all the zip codes in Tustin along with their latitude and longitudes, which is shown in the following cell.

However, our recommendation system only takes into account a few factors that should be accounted for. Therefore, this should only be taken as a staring point from which to start a new restaurant from where more consideration should be taken before settling on a final location.

In [24]:
try:
    best_location = final_rating_score.index[0]
    print(df[df['City'] == best_location].head())
except IndexError:
    print('There is not enough data to determine the best location.')

       Zip    City   Latitude   Longitude
62   92781  Tustin  33.640302 -117.769442
82   92780  Tustin  33.741651 -117.821270
126  92782  Tustin  33.739571 -117.786180


## Conclusion <a name="conclusion"></a>
The purpose of this project was to recommend a city location in Orange County for a specific type of restaurant as well as zip codes within the city limits. As an example, the restaurant type chosen was sushi restaurants. We first gathered restaurants of all types using the Foursquare API and then filtered out by our choice(Sushi Restaurant). To gather rating information on each location, we again used the Foursquare API and then proceeded to filter out restaurants with NaN values for their rating as they were not useful to our analysis. We then were able to create a ranking system based off the rating data and find the best location. For our example(Sushi Restaurant), the best location we found was Tustin. With this information, the zip codes in Tustin were given as potential new locations to start a new sushi restaurant.

The recommendation system is very basic and can be improved to be more sucessful by adding more relevant factors into account. Some factors that could provide to be useful are demographics of the region, locations of vacant buildings and their pricing, and using different performance metrics for restaurants such as total profit. In the end, this should only be used as a starting point as to which a new restaurant location will be chosen.