# Battle of Neighborhoods Coding Notebook. 
## OPENING A NEW RESTAURANT IN SAN FRANCISCO

### [ INTRODUCTION ]
### Our company wants to open a new restaurant in San Francisco. The interest is to open one at a premium location within the city, that would bring good revenue for the new business. The decision is to rent the property instead of buying it.

In [21]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


Libraries imported.


### Importing of the rent Dataset from my GitHub account to the Notebook.

In [111]:
git = 'https://raw.githubusercontent.com/Auxilin/Battle-of-Neighborhoods-for-new-Restaurant-location/master/SFRent_Dataset.csv'
sfran_Data = pd.read_csv(git)
sfran_Data

Unnamed: 0,Neighborhood,City,State,2018-07
0,Hayes Valley,San Francisco,CA,3030
1,Van Ness - Civic Center,San Francisco,CA,2977
2,Tenderloin,San Francisco,CA,2977
3,Downtown,San Francisco,CA,3040
4,Western Addition,San Francisco,CA,2989
5,Marina,San Francisco,CA,2948
6,Russian Hill,San Francisco,CA,2954
7,Lower Pacific Heights,San Francisco,CA,2963
8,Nob Hill,San Francisco,CA,3050
9,Pacific Heights,San Francisco,CA,2973


### For further analysis we need the co-ordinates of each neighborhood.

In [112]:
#creating a dataframe for storing co-ordinates details.
coordinates = pd.DataFrame(columns=['Latitude','Longitude'])

# Using 'for loop' to get pass each Neighborhood name and get co-ordinates details through geocoding.
for row,neighborhood in sfran_Data.iterrows():
    address = neighborhood['Neighborhood'] + ',' + neighborhood['City'] + ',' + neighborhood['State'] 
    try:
        geolocator = Nominatim(user_agent="my-application")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        # appending latitude and longitude values on coordiantes dataframe.
        coordinates = coordinates.append({'Latitude':latitude,'Longitude':longitude},ignore_index=True)
    except:
        print(address)
    

coordinates

Merced Heights,San Francisco,CA


Unnamed: 0,Latitude,Longitude
0,37.776685,-122.422936
1,37.77519,-122.419266
2,37.784249,-122.413993
3,37.787514,-122.407159
4,37.779559,-122.42981
5,37.799793,-122.435205
6,37.797707,-122.414971
7,37.785767,-122.438904
8,37.793262,-122.415249
9,37.792717,-122.435644


### Resize our sfran_Data dataframe to only Neighborhood & Rent information, then add both with co-ordinates.

In [113]:
#Let's now take only Neighorhood and rent data for further testing.
sfran_Data= sfran_Data[['Neighborhood','2018-07']]

# Adding the sfran_Data and coordinates in one new dataframe.
sf_Neighborhood = sfran_Data.join(coordinates, how='outer')
sf_Neighborhood

Unnamed: 0,Neighborhood,2018-07,Latitude,Longitude
0,Hayes Valley,3030,37.776685,-122.422936
1,Van Ness - Civic Center,2977,37.77519,-122.419266
2,Tenderloin,2977,37.784249,-122.413993
3,Downtown,3040,37.787514,-122.407159
4,Western Addition,2989,37.779559,-122.42981
5,Marina,2948,37.799793,-122.435205
6,Russian Hill,2954,37.797707,-122.414971
7,Lower Pacific Heights,2963,37.785767,-122.438904
8,Nob Hill,3050,37.793262,-122.415249
9,Pacific Heights,2973,37.792717,-122.435644


### Now let's calculate rentScore and RatingScore from our formaula i.e finalScore = (rentScore)0.6+(ratingScore)0.4
### First to calculate the rentSCore formula is (maxrentofN-currentrentofN)/(maxrentofN-minrentofN). Maximum and minimum values can be retrieved using max and min built in functions.

In [114]:
# Max value in '2018-07'
maxrentofN = max(sf_Neighborhood['2018-07'])
print('Max rent value of SF Neighborhood is $',maxrentofN)
minrentofN = min(sf_Neighborhood['2018-07'])
print('Min rent value SF Neighborhood is $',minrentofN)

Max rent value of SF Neighborhood is $ 3711
Min rent value SF Neighborhood is $ 2640


In [115]:
#create a new dataframe to store rent score.
rent_Score = pd.DataFrame(columns=['RentScore'])
for index, Neighborhood in sf_Neighborhood.iterrows():
    currentrentofN = Neighborhood['2018-07']
    rentscore =  (maxrentofN-currentrentofN)/(maxrentofN-minrentofN)
    rent_Score = rent_Score.append({'RentScore' : rentscore},ignore_index = True)
    
rent_Score 

Unnamed: 0,RentScore
0,0.635854
1,0.685341
2,0.685341
3,0.626517
4,0.674136
5,0.712418
6,0.706816
7,0.698413
8,0.61718
9,0.689076


In [116]:
#adding rent score information in sf_Neighborhood dataframe. 
sf_Neighborhood = sf_Neighborhood.join(rent_Score, how='outer')
sf_Neighborhood

Unnamed: 0,Neighborhood,2018-07,Latitude,Longitude,RentScore
0,Hayes Valley,3030,37.776685,-122.422936,0.635854
1,Van Ness - Civic Center,2977,37.77519,-122.419266,0.685341
2,Tenderloin,2977,37.784249,-122.413993,0.685341
3,Downtown,3040,37.787514,-122.407159,0.626517
4,Western Addition,2989,37.779559,-122.42981,0.674136
5,Marina,2948,37.799793,-122.435205,0.712418
6,Russian Hill,2954,37.797707,-122.414971,0.706816
7,Lower Pacific Heights,2963,37.785767,-122.438904,0.698413
8,Nob Hill,3050,37.793262,-122.415249,0.61718
9,Pacific Heights,2973,37.792717,-122.435644,0.689076


#### Let's visualize the Gaborone Restaurants that are nearby

In [117]:
sf_Neighborhood

Unnamed: 0,Neighborhood,2018-07,Latitude,Longitude,RentScore
0,Hayes Valley,3030,37.776685,-122.422936,0.635854
1,Van Ness - Civic Center,2977,37.77519,-122.419266,0.685341
2,Tenderloin,2977,37.784249,-122.413993,0.685341
3,Downtown,3040,37.787514,-122.407159,0.626517
4,Western Addition,2989,37.779559,-122.42981,0.674136
5,Marina,2948,37.799793,-122.435205,0.712418
6,Russian Hill,2954,37.797707,-122.414971,0.706816
7,Lower Pacific Heights,2963,37.785767,-122.438904,0.698413
8,Nob Hill,3050,37.793262,-122.415249,0.61718
9,Pacific Heights,2973,37.792717,-122.435644,0.689076


### Now let's calculate the second part of the formula i.e ratingScore, In our project we need only relevent restaurant data, so let's use categoryid in search endpoint url.

### Define Foursquare Credentials and Version

In [30]:
CLIENT_ID = 'BABRZ03NUDLFKCNORYZDD0GJDF5TDUDLV4HD3DYOHI0V4TOK' # your Foursquare ID
CLIENT_SECRET = 'FQJFZP3JHSMXGOVYTUAYBKFG404KPUWNDGKYDVAESHKAIOZL' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BABRZ03NUDLFKCNORYZDD0GJDF5TDUDLV4HD3DYOHI0V4TOK
CLIENT_SECRET:FQJFZP3JHSMXGOVYTUAYBKFG404KPUWNDGKYDVAESHKAIOZL


In [130]:
# Indian restaurant categoryid of foursquare
categoryId= '4bf58dd8d48988d10f941735'
# url willl search in 500 meters radius of latitude and longitude.
radius=500
api_endpoint = 'https://api.foursquare.com/'
#used below url to cache the request made to foursquare api
api_endpoint = 'http://cladiusfernando-eval-test.apigee.net/foursquare/'

#dataframe to save venue information.
venue_Details = pd.DataFrame(columns=['VenueNeighborhoodName','VenueName','VenueRating'])
#dataframe to save count of good rating information.
goodRating =pd.DataFrame(columns=['Neighborhood','GoodRatingRestaurant'])


In [63]:
!wget -q -O 'Good_Ratings.csv' https://github.com/Sizo-Dlodlo/Coursera_Capstone/blob/master/Good_Ratings
print('Data downloaded!')

Data downloaded!


In [118]:
df = pd.read_csv("https://raw.githubusercontent.com/Sizo-Dlodlo/Coursera_Capstone/master/Good_Ratings.csv")
df

Unnamed: 0,Neighborhood,2018-07,Latitude,Longitude,RentScore,GoodRatingRestaurant,RatingScore,FinalScore
0,Hayes Valley,3030,37.776685,-122.422936,0.635854,2,0.75,0.681513
1,Van Ness - Civic Center,2977,37.77519,-122.419266,0.685341,3,0.625,0.661204
2,Tenderloin,2977,37.784249,-122.413993,0.685341,8,0.0,0.411204
3,Downtown,3040,37.787514,-122.407159,0.626517,6,0.25,0.47591
4,Western Addition,2989,37.779559,-122.42981,0.674136,0,1.0,0.804482
5,Marina,2948,37.799793,-122.435205,0.712418,0,1.0,0.82745
6,Russian Hill,2954,37.797707,-122.414971,0.706816,1,0.875,0.77409
7,Lower Pacific Heights,2963,37.785767,-122.438904,0.698413,2,0.75,0.719048
8,Nob Hill,3050,37.793262,-122.415249,0.61718,1,0.875,0.720308
9,Pacific Heights,2973,37.792717,-122.435644,0.689076,0,1.0,0.813445


In [102]:
df.size

288

### Below are the venue names and it's rating according to each neighborhood

In [103]:
df

Unnamed: 0,Neighborhood,2018-07,Latitude,Longitude,RentScore,GoodRatingRestaurant,RatingScore,FinalScore
0,Hayes Valley,3030,37.776685,-122.422936,0.635854,2,0.75,0.681513
1,Van Ness - Civic Center,2977,37.77519,-122.419266,0.685341,3,0.625,0.661204
2,Tenderloin,2977,37.784249,-122.413993,0.685341,8,0.0,0.411204
3,Downtown,3040,37.787514,-122.407159,0.626517,6,0.25,0.47591
4,Western Addition,2989,37.779559,-122.42981,0.674136,0,1.0,0.804482
5,Marina,2948,37.799793,-122.435205,0.712418,0,1.0,0.82745
6,Russian Hill,2954,37.797707,-122.414971,0.706816,1,0.875,0.77409
7,Lower Pacific Heights,2963,37.785767,-122.438904,0.698413,2,0.75,0.719048
8,Nob Hill,3050,37.793262,-122.415249,0.61718,1,0.875,0.720308
9,Pacific Heights,2973,37.792717,-122.435644,0.689076,0,1.0,0.813445


### Second to calculate the RatingScore formula is  (maxgoodrest-currentrestofN)/(maxgoodrest-mingoodrest)

In [129]:
maxgoodrest = max(df['GoodRatingRestaurant'])
print('Maximum good restaurant count',maxgoodrest)

Maximum good restaurant count 8


### Now we have rent score and rating score, so let's calculate Final Score to decide which are suitable neighborhoods for New Restaurant.

### Below are the list of suitable neighborhoods with affordable rent in San Francisco for opening a New Restaurant.

In [125]:
Results = df[(df['FinalScore'] >= 0.8)].sort_values('FinalScore', ascending=False)
Results

Unnamed: 0,Neighborhood,2018-07,Latitude,Longitude,RentScore,GoodRatingRestaurant,RatingScore,FinalScore
11,Stonestown,2640,37.727446,-122.474895,1.0,0,1.0,1.0
12,Merced Heights,2640,37.717507,-122.470281,1.0,0,1.0,1.0
13,Lakeside,2640,37.731967,-122.474257,1.0,0,1.0,1.0
10,Noe Valley,2818,37.751591,-122.432081,0.8338,1,0.875,0.85028
5,Marina,2948,37.799793,-122.435205,0.712418,0,1.0,0.82745
19,Glen Park,2967,37.733108,-122.433784,0.694678,0,1.0,0.816807
9,Pacific Heights,2973,37.792717,-122.435644,0.689076,0,1.0,0.813445
4,Western Addition,2989,37.779559,-122.42981,0.674136,0,1.0,0.804482


### Folium and choropleth Map of the Best Location for the New Restaurant 

In [134]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Stonestown


# add Stonestown as a red circle mark
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Stonestown',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add other popular spots to the map as blue circle markers
for lat, lng, label in zip(df.Latitude, df.Longitude, df.Neighborhood):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

In [137]:
Image(url= "https://drive.google.com/file/d/1BdnNPG9U6JoBIHIPWXNiSWS4zx7rzfv_/view?usp=sharing")

### The Ideal place to rent for the New restaurant is the arear of Mission Rock and UCSF Mission Bay San Francisco with a Minimium rent of Max rent value of 3,711USD and Min rent value of 2,640USD.

#### The best features of this location are as follows: Near a big parking lot Oracle Parking, public transport and hangout spots like parks and Boom & Bust Course.