# Capstone project

This notebook will be mainly used for the capstone project of IBM data science capstone project.

In [36]:
import pandas as pd
import numpy as np

## Hello Capstone Project Course!

In this project we are going to talk about opening a new mexican food restaurant in the city of San Diego, CA. Why is this something useful? well, San Diego is a city full of visitors from Mexico and has a high number of mexicans currently living or working there. Because of this, there a variety of mexican restaurants distributed along the city. So our work here is to use the foursquare data and machine learning algorithms to find out where would be the best places to put a new mexican restaurant. For the data, we will be using only the best neighborhood of San Diego according to the page: https://www.zumper.com/blog/2018/05/7-best-san-diego-neighborhoods/. So let's get started.

After getting the name of the best neighborhood, I got into the task of getting the latitude and longitude of them so I can link them with foursquare

In [37]:
d={'Neighborhood':["Hillcrest", "Little Italy", "North Park", "Gaslamp Quarter",\
                   "Ocean Beach", "La Jolla", "Normal Heigths"], 'Latitude': [32.749997, \
                   32.721163782, 32.7408842, 32.7075795, 32.741947, 32.842674, 32.7580119679], 'Longitude':\
                   [-117.166666, -117.166999332, -117.1305877, -117.1601285, -117.239571, -117.257767 , -117.117999528]}


In [38]:
df=pd.DataFrame(data=d)
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Hillcrest,32.749997,-117.166666
1,Little Italy,32.721164,-117.166999
2,North Park,32.740884,-117.130588
3,Gaslamp Quarter,32.70758,-117.160128
4,Ocean Beach,32.741947,-117.239571


So now let's import the rest of the libraries that we will be using for the rest of the project.

In [39]:
import requests # library to handle requests

from sklearn.cluster import KMeans # import k-means from clustering stage

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

Let's load our credentials of foursquare.

In [40]:
CLIENT_ID = 'GTNIX443DWLBJ3GJADTUMKBVAELTUXTK4S2OG4H1BFR0DXGI' 
CLIENT_SECRET = 'ZH1EY4UNZL3VMUFCRG0O20M3OBYEYRNED0QC1EH340CJBG3B' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GTNIX443DWLBJ3GJADTUMKBVAELTUXTK4S2OG4H1BFR0DXGI
CLIENT_SECRET:ZH1EY4UNZL3VMUFCRG0O20M3OBYEYRNED0QC1EH340CJBG3B


And now let's get the geographical data of Sand Diego so we can create a folium map.

In [41]:
address = 'San Diego, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Diego are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Diego are 32.7174209, -117.1627714.


And now, we create a folium with all the top neighborhoods that we selected at the start marked.

In [42]:
map_sd = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sd)  
    
map_sd

Now we define a function that extract the venues' name, latitude and longitude as well as category.

In [43]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

And now we call the previous function with the data from our dataframe.

In [49]:
sd_venues_best_neighborhood = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'])

Hillcrest
Little Italy
North Park
Gaslamp Quarter
Ocean Beach
La Jolla
Normal Heigths


Now let's take a look at the shape of our dataframe and the information we got from our neighborhoods,

In [50]:
print(sd_venues_best_neighborhood.shape)
sd_venues_best_neighborhood.head()

(143, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hillcrest,32.749997,-117.166666,Toma Sol Tavern,32.749877,-117.166341,Sports Bar
1,Hillcrest,32.749997,-117.166666,Lazy Acres Natural Market,32.75021,-117.167797,Organic Grocery
2,Hillcrest,32.749997,-117.166666,Vons,32.74938,-117.168194,Grocery Store
3,Hillcrest,32.749997,-117.166666,RK Sushi,32.749992,-117.16707,Sushi Restaurant
4,Hillcrest,32.749997,-117.166666,Sushi Deli 1,32.74995,-117.165757,Sushi Restaurant


Now let's see how mane unique categories exist in our dataframe.

In [47]:
print('There are {} uniques categories.'.format(len(sd_venues_best_neighborhood['Venue Category'].unique())))

There are 72 uniques categories.


This will be all the information needed to continue with our capstone project. See you in the next lab!