<h1>Capstone Project Notebook</h1>

The Idea behind the topic of my Capstone Project came up from my real life situation. My wife wants to open a bakery and now we are spending pretty much time seeking the good location to open. It turns out that this is a challenging task. We are living in a big city, I work in an office and trying to learn the Data Science in my spare time, my wife is in parental leave, we have a wonderful 2 year old daughter, so in between of all these activities we are searching for time to sleep and for the best spot in the nearest (not always) neighborhoods.

## Indroduction

### Best location for the Bakery

Well, let's get back to the problem - for now in a nutshell the problem is to find the best spot for the bakery.
One of the requirements is to use Foursquare API.

Since actual solution of this problem will take a lot of time and data (not always public free), I have to simplify the task. For this capstone project to apply the skills I trained during this course I decided to try to find the best neighborhood in NY to open a bakery using the Foursquare API . 

I would like to proceed as follows: I'll start to explore the neighborhoods of New York city and try to find the ones with least bakeries. Let's assume that this will help me to find the neighborhoods with lowest competition. Then I'll try to apply some conditions (such as quantity of parks, schools, tourist areas, etc.) to find the best neighborhood for new bakery.

This little research may be useful for those want to open a small coffee shop, bakery, store, etc. I suppose this can be used for any kind of small business.

## Example

Let's import all the libraries we need:

In [1]:
import numpy as np
import pandas as pd #
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
!conda install -c conda-forge geopy --yes

from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes 
import folium

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.11

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          90 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.20.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.20.0         | 57 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##

I'm going to use the json data of New York from this course

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [3]:
#open the downloaded dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [5]:
# all the necessary info is in 'features' key
neighborhoods_data = newyork_data['features']

In [6]:
#create the data frame with Boroughs of NY
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [7]:
#check the DF we have so far
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Now let's loop through the data and fill the DF:

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
#let's check what we got
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Now lets get the coordinates of NY to visualize the map

In [10]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [11]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Now it's time to use the Foursquare API to explore the neighborhoods

In [13]:
CLIENT_ID = 'CURM53KZRBFPGSX4A0PMZTJPST5OTTF1LRH3C4KURRVPCCBT' # your Foursquare ID
CLIENT_SECRET = '0KTI2VDAKFMBDWP450JHZYN3ATMCOTJVJIIP4QCNS0VSKHKK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Let's find the bakeries in one of the neighborhood in our dataframe

In [16]:
neighborhoods.loc[86, 'Neighborhood']

'Downtown'

In [15]:
neighborhood_latitude = neighborhoods.loc[86, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[86, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[86, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Downtown are 40.69084402109802, -73.98346337431099.


In [18]:
#query the bakeries in the chosen neighborhood
search_query = 'Bakery'
radius = 500
LIMIT = 100

In [19]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, search_query, radius, LIMIT)

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d518382342adf0038d1ff0c'},
 'response': {'venues': [{'id': '4a14b462f964a5206a781fe3',
    'name': 'Betty Bakery',
    'location': {'address': '448 Atlantic Ave',
     'crossStreet': 'btwn Bond & Nevins',
     'lat': 40.6863845552732,
     'lng': -73.98335021420114,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.6863845552732,
       'lng': -73.98335021420114}],
     'distance': 496,
     'postalCode': '11217',
     'cc': 'US',
     'city': 'Brooklyn',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['448 Atlantic Ave (btwn Bond & Nevins)',
      'Brooklyn, NY 11217',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d16a941735',
      'name': 'Bakery',
      'pluralName': 'Bakeries',
      'shortName': 'Bakery',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bakery_',
       'suffix': '.png'},
      'primary': True}],
    'delivery': {'id': '486627',
     'url': 'h

In [21]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,location.neighborhood,venuePage.id
0,4a14b462f964a5206a781fe3,Betty Bakery,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",v-1565623170,False,448 Atlantic Ave,btwn Bond & Nevins,40.686385,-73.98335,"[{'label': 'display', 'lat': 40.6863845552732,...",496,11217,US,Brooklyn,NY,United States,"[448 Atlantic Ave (btwn Bond & Nevins), Brookl...",486627.0,https://www.seamless.com/menu/betty-bakery-448...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,
1,3fd66200f964a5207cf11ee3,Junior's Restaurant,"[{'id': '4bf58dd8d48988d147941735', 'name': 'D...",v-1565623170,False,386 Flatbush Avenue Ext,at DeKalb Ave,40.690011,-73.981734,"[{'label': 'display', 'lat': 40.69001145156075...",172,11201,US,Brooklyn,NY,United States,"[386 Flatbush Avenue Ext (at DeKalb Ave), Broo...",291315.0,https://www.seamless.com/menu/juniors-restaura...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,Downtown Brooklyn,66852513.0
2,4c176f06f256a5939e84ec3e,Broadway Bakery,"[{'id': '4bf58dd8d48988d1e2931735', 'name': 'A...",v-1565623170,False,379 Bridge St,,40.691667,-73.985063,"[{'label': 'display', 'lat': 40.691667, 'lng':...",163,11201,US,Brooklyn,NY,United States,"[379 Bridge St, Brooklyn, NY 11201, United Sta...",,,,,,,,
3,4f32270719836c91c7bb53c0,Flaky Crust Caribbean Bakery and Restaurant,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",v-1565623170,False,255 Livingston St,,40.688449,-73.982708,"[{'label': 'display', 'lat': 40.688449, 'lng':...",274,11217,US,Brooklyn,NY,United States,"[255 Livingston St, Brooklyn, NY 11217, United...",,,,,,,,
4,4b5a15c5f964a5203cac28e3,Golden Krust Caribbean Restaurant,"[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",v-1565623170,False,139 Lawrence Street,,40.691881,-73.986214,"[{'label': 'display', 'lat': 40.69188104944479...",259,11201,US,Brooklyn,NY,United States,"[139 Lawrence Street, Brooklyn, NY 11201, Unit...",,,,,,,,


In [22]:
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Betty Bakery,Bakery,448 Atlantic Ave,btwn Bond & Nevins,40.686385,-73.98335,"[{'label': 'display', 'lat': 40.6863845552732,...",496,11217,US,Brooklyn,NY,United States,"[448 Atlantic Ave (btwn Bond & Nevins), Brookl...",,4a14b462f964a5206a781fe3
1,Junior's Restaurant,Diner,386 Flatbush Avenue Ext,at DeKalb Ave,40.690011,-73.981734,"[{'label': 'display', 'lat': 40.69001145156075...",172,11201,US,Brooklyn,NY,United States,"[386 Flatbush Avenue Ext (at DeKalb Ave), Broo...",Downtown Brooklyn,3fd66200f964a5207cf11ee3
2,Broadway Bakery,Art Gallery,379 Bridge St,,40.691667,-73.985063,"[{'label': 'display', 'lat': 40.691667, 'lng':...",163,11201,US,Brooklyn,NY,United States,"[379 Bridge St, Brooklyn, NY 11201, United Sta...",,4c176f06f256a5939e84ec3e
3,Flaky Crust Caribbean Bakery and Restaurant,Bakery,255 Livingston St,,40.688449,-73.982708,"[{'label': 'display', 'lat': 40.688449, 'lng':...",274,11217,US,Brooklyn,NY,United States,"[255 Livingston St, Brooklyn, NY 11217, United...",,4f32270719836c91c7bb53c0
4,Golden Krust Caribbean Restaurant,Caribbean Restaurant,139 Lawrence Street,,40.691881,-73.986214,"[{'label': 'display', 'lat': 40.69188104944479...",259,11201,US,Brooklyn,NY,United States,"[139 Lawrence Street, Brooklyn, NY 11201, Unit...",,4b5a15c5f964a5203cac28e3
5,Patty Plus,Caribbean Restaurant,324 Livingston St,,40.68806,-73.981711,"[{'label': 'display', 'lat': 40.68805995558156...",343,11217,US,Brooklyn,NY,United States,"[324 Livingston St, Brooklyn, NY 11217, United...",,4bae7519f964a520d1b43be3


In [23]:
#visualize the bakeries we found
venues_map = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=13) # generate map centred around the choosen neighborhood 

# add a red circle marker to represent the neighborhood
folium.features.CircleMarker(
    [neighborhood_latitude, neighborhood_longitude],
    radius=10,
    color='red',
    popup=neighborhood_name,
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the bakeries as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

So we got 6 bakeries in Downtown, Brooklyn 

In [25]:
dataframe_filtered.shape

(6, 16)

This is an example of what I'm going to do. I'll check all the neighborhoods, find the quantity of the bakeries, apply some additional filters (quantity of kinder gardens, schools, parks) and suppose that the neighborhood with lowest quantity of bakeries and highest quantities of such publics places may be best suited for the bakery.    