# The Battle of Neighborhoods by BBogmakSit

## Intoduction/Business Problem

The target group of this project is entrants who are planning to open a small cafe serving coffee and snack. 

The analysis is built on two crowded cities on opposite coasts of US: New York, NY and San Francisco, CA. The aim is help the potential entrepreneurs to decide which city to open the cafe.

Especially since last year, people generally spent time on open air. And sometimes, they grab something to eat or drink before going to  park. Therefore, the focus is the areas close to parks and other open air places. In addition to the proximity to the parks, the location should be that the number of the potential rivals are less compared to the other options. So that, the potential customers of that area are divided between less number of venues.

## Data

In order to compare New York and San Francisco in terms of parks and the number of cafes, Foursquare data of these cities are needed. The venues information and their location information will be collected. Those would be enough to compare the cities in the scope explained. Because, when exploring the venues thorugh Forsquare data, one can reach both location and type of venues.

Let's start retrieving the necessary data.

First, define the credentials and some parameters.

In [129]:
CLIENT_ID = 'Deleted after run' 
CLIENT_SECRET = 'Deleted after run' 
VERSION = '20210101' 
LIMIT = 100

The coordinates are found online, and defined as below.

In [130]:
#coordinates of New York
ny_latitude=40.730610
ny_longitude=-73.935242

#coordinates of San Francisco
sf_latitude=37.773972
sf_longitude=-122.431297

In [131]:
#Create the url's for the parks in New York and San Francisco
url_ny_park = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}&query=park'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    ny_latitude, 
    ny_longitude, 
    LIMIT)

url_sf_park = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}&query=park'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    sf_latitude, 
    sf_longitude, 
    LIMIT)

In [132]:
#Create the url's for the coffee places in New York and San Francisco
url_ny_coff = 'https://api.foursquare.com/v2/venues/explore?&section=coffee&client_id={}&client_secret={}&v={}&ll={},{}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    ny_latitude, 
    ny_longitude, 
    LIMIT)

url_sf_coff = 'https://api.foursquare.com/v2/venues/explore?&section=coffee&client_id={}&client_secret={}&v={}&ll={},{}&limit={}&cat=coffee'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    sf_latitude, 
    sf_longitude, 
    LIMIT)

In order to work on the data easily, convert the json data into pandas data frame.

In [133]:
#First import the neccessary libraries
import pandas as pd
import numpy as np
import requests
from pandas import json_normalize

Create the park dataframe for Manhattan

In [134]:
results_ny_p = requests.get(url_ny_park).json()["response"]['groups'][0]['items']
ny_parks=json_normalize(results_ny_p)

# return only relevant information
ny_parks=ny_parks.loc[:,['venue.name','venue.location.lat', 'venue.location.lng', 'venue.location.city']]
ny_parks.columns = ['Park', 'Park Latitude', 'Park Longitude', 'City']
ny_parks.head(10)


Unnamed: 0,Park,Park Latitude,Park Longitude,City
0,Monsignor McGolrick Park,40.724546,-73.943654,Brooklyn
1,McCarren Park,40.72164,-73.952579,Brooklyn
2,WNYC Transmitter Park,40.729958,-73.960733,Brooklyn
3,Hunter's Point South Park,40.742632,-73.960701,Queens
4,Cooper Park,40.71594,-73.93728,Brooklyn
5,Bushwick Inlet Park,40.722559,-73.961549,Brooklyn
6,"Thomas P. Noonan, Jr. Playground",40.741053,-73.922213,Sunnyside
7,Msgr. McGolrick Park Dog Run,40.723291,-73.943459,Brooklyn
8,Four Freedoms Park,40.750744,-73.960465,New York
9,John F Murray Playground,40.747272,-73.948608,Queens


Now, create the San Francisco's park dataframe

In [146]:
results_sf_p = requests.get(url_sf_park).json()["response"]['groups'][0]['items']
sf_parks=json_normalize(results_sf_p)

# return only relevant information
sf_parks=sf_parks.loc[:,['venue.name','venue.location.lat', 'venue.location.lng', 'venue.location.city']]
sf_parks.columns = ['Park', 'Park Latitude', 'Park Longitude', 'City']

sf_parks.head(10)

Unnamed: 0,Park,Park Latitude,Park Longitude,City
0,Alamo Square,37.775881,-122.434412,San Francisco
1,Painted Ladies,37.77612,-122.433389,San Francisco
2,Duboce Park,37.769578,-122.43296,San Francisco
3,Patricia's Green,37.776369,-122.424479,San Francisco
4,Alamo Square Dog Park,37.775878,-122.43574,San Francisco
5,Waller Park,37.771648,-122.426626,San Francisco
6,Buena Vista Park,37.768338,-122.440501,San Francisco
7,Dog Park,37.769503,-122.432709,San Francisco
8,Corona Heights Park,37.765023,-122.438831,San Francisco
9,Daniel E. Koshland Community Park,37.773218,-122.427146,San Francisco


Now, create dataframes for coffee places.

New York's Coffee Places:

In [136]:
results_ny_c = requests.get(url_ny_coff).json()["response"]['groups'][0]['items']
ny_coffee=json_normalize(results_ny_c)

# return only relevant information
ny_coffee=ny_coffee.loc[:,['venue.name','venue.location.lat', 'venue.location.lng', 'venue.location.city']]
ny_coffee.columns = ['Venue', 'Venue Latitude', 'Venue Longitude', 'City']
ny_coffee.head(10)

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,City
0,Alkemy,40.725925,-73.939136,Brooklyn
1,Variety Coffee Roasters,40.723169,-73.944223,Brooklyn
2,Crema BK,40.723001,-73.945727,Brooklyn
3,Café Grumpy,40.728563,-73.948639,Brooklyn
4,Doughnut Plant,40.742989,-73.935292,Queens
5,Peter Pan Donut & Pastry Shop,40.726102,-73.952252,Brooklyn
6,Moe’s Doughs,40.724532,-73.948517,Brooklyn
7,Charlotte Patisserie,40.723051,-73.950386,Brooklyn
8,Tar Pit,40.717615,-73.941401,Brooklyn
9,The Blue Stove,40.717512,-73.944908,Brooklyn


San Francisco's Coffee places:

In [137]:
results_sf_c = requests.get(url_sf_coff).json()["response"]['groups'][0]['items']
sf_coffee=json_normalize(results_sf_c)

# return only relevant information
sf_coffee=sf_coffee.loc[:,['venue.name','venue.location.lat', 'venue.location.lng', 'venue.location.city']]
sf_coffee.columns = ['Venue', 'Venue Latitude', 'Venue Longitude', 'City']
sf_coffee.head(10)

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,City
0,The Center SF,37.774545,-122.43073,San Francisco
1,Réveille Coffee Co.,37.770978,-122.432029,San Francisco
2,Lady Falcon Coffee Club,37.777039,-122.431946,San Francisco
3,Cafe International,37.772156,-122.430691,San Francisco
4,Sightglass Coffee,37.772287,-122.437419,San Francisco
5,The Mill,37.776425,-122.43797,San Francisco
6,Ritual Coffee Roasters,37.776476,-122.424281,San Francisco
7,Wise Sons Bagel & Coffee,37.777284,-122.424958,San Francisco
8,Blue Bottle Coffee,37.77643,-122.423224,San Francisco
9,Duboce Park Cafe,37.769334,-122.431561,San Francisco


## Data Analysis
Now, explore and analyze the data.

First, let's see how many parks and coffee shops are there according to our search parameters. Note that, although there may be more than 100 venue in the area, the result will be max 100. Since the limit is set to 100.

In [138]:
print('The number of parks in NY:', ny_parks.shape[0])
print('The number of coffee shops in NY:', ny_coffee.shape[0])
print('The number of parks in SF:', sf_parks.shape[0])
print('The number of coffee shops in SF:', sf_coffee.shape[0])

The number of parks in NY: 100
The number of coffee shops in NY: 50
The number of parks in SF: 100
The number of coffee shops in SF: 32


At a glance, it is seen that there are more coffee places in New York. But their density is also important. Therefore, let's see the location of the places on the map.

In [139]:
import folium
#import matplotlib.cm as cm
#import matplotlib.colors as colors

Parks and Coffee shops located in the center of New York:

In [144]:
# create map
map_ny = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=13)


# add park markers to map
for park, lat, lng, city in zip(ny_parks['Park'], ny_parks['Park Latitude'], ny_parks['Park Longitude'], ny_parks['City']):
    label = '{}, {}'.format(park, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        parse_html=False).add_to(map_ny)  

# add coffee markers to map 
for venue, lat, lng, city in zip(ny_coffee['Venue'], ny_coffee['Venue Latitude'], ny_coffee['Venue Longitude'], ny_coffee['City']):
    label = '{}, {}'.format(venue, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        parse_html=False).add_to(map_ny)  
    
map_ny

Parks and Coffee shops located in the center of San Francisco:

In [149]:
# create map
map_sf = folium.Map(location=[sf_latitude, sf_longitude], zoom_start=13)


# add park markers to map
for park, lat, lng, city in zip(sf_parks['Park'], sf_parks['Park Latitude'], sf_parks['Park Longitude'], sf_parks['City']):
    label = '{}, {}'.format(park, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        parse_html=False).add_to(map_sf)  

# add coffee markers to map 
for venue, lat, lng, city in zip(sf_coffee['Venue'], sf_coffee['Venue Latitude'], sf_coffee['Venue Longitude'], sf_coffee['City']):
    label = '{}, {}'.format(venue, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        parse_html=False).add_to(map_sf)  
    
map_sf

In the same zoom level, it is seen that the parks are more densely located in San Francisco. Both this outcome and the number of coffee shops are taken into account, the location for the new cafe is recommended as San Francisco.