# IBM Data Science: Capstone Project
## Battle of the Neighborhoods - Week 1
## William Windsor
**+++++++++++++**
### Battle of the Neighborhoods -- Business Decision:
### Where to Locate a Professional Services Business: San Francisco SoMa Area or Seattle Pioneer Square?

**+++++++++++++**
### Part 1: Description of the Problem and a Discussion of the Background
#### Business Problem to Be Addressed: this data science project will address the decision where to locate a professional services business that services a geographical concentration of high tech companies that includes a burgeoning high-tech startup hub.
#### Examples of professional services businesses could be providing data science and data research services to tech companies (such as motivated by this IBM Data Science Professional Certification program!); facilities planning and facilities location research services;  professional human resources contract services;  professional legal services; on-site health care and monitoring services; or security services such as IT software hosting and security or personnel for on-site security monitoring.
#### This project will focus on two neighborhoods:
####   * San Francisco, CA: SoMa (South of Market Street) region
####   * Seattle, WA: Pioneer Square region
#### Both of these geographical regions have strong technology company presence, and both are noted for their burgeoning increase of technology startup companies.
### Target Audience and Why They Would Care About This Problem:
#### The target audience for this study are CEOs and entrepreneurs and venture capital companies looking to evaluate where to establish a new professional services business. They would care about this data analysis because the study will provide data sources and data science evaluation for several factors important for locating the business, with the thrust that professional services businesses generate greater success when they are located near their client field.

**+++++++++++++**
### Part 2: Description of the Data and How It Will Be Used to Solve the Problem
#### Decision Methodology and Datasets Planned to Be Used: the decision of where to locate a professional services business servicing a high-tech company hub depends on many factors. In this study, I focus on two major sets of factors:
####   (1) Financial impact to the company;  (2) Employee quality of life. 

### (1) Financial Impact: the study integrates financial costs and impact of locating in each area, focusing on:
###   (a) Commercial Office Real Estate lease and rent costs, based on current commercial office properties available at the time of this study.
####    These datasets, one for each city and locale, will be structured in the following fields for comparison: Property Address, Neighborhood or Zip Code, Rental Price per Month, Property Area in square feet, Price per Square Foot. 
#### I utilize the following commercial real estate companies' websites to extract these targeted commercial property data. The sites below provide these data for San Francisco and Seattle commercial properties:
#### https://www.loopnet.com ,  https://www.cityfeet.com ,  https://42floors.com

###   (b) Business density concentration, utilizing the Foursquare dataset.
#### This section of the study will prioritize the highest density of businesses in each neighborhood (highest numerical concentration of businesses), in order to determine a higher judgment of success in reaching businesses closely located to our proposed business. 
#### I utilize Foursquare to generate the number of business venues within a 1-mile radius of each targeted locality,  then use venue grouping and cluster density machine learning to generate the region with the highest density of businesses, between San Francisco / SoMa and Seattle / Pioneer Square regions.
###   (c) Business Taxes
#### To document the financial impact of business taxes, I utilize the following governmental organizations to extract business tax rate as a percent, to locate the professional services business in either San Francisco, CA or Seattle, WA. The sites below provide these data for their respective location:
####    * San Francisco city and State of California business taxes:  https://sftreasurer.org/business ,  https://www.ftb.ca.gov/businesses/index.shtml
####    * Seattle city and State of Washington business taxes:  https://www.seattle.gov/business-licenses-and-taxes ,  https://dor.wa.gov/find-taxes-rates/business-occupation-tax

### (2) Employee Quality of Life, utilizing the Foursquare dataset. 
#### Utilizing the Foursquare venues data, I utilize the Foursquare dataset to analyze local venues providing increased quality of life in the vicinity. "Quality of life" is clearly a subjective concept. So I have selected the following three factors affecting quality of life from the Foursquare data: 
####    (a) Availability of Public Transportation: specify how many subway public transportation stations are located in each region, and the distance of each from the center of each neighborhood.
####    (b) Availability of Exercise Facilities: specify how many gyms or exercise facilities are located in each region.
####    (c) Availability of Restaurants: specify how many restaurants are located in each region.
**+++++++++++++**
### Below is the start of this work, to extract the Foursquare venues data for each location: San Francisco / SoMa and Seattle / Pioneer Square regions.

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm
import matplotlib.colors as colors

import requests
from bs4 import BeautifulSoup
import csv
import xml
import random          # library for random number generation
print("Initial packages imported: \nNumPy, Pandas, Matplotlib, Requests, bs4.BeautifulSoup, CSV, XML, Random.")

Initial packages imported: 
NumPy, Pandas, Matplotlib, Requests, bs4.BeautifulSoup, CSV, XML, Random.


In [2]:
import json 
# Tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# module to convert an address into latitude and longitude values
# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
print("GeoPy.Geocoders.Nominatim installed.")

# Libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
print("IPython Image and HTML installed.")

from sklearn.cluster import KMeans

# !conda install -c conda-forge folium=0.5.0 --yes
import folium 
from folium import plugins

print("Second set of libraries imported: \nJSON, JSON_Normalize, Nominatum, Image, HTML, KMeans, Folium.")

GeoPy.Geocoders.Nominatim installed.
IPython Image and HTML installed.
Second set of libraries imported: 
JSON, JSON_Normalize, Nominatum, Image, HTML, KMeans, Folium.


In [3]:
# Specify Location (Latitude and Longitude Coordinates) of:
#  San Francisco / SoMa Area (South of Market Street): AddressSFSoMa
#  Seattle / Pioneer Square Area: AddressSeattlePS

AddressSFSoMa = 'San Francisco, CA / SoMa Area'
#location_SFSoMa = getlocation(AddressSFSoMa)
latitude_SFSoMa = 37.775316
longitude_SFSoMa = -122.419626

AddressSeattlePS = 'Seattle, WA / Pioneer Square Area'
#location_SeattlePS = getlocation(AddressSeattlePS)
latitude_SeattlePS = 47.601954
longitude_SeattlePS = -122.329204

print('The geographical coordinates of', AddressSFSoMa, 'are    : {}  {}'.format(latitude_SFSoMa, longitude_SFSoMa))
print('The geographical coordinates of', AddressSeattlePS, 'are: {}  {}'.format(latitude_SeattlePS, longitude_SeattlePS))

The geographical coordinates of San Francisco, CA / SoMa Area are    : 37.775316  -122.419626
The geographical coordinates of Seattle, WA / Pioneer Square Area are: 47.601954  -122.329204


In [5]:
# Display map of the region to show the two cities
map_all = folium.Map(location=[(latitude_SFSoMa+latitude_SeattlePS)/2, ((longitude_SFSoMa+longitude_SeattlePS)/2)+5], tiles='Stamen Terrain', zoom_start=5)

folium.Marker(location=[latitude_SFSoMa, longitude_SFSoMa], popup='San Francisco').add_to(map_all)
folium.CircleMarker(location=[latitude_SFSoMa, longitude_SFSoMa], radius=10,
popup='San Francisco / SoMa Area', color='#3186cc', fill_color='#3186cc').add_to(map_all)

folium.Marker(location=[latitude_SeattlePS, longitude_SeattlePS], popup='Seattle').add_to(map_all)
folium.CircleMarker(location=[latitude_SeattlePS, longitude_SeattlePS], radius=10,
popup='Seattle / Pioneer Square', color='#3186cc', fill_color='#3186cc').add_to(map_all)

map_all

In [6]:
# Display map of the San Francisco / SoMa Area (South of Market Street)
map_SFSoMa = folium.Map(location=[latitude_SFSoMa, longitude_SFSoMa], tiles='Stamen Terrain', zoom_start=12)

folium.Marker(location=[latitude_SFSoMa, longitude_SFSoMa], popup='San Francisco').add_to(map_SFSoMa)
folium.CircleMarker(location=[latitude_SFSoMa, longitude_SFSoMa], radius=10,
popup='San Francisco / SoMa Area', color='#3186cc', fill_color='#3186cc').add_to(map_SFSoMa)

map_SFSoMa

In [19]:
# Please Note: this is the cell where I provide my Foursquare confidential customer credentials
# My Foursquare confidential customer credentials are REDACTED here.

In [8]:
# San Francisco / SoMa area:
# Analyze venues / local businesses within 1500 meters of the target neighborhood (approx. 1 mile radius)
radius = 1500
limit = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, latitude_SFSoMa, longitude_SFSoMa, radius, limit)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=0L5LJUZ1ZRIICXKIV3JEBXMSIR2A2WUKTL2PYY2UE22TX5DL&client_secret=OE2B2ZNPTMOQCWJATIXF1IBTKWIMPZT5Q3OO0PE2LBPN0UFI&v=20180605&ll=37.775316,-122.419626&radius=1500&limit=100'

In [9]:
results = requests.get(url).json()
# results

# Test selected objects and venues from the dataset - focus on the 'items' object
# print("meta: requestId: ", results['meta']['requestId'])
# print("response: suggestedBounds: 'ne', 'sw' \n", results['response']['suggestedBounds']['ne'], '\n', results['response']['suggestedBounds']['sw'])

# results['response']['groups'][0]['items']

In [10]:
# Print selected Venue Name results, to verify the object referencing:
# results['response']['groups'][0]['items'][0]['venue']['name']  --> 'SFJazz Center'
for i in range(30):
    print(results['response']['groups'][0]['items'][i]['venue']['name'])

SFJazz Center
New Conservatory Theatre Center
Blue Bottle Coffee
Nojo Ramen Tavern
Cala
Sydney Goldstein Theater
Louise M. Davies Symphony Hall
Fatted Calf
Hotel Biron
The Beer Hall
Rich Table
Fitness SF
Veer & Wander
Blue Bottle Coffee
Siam Orchid Traditional Thai Massage
Warby Parker
Linden Room
Otoro Sushi
Maker & Moss
Zuni Café
Linden Alley
War Memorial Opera House
Suginami Aikikai
San Francisco Ballet
Fig & Thistle Wine Bar
Arlequin Wine Merchant
Welcome Stranger
Ritual Coffee Roasters
Urban Pharm
Smitten Ice Cream


In [11]:
# From our Foursquare lab, we know that all the information is in the items key. 
# Define function get_category_type to retrieve the venue's category
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [13]:
# Now we clean the JSON object and structure it into a pandas dataframe.
search_query_SF = 'venues'
venues_SF = results['response']['groups'][0]['items']
nearby_venues_SF = json_normalize(venues_SF)     # flatten JSON
nearby_venues_SF.head()

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.delivery.id,venue.delivery.provider.icon.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.name,venue.delivery.url,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,venue.location.distance,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.neighborhood,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups,venue.venuePage.id
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-50f21340e4b036c5cc0d7c7d-0,"[{'id': '4bf58dd8d48988d1e7931735', 'name': 'J...",,,,,,,50f21340e4b036c5cc0d7c7d,201 Franklin St,US,San Francisco,United States,at Fell St,203,"[201 Franklin St (at Fell St), San Francisco, ...","[{'label': 'display', 'lat': 37.77635001768186...",37.77635,-122.421539,Civic Center,94102,CA,SFJazz Center,0,[],
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4a346043f964a520279c1fe3-1,"[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",,,,,,,4a346043f964a520279c1fe3,25 Van Ness Ave,US,San Francisco,United States,Oak St,32,"[25 Van Ness Ave (Oak St), San Francisco, CA 9...","[{'label': 'display', 'lat': 37.77559103972081...",37.775591,-122.419753,,94102,CA,New Conservatory Theatre Center,0,[],75569151.0
2,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-5560dbdb498e91a2bcde84f6-2,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",,,,,,,5560dbdb498e91a2bcde84f6,1355 Market St Ste 190,US,San Francisco,United States,at 10th St,265,"[1355 Market St Ste 190 (at 10th St), San Fran...","[{'label': 'display', 'lat': 37.77628641647586...",37.776286,-122.416867,,94103,CA,Blue Bottle Coffee,0,[],
3,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4d8eabc7d265236af9a71017-3,"[{'id': '55a59bace4b013909087cb24', 'name': 'R...",,,,,,,4d8eabc7d265236af9a71017,231 Franklin St,US,San Francisco,United States,at Linden St,206,"[231 Franklin St (at Linden St), San Francisco...","[{'label': 'display', 'lat': 37.77663659663732...",37.776637,-122.42127,,94102,CA,Nojo Ramen Tavern,0,[],
4,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-5600850a498edff486fdfe89-4,"[{'id': '4bf58dd8d48988d1c1941735', 'name': 'M...",,,,,,,5600850a498edff486fdfe89,149 Fell St,US,San Francisco,United States,btwn Franklin and Van Ness,106,"[149 Fell St (btwn Franklin and Van Ness), San...","[{'label': 'display', 'lat': 37.77606310699734...",37.776063,-122.420386,,94102,CA,Cala,0,[],


In [14]:
# Repeat the analysis for Seattle / Pioneer Square area

# Display map of the San Francisco / SOMA (South of Market) Area
map_SeattlePS = folium.Map(location=[latitude_SeattlePS, longitude_SeattlePS], tiles='Stamen Terrain', zoom_start=12)

folium.Marker(location=[latitude_SeattlePS, longitude_SeattlePS], popup='Seattle').add_to(map_SeattlePS)
folium.CircleMarker(location=[latitude_SeattlePS, longitude_SeattlePS], radius=10,
popup='Seattle / Pioneer Square', color='#3186cc', fill_color='#3186cc').add_to(map_SeattlePS)

map_SeattlePS

In [15]:
# Seattle / Pioneer Square area:
# Analyze venues / local businesses within 1500 meters of the target neighborhood (approx. 1 mile radius)
radius = 1500
limit = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, latitude_SeattlePS, longitude_SeattlePS, radius, limit)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=0L5LJUZ1ZRIICXKIV3JEBXMSIR2A2WUKTL2PYY2UE22TX5DL&client_secret=OE2B2ZNPTMOQCWJATIXF1IBTKWIMPZT5Q3OO0PE2LBPN0UFI&v=20180605&ll=47.601954,-122.329204&radius=1500&limit=100'

In [16]:
results = requests.get(url).json()
# results

# Test selected objects and venues from the dataset - focus on the 'items' object
# print("meta: requestId: ", results['meta']['requestId'])
# print("response: suggestedBounds: 'ne', 'sw' \n", results['response']['suggestedBounds']['ne'], '\n', results['response']['suggestedBounds']['sw'])
# results['response']['groups'][0]['items']

In [17]:
# Print selected Venue Name results, to verify the object referencing:
# results['response']['groups'][0]['items'][0]['venue']['name']  --> 'SFJazz Center'
for i in range(30):
    print(results['response']['groups'][0]['items'][i]['venue']['name'])

Il Corvo
Flatstick Pub
Elm Coffee Roasters
Biscuit B*tch
Tat's Delicatessen
Tsukushinbo
Cherry Street Public House
Columbia Tower Club
Smith Tower
Juicy Cafe
Good Bar
Smith Tower Observation Deck
Damn the Weather
The Bar Shoppe
Delicatus
Maneki
Top Pot Doughnuts
Salumi
Caffè Umbria
Zeitgeist Kunst & Kaffee
Intrigue Chocolates
Casco Antiguo
Cafe Nordo
Metropolitan Grill
Kinokuniya Book Store
Columbia Center Observation Deck (Sky View Observatory)
Red Bowls
The London Plane
KOBO
Pioneer Pet Feed & Supply


In [18]:
# Now we clean the JSON object and structure it into a pandas dataframe.
search_query_SEA = 'venues'
venues_SEA = results['response']['groups'][0]['items']
nearby_venues_SEA = json_normalize(venues_SEA)     # flatten JSON
nearby_venues_SEA.head()

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.delivery.id,venue.delivery.provider.icon.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.name,venue.delivery.url,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,venue.location.distance,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.neighborhood,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups,venue.venuePage.id
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4db71ec1a86ed8d46c6e179c-0,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",,,,,,,4db71ec1a86ed8d46c6e179c,217 James St,US,Seattle,United States,btwn 2nd & 3rd Ave,215,"[217 James St (btwn 2nd & 3rd Ave), Seattle, W...","[{'label': 'display', 'lat': 47.60252187796595...",47.602522,-122.331952,,98104,WA,Il Corvo,0,[],
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-56b101bb498e4905510a5798-1,"[{'id': '56aa371ce4b08b9a8d57356c', 'name': 'B...",,,,,,,56b101bb498e4905510a5798,240 2nd Ave S,US,Seattle,United States,Main,249,"[240 2nd Ave S (Main), Seattle, WA 98104, Unit...","[{'label': 'display', 'lat': 47.60022, 'lng': ...",47.60022,-122.33131,Pioneer Square,98104,WA,Flatstick Pub,0,[],154127947.0
2,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-545803de498e7e758ac5605e-2,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",,,,,,,545803de498e7e758ac5605e,240 2nd Avenue Ext S Ste 103,US,Seattle,United States,,239,"[240 2nd Avenue Ext S Ste 103, Seattle, WA 981...","[{'label': 'display', 'lat': 47.60015237080675...",47.600152,-122.330944,,98104,WA,Elm Coffee Roasters,0,[],
3,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-5762cc9ccd1085a720b1433e-3,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",,,,,,,5762cc9ccd1085a720b1433e,621 3rd Ave,US,Seattle,United States,James St & Cherry St,254,"[621 3rd Ave (James St & Cherry St), Seattle, ...","[{'label': 'display', 'lat': 47.603237, 'lng':...",47.603237,-122.33201,,98104,WA,Biscuit B*tch,0,[],
4,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-455c500ef964a5206e3d1fe3-4,"[{'id': '4bf58dd8d48988d1c5941735', 'name': 'S...",,,,,,,455c500ef964a5206e3d1fe3,159 Yesler Way,US,Seattle,United States,at 2nd Ave,241,"[159 Yesler Way (at 2nd Ave), Seattle, WA 9810...","[{'label': 'display', 'lat': 47.60190101610654...",47.601901,-122.332423,Pioneer Square,98104,WA,Tat's Delicatessen,0,[],
