# Description of Problem and Background

San Francisco is one of the most vibrant cities  and also one of the most popular destinations in the world. What makes it unique is the beautiful mix of nature and city where people who live there as well as visit have so much to explore. For this submission, I would like to focus on San Francisco city’s city parks in order to determine where one would open a dog walking business in the city.

An in depth study of the parks in various neighborhoods of the city will be done based on the current data found online and if there are current dog walking businesses in those areas. I think it would be an interesting study to understand how far residents currently go to walk their dogs and based on the research, data like how far of an area the business would need to cover to get customers.

# Data Description and Problem Solving

In order to determine a list of the parks in the city of San Francisco, we will explore a list of of available parks.The steps of data preparation are as follows:
    1. Scrape data from available links online with the city park list
    2. Use the libraries such as geopy to ontain coordinates (latitude, longitude) of the parks
    3. Use the foursquare API to find all available businesses in those areas and narrow it down to other pet businesses
    4. Group the pet related businesses into categories
    5. Explore the competition that directly relates to dogwalking

### Data Preparation

In [4]:
import sys
import json

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import subprocess


import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

import seaborn as sns

from geopy.geocoders import Nominatim
!pip install folium
import folium

import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup



In [5]:
url = "https://en.wikipedia.org/wiki/List_of_parks_in_San_Francisco#City"
#html = urlopen(url)
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
html = urllib.request.urlopen( req )

In [6]:
soup = BeautifulSoup(html, 'html.parser')
type(soup)

bs4.BeautifulSoup

In [7]:
text = soup.get_text()
print(soup.text)






List of parks in San Francisco - Wikipedia
document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"f87f1d98-651f-4276-befd-3c6549afde0f","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_parks_in_San_Francisco","wgTitle":"List of parks in San Francisco","wgCurRevisionId":957391499,"wgRevisionId":957391499,"wgArticleId":1487596,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Dynamic lists","Lists of parks in the United States","Parks in San Francisco","San Francisco-related lists","Sports venues in San Francisco"],"wgPageContentLanguage":"en","wgPageContentModel":"wik

In [8]:
soup.find_all('a')

[<a id="top"></a>,
 <a class="mw-jump-link" href="#mw-head">Jump to navigation</a>,
 <a class="mw-jump-link" href="#p-search">Jump to search</a>,
 <a href="/wiki/San_Francisco" title="San Francisco">San Francisco</a>,
 <a href="#Federal"><span class="tocnumber">1</span> <span class="toctext">Federal</span></a>,
 <a href="#State"><span class="tocnumber">2</span> <span class="toctext">State</span></a>,
 <a href="#City"><span class="tocnumber">3</span> <span class="toctext">City</span></a>,
 <a href="#Private"><span class="tocnumber">4</span> <span class="toctext">Private</span></a>,
 <a href="#Privately-Owned_Public_Open_Spaces"><span class="tocnumber">4.1</span> <span class="toctext">Privately-Owned Public Open Spaces</span></a>,
 <a href="#See_also"><span class="tocnumber">5</span> <span class="toctext">See also</span></a>,
 <a href="#References"><span class="tocnumber">6</span> <span class="toctext">References</span></a>,
 <a href="/w/index.php?title=List_of_parks_in_San_Francisco&amp

In [9]:
list_district = []
for i in range(18, 38):
    district = soup.find_all('a')[i]
    list_district.append(district.text)

In [10]:
df = pd.DataFrame(list_district)
df.rename(columns={0 : 'Parks'}, inplace=True)
df.head()

Unnamed: 0,Parks
0,Golden Gate National Recreation Area
1,Alcatraz
2,China Beach
3,Fort Funston
4,Fort Mason


In [11]:
column_names = ['Parks', 'Latitude', 'Longitude'] 

sf_parks = pd.DataFrame(columns=column_names)
sf_parks['Parks'] = df['Parks']

In [12]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="sf_explorer")

lat = []
long = []

for address in sf_parks['Parks']:
    try:
        location = geolocator.geocode(address + ', sf', timeout = None)
        lat.append(location.latitude)
        long.append(location.longitude)
    except AttributeError:
        lat.append(0)
        long.append(0)
    
sf_parks['Latitude'] = lat
sf_parks['Longitude'] = long



In [13]:
sf_parks.head(20)

Unnamed: 0,Parks,Latitude,Longitude
0,Golden Gate National Recreation Area,37.849927,-122.517752
1,Alcatraz,37.826721,-122.422759
2,China Beach,37.788123,-122.490762
3,Fort Funston,37.719104,-122.503299
4,Fort Mason,37.806283,-122.428992
5,Fort Miley,37.782805,-122.505639
6,Lands End,37.783887,-122.506829
7,Ocean Beach,37.760314,-122.508219
8,The Presidio,37.798746,-122.464589
9,Baker Beach,37.793109,-122.483842


In [14]:
sf_parks.to_html('sf_parks_table.html')
subprocess.call(
    'wkhtmltoimage -f png --width 0 sf_parks_table.html sf_parks_table.png', shell=True)

127

In [15]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="sf_explorer")
address = 'San Francisco'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Francisco are 37.7790262, -122.4199061.


In [16]:
url='sf_parks_table.html'
sf_data=pd.read_html(url, header=0)[0]
sf_data.head()

Unnamed: 0.1,Unnamed: 0,Parks,Latitude,Longitude
0,0,Golden Gate National Recreation Area,37.849927,-122.517752
1,1,Alcatraz,37.826721,-122.422759
2,2,China Beach,37.788123,-122.490762
3,3,Fort Funston,37.719104,-122.503299
4,4,Fort Mason,37.806283,-122.428992
