# Applied Data Science Capstone
___

The objective of this notebook is to perform a capstone project with location data provided by Fourquare API.  

## Import Primary Libraries

Import basic data science libraries:

In [1]:
import pandas as pd
import numpy as np

print('Hello Capstone Project Course!')

Hello Capstone Project Course!


Import libraries for web scraping, API requests and data visualization:

In [2]:
import folium
import requests, json
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans

## Web Scraping

Request web page from Wikipedia:

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
site = requests.get(url)

Create a `BeautifulSoup` HTML parser as `soup`:

In [4]:
soup = BeautifulSoup(site.content, 'html.parser')

Use `soup` to select CSS and insert data from table into `dataset` list:

In [5]:
# scrapy table rows
trs = soup.select(".wikitable tr")

# function to split row in list of cells and filter the empty cells
def clear_row(row):
    return list(filter(lambda cell: cell != '', row.split('\n')))

In [6]:
# create dataset
ds = list()

# iterate through each table row, except the heading
for tr in trs[1:]:
    ds.append(clear_row(tr.text)) # append row to dataset

Create and populate `df` dataframe with `dataset` data:

In [7]:
arr = np.array(ds)

In [8]:
df = pd.DataFrame(arr, columns = clear_row(trs[0].text))

## Geolocation Data

In [9]:
geo = pd.read_csv('Geospatial_Coordinates.csv')

In [10]:
df = pd.merge(df,geo, on='Postal Code')

In [11]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## Data Wrangling

Remove "Not assigned" cells from "Borough" column:

In [12]:
df['Borough'].replace('Not assigned', np.NaN, inplace = True)

In [13]:
df.dropna(subset = ['Borough'],inplace = True)

In [14]:
df.rename(columns = {'Neighbourhood': 'Neighborhood'},inplace = True)

Mask "Not assigned" cells from "Neighborhood" column:

In [15]:
df['Neighborhood'].mask(lambda cell: cell == 'Not assigned', df['Borough'], inplace = True)

In [16]:
df.shape

(103, 5)

In [17]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## Explore and Cluster

In [18]:
X = df[['Latitude','Longitude']]
k_means = KMeans(init="k-means++", n_clusters=4, n_init=12)
k_means.fit(X)

KMeans(n_clusters=4, n_init=12)

In [19]:
print('labels:\n',k_means.labels_)
print('labels:\n',np.unique(k_means.labels_,return_counts=True))
print('centers:\n',k_means.cluster_centers_)
print('inertia:\n',k_means.inertia_)

labels:
 [2 2 3 0 3 1 2 0 3 3 0 1 2 3 3 3 0 1 2 3 3 0 2 3 3 3 2 0 0 3 3 3 2 0 0 3 3
 3 2 0 0 3 3 3 2 0 1 3 3 1 1 2 0 1 3 0 1 1 2 0 1 0 0 1 1 2 0 0 0 1 1 2 0 0
 3 1 1 1 2 3 3 1 2 3 3 2 3 3 1 1 2 3 3 1 1 2 3 3 1 3 3 1 1]
labels:
 (array([0, 1, 2, 3]), array([22, 25, 19, 37], dtype=int64))
centers:
 [[ 43.74254792 -79.41366641]
 [ 43.68059059 -79.52478493]
 [ 43.76342274 -79.25682511]
 [ 43.66807421 -79.37315645]]
inertia:
 0.27346033391168917


## Foursquare Location Data

**Please insert your own credentials.**

Defining Foursquare API credential and version as constant parameters:

In [20]:
CLIENT_ID = 'insert your client id here'
CLIENT_SECRET = 'insert your client secret here'
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

Defining specific variable parameters for service request in Foursquare API:

In [21]:
group = 'venues'
endpoint = 'search'

Requesting data with Foursquare API and getting response data: 

In [22]:
url = f'https://api.foursquare.com/v2/{group}/{endpoint}'
data = list()
for center in k_means.cluster_centers_:
    params = dict(
        client_id = CLIENT_ID,
        client_secret = CLIENT_SECRET,
        v = VERSION,
        ll = f'{center[0]},{center[1]}',
        limit = 3,
        radius = 4000
    )
    response = requests.get(url = url, params = params)
    data.append(json.loads(response.text))

## Data Visualization

In [23]:
city_map = folium.Map(location=[43.7, -79.4], zoom_start=12)

idx = 0
names = ['A','B','C','D']

# add markers to map
for lat, lng in k_means.cluster_centers_:
    
    for venue in data[idx]['response']['venues']:
        folium.Marker(
            [venue['location']['lat'], venue['location']['lng']],
            popup = venue['name'],
            fill_opacity = 0.1
        ).add_to(city_map)
    
    label = folium.Popup('', parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=12,
        tooltip=names[idx],
        color='yellow',
        fill=True,
        fill_collor='black',
        fill_opacity=0.3,
        parse_html=True
    ).add_to(city_map)
    
    idx += 1
    
city_map