# Coursera Capstone Project

This notebook will be used for the capstone project.

### Criteria

For **first week**, you will required to submit the following:
* A description of the problem and a discussion of the background. (15 marks)
* A description of the data and how it will be used to solve the problem. (15 marks)

For the **second week**, the final deliverables of the project will be:
* A link to your Notebook on your Github repository, showing your code. (15 marks)
* A full report consisting of all of the following components (15 marks):
    * Introduction where you discuss the business problem and who would be interested in this project.
    * Data where you describe the data that will be used to solve the problem and the source of the data.
    * Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
    * Results section where you discuss the results.
    * Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
    * Conclusion section where you conclude the report.
* Your choice of a presentation or blogpost. (10 marks)

In [9]:
# Importing libraries

import pandas as pd
import numpy as np
import random
import requests

# module to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 

# modules to work with geodata
import geopandas as gp
from geopandas.tools import geocode
import folium

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import json

# import tools for webscraping
from bs4 import BeautifulSoup
from urllib.request import urlopen
import urllib

## 1. Problem Description and Background Discussion

### 1.1 Problem Description
As part of the Capstone Project for the Applied Data Science Coursera Course I have chosen to analyze the effectiveness of the Business Improvement Area (BIA) Program of Toronto, ON in Canada. The question I will answer is: **„Does the BIA help venues to get better ratings on Foursquare?“** To answer the questions I will compare the ratings of venues that lie within the boundaries of BIA’s to ratings of venues in the areas surrounding the BIA’s. 

### 1.2 Background Discussion
The **Business Improvement Area (BIA)** is an association of commercial property owners and tenants within a defined area who work in partnership with the City to create thriving, competitive, and safe business areas that attract shoppers, diners, tourists, and new businesses. The question is how effective this association and the created Areas are for attracting shoppers, diners, tourists and new business. 

## 2. Data Description 

### 2.1 Description of Data and Data Source
The BIA layer represents the active BIAs in the City of Toronto that has been enacted by Council. Each BIA has been defined by a by-law and is represented by a Board of Management. The layer is updated as BIAs are created, amended or deleted by Council. This file is a polygon file that shows the BIAs Areas. 
The second part of the data for the analysis comes via the Foursquare API. This dataset contains venues located in Toronto, there location, name, venue category and user rating.

### 2.2 How will the Data be used to solve the Problem
...


### Getting the BIAs Data

Via the API provided by the City of Toronto 

In [97]:
# Get the dataset metadata by passing package_id to the package_search endpoint
# For example, to retrieve the metadata for this dataset:

url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show"
params = { "id": "9edb9628-1213-42bd-8352-5c4ed28e9e42"}
response = urllib.request.urlopen(url, data=bytes(json.dumps(params), encoding="utf-8"))
package = json.loads(response.read())

# Get the data by passing the resource_id to the datastore_search endpoint
# See https://docs.ckan.org/en/latest/maintaining/datastore.html for detailed parameters options
# For example, to retrieve the data content for the first resource in the datastore:

for idx, resource in enumerate(package["result"]["resources"]):
    if resource["datastore_active"]:
        url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/datastore_search"
        p = { "id": resource["id"] }
        r = urllib.request.urlopen(url, data=bytes(json.dumps(p), encoding="utf-8"))
        data = json.loads(r.read())
        df_BIAs = pd.DataFrame(data["result"]["records"])
        break
df_BIAs.head()

Unnamed: 0,_id,AREA_ID,DATE_EFFECTIVE,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
0,3162,2481845,2020-02-04T17:20:36,26006945,,059-00,059-00,College Promenade,College Promenade,310882.714,4834896.895,-79.424591,43.653991,17569265,121142.2,2553.921411,"{""type"": ""Polygon"", ""coordinates"": [[[-79.4203..."
1,3163,2481844,2020-02-04T17:20:36,26006944,,109-01,109-01,CityPlace and Fort York,CityPlace and Fort York,312780.016,4833181.751,-79.401095,43.638534,17569281,1192955.0,5327.234526,"{""type"": ""Polygon"", ""coordinates"": [[[-79.4103..."
2,3164,2481843,2020-02-04T17:20:36,26006943,,065-00,065-00,Chinatown,Chinatown,313028.145,4834884.119,-79.397994,43.653855,17569297,293221.4,4749.59543,"{""type"": ""Polygon"", ""coordinates"": [[[-79.4006..."
3,3165,2481842,2020-02-04T17:20:36,26006942,,012-02,012-02,Cabbagetown,Cabbagetown,315349.971,4836048.305,-79.369187,43.664305,17569313,315613.3,5933.750217,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[-7..."
4,3166,2481841,2020-02-04T17:20:36,26006941,,027-00,027-00,Broadview Danforth,Broadview Danforth,316338.822,4837427.069,-79.356896,43.676701,17569329,194080.8,3085.887431,"{""type"": ""Polygon"", ""coordinates"": [[[-79.3528..."


In [98]:
df_BIAs.shape

(83, 17)

In [99]:
df_BIAs.sort_values(by = ['Shape__Area'], ascending = False, inplace = True)

In [100]:
# dropping BIAs that are out of the central area of Toronto (Albion Islington Square, 79, Wilson Village,  28, Sheppard East Village, 15, Kennedy Road,  53, Wexford Heights, 20, Crossroads of the Danforth 76)

df_BIAs.drop([65, 70, 15, 20, 28, 53, 76, 79], inplace = True)

In [110]:
df_BIAs.reset_index(drop = True, inplace = True)

In [111]:
df_BIAs.shape

(75, 17)

### Plotting the BIA-Areas on a map

In [105]:
# get location for map centering from Downtown Toronto
address = 'Toronto, Downtown'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto Downtown are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto Downtown are 43.6541737, -79.38081164513409.


In [142]:
# create a open street map, center it on a location using latitude and longitude and give it a starting zoom factor
m = folium.Map(location = [latitude, longitude], tiles = 'Stamen Toner', zoom_start = 12)

# create a feature group for the map
fg = folium.map.FeatureGroup(name='BIAs').add_to(m)

# add geojson data for the BIAs to map
for i in range(len(df_BIAs['geometry'])):
    b = folium.GeoJson(df_BIAs['geometry'][i])
    b.add_child(folium.Popup(df_BIAs['AREA_NAME'][i]))
    fg.add_child(b)
    
#folium.features.GeoJsonPopup(
#    fields=[df_BIAs['AREA_NAME']],
#    labels=False 
#    ).add_to(m)    
    
folium.LayerControl().add_to(m)
    
# display the map
m

### API for accessing foursquare data

In [143]:
CLIENT_ID = '5MEM4YM205NTQBOMWUQX00NHLMW2GJGAV2OPGIHK55JSJKFU' # your Foursquare ID
CLIENT_SECRET = 'XQ34UGNCZTPZWQFKCIVSYLXHK533UR24OSHJ1BKLE2SSZTT3' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5MEM4YM205NTQBOMWUQX00NHLMW2GJGAV2OPGIHK55JSJKFU
CLIENT_SECRET:XQ34UGNCZTPZWQFKCIVSYLXHK533UR24OSHJ1BKLE2SSZTT3
