# Capstone Project - The Battle of Neighborhoods

## Introduction/ Business Problem section

#### Circumstance: Accommodation problem in London, Ontario.

- In this capstone, we reach out to stakeholders who are looking for the most convenient and well-located areas with good public amenities and service in London, Ontario. Specifically, We are going to explore the neighborhoods in London and specify the number of residential areas with a prosperous economy. 

- Our stakeholders don't know what kind of effective business patterns they should startup, so they also want us to suggest some of the potential economies in the recommended areas. More crucially, the number of influential factors will be **the stakeholder's affordability** for **the cost of living** and **the price of the real estate in the desired areas**. Since we primarily focus on using Foursquare to discover the most ideal living area in this capstone, we will assume these factors are in the available budget of our stakeholders. 

- Working as a data scientist, we will manipulate the power of data to generate the most feasible and promising neighborhoods based on the listed above criteria. It will be expected that the upsides and downsides will be also comprehensively listed out so that the best deliverables can be used to help our stakeholders make their final decision.

- This project will significantly target the potential stakeholders who have a desire to settle down and run their own business in a residential area with good living conditions.

**Problem**
1. Which area is the best to settle down in London, Ontario?
2. How close should the living area be to the surrounded public services?
3. What kind of potential business pattern should be recommended?

## Data section

Based on our defined circumstance, there will be a number of factors that will have impacts on our decision:
- The number of neighborhoods that need to be taken a look at in London, Ontario. 
- The distance from the living areas to the other venues within the neighborhood.
- The number of available business patterns within the neighborhood.

As listed above, the following data sources will be needed to generate the required information:
- The location and coordinates of each neighborhood in London, Ontario will be scraped from **[webage](http://www.geonames.org/postalcode-search.html?q=london&country=CA&adminCode1=ON&fbclid=IwAR2XipWkuSm3F9YSjjVvFqp7SfYCPl9_XaxiehoPnn-7XmsjtnJBrbKh31g)** by using **Pandas function/ BeautifulSoup**.
- The number of venues and their categories within each neighborhood are extracted by using **FourSquare API**.
- **The extracted venue categories** can be used as a **separated dataset** to build the **recommender system** for the suggestion of business patterns.   

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import seaborn as sb
import folium
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup

In [2]:
url = 'http://www.geonames.org/postalcode-search.html?q=london&country=CA&adminCode1=ON&fbclid=IwAR2XipWkuSm3F9YSjjVvFqp7SfYCPl9_XaxiehoPnn-7XmsjtnJBrbKh31g'
html_file = requests.get(url).text
soup = BeautifulSoup(html_file,'html.parser')

In [15]:
pc = []
borough = []
neigh = []
lat = []
lng = []

iterrows = soup.find('table', class_ = 'restable').find_all('tr')
for rows in iterrows[1::2]:
    if len(rows) > 1:
        temp = rows.find_all('td')[1:9]
        lst_rows = temp[0:2] + [temp[-1]]
        pc.append(lst_rows[1].text)
        borough.append(lst_rows[0].text.split('(')[0].strip())
        try:
            neigh.append(lst_rows[0].text.split('(')[1].strip(')'))
        except:
            neigh.append(lst_rows[0].text.split('(')[0].strip())
        lat.append(eval(lst_rows[-1].small.text.split('/')[0]))
        lng.append(eval(lst_rows[-1].small.text.split('/')[1]))        
#         for i in lst_rows:
#             print(i.text)
#         print(lst_rows)
            
    
# This Code is used to test for the data getting from the webpage
# rows = soup.find('table', class_ = 'restable').find_all('tr')
# temp = rows[1].find_all('td')[1:9]
# lst = temp[0:2] + [temp[-1]]
# lst[0].text.split('(')
# eval(rows[2].small.text.split('/')[0])

# Check the total length in each rows
# for i in rows[1:]:
#     temp = i.find_all('td')
#     print(len(i.find_all('td')))

london_df = pd.DataFrame({'Postal Code': pc,'Borough': borough,'Neighborhood': neigh,'Latitude':lat,'Longitude':lng})
london_df.to_csv('London_ON_Canada.csv')
london_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,N5Y,London,West Huron Heights / Carling,43.012,-81.231
1,N5Z,London,Glen Cairn,42.966,-81.205
2,N6L,London,East Tempo,42.872,-81.247
3,N5W,London East,SW Argyle / Hamilton Road,42.986,-81.182
4,N6H,London West,Central Hyde Park / Oakridge,42.991,-81.34
5,N6J,London,Southcrest / East Westmount / West Highland,42.955,-81.273
6,N6M,London,Jackson / Old Victoria / Bradley / North Highbury,42.963,-81.139
7,N5V,London,YXU / North and East Argyle / East Huron Heights,43.023,-81.164
8,N5X,London,Fanshawe / Stoneybrook / Stoney Creek / Upland...,43.044,-81.239
9,N6C,London South,East Highland / North White Oaks / North Westm...,42.958,-81.238


In [13]:
# address = 'London, ON, Canada'
# geolocator = Nominatim(user_agent="lon_explorer")
# location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London, Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London, Ontario are 42.9836747, -81.2496068.


In [24]:
london_map = folium.Map([latitude,longitude], zoom_start = 11)
for lat,lng,pc,neigh in zip(london_df.Latitude, london_df.Longitude,london_df['Postal Code'],london_df.Neighborhood):
    label = f'{neigh}\n({pc})'
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat,lng],
        radius = 5,
        popup = label,
        color = 'Blue',
        fill = True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(london_map)
    
london_map

In [25]:
CLIENT_ID = 'yAHPPBLZ4QX43IMOBWWZPQPW2GO2PC403TLIIJPTXEUDV1PGJ' # your Foursquare ID
CLIENT_SECRET = 'OWL05PQU02RVOHOS45IIGJ3SCHJOYOGUFNOOL21SPROSTZ1I' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
ACCESS_TOKEN = 'OZ0KJZ4ZYJPGUBRGQPDA2QXNO5MTBC4IG2SVMXP2K0LBGBQM'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
print('ACCESS_TOKEN:' + ACCESS_TOKEN)

Your credentails:
CLIENT_ID: yAHPPBLZ4QX43IMOBWWZPQPW2GO2PC403TLIIJPTXEUDV1PGJ
CLIENT_SECRET:OWL05PQU02RVOHOS45IIGJ3SCHJOYOGUFNOOL21SPROSTZ1I
ACCESS_TOKEN:OZ0KJZ4ZYJPGUBRGQPDA2QXNO5MTBC4IG2SVMXP2K0LBGBQM


In [28]:
def getNearbyVenues(pc, names, latitude, longitude, radius = 500):
    venues_lst = []
    for pc, names, lat, lng in zip(pc, names, latitude, longitude):
        print(names)
        url = f'https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={lat},{lng}&oauth_token={ACCESS_TOKEN}&radius={radius}&limit={LIMIT}'
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_lst.append([(
            pc,
            names,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_lst in venues_lst for item in venue_lst])
    nearby_venues.columns = ['Postal Code',
                            'Neighborhood',
                            'Neighborhood Latitude',
                            'Neighborhood Longitude',
                            'Venue',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category']
    return(nearby_venues)

In [29]:
venues_df = getNearbyVenues(london_df['Postal Code'], london_df.Neighborhood, london_df.Latitude, london_df.Longitude)
venues_df.head()

West Huron Heights / Carling
Glen Cairn
East Tempo
SW Argyle / Hamilton Road
Central Hyde Park / Oakridge
Southcrest / East Westmount / West Highland
Jackson / Old Victoria / Bradley / North Highbury
YXU / North and East Argyle / East Huron Heights
Fanshawe / Stoneybrook / Stoney Creek / Uplands / East Masonville
East Highland / North White Oaks / North Westminster
South White Oaks / Central Westminster / East Longwoods / West Brockley
Sunningdale / West Masonville / Medway / NE Hyde Park / East Fox Hollow
Riverbend / Woodhull / North Sharon Creek / Byron / West Westmount
South Highbury / Glanworth / East Brockley / SE Westminster
Talbot / Lambeth / West Tempo / South Sharon Creek
London Central
UWO


Unnamed: 0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,N5Y,West Huron Heights / Carling,43.012,-81.231,Ace Mini Mart,43.010576,-81.226004,Convenience Store
1,N5Y,West Huron Heights / Carling,43.012,-81.231,Signal88 Security of London Ontario,43.008695,-81.227249,Home Service
2,N5Y,West Huron Heights / Carling,43.012,-81.231,huron heights park,43.01053,-81.225235,Park
3,N5Z,Glen Cairn,42.966,-81.205,South West Agents,42.968419,-81.208899,Business Service
4,N6L,East Tempo,42.872,-81.247,Murphy Contracting Co,42.872958,-81.252495,Construction & Landscaping


In [43]:
cate_df = venues_df.groupby('Venue Category').count().sort_values(by='Venue',ascending=False)
cate_df.to_csv('Venue_Categories_in_London.csv')
cate_df.head(10)

Unnamed: 0_level_0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Insurance Office,12,12,12,12,12,12,12
Coffee Shop,6,6,6,6,6,6,6
Hotel,4,4,4,4,4,4,4
Park,4,4,4,4,4,4,4
Pharmacy,4,4,4,4,4,4,4
Construction & Landscaping,3,3,3,3,3,3,3
Home Service,2,2,2,2,2,2,2
Lawyer,2,2,2,2,2,2,2
Gas Station,2,2,2,2,2,2,2
Pet Store,2,2,2,2,2,2,2


In [44]:
print(f'Number of unique categories: {len(venues_df["Venue Category"].unique())}')

Number of unique categories: 63


In [45]:
onehot = pd.get_dummies(venues_df[['Venue Category']], prefix = "", prefix_sep = "")
onehot

Unnamed: 0,African Restaurant,Asian Restaurant,Athletics & Sports,Auto Garage,Bar,Beer Store,Breakfast Spot,Burger Joint,Business Service,Café,...,Restaurant,Salon / Barbershop,Sandwich Place,Soup Place,Sporting Goods Shop,Sports Bar,Storage Facility,Supermarket,Thai Restaurant,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
96,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
97,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
98,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
