**CAPSTONE** **PROJECT: BATTLE OF THE NEIGHBORHOODS**

United Kingdom Visitors and Expatriates Venue Recommendation

I. **PURPOSE**

This document provides the details of my final peer reviewed assignment for the IBM Data Science Professional Certificate program  

**INTRODUCTION** 

United Kingdom is one of the most visited countries in Europe, commonly known as the United Kingdom (UK) or Britain. There are a lot of websites where travelers can check and retrieve recommendations of places to stay or visit. However, most of these websites provides recommendation simply based on usual tourist attractions or key residential areas that are mostly expensive or already known for travelers based on certain keywords like \"Hotel\", or \"Backpackers\" etc. The intention on this project is to collect and provide a data driven recommendation that can supplement the recommendation with statistical data. This will also be utilizing data retrieved from the UK  open data sources and FourSquare API venue recommendations.

The sample recommender in this notebook will provide the following use case scenario

* A person planning to visit the UK as a Tourist or an Expat and looking for a reasonable accommodation.
* The user wants to receive venue recommendation where he can stay or rent an HDB apartment with close proximity to places of interest or search category option
* The recommendation should not only present the most viable option, but also present a comparison table of all possible town venues.

For this demonstration, this notebook will make use of the following data:
* The UK Median Rental Prices by town.
* Popular Food venues in the vicinity. (Sample category selection)
    
Note: While this demo makes use of Food Venue Category, Other possible categories can also be used for the same implementation such as checking categories like:
* Outdoors and Recreation
* Nightlife
* Nearby Schools, etc.
            
            
I will limit the scope of this search as FourSquare API only allows 50 free venue query limit per day when using a free user access.            

DATA ACQUISITION

This demonstration will make use of the following data sources:

The UK Towns Median house prices .
Data will retrieved from the UK open dataset from <a href='https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/49810/582.xls'>median rent by town and flattype</a> from https://data.gov.sg website. 

The original data source contains Median house prices based  from 1996 up to 2nd quarter of 2013. I will retrieve rental the most recent recorded rental prices from this data source (Q2 2013) being the most relevant price available at this time. For this demonstration, I will simplify the analysis by using the average rental prices of all available flat type.

Uk Towns location data retrieved using Google maps API.
Data coordinates of Town Venues will be retrieved using google API. I also make use of MRT stations coordinate as a more important center of for all towns included in venue recommendations.

The UK Top Venue Recommendations from FourSquare API
(FourSquare website: www.foursquare.com)

I will be using the FourSquare API to explore neighborhoods in selected towns in UK. The Foursquare explore function will be used to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters.  The following information are retrieved on the first query:
* Venue ID
* Venue Name
* Coordinates : Latitude and Longitude
* Category Name

Another venue query will be performed to retrieve venue ratings for each location. Note that rating information is a paid service from FourSquare and we are limited to only 50 queries per day. With this constraint, we limit the category analysis with only one type for this demo. I will try to retrieve as many ratings as possible for each retrieved venue ID. 

**METHODOLOGY**

The UK Towns List with Median house   prices.
The source data contains median rental prices of The UK from 1996 up to 2nd quarter of 2013. I will retrive the most recent recorded rental prices from this data source (Q2 2013) being the most relevant price available at this time. For this demonstration, I will simplify the analysis by using the average rental prices of all available flat type.
**Data Cleanup and re-grouping.** The retrieved table contains some un-wanted entries and needs some cleanup.

The following tasks will be performed:
* Drop/ignore cells with missing data.
* Use most current data record.
* Fix data types.

**Importing Python Libraries**

This section imports required python libraries for processing data. <br>
While this first part of python notebook is for data acquisition, we will use some  of the libraries make some data visualization.

In [29]:
!conda install -c conda-forge folium=0.5.0 --yes # comment/uncomment if not yet installed.
!conda install -c conda-forge geopy --yes        # comment/uncomment if not yet installed
!pip install xlrd

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

# Numpy and Pandas libraries were already imported at the beginning of this notebook.
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library


print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Collecting xlrd
[?25l  Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB)
[K     |████████████████████████████████| 112kB 28.5MB/s eta 0:00:01
[?25hInstalling collected packages: xlrd
Successfully installed xlrd-1.2.0
Libraries imported.


In [56]:
df_can = pd.read_excel(
    'https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/49810/582.xls',
    Sheetname="Median_house_price",
    skiprows=range(20),
    skip_footer = 2)

  """


In [57]:
df_can.head()

Unnamed: 0.1,Unnamed: 0,Hartlepool UA,Unnamed: 2,37500,37450,40500,44500,39000,41391,40250,45950,40000,44750,44000,43950,46995,47000,45950.1,48000,49000,49000.1,42500,47500,42000,45000,51500,51500.1,45000.1,55750,55000,55000.1,43250,59500,58000,58950,47250,50000,58000.1,53000,53500,67250,75000,80000,83250,88950,93250,95000,95000.1,106750,110000,107750,109997.5,115000,101326,112750,86000,103000,107500,114975,104000,112000,110000.1,109000,106132.5,108975,102600,99850,95000.2,105000,109360,117750,113288,114575
0,,Middlesbrough UA,,43875,43000,44000.0,42000,44000,46500,47000,43000,43850,40975,42500,45500,44500,46500,48000,45000,44550,45000,46000.0,45000,42075,44000,46500,50250,45000,50000,45000,47975,40000,41000,57950,59500,50000,70000,70000,63998,70000,80500,90000,92000,97000,99450,100000,110000,105000,105000,110000,104250,99725,100296,90000,110000,108000,110000,110000,114000,89350,105000,110000,105000,99695,109972,101500,102553,98975,110000,113952,108000,100000,108500
1,,Northumberland UA,,47000,50000,50872.5,51000,51000,53500,53000,54975,53000,53500,55000,57000,52500,58750,59995,59950,58500,59900,64000.0,60000,55000,65000,67000,67325,61975,69950,75000,80000,74950,87000,93000,92500,96350,115500,125000,125998,116000,116000,125100,130000,122750,134750,147000,135000,135000,139000,141750,145000,137000,148000,150000,133750,127000,135000,140000,145000,140000,145000,147000,145000,131950,146500,134000,135500,130000,136498,145000,136421,127000,135000
2,,,Alnwick,46000,57000,57000.0,59950,55000,56250,58650,57500,55000,60000,56500,69750,50100,66500,72000,71975,65175,66350,89490.5,70000,73000,73000,85000,76000,85000,89995,95000,95000,119000,116000,130000,129995,124975,159000,178975,173000,170000,169000,160000,175000,154950,179500,190000,170000,180000,209248,185000,201000,186000,180000,199950,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
3,,,Berwick-upon-Tweed,43250,43500,46250.0,48000,45675,50000,44625,58000,55875,51000,51500,55250,55000,58500,57000,59000,56000,54000,66000.0,57500,59000,65000,67000,71000,62000,60000,79500,81500,68100,92000,90000,96950,110000,131750,141750,137000,107500,101500,116500,142500,125000,137500,150000,139950,147250,156000,165000,179000,185000,172000,163000,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
4,,,Blyth Valley,42000,41500,44000.0,43000,41000,43000,41000,47975,46000,44950,44000,48000,47950,49000,47950,49000,50250,49000,47750.0,49000,45975,53000,49950,54725,53950,57500,63500,67995,65000,70000,79250,73250,78500,89950,100000,93000,89950,92250,95000,104000,105998,114000,119000,123995,115875,121500,119950,124995,112000,118000,112000,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
