# The Battle of the Neighborhoods (Week 1)
### Applied Data Science Capstone by IBM/ Coursera

## Table of contents
* [A: Introduction](#introduction)
* [B: Data](#data)
* [C: References](#references)

## A: Introduction <a name="introduction"></a>

### A.1: Background

London is considered to be one of the world's most important global cities and has been called the world's most powerful, most desirable, most influential, most visited, most expensive, innovative, sustainable, most investment-friendly, and most-popular-for-work city.[1]

Every year, thousands of people make the move to London both from within the UK and from overseas. They decide to move and settle down in London due to many reasons such as work commitment changes, looking for better living conditions, etc. However, there are certain things they have to consider before moving in. London housing and rental prices are among the highest in the world and can eat up to a significant portion of their income. Furthermore, other living costs such as public transport fares, owning a vehicle & driving are not cheap either. Considering these and many other facts, it is relatively tough matter to decide to where to settle down within London.

### A.2: Problem

London has 32 boroughs which vary from each other by many aspects: cost of living, housing prices, crime rates, etc. to name a few.  Therefore, our problem here would be to **find out the best London borough to live** considering various facts & environments mentioned above.

### A.3: Interest

Any newcomer has to educate themselves beforehand about the things mentioned above to decide the best place for them to settle down. Furthermore, it is for any real estate agent’s advantage that they are well updated on such matters whenever a client contacts them with such enquiry. Also, this knowledge will be welcomed by any property developer as it helps them on deciding best places to build their next housing scheme.

## B: Data <a name="data"></a>

### B.1 Data Description

Based on the definition of our problem, there are certain data sets we have to consider:
* Various facts about the 32 London boroughs; ex: cost of living, house price, crime rate
* Other facilities that make them desirable neighbourhoods; ex: parks, shopping malls, attractions
* Since we are intend to elaborate these data using a choropleth map, we need to have boundaries of London boroughs

### B.2 Data Sources

Following data sources will be used to extract or generate the required information:
* Data and facts about London boroughs will be obtained as a .csv file from **London Borough Profiles and Atlas, London Data Store** website[2]
* Important venues and other desired locations will be obtained using **Foursquare API**[3]
* London borough boundaries will be obtained as a .json file from **Statistical GIS Boundary Files for London, London Data Store** website[4]

### B.3 Obtaining Data and Fine Tuning

Let us first download the London borough profiles data set

In [2]:
import pandas as pd
import numpy as np
import requests
from geopy.geocoders import Nominatim

In [5]:
#!wget -O londonborough.csv https://data.london.gov.uk/download/london-borough-profiles/c1693b82-68b1-44ee-beb2-3decf17dc1f8/london-borough-profiles.csv

df = pd.read_csv('londonborough.csv', encoding = 'windows-1252')
df.head()

Unnamed: 0,Code,Area_name,Inner/_Outer_London,GLA_Population_Estimate_2017,GLA_Household_Estimate_2017,Inland_Area_(Hectares),Population_density_(per_hectare)_2017,"Average_Age,_2017","Proportion_of_population_aged_0-15,_2015","Proportion_of_population_of_working-age,_2015",...,Happiness_score_2011-14_(out_of_10),Anxiety_score_2011-14_(out_of_10),Childhood_Obesity_Prevalance_(%)_2015/16,People_aged_17+_with_diabetes_(%),Mortality_rate_from_causes_considered_preventable_2012/14,Political_control_in_council,Proportion_of_seats_won_by_Conservatives_in_2014_election,Proportion_of_seats_won_by_Labour_in_2014_election,Proportion_of_seats_won_by_Lib_Dems_in_2014_election,Turnout_at_2014_local_elections
0,E09000001,City of London,Inner London,8800,5326,290,30.3,43.2,11.4,73.1,...,6.0,5.6,,2.6,129,.,.,.,.,.
1,E09000002,Barking and Dagenham,Outer London,209000,78188,3611,57.9,32.9,27.2,63.1,...,7.1,3.1,28.5,7.3,228,Lab,0,100,0,36.5
2,E09000003,Barnet,Outer London,389600,151423,8675,44.9,37.3,21.1,64.9,...,7.4,2.8,20.7,6.0,134,Cons,50.8,.,1.6,40.5
3,E09000004,Bexley,Outer London,244300,97736,6058,40.3,39.0,20.6,62.9,...,7.2,3.3,22.7,6.9,164,Cons,71.4,23.8,0,39.6
4,E09000005,Brent,Outer London,332100,121048,4323,76.8,35.6,20.9,67.8,...,7.2,2.9,24.3,7.9,169,Lab,9.5,88.9,1.6,36.3


We decided to drop some unnecessary columns from our dataset

In [7]:
ft1_df = df.drop(['GLA_Population_Estimate_2017','GLA_Household_Estimate_2017','Inland_Area_(Hectares)','Population_density_(per_hectare)_2017','Average_Age,_2017','Proportion_of_population_aged_0-15,_2015','Proportion_of_population_of_working-age,_2015','Proportion_of_population_aged_65_and_over,_2015','Net_internal_migration_(2015)','Net_international_migration_(2015)','Net_natural_change_(2015)','%_of_resident_population_born_abroad_(2015)','Largest_migrant_population_by_country_of_birth_(2011)','%_of_largest_migrant_population_(2011)','Second_largest_migrant_population_by_country_of_birth_(2011)','%_of_second_largest_migrant_population_(2011)','Third_largest_migrant_population_by_country_of_birth_(2011)','%_of_third_largest_migrant_population_(2011)','%_of_population_from_BAME_groups_(2016)','%_people_aged_3+_whose_main_language_is_not_English_(2011_Census)','Overseas_nationals_entering_the_UK_(NINo),_(2015/16)','New_migrant_(NINo)_rates,_(2015/16)','Largest_migrant_population_arrived_during_2015/16','Second_largest_migrant_population_arrived_during_2015/16','Third_largest_migrant_population_arrived_during_2015/16','Employment_rate_(%)_(2015)','Male_employment_rate_(2015)','Female_employment_rate_(2015)','Unemployment_rate_(2015)','Youth_Unemployment_(claimant)_rate_18-24_(Dec-15)','Proportion_of_16-18_year_olds_who_are_NEET_(%)_(2014)','Proportion_of_the_working-age_population_who_claim_out-of-work_benefits_(%)_(May-2016)','%_working-age_with_a_disability_(2015)','Proportion_of_working_age_people_with_no_qualifications_(%)_2015','Proportion_of_working_age_with_degree_or_equivalent_and_above_(%)_2015','Gross_Annual_Pay,_(2016)','Gross_Annual_Pay_-_Male_(2016)','Gross_Annual_Pay_-_Female_(2016)','Modelled_Household_median_income_estimates_2012/13','%_adults_that_volunteered_in_past_12_months_(2010/11_to_2012/13)','Number_of_jobs_by_workplace_(2014)','%_of_employment_that_is_in_public_sector_(2014)','Jobs_Density,_2015','Two-year_business_survival_rates_(started_in_2013)','Ambulance_incidents_per_hundred_population_(2014)','New_Homes_(net)_2015/16_(provisional)','Homes_Owned_outright,_(2014)_%','Being_bought_with_mortgage_or_loan,_(2014)_%','Rented_from_Local_Authority_or_Housing_Association,_(2014)_%','Rented_from_Private_landlord,_(2014)_%','Total_carbon_emissions_(2014)','Household_Waste_Recycling_Rate,_2014/15','Number_of_cars,_(2011_Census)','Number_of_cars_per_household,_(2011_Census)','%_of_adults_who_cycle_at_least_once_per_month,_2014/15','Rates_of_Children_Looked_After_(2016)','%_of_pupils_whose_first_language_is_not_English_(2015)','%_children_living_in_out-of-work_households_(2015)','Male_life_expectancy,_(2012-14)','Female_life_expectancy,_(2012-14)','Teenage_conception_rate_(2014)','Anxiety_score_2011-14_(out_of_10)','Childhood_Obesity_Prevalance_(%)_2015/16','People_aged_17+_with_diabetes_(%)','Mortality_rate_from_causes_considered_preventable_2012/14','Political_control_in_council','Proportion_of_seats_won_by_Conservatives_in_2014_election','Proportion_of_seats_won_by_Labour_in_2014_election','Proportion_of_seats_won_by_Lib_Dems_in_2014_election','Turnout_at_2014_local_elections'], axis=1)
ft1_df.head()

Unnamed: 0,Code,Area_name,Inner/_Outer_London,"Number_of_active_businesses,_2015",Crime_rates_per_thousand_population_2014/15,Fires_per_thousand_population_(2014),"Median_House_Price,_2015","Average_Band_D_Council_Tax_charge_(£),_2015/16","%_of_area_that_is_Greenspace,_2005","Average_Public_Transport_Accessibility_score,_2014","Achievement_of_5_or_more_A*-_C_grades_at_GCSE_or_equivalent_including_English_and_Maths,_2013/14",Life_satisfaction_score_2011-14_(out_of_10),Worthwhileness_score_2011-14_(out_of_10),Happiness_score_2011-14_(out_of_10)
0,E09000001,City of London,Inner London,26130,.,12.3,799999,931.2,4.8,7.9,78.6,6.6,7.1,6.0
1,E09000002,Barking and Dagenham,Outer London,6560,83.4,3.0,243500,1354.03,33.6,3.0,58.0,7.1,7.6,7.1
2,E09000003,Barnet,Outer London,26190,62.7,1.6,445000,1397.07,41.3,3.0,67.3,7.5,7.8,7.4
3,E09000004,Bexley,Outer London,9075,51.8,2.3,275000,1472.43,31.7,2.6,60.3,7.4,7.7,7.2
4,E09000005,Brent,Outer London,15745,78.8,1.8,407250,1377.24,21.9,3.7,60.1,7.3,7.4,7.2


Now we are going to rename some long column names to make them better looking

In [8]:
ft2_df = ft1_df.rename(columns = {"Area_name":"Burough", "Inner/_Outer_London":"Inner_Outer", "Number_of_active_businesses,_2015":"Businesses", "Crime_rates_per_thousand_population_2014/15":"Crime_rate", "Fires_per_thousand_population_(2014)":"Fires_rate", "Median_House_Price,_2015":"House_Price", "Average_Band_D_Council_Tax_charge_(£),_2015/16":"Council_tax", "%_of_area_that_is_Greenspace,_2005":"Greenspace", "Average_Public_Transport_Accessibility_score,_2014":"Public_Transport", "Achievement_of_5_or_more_A*-_C_grades_at_GCSE_or_equivalent_including_English_and_Maths,_2013/14":"GCSE", "Life_satisfaction_score_2011-14_(out_of_10)":"Life_satisfaction", "Worthwhileness_score_2011-14_(out_of_10)":"Worthiness", "Happiness_score_2011-14_(out_of_10)":"Happiness"})
ft2_df.head()

Unnamed: 0,Code,Burough,Inner_Outer,Businesses,Crime_rate,Fires_rate,House_Price,Council_tax,Greenspace,Public_Transport,GCSE,Life_satisfaction,Worthiness,Happiness
0,E09000001,City of London,Inner London,26130,.,12.3,799999,931.2,4.8,7.9,78.6,6.6,7.1,6.0
1,E09000002,Barking and Dagenham,Outer London,6560,83.4,3.0,243500,1354.03,33.6,3.0,58.0,7.1,7.6,7.1
2,E09000003,Barnet,Outer London,26190,62.7,1.6,445000,1397.07,41.3,3.0,67.3,7.5,7.8,7.4
3,E09000004,Bexley,Outer London,9075,51.8,2.3,275000,1472.43,31.7,2.6,60.3,7.4,7.7,7.2
4,E09000005,Brent,Outer London,15745,78.8,1.8,407250,1377.24,21.9,3.7,60.1,7.3,7.4,7.2


## C: References <a name="references"></a>

* [1] London Wikipedia Page, https://en.wikipedia.org/wiki/London
* [2] London Borough Profiles and Atlas, London Data Store, https://data.london.gov.uk/dataset/london-borough-profiles
* [3] Foursquare API, https://developer.foursquare.com/
* [4] Statistical GIS Boundary Files for London, London Data Store, https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london