# Capstone Project - The battle of the neighborhood - week 1

#### 1.A description of the problem and a discussion of the background.
The problem we want to solve with this data science project is to find good schools in safe boroughs in London. London comprises of 32 boroughs.  
Finding a good school in a safe borough is an important task for many families with young children and teenagers. Many families decide to move borough when it comes to choose a school.  
Although there is a lot of interest in finding good state schools in London, it is quite difficult to have a real grasp of the situation. Detailed reports on this topic are available but they are not easy to read, especially for people moving from countries with a different school system and/or from non-native English-speaking countries. The info and statistics available are not provided in a user-friendly manner and this can be quite frustrating for families moving to London.


#### 2. A description of the data and how it will be used to solve the problem.
This project uses open source data from London government authorities regarding youth crime and high-school General Certificate of Secondary Education (GCSE) achievements to find the best performing schools in the safest London boroughs.

Each borough comprises of variuos neighborhoods. The neighboorhoods within a borough that meet the conditions (i.e. good schools and safety) will be explored using Foursquare API aiming to highlight the most common venues categories (e.g. restaurants, pharmacies, train stations etc.) in each neighborhood. The neighborhoods will be then clustered by similar characteristics (e.g. good transport links) using k-mean algorithm and Folium library to visualise the different clusters on the London borough map. This would give an insight on the quality of life in the relevant neighborhoods and thereby help families with their choices.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item2">Download and explore London government authorities data</a>

2. <a href="#item1">Scraping data from the web to create a dataset with London Borough and corresponding coordinates</a>  

3. <a href="#item3">Explore the safest Boroughs and their neighborhoods with highest achieving high schools in London</a>  

4. <a href="#item4">Analyze Each Neighborhood for the top borough (i.e. high achieving schools in a safe borough)</a>  

5. <a href="#item5">Cluster Neighborhoods</a>  

6. <a href="#item6">Examine Clusters and draw conclusions</a>  
</font>
</div>

### Import the main libraries 

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # to get latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#import the libraries BeautifulSoup and io
import bs4 as bs
from bs4 import BeautifulSoup
import io
import urllib.request


print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


## Download and explore London government authorities data

In [2]:
#Downloading youth_knife_crime per London borough and creating youth_knife_crime dataframe
##data available at https://www.london.gov.uk/mopac-disclosure-log/london-knife-crime-statistics
youth_knife_crime=pd.read_csv('London_borough_knife_crime .csv') 
youth_knife_crime

Unnamed: 0,London Borough,2017/18 Total,2017/18 With Injury,2018/2019 Total,2018/2019 With Injury,% Change in total 2017/18 - 2018/19
0,Westminster,650.0,169.0,985.0,199.0,0.5154
1,Southwark,866.0,318.0,777.0,239.0,-0.1028
2,Haringey,794.0,229.0,764.0,179.0,-0.0378
3,Newham,787.0,235.0,696.0,197.0,-0.1156
4,Brent,766.0,241.0,680.0,191.0,-0.1123
5,Tower Hamlets,715.0,217.0,667.0,221.0,-0.0671
6,Hackney,578.0,192.0,650.0,169.0,0.1246
7,Enfield,589.0,182.0,617.0,167.0,0.0475
8,Islington,631.0,188.0,578.0,153.0,-0.084
9,Lewisham,566.0,196.0,574.0,165.0,0.0141


In [3]:
#Downloading GCSEs_achievement per London borough and creating GCSEs_borough dataframe
#data available at https://data.london.gov.uk/dataset/gcse-results-by-borough

GCSEs_borough=pd.read_csv('GCSEs_results_borough.csv')
GCSEs_borough

Unnamed: 0,London Borough,% pupils achieving Math & English strong 9-5 pass,% pupils achieving Math & English standard 9-4 pass
0,Barking and Dagenham,40.2,60.0
1,Barnet,60.6,76.0
2,Bexley,51.0,69.1
3,Brent,50.9,69.5
4,Bromley,49.3,70.0
5,Camden,47.0,66.0
6,Croydon,42.2,63.1
7,Ealing,51.9,69.2
8,Enfield,41.4,61.3
9,Greenwich,39.0,58.0


### 2. Scraping data from the web to create a dataset with London Borough and corresponding coordinates (latitude, longitude)

Using BeautifulSoup to scrape the <a href=https://en.wikipedia.org/wiki/List_of_London_boroughs>**List of London Boroughs**</a> from Wikipedia

In [4]:
#Creating the source from which to scrape the data from the table called London "List of London boroughs" from the wikipedia page.
source = urllib.request.urlopen('https://en.wikipedia.org/wiki/List_of_London_boroughs').read()

In [5]:
#Creating table object with BeautifulSoup
soup=BeautifulSoup(source, 'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of London boroughs - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_London_boroughs","wgTitle":"List of London boroughs","wgCurRevisionId":881899861,"wgRevisionId":881899861,"wgArticleId":28092685,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Use dmy dates from August 2015","Use British English from August 2015","Lists of coordinates","Geographic coordinate lists","Articles with Geo","London boroughs","Lists of places in London"],"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June

In [6]:
# Getting the table from  Wikipedia
table = soup.find('table',{'class':'wikitable sortable'})
print(table)

<table class="wikitable sortable" style="font-size:100%" width="100%">
<tbody><tr>
<th>Borough
</th>
<th>Inner
</th>
<th>Status
</th>
<th>Local authority
</th>
<th>Political control
</th>
<th>Headquarters
</th>
<th>Area (sq mi)
</th>
<th>Population (2013 est)<sup class="reference" id="cite_ref-1"><a href="#cite_note-1">[1]</a></sup>
</th>
<th>Co-ordinates
</th>
<th><span style="background:#67BCD3"> Nr. in map </span>
</th></tr>
<tr>
<td><a href="/wiki/London_Borough_of_Barking_and_Dagenham" title="London Borough of Barking and Dagenham">Barking and Dagenham</a> <sup class="reference" id="cite_ref-2"><a href="#cite_note-2">[note 1]</a></sup>
</td>
<td>
</td>
<td>
</td>
<td><a href="/wiki/Barking_and_Dagenham_London_Borough_Council" title="Barking and Dagenham London Borough Council">Barking and Dagenham London Borough Council</a>
</td>
<td><a href="/wiki/Labour_Party_(UK)" title="Labour Party (UK)">Labour</a>
</td>
<td><a class="new" href="/w/index.php?title=Barking_Town_Hall&amp;action

In [7]:
#Scraping the Borough and Co-ordinates headers from "List of boroughs and local authorities" from wikipedia
table_header=table.find_all('th')
headers=[]
for th in table_header:
    headers.append(th.text.strip('\n'))
    
        
import pandas as pd

df_1=pd.DataFrame(headers)
df_1
#Keeping just two headers out of eight. I keep borough and (borough) coordinates (latitude and longitude)
df_2=df_1.iloc[[0, 8]]
df_2.reset_index(drop=True)

Unnamed: 0,0
0,Borough
1,Co-ordinates


In [8]:
#Getting the rows for such column headers
table_rows=table.find_all('tr')

list_of_rows=[]
for tr in table_rows:
    td=tr.find_all('td')
    row = [item.text.strip('\n') for item in td]
    list_of_rows.append(row)
    
df_3=pd.DataFrame(list_of_rows)
df_3=df_3.drop(df_3.columns[[1, 2, 3, 4, 5, 6, 7, 9]], axis=1)
df_3
df_4=df_3.drop([0], axis=0)
df_4.reset_index(drop=True)
df_4.columns=['Borough', 'coordinates']
df_4

Unnamed: 0,Borough,coordinates
1,Barking and Dagenham [note 1],51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...
2,Barnet,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
3,Bexley,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
4,Brent,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
5,Bromley,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...
6,Camden,51°31′44″N 0°07′32″W﻿ / ﻿51.5290°N 0.1255°W﻿ /...
7,Croydon,51°22′17″N 0°05′52″W﻿ / ﻿51.3714°N 0.0977°W﻿ /...
8,Ealing,51°30′47″N 0°18′32″W﻿ / ﻿51.5130°N 0.3089°W﻿ /...
9,Enfield,51°39′14″N 0°04′48″W﻿ / ﻿51.6538°N 0.0799°W﻿ /...
10,Greenwich [note 2],51°29′21″N 0°03′53″E﻿ / ﻿51.4892°N 0.0648°E﻿ /...


In [9]:
# To be continued in The battle of the Neighborhoods - week2