# Ron Daniel Analysis of East London Suburbs with Parks and Amenities

## Introduction

### The Covid crisis has highlighted the need for easy access to open spaces for citizens, especially those who live in apartments and may not have a private garden.  As a result, many are looking to move to neighbourhoods with parks and leisure facilities, especially those that can be reached by walking.  

### In this project, I plan to scrape location data for the suburbs of East London, where I live, and use the Foursquare API to get venues information on those suburbs.  I will then use KMeans to cluster the data to identify which areas have the most parks and open spaces.  If time permits, I will also do a cluster analysis using Random Forest.

## Import Libraries

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 


import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\rbd63\anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.10.0               |   py38haa244fe_1         3.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.1 MB

The following packages will be UPDATED:

  conda                                4.9.2-py38haa244fe_0 --> 4.10.0-py38haa244fe_1



Downloading and Extracting Packages

conda-4.10.0         | 3.1 MB    |            |   0% 
conda-4.10.0         | 3.1 MB    |            |   1% 
conda-4.10.0         | 3.1 MB    | 6          |   7% 
conda-4.10.0         | 3.1 MB    | #          |  11% 
conda-4.10.0         | 3.1 MB    | #5     

## Scrape London Suburbs data from Wikipedia

In [7]:
! pip install Wikipedia-API

import wikipediaapi as wp



In [10]:
## import library to open URLs
import urllib.request
print("urllib imported")

urllib imported


In [11]:
## create variable url for desired webpage
url = "https://en.wikipedia.org/wiki/List_of_areas_of_London"

In [12]:

## open url and load into variable toronto_postcodes
suburbs = urllib.request.urlopen(url)

In [13]:

## import beautifulsoup for parsing HTML
from bs4 import BeautifulSoup

In [14]:

## parse the HTML page from our url into BeautifulSoup parse tree format
suburbs_soup = BeautifulSoup(suburbs, "lxml")

In [15]:
## view html page
print(suburbs_soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of areas of London - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"fd24c8fb-13f0-422a-8bef-59f8f64f8a7b","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_areas_of_London","wgTitle":"List of areas of London","wgCurRevisionId":1001821721,"wgRevisionId":1001821721,"wgArticleId":11915713,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Use dmy dates from August 2015","Use 

In [19]:
## retrieve all tables and store in variable all_tables
all_tables = suburbs_soup.find_all('table')
all_tables

[<table class="noprint infobox" id="GeoGroup" style="width: 23em; font-size: 88%; line-height: 1.5em">
 <tbody><tr>
 <td><b>Map all coordinates in "Category:Areas of London" using:</b> <a class="external text" href="//tools.wmflabs.org/osm4wiki/cgi-bin/wiki/wiki-osm.pl?project=en&amp;article=Category%3AAreas+of+London">OpenStreetMap</a> 
 </td></tr>
 <tr>
 <td><b>Download coordinates as:</b> <a class="external text" href="//tools.wmflabs.org/kmlexport?article=Category%3AAreas+of+London">KML</a>
 </td></tr></tbody></table>,
 <table class="wikitable sortable" style="clear:both;">
 <tbody><tr>
 <th>Location</th>
 <th>London borough</th>
 <th>Post town</th>
 <th>Postcode district</th>
 <th>Dial code</th>
 <th>OS grid ref
 </th></tr>
 <tr>
 <td><a href="/wiki/Abbey_Wood" title="Abbey Wood">Abbey Wood</a></td>
 <td>Bexley,  Greenwich <sup class="reference" id="cite_ref-mills1_7-0"><a href="#cite_note-mills1-7">[7]</a></sup></td>
 <td>LONDON</td>
 <td>SE2</td>
 <td>020</td>
 <td><span class="

In [29]:

## retrieve london suburbs table
suburbs_table = suburbs_soup.find_all('table', class_='wikitable sortable', style="clear:both;")
suburbs_table

[<table class="wikitable sortable" style="clear:both;">
 <tbody><tr>
 <th>Location</th>
 <th>London borough</th>
 <th>Post town</th>
 <th>Postcode district</th>
 <th>Dial code</th>
 <th>OS grid ref
 </th></tr>
 <tr>
 <td><a href="/wiki/Abbey_Wood" title="Abbey Wood">Abbey Wood</a></td>
 <td>Bexley,  Greenwich <sup class="reference" id="cite_ref-mills1_7-0"><a href="#cite_note-mills1-7">[7]</a></sup></td>
 <td>LONDON</td>
 <td>SE2</td>
 <td>020</td>
 <td><span class="plainlinks nourlexpansion" style="white-space: nowrap"><a class="external text" href="https://geohack.toolforge.org/geohack.php?pagename=List_of_areas_of_London&amp;params=51.48648031512_N_0.10859224316653_E_region:GB" rel="nofollow">TQ465785</a></span>
 </td></tr>
 <tr>
 <td><a href="/wiki/Acton,_London" title="Acton, London">Acton</a></td>
 <td>Ealing, Hammersmith and Fulham<sup class="reference" id="cite_ref-mills2_8-0"><a href="#cite_note-mills2-8">[8]</a></sup></td>
 <td>LONDON</td>
 <td>W3, W4</td>
 <td>020</td>
 <td>

In [34]:
## Set up six empty lists to store data in and loop through data and assign to empty lists
location=[]
borough=[]
post_town=[]
postcode=[]
dialcode =[]
osgrid=[]


for row in suburbs_table[0].findAll('tr'):
    cells = row.findAll('td')
    if len(cells)==3:
        location.append(cells[0].find(text=True))
        borough.append(cells[1].find(text=True))
        post_town.append(cells[2].find(text=True))
        postcode.append(cells[3].find(text=True))
        dialcode.append(cells[4].find(text=True))
        osgrid.append(cells[5].find(text=True))

In [32]:
location

[]

In [28]:
## assign each list to a column in dataframe suburbs_df
column_names = ["Location", "Borough", "Town", "Post_Code", "Dial_Code", "os_grid"]
suburbs_df = pd.DataFrame(columns=column_names)
suburbs_df["Location"] = Location
suburbs_df["Borough"] = borough
suburbs_df["Town"] = post_town
suburbs_df["Post_Code"] = postcode
suburbs_df["Dial_Code"] = dialcode
suburbs_df["os_grid"] = osgrid
suburbs_df.head()

Unnamed: 0,Location,Borough,Town,Post_Code,Dial_Code,os_grid


In [44]:
import wikipediaapi as wp
print(wikipediaapi.__version__)
print(dir(wikipediaapi))


(0, 5, 4)
['Any', 'Dict', 'ExtractFormat', 'IntEnum', 'List', 'Namespace', 'Optional', 'PagesDict', 'RE_SECTION', 'Union', 'WikiNamespace', 'Wikipedia', 'WikipediaPage', 'WikipediaPageSection', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'log', 'logging', 'namespace2int', 'parse', 're', 'requests']


In [47]:
html = wp.table("List_of_areas_of_London").html().encode("UTF-8")
try: 
    df = pd.read_html(html)[1]  # Try 2nd table first as most pages contain contents table first
except IndexError:
    df = pd.read_html(html)[0]
print(df.to_string())

AttributeError: module 'wikipediaapi' has no attribute 'table'