<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Overview" data-toc-modified-id="Overview-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Overview</a></span><ul class="toc-item"><li><span><a href="#Background" data-toc-modified-id="Background-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Background</a></span></li><li><span><a href="#Approach" data-toc-modified-id="Approach-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Approach</a></span></li><li><span><a href="#Method" data-toc-modified-id="Method-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Method</a></span></li><li><span><a href="#Caveats-&amp;-Notes" data-toc-modified-id="Caveats-&amp;-Notes-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Caveats &amp; Notes</a></span></li></ul></li><li><span><a href="#Data-acquisition" data-toc-modified-id="Data-acquisition-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data acquisition</a></span><ul class="toc-item"><li><span><a href="#Library-loading" data-toc-modified-id="Library-loading-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Library loading</a></span></li><li><span><a href="#Milwaukee-neighbour-hoods" data-toc-modified-id="Milwaukee-neighbour-hoods-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Milwaukee neighbour hoods</a></span></li></ul></li></ul></div>

# Similar neighbourhoods in Milwaukee and London

## Overview

### Background
In this work I will be comparing and contrasting neighbourhoods in two cities.
I come from London while my wife comes from Milwaukee.  When we visit it would be nice to know which neighbourhoods within her city are similar to the one in which she was raised.  It would also be nice to know which neighbourhoods in Milwaukee are the same as the one in which we live in London.
Finally, we are considering moving (within London) fairly soon and so a a categorisation of localities would potential enable us to narrow down our search for a new home.

### Approach
When thinking about neighbourhoods there are as many definitions of borders and extent as there are people who know the areas.  I decided to use two different definition for each city.

There were 76 neighbourhoods in Milwaukee that were officially recognised by the city council in 1990.  The most recent city council definitions I could find, from 2000, had 190 neighbourhoods listed[^1][^2].  Milwaukee is not a large city compared with London and this number of neighbourhoods seemed excessive.  In the end I decided to use a list prepared by the University of Wisconsin - Milwaukee[^3].  This lists 48 neighbourhoods and roughly matches the list on Wikipedia[^4] but with its more historic focus it is likely to match my wife's and her family's notions more closely.

[^1]: https://data.milwaukee.gov/dataset/0f5695f6-bca1-46e9-832b-54d1d906d28e/resource/f69d2c4b-e0d7-406e-98fe-1e888a3304b1/download/neighborhood.zip
[^2]: https://city.milwaukee.gov/Neighborhoods
[^3]: https://guides.library.uwm.edu/c.php?g=56373&p=362687
[^4]: https://en.wikipedia.org/wiki/List_of_neighborhoods_of_Milwaukee

London is divided into 32 boroughs and the City of London (the City of Westminster is also a borough).  The boroughs can be very diverse and consequently, especially considering its size, I decided to represent neighbourhoods in London as postcode areas.  This is also a fairly common way that Londoners reference individual neighbourhoods.there are 121 postcode areas in London which will still all have higher populations than the neighbourhoods in Milwaukee.

To measure similarity I decided attempt to use two main metrics.  The first is the mix of consumer business and leisure venues as indexed by foursquare.  The second, if I am able to, is to look at average house prices for each neighbourhood (relative to the overall house price average).  This will be simple enough in London - splitting by postcode allows the Land Registry records for house sales in London to be easily achieved.  Real estate sales in Milwaukee are listed by address so, for this to work, I will need a way to apportion addresses in Milwaukee to the correct neighbourhoods.

### Method
My plan is to do the following:
#### For Milwaukee
* Scrape source for list of Milwaukee neighbourhoods
* Geocode to find their 'centres' according to Google
* Allocate sold properties to neighbourhoods if possible
* Retrieve property sales from City Council records
* Calculate mean house prices for neighbourhoods if possible
* Pull data for top 100 (say) venues within 500m or 1000m or the centre
* Split venue numbers for each neighbourhood by type
* Find proportions of each type of venue in/near the neighbourhood
* Find clusters of similar properties
#### For London
* Scrape sources for a list of London postcode areas
* Geocode for the 'centres' of the postcode districts according to Google
* Retrieve property sales from Land Registry
* Allocate sold properties to postcodes
* Calculate mean house prices for neighbourhoods if possible
* Pull data for top 100 (say) venues within 500m or 1000m or the centre
* Split venue numbers for each neighbourhood by type
* Find proportions of each type of venue in/near the neighbourhood
* Find clusters of similar properties
#### Together
* Find common clusters of neighbourhoods across both cities
* Check against the intra-city clusters to ensure reliability
* Classify neighbourhood types and see where I should visit or live!

### Caveats & Notes
The latest property data I can obtain is from 2018.  Given that I do not think many areas change substantially in 2-3 years this should not be a problem.

Venue types in the UK and US may be entirely different.  This seems unlikely as Foursquare categorises venues as e.g. 'Coffee shops' rather than 'Cafés' in the UK.

London postcodes and Milwaukee neighbourhoods can have markedly different sizes (SW5 and SW19, Buchell Park and Bayview) this means that where the distances to venues for one neighbourhood would be best to be within, 300m, in another 1500m would be more reasonable.  This is not something I am going to try to address but will try a range of radii in each city.  It does mean that venues in a cluster of small neighbourhoods may be counted multiple times.

Moreover some postcodes London postcodes and Milwaukee neighbourhoods are non-circular (Riverwest, W13).  This will lead to more overlap with high radii and missed venues with low radii.

Despite these outliers, I hope that there will be some results that tally with local knowledge.

## Data acquisition

### Library loading
I will put *all* my imports here to avoid scattering them throughout the document.

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sb
import requests
import math

### Milwaukee neighbour hoods