# Brummy or Mancunian - Which one are you? [Birmingham|Manchester]

## Introduction

When we think of UK, we think of London. Although a great place to visit, London isn’t necessarily affordable by everyone as a place to live. Due to this, many prefer living in a city close to London so that not only can they live in a bigger house for a lower rent, and save on many other things while earning a decent income, but also can drop in London within an hour or two. Smart.

A problem with this approach is that if you just want a quick night-out, maybe go to a fancy restaurant, or visit a museum, or a park, you shouldn’t have to sit in a train for over an hour, every single time. The city you choose should have things to do that you prefer.

What then, is the next best thing? 

A quick Google search revealed people prefer either Birmingham or Manchester. Many say Birmingham’s better, even better than London, while many side with Manchester. 

But which one’s better for you?


## Data

I will use Foursquare to explore both the cities. This should help a person select the city they’d like to live in.

I will then use Foursquare to explore the neighborhoods to help the person select a place to call home.

I will get the list of postal codes and neighborhoods from the following:

Birmingham - https://en.wikipedia.org/wiki/B_postcode_area

Manchester - https://en.wikipedia.org/wiki/M_postcode_area

I will use geocoder to get the latitudes and longitudes needed.

## Disclaimer

I am doing this project to learn. Needless to say, I strongly advise you against using this to make a major life decision!

But yes, feel free to go through the notebook and let me know what you think.

In [None]:
# import bs4 as bs
# import urllib.request
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize
# import sklearn
from sklearn.cluster import KMeans
from matplotlib import cm
from matplotlib import colors

# !conda install geopandas
# !conda install geopy
# import geopandas
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

# !conda install -c conda-forge folium=0.5.0 --yes
import folium

In [None]:
# @hidden_cell
CLIENT_ID = 'DDRIOHSEA5UDALFXC0G33ZVHMWSHRSARSCCZW20IBUAFTYKR' # your Foursquare ID
CLIENT_SECRET = 'WXIZHULTVFVWK0FII1MKW2GMEBVHM4NIVB1UPSS5QOMPZMSY' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

## Prepare Birmingham dataset

In [None]:
list=[]
dfs = pd.read_html('https://en.wikipedia.org/wiki/B_postcode_area',header=0)
bir = dfs[1]
bir.head()

In [None]:
bir['Post town'].value_counts()

Let's select only Birmingham.

In [None]:
bir = bir[bir['Post town'] == 'BIRMINGHAM']
bir['Post town'].value_counts()

In [None]:
bir.isnull().sum()

There is one null value. Drop it.

In [None]:
bir.dropna(subset=['Coverage'],inplace=True)
bir.isnull().sum()

Now, let's add the coordinates.

In [None]:
locator = Nominatim(user_agent="myGeocoder")

# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

# 2- - create location column
bir['location'] = bir['Coverage'].apply(geocode)

In [None]:
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
bir['point'] = bir['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# 4 - Drop null values
bir.dropna(subset=['location','point'],inplace=True)

# 5 - split point column into latitude, longitude and altitude columns
bir[['latitude', 'longitude', 'altitude']] = pd.DataFrame(bir['point'].tolist(), index=bir.index)

In [None]:
bir.head(3)

In [None]:
bir.isnull().sum()

Let's drop 'Post town', 'Local authority area', 'location', 'point' and 'altitude'.

In [None]:
bir.drop(columns=['Postcode district','Post town','Local authority area','location','point','altitude'], inplace=True)
bir.rename({'Coverage':'Neighborhood'},axis=1,inplace=True)
bir.head()

## Prepare Manchester dataset

In [None]:
list=[]
dfs = pd.read_html('https://en.wikipedia.org/wiki/M_postcode_area',header=0)
man = dfs[1]
man.head()

In [None]:
man['Post town'].value_counts()

In [None]:
man = man[man['Post town'] == 'MANCHESTER']
man['Post town'].value_counts()

In [None]:
man.isnull().sum()

No nulls. Now add coordinates.

In [None]:
man['location'] = man['Coverage'].apply(geocode)
man['point'] = man['location'].apply(lambda loc: tuple(loc.point) if loc else None)
man[['latitude', 'longitude', 'altitude']] = pd.DataFrame(man['point'].tolist(), index=man.index)

In [None]:
man.drop(columns=['Postcode district','Post town','Local authority area','location','point','altitude'], inplace=True)
man.head(3)

In [None]:
man.isnull().sum()

In [None]:
man.dropna(subset=['latitude','longitude'],inplace=True)
man.isnull().sum()

In [None]:
man.rename({'Coverage':'Neighborhood'},axis=1,inplace=True)

In [None]:
man.head()

## Let's explore!

First, let's explore each city. Mainly, what is trending in that city, or, what the top venues are in that city. This will help you undertand what kind of a city that is, and if it is for you. For example, if you are a foodie, and Manchester has many good restaurants, then you know where to go.

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([( 
            v['venue']['name'],
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Venue','Venue Category']
    
    return(nearby_venues)

## Top 10 categories in Birmingham

In [None]:
bir_venues = getNearbyVenues(names=bir['Neighborhood'],
                             latitudes=bir['latitude'],
                             longitudes=bir['longitude']
                            )

In [None]:
bir_venues.head()

In [None]:
print("Number of places of interest:", format(bir_venues.shape[0]))

In [None]:
bir_top = bir_venues['Venue Category'].value_counts()[0:10]
bir_top

## Top 10 categories in Manchester

In [None]:
man_venues = getNearbyVenues(names=man['Neighborhood'],
                             latitudes=man['latitude'],
                             longitudes=man['longitude']
                            )

In [None]:
man_venues.head()

In [None]:
print("Number of places of interest:", format(man_venues.shape[0]))

In [None]:
man_top = man_venues['Venue Category'].value_counts()[0:10]
man_top

In [None]:
(bir_venues['Venue Category']=="Zoo").sum(), (man_venues['Venue Category']=="Zoo").sum() 

No zoo?! Dealbreaker. Haha I'm sure they have it.

## Observations:

#### **Similarities**
* Pubs
* Indian Restaurants
* Coffee Shops & Cafes
* Bars
* Restaurants
* Supermarkets

#### **Differences**
* Birmingham has more Fast Food Restaurants, Italian Restaurants and Soccer Stadiums.
* Manchester has more Parks, Hotels and Grocery Stores.


## Cluster the neighborhoods

I will create clusters of neighborhoods based on its Top 10 Venue Categories. For that, I need to

- One hot encode the Venue Categories
- Find the top 10 categories
- Group by Neighborhoods

## Birmingham

In [None]:
bir_onehot = pd.get_dummies(bir_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bir_onehot['Neighborhood'] = bir['Neighborhood'] 

bir_onehot.set_index("Neighborhood",inplace=True)

bir_onehot.head(3)

In [None]:
# Group by neighborhood
bir_grouped = bir_onehot.groupby('Neighborhood').mean().reset_index()

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bir_grouped['Neighborhood']

for ind in np.arange(bir_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bir_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

In [None]:
kclusters = 3

bir_grouped_clustering = bir_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bir_grouped_clustering)

kmeans.labels_

In [None]:
# New dataframe that includes the cluster as well as the top 10 venues for each neighborhood

# Drop old 'Cluster Labels' column if it exists
if "Cluster Labels" in neighborhoods_venues_sorted.columns:
    neighborhoods_venues_sorted = neighborhoods_venues_sorted.drop('Cluster Labels', axis=1)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

bir_merged = bir

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bir_merged = bir_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

bir_merged.head()

After running the below visualization, I noticed two circle spots outside Birmingham. They must be a mistake. Let's drop them.

In [None]:
bir_merged.set_index('Neighborhood',inplace=True)
bir_merged.drop(['Handsworth','Yardley'],inplace=True)
bir_merged.reset_index(inplace=True)
bir_merged.head(2)

In [None]:
# Birmingham's coordinates
lat = 52.4862
lon = -1.8904
map_clusters = folium.Map(location=[lat,lon], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bir_merged['latitude'], bir_merged['longitude'], bir_merged['Neighborhood'], bir_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
# bir_merged.loc[bir_merged['Cluster Labels'] == 0, bir_merged.columns[[1] + list(range(5, bir_merged.shape[1]))]]
b1 = bir_merged.loc[bir_merged['Cluster Labels'] == 0, :]
b1 = b1[['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue']]
b1

In [None]:
b2 = bir_merged.loc[bir_merged['Cluster Labels'] == 1, :]
b2 = b2[['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue']]
b2

In [None]:
b3 = bir_merged.loc[bir_merged['Cluster Labels'] == 2, :]
b3 = b3[['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue']]
b3

## Observations

- If you are an Indian, check out Edgbaston, Buckland End, Kingshearst and Hamstead.
- If close access to coffee shops is a must for you, check out the City Centre, Vauxhall, Stirchley and West Heath.
- If you like non-veg, many neighborhoods in Birmingham have Fried Chicken Joints.

## Manchester

In [None]:
man_onehot = pd.get_dummies(man_venues[['Venue Category']], prefix="", prefix_sep="")
man_onehot['Neighborhood'] = man['Neighborhood'] 
man_onehot.set_index("Neighborhood",inplace=True)
man_grouped = man_onehot.groupby('Neighborhood').mean().reset_index()
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)   
    return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = man_grouped['Neighborhood']
for ind in np.arange(man_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(man_grouped.iloc[ind, :], num_top_venues)

kclusters = 3
man_grouped_clustering = man_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(man_grouped_clustering)
if "Cluster Labels" in neighborhoods_venues_sorted.columns:
    neighborhoods_venues_sorted = neighborhoods_venues_sorted.drop('Cluster Labels', axis=1)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
man_merged = man
man_merged = man_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# Manchester's coordinates
lat = 53.4808
lon = -2.2426
map_clusters = folium.Map(location=[lat,lon], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(man_merged['latitude'], man_merged['longitude'], man_merged['Neighborhood'], man_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
m1 = man_merged.loc[man_merged['Cluster Labels'] == 0, :]
m1 = m1[['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue']]
m1

In [None]:
m2 = man_merged.loc[man_merged['Cluster Labels'] == 1, :]
m2 = m2[['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue']]
m2

In [None]:
m3 = man_merged.loc[man_merged['Cluster Labels'] == 2, :]
m3 = m3[['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue']]
m3

## Observations

- If you want close access to a Bar and a Flea Market, check out Deansgate, Tyldesly and Whitefield.
- If you're just visiting Manchester and want to live in a hotel, check out Stretford and Trafford Park.
- If you're a Vegan, almost all neighborhoods will be fine.

## Conclusion