## Proportional representation of UK constituencies

The idea behind this project is to see how a proportional representation system would have altered results in the last UK general election. The plan is to merge neighbouring constituencies in the UK into 'super' constituencies with 2 / 3 / 4 / etc of them merged together into a larger one and use the D'Hondt method to allocate seats in this 'super' constituency.

In [1]:
import numpy as np
import pandas as pd
import geopandas as gp

# import seaborn as sns
# import matplotlib.pyplot as plt

# %matplotlib inline

In [2]:
df = gp.read_file("../Data/Westminster_Parliamentary_Constituencies_December_2017_UK_BFC/Westminster_Parliamentary_Constituencies_December_2017_UK_BFC.shp")
df = df.rename(columns={"PCON17NM": "Name"})

Now let's scrape wikipedia for details about the UK constituencies, namely which region they are in

In [3]:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

In [4]:
url = "https://en.wikipedia.org/wiki/United_Kingdom_Parliament_constituencies"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
constituencies_table = soup.findAll('table',{'class':'wikitable sortable'})

In [5]:
const_dict = {}
countries = ['England','Scotland','Wales','Northern Ireland']
for i in range(len(constituencies_table)):
    for items in constituencies_table[i].findAll('tr')[1:]:
        data = items.find_all(['th','td'])
        try:
            constituency = data[0].a.text
            electorate_2017 = data[1].text
            county = data[2].a.text
            if i == 0:
                region = data[3].text[:-1] # Remove the "\n"
            else:
                region = countries[i]
        except IndexError:pass
        const_dict[constituency] = pd.DataFrame.from_records(
            [{'constituency': constituency, 'electorate_2017': electorate_2017, 'county': county, 'region': region}] #, index=constituency
        )
const_df = pd.concat(const_dict).reset_index(drop=True)

In [6]:
# Bit of string replacement so that we can merge the datasets.
for i in range(len(const_df['constituency'])):
    const = const_df['constituency'][i]
    if 'St ' in const:
        const_df['constituency'][i] = str.replace(const_df['constituency'][i], "St ", "St. ")
    # Also need to remove the 'ô' in "Ynys Môn" 
    if 'ô' in const:
        const_df['constituency'][i] = str.replace(const_df['constituency'][i], "ô", "o")
# Now merge new information
# df['constituency'] = [x.rsplit(' ', 1)[0].rsplit(' ', 1)[0] for x in df['name']]
df = df.merge(const_df, left_on='Name', right_on='constituency')

In [9]:
# The plan is to take every single constituency and find the 'neighbouring' constituency. We will use the 'disjoint' function from geopandas to
# find is two constituencies are not bordering.
# This will only be done on a region by region basis, as we are not interested in, say a Scotish constiuency that borders one in North 
# Eastern England.
# Create a pair and then see if we can use the Algorithm X to fit them into a region.
pair_const = {}
k = 0
for i in range(len(df)):
    iregion = df['region'][i]
    for j in range(i+1, len(df)):
        if (df['region'][j] == iregion):
            if not df.geometry[i].disjoint(df.geometry[j]):
                k += 1
                pair_const[k] = pd.DataFrame({'region': [iregion], 'pairing': k, 'name1': [df['Name'][i]], 'name2': [df['Name'][j]]})
const_pairs = pd.concat(pair_const).reset_index(drop=True)

In [11]:
# Now need to find all constituencies which haven't got a neighbouring constituency
paired_const = set(const_pairs['name1']).union(set(const_pairs['name2']))
unpaired_const = set(df['Name']).difference(paired_const)
print(unpaired_const)
# For the moment we will leave these out. 
# One reason is that 'Isle of Wight', 'Na h-Eileanan an Iar' and 'Orkney and Shetland' have protected status so that they have constituency
# boundaries defined exclusively by geography rather than by (or partly by) size of electorate.

{'Isle of Wight', 'Orkney and Shetland', 'Na h-Eileanan an Iar', 'Ynys Mon'}


To find merged constituencies of size 2 we can use the data frame `paired_const` fitered to just one region and then creating a dictionary which follows directly from the data frame.

In [39]:
# 
from algo_x import *

def all_solns(const_pairs, region):
    df = const_pairs[const_pairs['region'] == region_name]
    Y = {}
    for i in range(len(df)):
        Y[df2['pairing'].iloc[i]] = {df['name1'].iloc[i], df['name2'].iloc[i]}

    all_solns = ExactCover(Y, random = True)
    i = 0
    for a in all_solns:
        i += 1
    # Find out how many constituencies there are in the dictionary.
    X = set([x for y in Y.values() for x in y])
    print(f"For the {region} region there are {i} solutions when there are {len(X)} constituencies.")

In [40]:
region_name = 'Northern Ireland'
start = time.time()
all_solns(const_pairs, region_name)
end = time.time()
print(f"The time taken is {end - start:.4f}s")

For the Northern Ireland region there are 129 solutions when there are 18 constituencies.
The time taken is 0.0119s


That works out quite nicely, however there are only 18 constituencies in the Northern Ireland region and it also contains an even number of constituencies. If we repeat the above for the 'North East' we will not have any solutions.

In [51]:
region_name = 'North East'
start = time.time()
all_solns(const_pairs, region_name)
end = time.time()
print(f"The time taken is {end - start:.4f}s")

For the North East region there are 0 solutions when there are 29 constituencies.
The time taken is 0.0108s


Possible solutions might be to remove one constituency at random, however this can seriously affect the number of solutions, especially for constituencies with few neighbours. E.G. York Central is completely covered by York Outer, so if we removed York Outer then there would be no solutions.

An initial thought would be to find a three way neighbour at random and have that as part of the solution. How to implement this is a bit more complex, though shouldn't be too hard if we want the 'super' constituencies to be of size 2; however if we want the majority to be of size 3 then we would end up in a situation for e.g. London with 73 constituencies having 23 three-merged constituencies and 2 two-merged constituencies. How to implement these may be tougher. Also does this three-merged constituenciy stay constant for all of our solutions (definitely not) or do we change it with every possible solution (yes we would, but how would that impact on the time taken).

In addition one other issue we face with the ExactCover code is that it returns all known solutions. This isn't a problem for some of the regions, but e.g. London, kept running for several hours and still didn't complete. Ideally we would want at least 10,000 solutions for each possible region (where this is possible). We would also really want to change the code so that the starting criteria is more random. At present it appears to start off in the same place every time, since the code is written to find every solution, which is sub-optimal for what we want to do.

#### Future plans:
Once the issue with the Exact Cover is solved we would need to get all of the results of the UK general election in 2019 (and possibly previous elections). 

Using the solutions we would aggregate the constiuency election results together and apply the D'Hondt method for up to, say 10,000, simulations and report results.