# Algorithm X application to constituency data

Previously we found all sets of 2 / 3 / 4 constituencies which are neighbours, i.e. those constituencies which share a border, which we shall call sets (with a unique identifier `set_no`). We will now apply Algorithm X to these merged constituencies and find (a subset of) solutions so that every constituency is selected once and only once. We shall do this on a region-by-region basis for two reasons:

1. it will reduce the amount of possible combinations substantially
1. it also (mostly) ensures consistency of political parties, so that e.g. we wouldn't have one constituency on England and one in Wales, so that Plaid Cymru vote would potentially halve.

There are often times when the total number of constituencies in a region is not divisible by 2 / 3 / 4. For these cases we shall remove a set from a different constituency size until they are divisible, e.g. for the North East we have 29 constituencies so if we want to find all solutions where we merge 2 constituencies we shall pick at random one of the sets where 3 constituencies have been merged and remove them from our initial analysis. We shall repeat this, removing another of the 3-way merged sets, until we get a large enough sample.

For some of the sets we have a large number of solutions, so we will only keep a subset of them. When there are a large number of solutions we shall rerun the analysis with the dataframe resampled and this can change the initial solutions given.

The (sampled) solutions will be saved as csv files.

All functions used are stored in the `algox_modules.py` file.


In [1]:
import pandas as pd
from joblib import Parallel, delayed
from algox_modules import *

In [2]:
const_pairs = pd.read_csv("../Analysis/Data/const_pairs.csv.gz")
const_tris = pd.read_csv("../Analysis/Data/const_tris.csv.gz")
const_quads = pd.read_csv("../Analysis/Data/const_quads.csv.gz")

In [3]:
regions = np.unique(const_pairs['region'])

In [4]:
# Set up folders used to store logs and info during runthrough
import os
if not os.path.isdir("Logs"):
    os.makedirs("Logs/")
    os.makedirs("Logs/check/")
    os.makedirs("Logs/DataFrames/")
    os.makedirs("Logs/solns/")
    
# Remove any files that were created in a previous run
import glob
def del_files(dir):
    files = glob.glob(dir)
    if len(files) > 0:
        [os.remove(f) for f in files]
del_files("Logs/solns/soln_*.csv")
del_files("Logs/log_*.log")
del_files("Logs/DataFrames/df_*.csv.gz")
del_files("Logs/solns_*.csv.gz")
del_files("Logs/check/df_*.csv")
del_files("Solutions/solns_*.csv.gz")


In [5]:
# Command to run with joblib.
element_information = Parallel(n_jobs=5, verbose=10)(
    delayed(get_solns)(const_pairs, const_tris, const_quads, seats, region, max_solns=5e5) 
        for region in regions for seats in [2,3,4])

[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   3 tasks      | elapsed:  2.3min
[Parallel(n_jobs=5)]: Done   8 tasks      | elapsed: 17.0min
[Parallel(n_jobs=5)]: Done  15 tasks      | elapsed: 32.5min
[Parallel(n_jobs=5)]: Done  22 tasks      | elapsed: 135.1min
[Parallel(n_jobs=5)]: Done  31 out of  36 | elapsed: 194.0min remaining: 31.3min
[Parallel(n_jobs=5)]: Done  36 out of  36 | elapsed: 376.4min finished


In [7]:
# Command to run with joblib.
element_information = Parallel(n_jobs=4, verbose=10)(
    delayed(get_solns)(const_pairs, const_tris, const_quads, seats, region, max_solns=2.5e5) 
        for region in regions for seats in [2,3,4])

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed: 10.1min
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed: 17.0min
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed: 21.6min
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed: 42.7min
[Parallel(n_jobs=4)]: Done  33 out of  36 | elapsed: 71.1min remaining:  6.5min
[Parallel(n_jobs=4)]: Done  36 out of  36 | elapsed: 97.9min finished


In [10]:
test = pd.read_csv("Solutions/solns_London_4.csv.gz")
f"{test.shape[0]:,}"

'525,000'

In [None]:
# Command to run with joblib.
element_information = Parallel(n_jobs=4, verbose=10)(
    delayed(get_solns)(const_pairs, const_tris, const_quads, seats, region, max_solns=15e4) 
        for region in regions for seats in [2,3,4])

In [None]:
import sys
import importlib
importlib.reload(sys.modules['algox_modules'])
importlib.reload(sys.modules['AlgorithmX_timeout'])

In [None]:
import gc
gc.collect()

In [None]:
len(gc.get_objects())

In [12]:
137500/11

12500.0

In [13]:
5e5*0.025

12500.0

In [14]:
525000/21

25000.0