# Algorithm X application to constituency data

Previously we found all sets of 2 / 3 / 4 constituencies which are neighbours, i.e. those constituencies which share a border, which we shall call sets (with a unique identifier `set_no`). We will now apply Algorithm X to these merged constituencies and find (a subset of) solutions so that every constituency is selected once and only once. We shall do this on a region-by-region basis for two reasons:

1. it will reduce the amount of possible combinations substantially
1. it also (mostly) ensures consistency of political parties, so that e.g. we wouldn't have one constituency on England and one in Wales, so that Plaid Cymru vote would potentially halve.

There are often times when the total number of constituencies in a region is not divisible by 2 / 3 / 4. For these cases we shall remove a set from a different constituency size until they are divisible, e.g. for the North East we have 29 constituencies so if we want to find all solutions where we merge 2 constituencies we shall pick at random one of the sets where 3 constituencies have been merged and remove them from our initial analysis. We shall repeat this, removing another of the 3-way merged sets, until we get a large enough sample.

For some of the sets we have a large number of solutions, so we will only keep a subset of them. When there are a large number of solutions we shall rerun the analysis with the dataframe resampled and this can change the initial solutions given.

The (sampled) solutions will be saved as csv files.


In [1]:
import numpy as np
import pandas as pd
from AlgorithmX import *
from joblib import Parallel, delayed
from random import random, sample
from algox_modules import *
import os

In [2]:
const_pairs = pd.read_csv("../Analysis/Data/const_pairs.csv.gz")
const_tris = pd.read_csv("../Analysis/Data/const_tris.csv.gz")
const_quads = pd.read_csv("../Analysis/Data/const_quads.csv.gz")

In [57]:
import sys
import importlib
importlib.reload(sys.modules['algox_modules'])

<module 'algox_modules' from '/home/work/AlgorithmX/algox_modules.py'>

In [None]:
# Command to run with joblib.
element_information = Parallel(n_jobs=4, verbose=10)(
    delayed(get_solns)(const_pairs, const_tris, const_quads, seats, region, max_solns=1e7) 
        for seats in [2,3,4] for region in regions)


In [3]:
get_solns(const_pairs, const_tris, const_quads, 3, 'Yorkshire and the Humber', max_solns=1e5)

INFO: 2020-06-05 17:56:26,568: Starting code for region Yorkshire and the Humber with 3 seats.
INFO: 2020-06-05 17:56:26,630: Finished getting solutions for region Yorkshire and the Humber with 3 seats


In [None]:
for i in [2,3,4]:
    get_solns(const_pairs, const_tris, const_quads, i, 'Wales', max_solns=1e5)

INFO: 2020-06-05 17:56:53,230: Starting code for region Wales with 2 seats.
INFO: 2020-06-05 17:56:55,714: Finished getting solutions for region Wales with 2 seats
INFO: 2020-06-05 17:56:55,748: Starting code for region Wales with 3 seats.
INFO: 2020-06-05 17:57:00,379: Finished getting solutions for region Wales with 3 seats
INFO: 2020-06-05 17:57:00,518: Starting code for region Wales with 4 seats.


In [15]:
get_solns(const_pairs, const_tris, const_quads, 3, 'Wales', max_solns=1e7)

INFO: 2020-06-05 15:36:03,112: Starting code for region Wales with 3 seats.
INFO: 2020-06-05 15:36:03,112: Starting code for region Wales with 3 seats.
134675
134675
INFO: 2020-06-05 15:36:07,731: Finished getting solutions for region Wales with 3 seats
INFO: 2020-06-05 15:36:07,731: Finished getting solutions for region Wales with 3 seats


In [None]:
# Command to run with joblib.
# Need to sort out a few things
element_information = Parallel(n_jobs=4, verbose=10)(
    delayed(get_solns)(const_pairs, const_tris, const_quads, seats, region, max_solns=1e7) for seats in [2,3,4] for region in regions)


#### Ignore
Some code left over from initial work. Leaving in for the moment as may want to look at it later.

In [60]:
from ast import literal_eval
test = pd.read_csv("Solutions/solns_East_3.csv.gz", dtype={'region': str}, converters={'soln': literal_eval})
test.shape

(312500, 3)

In [77]:
tf = test['soln'].apply(pd.Series).duplicated()
pd.value_counts(tf)


False    191560
True     120940
dtype: int64

In [61]:
test = test.assign(sorted_soln = [list(np.sort(t)) for t in test['soln']])

In [62]:
test.sample(10)

Unnamed: 0,soln,quad,region,sorted_soln
39624,"[198, 328, 169, 17, 64, 184, 345, 93, 336, 143...",560,East,"[17, 26, 64, 93, 129, 143, 161, 169, 184, 198,..."
242776,"[198, 350, 157, 24, 171, 394, 152, 174, 349, 9...",1051,East,"[7, 24, 66, 96, 117, 152, 157, 171, 174, 198, ..."
170581,"[22, 281, 198, 17, 169, 328, 194, 63, 213, 311...",2094,East,"[17, 22, 55, 63, 103, 138, 155, 169, 188, 194,..."
308636,"[401, 198, 87, 113, 169, 8, 192, 50, 131, 406,...",4435,East,"[8, 18, 50, 87, 101, 113, 131, 157, 169, 192, ..."
10574,"[198, 396, 169, 157, 252, 131, 210, 403, 101, ...",715,East,"[3, 21, 101, 126, 131, 157, 169, 197, 198, 205..."
272010,"[198, 350, 159, 189, 169, 17, 64, 24, 316, 135...",4480,East,"[17, 24, 64, 100, 126, 135, 159, 169, 189, 198..."
150483,"[22, 281, 198, 171, 394, 1, 253, 160, 191, 311...",3685,East,"[1, 22, 58, 101, 118, 142, 160, 171, 174, 191,..."
1555,"[198, 171, 394, 156, 252, 19, 131, 210, 369, 9...",888,East,"[11, 19, 63, 99, 131, 156, 171, 192, 198, 210,..."
76889,"[315, 198, 253, 396, 169, 348, 91, 336, 164, 2...",2212,East,"[2, 26, 39, 91, 113, 164, 169, 191, 198, 225, ..."
212789,"[396, 169, 199, 203, 193, 174, 130, 281, 345, ...",5,East,"[21, 51, 93, 130, 143, 169, 174, 193, 199, 203..."


In [84]:
start1 = datetime.now()
for seats in [2,3,4]:
    for region in regions:
        count_n(const_pairs, const_tris, const_quads, seats, region)
end1 = datetime.now()
print(f"Time taken = {(end1-start1).total_seconds():.4f}s")

Time taken = 1.3800s


In [85]:
start = datetime.now() 
element_information = Parallel(n_jobs=4, verbose=10)(
    delayed(count_n)(const_pairs, const_tris, const_quads, seats, region) for seats in [2,3,4] for region in regions)
end = datetime.now() 
print(f"The total time taken is {(end-start).total_seconds():.4f}s")

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    2.8s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    3.3s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    4.5s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    5.7s
[Parallel(n_jobs=4)]: Done  33 out of  36 | elapsed:    7.3s remaining:    0.7s


The total time taken is 7.7780s


[Parallel(n_jobs=4)]: Done  36 out of  36 | elapsed:    7.8s finished
