# Algorithm X application to constituency data

Previously we found all sets of 2 / 3 / 4 constituencies which are neighbours, i.e. those constituencies which share a border, which we shall call sets (with a unique identifier `set_no`). We will now apply Algorithm X to these merged constituencies and find (a subset of) solutions so that every constituency is selected once and only once. We shall do this on a region-by-region basis for two reasons:

1. it will reduce the amount of possible combinations substantially
1. it also (mostly) ensures consistency of political parties, so that e.g. we wouldn't have one constituency on England and one in Wales, so that Plaid Cymru vote would potentially halve.

There are often times when the total number of constituencies in a region is not divisible by 2 / 3 / 4. For these cases we shall remove a set from a different constituency size until they are divisible, e.g. for the North East we have 29 constituencies so if we want to find all solutions where we merge 2 constituencies we shall pick at random one of the sets where 3 constituencies have been merged and remove them from our initial analysis. We shall repeat this, removing another of the 3-way merged sets, until we get a large enough sample.

For some of the sets we have a large number of solutions, so we will only keep a subset of them. When there are a large number of solutions we shall rerun the analysis with the dataframe resampled and this can change the initial solutions given.

The (sampled) solutions will be saved as csv files.


In [2]:
import numpy as np
import pandas as pd
from AlgorithmX import *
from joblib import Parallel, delayed
from random import random, sample
import os

In [3]:
const_pairs = pd.read_csv("../Analysis/Data/const_pairs.csv.gz")
const_tris = pd.read_csv("../Analysis/Data/const_tris.csv.gz")
const_quads = pd.read_csv("../Analysis/Data/const_quads.csv.gz")

In [4]:
def const_mapper(df):
    """
    As the AlgorithmX code requires inputs starting from zero we shall take all values in the dataframes
    and map them to ints. This function will return the solver required.
    The df is always randomly resampled when we run this so that we get a different initial answer each time.
    """
    df = df.sample(len(df))
    name_cols = get_name_cols(df)
    const_list = np.unique(df[name_cols].stack())
    n = len(const_list)
    mapping = {}
    for i in range(n):
        mapping[const_list[i]] = i
    for col in name_cols:
        df = df.replace({col: mapping})
    solver = AlgorithmX(n)
    for index, row in df.iterrows():
        solver.appendRow([r for r in row[name_cols]], row['set_no'])
    return solver

In [5]:
from interruptingcow import timeout

try:
    with timeout(5, exception=RuntimeError):
        # perform a potentially very slow operation
        pass
except RuntimeError:
    print("didn't finish within 5 seconds")

In [6]:
from interruptingcow import timeout
try:
    with timeout(5, exception=RuntimeError):
        while True:
            test = 0
            if test == 5:
                break
            test = test - 1
except RuntimeError:
    print("Error")

Error


In [54]:
def return_solutions(df, prop = None, max_soln = 1e7, resampled=False):
    """
    This function returns the solutions from the AlgorithmX code.
    prop - states what proportion of the solutions are returned (useful for when they get too big)
    max_soln - maximum number of solutions to derive
    resampled - is this solution being rerun
    """
    max_returned = 2.5e6
    
    solver = const_mapper(df)
    solns = 0
    dict_solns = {}
    try:
        with timeout(90, exception=RuntimeError): 
            # Stop calculations if taking too long, either there is no solution or having difficulty finding first one
            for solution in solver.solve():
                dict_solns[solns] = solution
                solns += 1
                if solns == max_soln:
                    resampled = True # As we will be rerunning this with a dataframe 'resampled' data frame
                    break
            soln_returned = solns > 0

            # If the result is too big take a sample. If the solution is going to be resampled take a small proportion
            # otherwise take a larger one
            if soln_returned:
                if not resampled and solns <= max_returned:
                    sampled_solns = pd.DataFrame({'soln': dict_solns}).reset_index(drop=True)
                else:
                    if not resampled:
                        keys = sample(list(dict_solns.keys()), max_returned)
                    else:
                        keys = sample(list(dict_solns.keys()), int(max_soln*0.0025))
                    dict_solns2 = {}
                    for k in keys:
                        dict_solns2[k] = dict_solns[k]
                    sampled_solns = pd.DataFrame({'soln': dict_solns2}).reset_index(drop=True)
                return soln_returned, sampled_solns, resampled
            else:
                soln_returned = False
                return soln_returned, None, None
    except RuntimeError:
        soln_returned = False
        return soln_returned, None, None
        
    # Need to add in the following:
    # 1. Stop when solutions become too big, rerun with resampled df and take sample of that - DONE
    # 2. when we remove some other random constituencies how do we rerun it and run it multiple times
    #        - need a counter to ensure we get a solution too - DONE
    # 3. how do we cope with zero solutions, e.g. Yorkshire when we have triplets
    #        - Have added a check to ensure that number of remaining constituencies is equal to no of merged seats
    #          This won't solve the problem with the triplets and Yorkshire, but should ensure that when we
    #          have to remove multiple triplets that it should get a result.
        

In [8]:
def to_remove_names(df):
    """
    from the randomly selected 'set_no' put the names that will be removed into a list
    """
    return df.loc[:, df.columns.str.startswith('name')].values.tolist()[0]

In [9]:
def get_n(df, name_cols):
    """
    Find how many different constituencies there are in a data frame.
    """
    const_list = np.unique(df[name_cols].stack())
    return len(const_list)

In [10]:
def remove_consts(df, to_remove, name_cols):
    """
    Given a list of constituencies (to_remove) remove all rows from dataframe which contain them
    """
    for name in name_cols:
        df = df[~df[name].isin(to_remove)]
    return df

In [11]:
def get_name_cols(df):
    """
    Return all columns that start with the word 'name'
    """
    return df.columns[df.columns.str.startswith('name')]

In [12]:
def remove_random_const(const_pairs, const_tris, const_quads, seats, region, n):
    """
    This function removes randomly selected pairs / triplets / quadruplets to make sure
    that the number of constituencies left are divisble by the number of seats.
    """
    n2 = n # Check that the number of remaining constituencies are divisible by n
    if seats == 2:
        name_cols = get_name_cols(const_pairs)
    elif seats == 3:
        name_cols = get_name_cols(const_tris)
    elif seats == 4:
        name_cols = get_name_cols(const_quads)
    removed = {}
    if seats == 2:
        while n2 % seats != 0:
            df = const_pairs.copy()
            random_const = const_tris.sample(1)
            removed['triplet'] = random_const['set_no'].iloc[0]
            to_remove = to_remove_names(random_const)
            df = remove_consts(df, to_remove, name_cols)
            n2 = get_n(df, name_cols)
    elif seats == 3:
        while n2 % seats != 0:
            df = const_tris.copy()
            if (seats == 3) & (n % seats == 1):
                random_const = const_quads.sample(1)
                removed['quad'] = random_const['set_no'].iloc[0]
                to_remove = to_remove_names(random_const)
            elif (seats == 3) & (n % seats == 2):
                random_const = const_pairs.sample(1)
                removed['pair'] = random_const['set_no'].iloc[0]
                to_remove = to_remove_names(random_const)
            df = remove_consts(df, to_remove, name_cols)
            n2 = get_n(df, name_cols)
    elif seats == 4:
        while n2 % seats != 0:
            df = const_quads.copy()
            # Need to ensure that when we remove multiple triplets that none of the elements are repeated
            if (n % seats == 2) or (n % seats == 1):
                df2 = const_tris.copy()
                name_cols2 = df2.columns[df2.columns.str.startswith('name')]
                if n % seats == 1:
                    # remove 3 triplets
                    trips = 3
                elif n % seats == 2:
                    # remove 2 triplets
                    trips = 2
                to_remove = []
                for i in range(trips):
                    random_const = df2.sample(1)
                    if i == 0:
                        removed['triplet'] = [random_const['set_no'].iloc[0]]
                    else:
                        removed['triplet'] = [*removed['triplet'], random_const['set_no'].iloc[0]]
                    to_remove = to_remove + to_remove_names(random_const)
                    for name in name_cols2:
                        df2 = df2[~df2[name].isin(to_remove)]
            elif n % seats == 3:
                random_const = const_tris.sample(1)
                removed['triplet'] = random_const['set_no'].iloc[0]
                to_remove = to_remove_names(random_const)
            df = remove_consts(df, to_remove, name_cols)
            n2 = get_n(df, name_cols)

    return df, removed

In [13]:
from datetime import datetime,timedelta

In [55]:
def get_solns(const_pairs, const_tris, const_quads, seats, region, max_solns=1e6):
    """
    Find the solutions, or a subset of them, and saves them into a csv file
    """
    const_pairs2 = const_pairs.query("region == @region")
    const_tris2 = const_tris.query("region == @region")
    const_quads2 = const_quads.query("region == @region")
    if seats == 2:
        df = const_pairs2
    elif seats == 3:
        df = const_tris2
    elif seats == 4:
        df = const_quads2
    name_cols = get_name_cols(df)
    n = get_n(df, name_cols)
    r = region.replace(" ", "_")
    file_name = f"Solutions/solns_{r}_{seats}.csv.gz"
    if n % seats == 0:
        soln_returned, solns, resampled = return_solutions(df, resampled=False, max_soln=max_solns)
        if soln_returned:
            if len(solns) == 0:
                print(f"For the {region} region, when we have {seats} seats there are no solutions.")
            # If we're unable to get all solutions rerun multiple times to get a further subset of them.
            if resampled:
                d = {}
                d[0] = solns.copy()
                for j in range(1, 10):
                    soln_returned, d[j], resampled = return_solutions(df, resampled=True)
                solns = pd.concat(d)
        else:
            print(f"Issue with the {region} region, when we have {seats} seats there are no solutions.")
    else:
        # Get the solutions multiple times with different random elements removed.
        soln_dict = {}
        i = 0
        while i < 10:
            print(f"i: {i}")
            start = datetime.now() 
            df, removed = remove_random_const(const_pairs2, const_tris2, const_quads2, seats, region, n)
            soln_returned, soln_dict[i], resampled = return_solutions(df, resampled=False, max_soln=max_solns)
            if soln_returned == False:
                print(f"Is solution returned: {soln_returned}")
                print(f"Is going to be resampled: {resampled}")
            if soln_returned and not resampled:
                file = "Logs/df_" + datetime.now().strftime('%Y-%m-%d-%H-%M-%S') + ".csv"
                df.to_csv(file, index=False)
                file = "Logs/soln_" + datetime.now().strftime('%Y-%m-%d-%H-%M-%S') + ".csv"
                soln_dict[i].to_csv(file, index=False)
            if soln_returned:
                end = datetime.now() 
#                 print(f"The time taken is {end - start}s")
#                 print(soln_dict[i].shape)
                if resampled:
                    d = {}
                    d[0] = soln_dict[i].copy()
                    j = 1
                    while j < 25 and soln_returned:
                        print(f"j: {j}")
                        start = datetime.now() 
                        if soln_returned:
                            j += 1
                            soln_returned, d[j], resampled = return_solutions(df, resampled=True, max_soln=max_solns)
                            end = datetime.now() 
                            if soln_returned == False:
                                print(f"Is solution returned: {soln_returned}")
                                print(f"Is going to be resampled: {resampled}")
#                             print(f"The time taken is {end - start}s")
                        else:
                            break
            if soln_returned:
                soln_dict[i] = pd.concat(d)
                print(soln_dict[i].shape)
            if soln_returned:
                # Add in the set_no's that were removed from the solutions
                soln_dict[i][list(removed.keys())[0]] = str(list(removed.values())[0])
                i += 1
                print("Done getting solutions")
                solns = pd.concat(soln_dict)
    if len(solns) > 0:
        solns = solns.assign(region = region)
        solns.to_csv(file_name, index=False, compression='gzip')

In [61]:
get_solns(const_pairs, const_tris, const_quads, 4, 'London', max_solns=1e5)

i: 0
j: 1
j: 2
j: 3
j: 4
j: 5
j: 6
j: 7
j: 8
j: 9
j: 10
j: 11
j: 12
j: 13
j: 14
j: 15
j: 16
j: 17
j: 18
j: 19
j: 20
j: 21
j: 22
j: 23
j: 24
(6250, 1)
Done getting solutions
i: 1
j: 1
j: 2
j: 3
Is solution returned: False
Is going to be resampled: None
i: 1
j: 1
j: 2
Is solution returned: False
Is going to be resampled: None
i: 1
j: 1
j: 2
j: 3
j: 4
j: 5
j: 6
j: 7
j: 8
j: 9
j: 10
j: 11
j: 12
j: 13
j: 14
j: 15
j: 16
j: 17
j: 18
j: 19
j: 20
j: 21
j: 22
j: 23
j: 24
(6250, 1)
Done getting solutions
i: 2
Is solution returned: False
Is going to be resampled: None
i: 2
j: 1
Is solution returned: False
Is going to be resampled: None
i: 2
Is solution returned: False
Is going to be resampled: None
i: 2
j: 1
j: 2
j: 3
j: 4
j: 5
j: 6
j: 7
j: 8
j: 9
j: 10
j: 11
j: 12
j: 13
j: 14
j: 15
j: 16
j: 17
j: 18
j: 19
j: 20
j: 21
j: 22
j: 23
j: 24
(6250, 1)
Done getting solutions
i: 3
j: 1
j: 2
j: 3
j: 4
j: 5
j: 6
j: 7
j: 8
j: 9
j: 10
j: 11
j: 12
j: 13
j: 14
j: 15
j: 16
j: 17
j: 18
j: 19
j: 20
j: 21
j: 22
j: 

In [None]:
# Command to run with joblib.
# Need to sort out a few things
element_information = Parallel(n_jobs=4, verbose=10)(
    delayed(get_solns)(const_pairs, const_tris, const_quads, seats, region) for seats in [2,3,4] for region in regions)


#### Ignore
Some code left over from initial work. Leaving in for the moment as may want to look at it later.

In [60]:
from interruptingcow import timeout
try:
    with timeout(5, exception=RuntimeError):
        while True:
            test = 0
            if test == 5:
                break
            test = test - 1
except RuntimeError:
    print("Error")

Error


In [104]:
x = [10]
for i in range(3):
    x = [*x, 11 + i]
x

[10, 11, 12, 13]

In [90]:
# test = pd.read_csv("Solutions/solns_London_4.csv.gz")
# test.shape
from ast import literal_eval
# df.to_csv("test.csv", index=False)
test = pd.read_csv("Solutions/solns_London_4.csv.gz", 
                   dtype={'region': str}, converters={'soln': literal_eval, 'triplet': literal_eval})
test.shape


(62500, 3)

In [91]:
test.shape[0] / 25

2500.0

In [92]:
test.sample(10)

Unnamed: 0,soln,triplet,region
1884,"[12500, 16743, 17249, 13276, 15864, 15283, 136...","[1111, 1192, 1009]",London
37729,"[18796, 13119, 17717, 14563, 15396, 17262, 141...","[898, 786, 1221]",London
21771,"[13431, 17154, 18857, 12685, 18964, 17567, 155...","[1072, 1035, 977]",London
51504,"[13417, 14895, 15127, 19034, 18955, 17001, 134...","[807, 816, 1204]",London
27226,"[12448, 15435, 17069, 17367, 16110, 13741, 144...","[1006, 1282, 855]",London
31710,"[12980, 14849, 16597, 12666, 15092, 18899, 170...","[1028, 951, 904]",London
27092,"[12448, 15435, 17069, 17367, 16110, 13741, 144...","[1006, 1282, 855]",London
55888,"[13417, 13010, 19029, 16628, 14567, 17002, 184...","[807, 816, 1204]",London
18571,"[13401, 14945, 15120, 13091, 12638, 17563, 189...","[961, 1135, 1042]",London
54922,"[13402, 15097, 18970, 16623, 14589, 16942, 135...","[807, 816, 1204]",London


In [100]:
test['triplet'][10] == test['triplet'][0]

True

In [102]:
t2 = np.unique(test['triplet'])[0]
test2 = test[[t == t2 for t in test['triplet']]]

In [119]:
test = test.assign(sorted_soln = [list(np.sort(t)) for t in test['soln']])
# np.sort(test2['soln'].iloc[0])

In [120]:
test2.sample(10)

Unnamed: 0,soln,triplet,region,sorted_soln
53627,"[13417, 14895, 18970, 15115, 17576, 14103, 184...","[807, 816, 1204]",London,"[12503, 13134, 13417, 14103, 14895, 15115, 157..."
52511,"[13402, 15097, 19025, 16623, 14573, 17002, 141...","[807, 816, 1204]",London,"[12461, 13238, 13402, 14051, 14187, 14573, 150..."
55870,"[13417, 13010, 19029, 16628, 14567, 17002, 184...","[807, 816, 1204]",London,"[12521, 13010, 13342, 13417, 13710, 14567, 154..."
56213,"[13404, 13009, 18746, 15086, 18949, 16678, 155...","[807, 816, 1204]",London,"[13009, 13219, 13404, 13553, 14387, 15086, 153..."
51873,"[13417, 14895, 18868, 15125, 18966, 14378, 142...","[807, 816, 1204]",London,"[12469, 13417, 13804, 14229, 14378, 14895, 151..."
51813,"[13417, 14895, 18868, 15125, 18966, 14378, 142...","[807, 816, 1204]",London,"[12469, 13417, 14088, 14229, 14378, 14895, 151..."
52934,"[13404, 13009, 19028, 16631, 18666, 17097, 156...","[807, 816, 1204]",London,"[12466, 13009, 13128, 13404, 13451, 14548, 156..."
54003,"[13404, 16467, 18971, 15168, 18936, 15508, 170...","[807, 816, 1204]",London,"[12466, 13404, 13694, 14200, 15168, 15331, 155..."
56196,"[13404, 13009, 18746, 15086, 18949, 16678, 155...","[807, 816, 1204]",London,"[13009, 13219, 13404, 13796, 14398, 15086, 153..."
54642,"[13417, 14895, 18970, 18678, 15165, 16682, 153...","[807, 816, 1204]",London,"[12499, 13417, 14046, 14323, 14408, 14895, 151..."


In [123]:
f"{len(np.unique(test['sorted_soln'])):,}"

'62,500'

In [124]:
const_tris[const_tris['set_no'] == 807]

Unnamed: 0,region,name1,name2,name3,set_no
806,London,Battersea,Vauxhall,Dulwich and West Norwood,807


In [125]:
const_tris[const_tris['set_no'] == 816]

Unnamed: 0,region,name1,name2,name3,set_no
815,London,Beckenham,Bromley and Chislehurst,Orpington,816


In [126]:
const_tris[const_tris['set_no'] == 1204]

Unnamed: 0,region,name1,name2,name3,set_no
1203,London,Greenwich and Woolwich,Lewisham East,"Lewisham, Deptford",1204


In [76]:
np.unique(test['triplet'])

array([list([807, 816, 1204]), list([811, 970, 1129]),
       list([898, 786, 1221]), list([961, 1135, 1042]),
       list([962, 860, 900]), list([1006, 1282, 855]),
       list([1028, 951, 904]), list([1072, 1035, 977]),
       list([1111, 1192, 1009]), list([1151, 1198, 818])], dtype=object)

In [11]:
regions = np.unique(const_pairs['region'])

In [82]:
def count_n(const_pairs, const_tris, const_quads, seats, region):
    if seats == 2:
        orig = const_pairs.copy()
    elif seats == 3:
        orig = const_tris.copy()
    elif seats == 4:
        orig = const_quads.copy()
    col_names = orig.columns[orig.columns.str.startswith('name')]
    df = orig.query("region == @region")
    const_list = np.unique(df[col_names].stack())
    n = len(const_list)
    return n


In [84]:
start1 = datetime.now()
for seats in [2,3,4]:
    for region in regions:
        count_n(const_pairs, const_tris, const_quads, seats, region)
end1 = datetime.now()
print(f"Time taken = {(end1-start1).total_seconds():.4f}s")

Time taken = 1.3800s


In [85]:
start = datetime.now() 
element_information = Parallel(n_jobs=4, verbose=10)(
    delayed(count_n)(const_pairs, const_tris, const_quads, seats, region) for seats in [2,3,4] for region in regions)
end = datetime.now() 
print(f"The total time taken is {(end-start).total_seconds():.4f}s")

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    2.8s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    3.3s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    4.5s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    5.7s
[Parallel(n_jobs=4)]: Done  33 out of  36 | elapsed:    7.3s remaining:    0.7s


The total time taken is 7.7780s


[Parallel(n_jobs=4)]: Done  36 out of  36 | elapsed:    7.8s finished


In [86]:
from datetime import datetime,timedelta
start = datetime.now() 
dtypes={'x': int, 'y': float}
test = pd.read_csv("Solutions/test.csv", low_memory=False) #, dtype=dtypes)
end = datetime.now() 
print(f"The time taken to concatenate is {end - start}s")

The time taken to concatenate is 0:00:03.046728s


In [87]:
from datetime import datetime,timedelta
start = datetime.now() 
dtypes={'x': int, 'y': float}
test = pd.read_csv("Solutions/test.csv") #, low_memory=False, dtype=dtypes)
end = datetime.now() 
print(f"The time taken to concatenate is {end - start}s")

The time taken to concatenate is 0:00:02.798389s


In [7]:
d = {}
start = datetime.now() 
for i in range(100000):
    d[i+1] = pd.DataFrame(data={'region': random.choice(region), 
                                'solns': [[random.sample(popn1, 100)]], 
                                'trips': [random.choice(popn2)]})
end = datetime.now() 
print(f"The time taken to create the dataframe is {end - start}s")

start = datetime.now() 
df = pd.concat(d, ignore_index=True)
end = datetime.now() 
print(f"The time taken to concatenate is {end - start}s")

The time taken to create the dataframe is 0:04:58.224902s
The time taken to concatenate is 0:01:03.580685s


In [70]:
from datetime import datetime,timedelta

In [81]:
start = datetime.now() 
df = pd.concat(d, ignore_index=True)
end = datetime.now() 
print(f"The time taken is {end - start}s")

The time taken is 0:02:04.179022s


In [77]:
df.head(10)

Unnamed: 0,region,solns,trips
0,North West,"[[1792, 1402, 2454, 2025, 2018, 618, 450, 997,...",1.0
1,Scotland,"[[771, 1374, 99, 1496, 527, 1020, 1393, 271, 3...",1.0
2,South West,"[[233, 769, 983, 2231, 2267, 1739, 611, 2084, ...",1.0
3,Scotland,"[[2310, 201, 1768, 431, 2450, 926, 368, 775, 1...",1.0
4,Scotland,"[[1939, 135, 1235, 374, 417, 982, 2106, 827, 6...",1.0
5,Northern Ireland,"[[2067, 877, 1493, 123, 542, 939, 1750, 893, 9...",1.0
6,East,"[[204, 2371, 656, 1651, 194, 530, 1798, 468, 1...",
7,Scotland,"[[55, 38, 58, 1948, 176, 50, 1349, 210, 1649, ...",
8,Wales,"[[1475, 1748, 469, 7, 539, 1338, 1557, 425, 72...",1.0
9,West Midlands,"[[456, 1459, 1138, 1827, 762, 1468, 2057, 313,...",1.0


In [8]:
df.head(10)

Unnamed: 0,region,solns,trips
0,Yorkshire and The Humber,"[[1602, 2187, 2307, 263, 1311, 231, 932, 687, ...",24
1,North West,"[[1445, 1013, 2026, 1009, 1230, 2003, 2429, 14...",85
2,Wales,"[[1760, 1659, 1134, 911, 508, 1623, 999, 2066,...",62
3,Northern Ireland,"[[290, 1898, 244, 701, 2424, 2067, 1670, 1451,...",30
4,South West,"[[1392, 1740, 2179, 46, 1818, 578, 1304, 1801,...",63
5,West Midlands,"[[935, 993, 2076, 2352, 419, 626, 2365, 1719, ...",90
6,South East,"[[2030, 1052, 1197, 225, 904, 807, 1318, 2491,...",68
7,Yorkshire and The Humber,"[[1287, 382, 1714, 347, 394, 749, 642, 188, 22...",60
8,Northern Ireland,"[[1173, 661, 2177, 592, 873, 304, 2279, 250, 1...",12
9,Wales,"[[1719, 1819, 637, 29, 111, 1190, 2375, 467, 4...",38


In [38]:
from ast import literal_eval


In [44]:
# df.to_csv("test.csv", index=False)
df2 = pd.read_csv("test.csv", dtype={'region': str}, converters={'solns': literal_eval})

In [47]:
df2.dtypes

region    object
solns     object
trips      int64
dtype: object

In [46]:
for col in df.columns:
    print(df[col].equals(df2[col]))

True
True
True


In [50]:
df[col][0], df2[col][0]

([0, 1, 2, 3, 4], [0, 1, 2, 3, 4])

In [52]:
df2[col][4]

8

In [53]:
df2[df2.trips.isin(df2.solns[0])]

Unnamed: 0,region,solns,trips
0,London,"[0, 1, 2, 3, 4]",0
1,London,"[2, 3, 4, 5, 6]",1
2,London,"[4, 5, 6, 7, 8]",2
3,London,"[6, 7, 8, 9, 10]",3
4,London,"[8, 9, 10, 11, 12]",4


In [18]:
const_pairs = pd.read_csv("../Analysis/Data/const_pairs.csv")
const_tris = pd.read_csv("../Analysis/Data/const_tris.csv")
const_quads = pd.read_csv("../Analysis/Data/const_quads.csv")

In [19]:
const_pairs.shape, const_tris.shape, const_quads.shape

((1476, 4), (4714, 5), (75048, 6))