In [1]:
CODE_DIR := "gap/";;

In [2]:
Read(Concatenation(CODE_DIR, "pmatmul.g"));

# Simple speedups for unsophisticated users on multicore workstations

## Multiplying matrices

We create a couple of random rational matrices and check GAP's default serial multiply.

In [5]:
m1 := RandomMat(400, 400, Rationals);;
m2 := RandomMat(400, 400, Rationals);;
ShowBench(\*, m1, m2);

wall time: 10.67s cpu time: 9.23s memory allocated: 1006.69MB result returned


In a few lines of code, we can implement a simple blocked matrix multiply based on $$(A*B)_{ik} = \sum_j A_{ij}*B_{jk}$$  

In [6]:
MatMulWithTasks := function(m1, m2, chop1, chop2, chop3)
    local  A, B, prodtasks, sumtasks, C;
    
    # divide matrices into blocks
    A := block(m1, chop1, chop2);
    B := block(m2, chop2, chop3);

    # Start chop1*chop2*chop3 multiply tasks
    prodtasks := List([1..chop1], i-> List([1..chop2], j-> 
        List([1..chop3], k -> RunTask(\*, A[i][j],B[j][k]))));
    # And chop1 * chop3 tasks to do the summations
    sumtasks := List([1..chop1], i -> List([1..chop3], k-> 
        RunTask(Accumulate,AddMat,ShallowCopyMat, 
                prodtasks[i]{[1..chop2]}[k])));
    # Finally wait for the summations to complete and assemble the result
    C := List(sumtasks, row -> List(row, TaskResult));
    return unblock(C);
end;


function( m1, m2, chop1, chop2, chop3 ) ... end

In [7]:
ShowBench(MatMulWithTasks, m1, m2, 4, 4, 4);

wall time: 2.44s cpu time: 14.54s memory allocated: 458.3MB result returned


The `Accumulate` function takes a list of tasks and combines their results as they become available, allowing memory to be recovered quickly.

In [8]:
Display(Accumulate);

function ( op, makebase, tasks )
    local i, acc;
    i := WaitAnyTask( tasks );
    acc := makebase( TaskResult( tasks[i] ) );
    Remove( tasks, i );
    while Length( tasks ) > 0 do
        i := WaitAnyTask( tasks );
        op( acc, TaskResult( tasks[i] ) );
        Remove( tasks, i );
    od;
    return acc;
end


## A Simple Search

We search for *Association Schemes* preserved by interesting permutation groups. Our initial filter selects relevant groups where the problem is non-trivial from GAP's primtive groups database. 

In [11]:
c := cands([136,165],[1..13]);; List(c, x -> x.g); List(c, x-> x.rank);

[ PSL(2, 17), M_11 ]

[ 11, 7 ]

We apply, for now, a brute force search over all partitions of the set $\{2\ldots r\}$ where $r$ is the rank of the permutation action. This is a fair sized search space and grows very rapidly with $r$.

In [12]:
NrPartitionsSet([2..11]);

115975

In [15]:
BruteForceSearch := function ( s )
    return Filtered( PartitionsSet( [ 2 .. s.rank ] ), 
        p -> TestPartition( s, p ));        
end;

function( s ) ... end

In [None]:
ShowBench(Brute, c[1]);

A very simple approach to parallelising this brute force search produces a useful speedup

In [None]:
ParBruteForceSearch := function ( s )
    return ParFiltered( PartitionsSet( [ 2 .. s.rank ] ), 
        p -> TestPartition( s, p ));        
end;

In [6]:
ShowBench(ParBrute, c[1]);

wall time: 21.15s cpu time: 71.66s memory allocated: 14.43GB result returned


Other things -- some of the atomic data structures?