# Matrix vector multiplication in SHEEP
Since we can use "slots", i.e. vectors of input values, and many libraries support SIMD operations (i.e. operating on many slots is just as quick as operating on single values), we can easily do component-wise multiplication.  However, as with vector dot products, we generally need to also sum over slots, which can be done using a sequence of "ROTATE" and "ADD" operations.

In [1]:
import os
if "SHEEP_HOME" in os.environ.keys():
  SHEEP_HOME = os.environ["SHEEP_HOME"]
else:
  SHEEP_HOME = os.path.join(os.environ["HOME"],"SHEEP","frontend")
import sys
sys.path.append(SHEEP_HOME)

from pysheep import sheep_client

## Multiplying a 4x4 matrix with a 4-component vector


Lets do the following calculation:
\begin{equation*}
\begin{vmatrix}
1, 2 , 3 , 4\\
5 , 6 , 7 , 8\\
9, 10, 11, 12 \\
13, 14 , 15, 16
\end{vmatrix}
\begin{vmatrix}
1 \\
2 \\
3 \\
4
\end{vmatrix}
\end{equation*}

which should give us the answer {30, 55, 110, 150}


### a) The straightforward (but non-optimal) way

Essentially the each element of the output vector will be the dot product of the corresponding row of the matrix with the vector.  We can therefore do this in the same way as was demonstrated in the vector_dot_product notebook, i.e. component-wise multiplication followed by a sequence of rotations and additions.

The circuit will look like:

In [2]:
circuit = """
INPUTS mrow_0 mrow_1 mrow_2 mrow_3 vec mask
CONST_INPUTS rotate_by_minus1 rotate_by_plus1 rotate_by_plus2 rotate_by_plus3
OUTPUTS output_vec  sum_0 msum_0 
# dot product of the first row with the vector
mrow_0 vec MULTIPLY prod_00
prod_00 rotate_by_minus1 ROTATE prod_01
prod_00 prod_01 ADD sum_00
prod_01 rotate_by_minus1 ROTATE prod_02
sum_00 prod_02 ADD sum_01
prod_02 rotate_by_minus1 ROTATE prod_03
sum_01 prod_03 ADD sum_0
# dot product of the second row with the vector
mrow_1 vec MULTIPLY prod_10
prod_10 rotate_by_minus1 ROTATE prod_11
prod_10 prod_11 ADD sum_10
prod_11 rotate_by_minus1 ROTATE prod_12
sum_10 prod_12 ADD sum_11
prod_12 rotate_by_minus1 ROTATE prod_13
sum_11 prod_13 ADD sum_1
# dot product of the third row with the vector
mrow_2 vec MULTIPLY prod_20
prod_20 rotate_by_minus1 ROTATE prod_21
prod_20 prod_21 ADD sum_20
prod_21 rotate_by_minus1 ROTATE prod_22
sum_20 prod_22 ADD sum_21
prod_22 rotate_by_minus1 ROTATE prod_23
sum_21 prod_23 ADD sum_2
# dot product of the third row with the vector
mrow_3 vec MULTIPLY prod_30
prod_30 rotate_by_minus1 ROTATE prod_31
prod_30 prod_31 ADD sum_30
prod_31 rotate_by_minus1 ROTATE prod_32
sum_30 prod_32 ADD sum_31
prod_32 rotate_by_minus1 ROTATE prod_33
sum_31 prod_33 ADD sum_3
# now we have four vectors, sum_0, sum_1, sum_2 and sum_3, where the first element is 
# the dot product of that row.  We need to isolate just this element, using mask, which is [1,0,0,0]
sum_0 mask MULTIPLY msum_0
sum_1 mask MULTIPLY msum_10
msum_10 rotate_by_plus1 ROTATE msum_1
sum_2 mask MULTIPLY msum_20
msum_20 rotate_by_plus2 ROTATE msum_2
sum_3 mask MULTIPLY msum_30
msum_30 rotate_by_plus3 ROTATE msum_3
# now we should have four vectors with one non-zero element each in the right place - need to sum them
msum_0 msum_1 ADD out_01
out_01 msum_2 ADD out_02
out_02 msum_3 ADD output_vec
"""

In [3]:
sheep_client.new_job()
sheep_client.set_context("HElib_Fp")
sheep_client.set_input_type("int16_t")
# set the "Levels" parameter so we can do more multiplications without getting the wrong answer.
sheep_client.set_parameters({"Levels": 30})
sheep_client.set_circuit_text(circuit)
sheep_client.get_inputs()

{'content': ['mrow_0', 'mrow_1', 'mrow_2', 'mrow_3', 'vec', 'mask'],
 'status_code': 200}

In [4]:
sheep_client.get_const_inputs()

{'content': ['rotate_by_minus1',
  'rotate_by_plus1',
  'rotate_by_plus2',
  'rotate_by_plus3'],
 'status_code': 200}

In [5]:
sheep_client.set_inputs({"mrow_0": [1,2,3,4], "mrow_1": [5,6,7,8], "mrow_2": [9,10,11,12], "mrow_3": [13,14,15,16],"vec": [1,2,3,4], "mask": [1,0,0,0]})


{'content': '', 'status_code': 200}

In [6]:
sheep_client.set_const_inputs({"rotate_by_minus1": -1, "rotate_by_plus1": 1, "rotate_by_plus2": 2, "rotate_by_plus3":3})

{'content': '', 'status_code': 200}

In [7]:
sheep_client.run_job()

{'content': '', 'status_code': 200}

In [8]:
sheep_client.get_results()

{'content': {'cleartext check': {'is_correct': True},
  'outputs': {'msum_0': ['30,0,0,0'],
   'output_vec': ['30,70,110,150'],
   'sum_0': ['30,30,30,30']},
  'timings': {'decryption': '3592.900000',
   'encryption': '17918.700000',
   'evaluation': '250372.500000',
   'msum_0': '197.500000',
   'msum_1': '189.300000',
   'msum_10': '202.200000',
   'msum_2': '197.100000',
   'msum_20': '198.700000',
   'msum_3': '206.400000',
   'msum_30': '186.500000',
   'out_01': '197.700000',
   'out_02': '185.800000',
   'output_vec': '197.200000',
   'prod_00': '5272.400000',
   'prod_01': '17289.300000',
   'prod_02': '11895.800000',
   'prod_03': '17369.400000',
   'prod_10': '205.200000',
   'prod_11': '303.500000',
   'prod_12': '3507.100000',
   'prod_13': '12176.500000',
   'prod_20': '3563.700000',
   'prod_21': '15683.800000',
   'prod_22': '12409.600000',
   'prod_23': '15374.700000',
   'prod_30': '12173.800000',
   'prod_31': '3355.500000',
   'prod_32': '11979.800000',
   'prod_33':

### b) A better way -  doing the same calculation with fewer ROTATEs.

The paper describing the [GAZELLE](https://eprint.iacr.org/2018/073.pdf) framework includes a clever method for speeding up matrix-vector multiplication with SIMD operations.
To minimize the number of intra-slot operations, we can express the matrix inputs as diagonal strips of the matrix.

In this case, the circuit will look like:

In [93]:
new_circuit = """
INPUTS mstrip_0 mstrip_1 mstrip_2 mstrip_3 vec 
CONST_INPUTS rotate_minus1
OUTPUTS output_vec 
mstrip_0 vec MULTIPLY prod_0
vec rotate_minus1 ROTATE vec_r1
mstrip_1 vec_r1 MULTIPLY prod_1
vec_r1 rotate_minus1 ROTATE vec_r2
mstrip_2 vec_r2 MULTIPLY prod_2
vec_r2 rotate_minus1 ROTATE vec_r3
mstrip_3 vec_r3 MULTIPLY prod_3
prod_0 prod_1 ADD sum_0
sum_0 prod_2 ADD sum_1
sum_1 prod_3 ADD output_vec
"""

In [94]:
sheep_client.new_job()
sheep_client.set_context("HElib_Fp")
sheep_client.set_input_type("int16_t")
sheep_client.set_circuit_text(new_circuit)


{'content': '', 'status_code': 200}

We can set some of the inputs similarly to above.

In [95]:
const_input_vals = {"rotate_minus1": -1}
input_vals = {"vec": [1,2,3,4]}

So now we have to set the remaining inputs to be diagonal strips of the matrix:

In [96]:
input_vals["mstrip_0"] = [1,6,11,16]
input_vals["mstrip_1"] = [2,7,12,13]
input_vals["mstrip_2"] = [3,8,9,14]
input_vals["mstrip_3"] = [4,5,10,15]
sheep_client.set_inputs(input_vals)
sheep_client.set_const_inputs(const_input_vals)

{'content': '', 'status_code': 200}

In [97]:
sheep_client.run_job()

{'content': '', 'status_code': 200}

In [98]:
sheep_client.get_results()

{'content': {'cleartext check': {'is_correct': True},
  'outputs': {'output_vec': ['30,70,110,150']},
  'timings': {'decryption': '425.600000',
   'encryption': '4726.200000',
   'evaluation': '22525.600000',
   'output_vec': '5236.400000',
   'prod_0': '93.900000',
   'prod_1': '1607.600000',
   'prod_2': '1622.600000',
   'prod_3': '101.800000',
   'sum_0': '4356.100000',
   'sum_1': '5685.600000',
   'vec_r1': '1356.600000',
   'vec_r2': '1701.300000',
   'vec_r3': '65.200000'}},
 'status_code': 200}

Note that there is a function in ```pysheep.mid_level_benchmarks``` called ```generate_matrix_vector_mult``` that will take a matrix (as a list of lists) and a vector (as a list), and return a circuit, dict-of-input-vals, and dict-of-output-vals, for use in this scheme.

Let's try it out:

In [52]:
sheep_client.new_job()

{'content': '', 'status_code': 200}

In [53]:
sheep_client.set_context("HElib_Fp")

{'content': '', 'status_code': 200}

In [54]:
sheep_client.set_input_type("int16_t")

{'content': '', 'status_code': 200}

In [55]:
sheep_client.set_parameters({"Levels": 30})

{'content': '', 'status_code': 200}

In [58]:
matrix = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]
vec = [1,2,3,4]

In [59]:
from pysheep.mid_level_benchmarks import *


In [60]:
circ, inputs, const_inputs = generate_matrix_vector_mult(matrix,vec)

In [61]:
sheep_client.set_circuit_text(circ)

{'content': '', 'status_code': 200}

In [62]:
print(circ)

OUTPUTS output_vec
CONST_INPUTS rotate_minus1
INPUTS input_vec mstrip_0 mstrip_1 mstrip_2 mstrip_3 
input_vec ALIAS vec_r0
mstrip_0 vec_r0 MULTIPLY prod_0
vec_r0 rotate_minus1 ROTATE vec_r1
mstrip_1 vec_r1  MULTIPLY prod_1
vec_r1 rotate_minus1 ROTATE vec_r2
mstrip_2 vec_r2  MULTIPLY prod_2
vec_r2 rotate_minus1 ROTATE vec_r3
mstrip_3 vec_r3  MULTIPLY prod_3
prod_0 prod_1 ADD sum_0
sum_0 prod_2 ADD sum_1
sum_1 prod_3 ADD sum_2
sum_2 ALIAS output_vec



In [63]:
sheep_client.set_inputs(inputs)

{'content': '', 'status_code': 200}

In [64]:
sheep_client.set_const_inputs(const_inputs)

{'content': '', 'status_code': 200}

In [65]:
sheep_client.run_job()

{'content': '', 'status_code': 200}

In [66]:
sheep_client.get_results()

{'content': {'cleartext check': {'is_correct': True},
  'outputs': {'output_vec': ['30,70,110,150']},
  'timings': {'decryption': '1011.900000',
   'encryption': '14887.200000',
   'evaluation': '53811.800000',
   'output_vec': '12352.100000',
   'prod_0': '3479.900000',
   'prod_1': '4077.400000',
   'prod_2': '158.600000',
   'prod_3': '222.500000',
   'sum_0': '87.900000',
   'sum_1': '10386.800000',
   'sum_2': '12506.400000',
   'vec_r0': '85.500000',
   'vec_r1': '4332.300000',
   'vec_r2': '4222.800000',
   'vec_r3': '237.200000'}},
 'status_code': 200}