# The Path Forward
If the WASM slowdown is generically by 1/3.4, then our heuristics say we should aim the follow time on 16 cores for an x86 build:


In [49]:
TARGET_TIME = 60/4.25
TARGET_TIME 

14.117647058823529

Set up the basic utility class to compute and display the effects of our optimizations

In [50]:
from prettytable import PrettyTable
FIELD_NAMES = ["-", "total", "rels", "msm", "trace", "etc"]
NUM_RELATIONS = 10 
NUM_THREADS = 16

def normalize(d):
    total = sum(d.values())
    return {k: round(100*v/total) for (k, v) in d.items()}


class TableData:
    def __init__(self, percs, times):
        self.percs = percs
        self.times = times

    def to_table(self):
        result = PrettyTable()
        result.field_names = FIELD_NAMES

        percs = list(self.percs.values())
        times = list(self.times.values())
        result.add_row(["%"] + [round(sum(percs), 2)] + percs)
        result.add_row(["#(s)"] + [round(sum(times), 2)] + times)
        return result

    def update_perc(self, k, v):
        self.percs[k] = self.percs[k] * v
        self.percs = normalize(self.percs)

    def update_time(self, k, v):
        self.times[k] = round(self.times[k] * v, 2)

    def update(self, k, v):
        self.update_perc(k, v)
        self.update_time(k, v)

    def __str__(self):
        return self.to_table().__str__()

Set initial distribution using the numbers of https://github.com/AztecProtocol/aztec-packages/pull/5211

In [51]:
INITIAL_TIME = 23.7
INITIAL_PERCS = {"rels": 35,   "msm": 35,   "trace": 20,    "etc": 10}
INITIAL_TIMES = dict([(key, INITIAL_TIME * val / 100)
                     for (key, val) in INITIAL_PERCS.items()])
T = TableData(INITIAL_PERCS, INITIAL_TIMES)
print(T)

+------+-------+-------+-------+-------+------+
|  -   | total |  rels |  msm  | trace | etc  |
+------+-------+-------+-------+-------+------+
|  %   |  100  |   35  |   35  |   20  |  10  |
| #(s) |  23.7 | 8.295 | 8.295 |  4.74 | 2.37 |
+------+-------+-------+-------+-------+------+


## Structuring the execution trace

By structuring the exeuction trace, we can reduce the time to execute the relations by a factor of $1/\text{NUM\_RELATIONS}$

In [52]:
REDUCTION_DUE_TO_TRACE_STRUCTURING = 1/NUM_RELATIONS
T.update("rels", REDUCTION_DUE_TO_TRACE_STRUCTURING)
print(T)


+------+-------+------+-------+-------+------+
|  -   | total | rels |  msm  | trace | etc  |
+------+-------+------+-------+-------+------+
|  %   |  100  |  5   |   51  |   29  |  15  |
| #(s) | 16.23 | 0.83 | 8.295 |  4.74 | 2.37 |
+------+-------+------+-------+-------+------+


## Improved multithreading

OMP did a better job by about 35% on the sizes we care about in terms of commitments. Assume that holds and we can get the corresponding reduction in x86.

In [53]:
# REDUCTION_DUE_TO_OMP = 0.65
# T.update("msm", REDUCTION_DUE_TO_OMP)
# print(T)


## Better field arithemetic

Heuristically, by using Karatsuba in our WASM we can get a 25% reduction across the board when doing field arithmetic. Let's be pessimistic and say that's actually 15%. Last I checked, field arithmetic accounts for 80% of our proof construction time, so we'd hope for a 12% reduction across the board.

In [54]:
REDUCTION_DUE_TO_USING_NIM = 1-0.12

T.update("msm", REDUCTION_DUE_TO_USING_NIM)
T.update("rels", REDUCTION_DUE_TO_USING_NIM)
T.update("trace", REDUCTION_DUE_TO_USING_NIM)
T.update("etc", REDUCTION_DUE_TO_USING_NIM)
print(T)

+------+-------+------+-----+-------+------+
|  -   | total | rels | msm | trace | etc  |
+------+-------+------+-----+-------+------+
|  %   |  100  |  4   |  51 |   30  |  15  |
| #(s) | 14.29 | 0.73 | 7.3 |  4.17 | 2.09 |
+------+-------+------+-----+-------+------+
