# CGARMv8

## Prototype Work Overview
The main purpose of the prototype is to add enough of an ARMv8 backend to be able to test code generation for array element-wise multiplication.

## General Changes
This codegen file is derived from CGMips and implements a portion of the codegen procedures in proper ARMv8 assembly. The assembly is currently structured and validated by assembling with a GCC version compiled for AArch64 with the 
>aarch64-linux-gnu-gcc -static out.S -o out

Unfortunately due to unforseen circumstances we have not validated the code through practice as our method of choice (Unicorn-Engine's CPU emulation) is currently raising a CPU exception on any SIMD instruction.

## Missing Features and Future Work
The codegen is missing calling conventions and thus will not handle any procedure calls properly at the moment. Similarly, the new codgen requires a new way to trigger syscalls to implement builtin methods such as: exit, read, and write. Finally, miscellaneous instructions and sequences are missing at the moment in an effort to add codegen for the new array element-wise operations. 

For the final submission, we would like to finish the codegen including: syscalls, procedure and calling conventions, minor optimizations, and more testing of content in P0test and new SIMD test cases. It would be exceptional to get the code emulated successfully under Unicorn to validate that the code that is being generated is correct. If somehow time permits at the end of the project it would be ideal to port any new codegen functions to be implemented in CGMips and CGast to ensure compatibility between backends to support the new additions to the frontend.

In [None]:
"""
Pascal0 Code Generator for ARMv8, Emil Sekerinski, Gabriel Dalimonte, Gavin Johnson, March 2017.
Using delayed code generation for a one-pass compiler. The types of symbol
table entries for expressions, Var, Ref, Const, are extended by two more
types Reg for expression results in a register, and Cond, for short-circuited
Boolean expressions with two branch targets.
"""

import SC  #  used for SC.error
from SC import TIMES, DIV, MOD, AND, PLUS, MINUS, OR, EQ, NE, LT, GT, LE, \
     GE, NOT, mark
from ST import Var, Ref, Const, Type, Proc, StdProc, Int, Bool

# w31's value is context dependent based on the instruction
ZR = 'wzr'; FP = 'w29'; SP = 'wsp'; LNK = 'w30'  # reserved registers

class Reg:
    """
    For integers or booleans stored in a register;
    register can be $0 for constants '0' and 'false'
    """
    def __init__(self, tp, reg):
        self.tp, self.reg = tp, reg

class Cond:
    """
    For a boolean resulting from comparing left and right by cond:
    left, right are either registers or constants, but one has to be a register;
    cond is one of 'EQ', 'NE', 'LT', 'GT', 'LE', 'GE';
    labA, labB are lists of branch targets for when the result is true or false
    if right is $0, then cond 'EQ' and 'NE' can be used for branching depending
    on register left.
    """
    count = 0
    def __init__(self, cond, left, right):
        self.tp, self.cond, self.left, self.right = Bool, cond, left, right
        self.labA = ['.C' + str(Cond.count)]; Cond.count += 1
        self.labB = ['.C' + str(Cond.count)]; Cond.count += 1

## DeferredBlocks

Deferred blocks are a concept introduced to the compiler codegen to be able abstract the implementation of array element-wise operations. Codegen for SIMD blocks are deferred as a destination memory location from the Becomes statement. In addition to passing around a "working register" it is necessary to include an accumulator register for loop offset. The tp of the Deferred block is that of Array in which the operation's resulting type will be. And func is an array of functions to call in the order specified to accurately emit the deferred code.

In [None]:
class DeferredBlock:
    """
    Builds a deferred block which will allow for deferred cvode generation
    of array operations to have a desitnation register.
    The notion is the concept of a deferred block chain
    For the current use of deferred blocks (for SIMD) bunching
    the deferred assignment generates the loop prologue and epilogue
    Then each func in function chain is called appending the actions
    of the previous. The return from func is the output register
    """
    def __init__(self, tp, func):
        self.func = [func]
        self.tp = tp

# curlev is the current level of nesting of procedures
# regs is the set of available registers for expression evaluation
# asm is a list of triples; each triple consists of three strings
# - a label
# - an instruction, possibly with operands
# - a target (for branch and jump instructions)
# each of them can be the empty string

def obtainReg():
    if len(regs) == 0: mark('out of registers'); return ZR
    else: return regs.pop()

    

### ObtainVectorReg
Contains a list of unused SIMD registers which may be used. 

In [None]:
def obtainVectorReg():
    if len(vregs) == 0: mark('out of SIMD registers'); return ZR
    else: return vregs.pop()

### ReleaseReg
Releases a register (of any type) back to the appropriate list. Removes the need for separate release functions.

In [None]:
def releaseReg(r):
    if r not in (ZR, SP, FP, LNK): (regs if r[0] == 'w' or r[0] == 'x' else vregs).add(r)

def putLab(lab, instr = ''):
    """Emit label lab with optional instruction; lab may be a single
    label or a list of labels"""
    if type(lab) == list:
        for l in lab[:-1]: asm.append((l, '', ''))
        asm.append((lab[-1], instr, ''))
    else: asm.append((lab, instr, ''))

def putInstr(instr, target = ''):
    """Emit an instruction"""
    asm.append(('', instr, target))

### put*
Outputting instructions of different number of operands. In the future, this would be merged into one put function with optional arguements to handle 2-4 operand opcodes.

In [None]:
def put(op, a, b, c):
    """Emit instruction op with three operands, a, b, c"""
    putInstr(op + ' ' + a + ', ' + str(b) + ', ' + str(c))

def put2(op, a, b):
    """Emit instruction op with three operands, a, b, c"""
    putInstr(op + ' ' + a + ', ' + str(b))

def put4(op, a, b, c, d):
    putInstr(op + ' ' + a + ', ' + b + ', ' + c + ', ' + d)


def putB(op, a, b, c):
    putInstr(op + ' ' + a + ', ' + str(b), str(c))

def putM(op, a, b, c):
    """Emit load/store instruction at location or register b + offset c"""
    # TODO: Hack, should calculate base register where this is called
    # Hack, up cast this to 64-bit reg
    if b == ZR: putInstr(op + ' ' + a + ', [' + c + ']')
    else: putInstr(op + ' ' + a + ', [' + c + ', ' + b + ']')

def testRange(x):  # TODO
    """Check if x is suitable for immediate addressing"""
    if x.val >= 0x8000 or x.val < -0x8000: mark('value too large')
    
def loadItemReg(x, r):
    """Assuming item x is Var, Const, or Reg, loads x into register r"""
    if type(x) == Var:
        s = obtainReg()
        s = 'x' + s[1:]
        put2('adrp', s, x.adr)
        put('add', s, s, ':lo12:' + x.adr)
        putM('ldr', r, x.reg, s); s = 'w' + s[1:]; releaseReg(s); releaseReg(x.reg)
    elif type(x) == Const:
        testRange(x); put2('mov', r, '#' + str(x.val))
    elif type(x) == Reg: # move to register r
        put2('mov', r, x.reg)
    else: assert False

def loadItem(x):
    """Assuming item x is Var or Const, loads x into a new register and
    returns a new Reg item"""
    if type(x) == Const and x.val == 0: r = ZR # use ZR for "0"
    else: r = obtainReg(); loadItemReg(x, r)
    return Reg(x.tp, r)

def loadBool(x):
    """Assuming x is Var or Const and x has type Bool, loads x into a
    new register and returns a new Cond item"""
    # improve by allowing c.left to be a constant
    if type(x) == Const and x.val == 0: r = ZR # use ZR for "false"
    else: r = obtainReg(); loadItemReg(x, r)
    c = Cond(NE, r, ZR)
    return c

def putOp(cd, x, y):
    """For operation op with mnemonic cd, emit code for x op y, assuming
    x, y are Var, Const, Reg"""
    if type(x) != Reg: x = loadItem(x)
    if x.reg == ZR: x.reg, r = obtainReg(), ZR
    else: r = x.reg # r is source, x.reg is destination
    if type(y) == Const:
        testRange(y); put(cd, r, x.reg, y.val)
    else:
        if type(y) != Reg: y = loadItem(y)
        put(cd, x.reg, r, y.reg); releaseReg(y.reg)
    return x

def assembly(l, i, t):
    """Convert label l, instruction i, target t to assembly format"""
    return (l + ':\t' if l else '\t') + i + (', ' + t if t else '')

# public functions

def init():
    """initializes the code generator"""
    global asm, curlev, regs, vregs
    asm, curlev = [], 0
    regs = {('w' + str(i)) for i in range(9,16)}
    vregs = {('v' + str(i)) for i in range(31)}
                                
def genRec(r):
    """Assuming r is Record, determine fields offsets and the record size"""
    s = 0
    for f in r.fields:
        f.offset, s = s, s + f.tp.size
    r.size = s
    return r

def genArray(a):
    """Assuming r is Array, determine its size"""
    # adds size
    a.size = a.length * a.base.size
    return a

def genLocalVars(sc, start):
    """For list sc of local variables, starting at index start, determine the
    $fp-relative addresses of variables"""
    s = 0 # local block size
    for i in range(start, len(sc)):
        if type(sc[i]) == Var:
            s = s + sc[i].tp.size
            sc[i].adr = - s - 8
    return s

def genGlobalVars(sc, start):
    """For list sc of global variables, starting at index start, determine the
    address of each variable, which is its name with a trailing _"""
    for i in range(len(sc) - 1, start - 1, - 1):
        if type(sc[i]) == Var:
            sc[i].adr = sc[i].name + '_'
            putLab(sc[i].adr, '.space ' + str(sc[i].tp.size))

def progStart():
    putInstr('.data')

def progEntry(ident):
    putInstr('.text')
    putInstr('.global main')
    #putInstr('.entry')
    putLab('main')

### progExit
Major work to be done on this method. Syscalls not currently implemented. The final submission should have opcodes included.

In [None]:
def progExit(x):
    # TODO: Syscalls
    putInstr("nop")
    #putInstr('li $v0, 10')
    #putInstr('syscall')
    #putInstr('.end main')
    return '\n'.join(assembly(l, i, t) for (l, i, t) in asm)
        
def procStart():
    global curlev, parblocksize
    curlev = curlev + 1
    putInstr('.text')

def genFormalParams(sc):
    """For list sc with formal procedure parameters, determine the $fp-relative
    address of each parameters; each parameter must be type integer, boolean
    or must be a reference parameter"""
    s = 0 # parameter block size
    for p in reversed(sc):
        if p.tp == Int or p.tp == Bool or type(p) == Ref:
            p.adr, s = s, s + 4
        else: mark('no structured value parameters')
    return s

### genProc*
These methods are currently broken and unimplemented. Procedure calls and AArch64 calling conventions are not known or implemented at the moment however, the calling conventions should be implemented by the final submission.

In [None]:
def genProcEntry(ident, parsize, localsize):  # TODO
    """Declare procedure name, generate code for procedure entry"""
    putInstr('.globl ' + ident)        # global declaration directive
    putInstr('.ent ' + ident)          # entry point directive
    putLab(ident)                      # procedure entry label
    # TODO: AArch64 calling conventions
    #putM('sw', FP, SP, - parsize - 4)  # push frame pointer
    #putM('sw', LNK, SP, - parsize - 8) # push return address
    put('sub', FP, SP, parsize)        # set frame pointer
    put('sub', SP, FP, localsize + 8)  # set stack pointer

def genProcExit(x, parsize, localsize): # generates return code  #TODO
    global curlev
    curlev = curlev - 1
    # TODO: Need to see AArch64 calling conventions
    put('add', SP, FP, parsize)
    #putM('lw', LNK, FP, - 8)
    #putM('lw', FP, FP, - 4)
    #putInstr('jr $ra')
    putInstr("ret w30")

def genSelect(x, f):
    # x.f, assuming y is name in one of x.fields
    x.tp, x.adr = f.tp, x.adr + f.offset if type(x.adr) == int else \
                        x.adr + '+' + str(f.offset)
    return x

def genIndex(x, y):
    # x[y], assuming x is ST.Var or ST.Ref, x.tp is ST.Array, y.tp is ST.Int
    # assuming y is Const and y.val is valid index, or Reg integer
    if type(y) == Const:
        offset = (y.val - x.tp.lower) * x.tp.base.size
        x.adr = x.adr + (offset if type(x.adr) == int else '+' + str(offset))
    else:
        if type(y) != Reg: y = loadItem(y)
        put('sub', y.reg, y.reg, x.tp.lower)
        if x.reg != ZR:
            put4('madd', y.reg, x.tp.base.size, y.reg, x.reg); releaseReg(x.reg)
        else:
            put('mul', y.reg, y.reg, x.tp.base.size)
        x.reg = y.reg
    x.tp = x.tp.base
    return x


def genVar(x):
    # assuming x is ST.Var, ST.Ref, ST.Const
    # for ST.Const: no code, x.val is constant
    # for ST.Var: x.reg is FP for local, 0 for global vars,
    #   x.adr is relative or absolute address
    # for ST.Ref: address is loaded into register
    # returns ST.Var, ST.Const
    if type(x) == Const: y = x
    else:
        if x.lev == 0: s = ZR
        elif x.lev == curlev: s = FP
        else: mark('level!'); s = ZR
        y = Var(x.tp); y.lev = x.lev
        if type(x) == Ref: # reference is loaded into register
            t = obtainReg()
            t = 'x' + t[1:]
            put2('adrp', t, x.adr)
            put('add', t, t, ':lo12:' + x.adr)
            r = obtainReg(); putM('ldr', r, s, t)
            t = 'w' + t[1:]
            releaseReg(t)
            y.reg, y.adr = r, 0
        elif type(x) == Var:
            y.reg, y.adr = s, x.adr
        else: y = x # error, pass dummy item
    return y

def genConst(x):
    # assumes x is ST.Const
    return x

def genUnaryOp(op, x):
    """If op is MINUS, NOT, x must be an Int, Bool, and op x is returned.
    If op is AND, OR, x is the first operand (in preparation for the second
    operand"""
    if op == MINUS: # subtract from 0
        if type(x) == Var: x = loadItem(x)
        put2('neg', x.reg, x.reg)
    elif op == NOT: # switch condition and branch targets, no code
        if type(x) != Cond: x = loadBool(x)
        x.cond = negate(x.cond); x.labA, x.labB = x.labB, x.labA
    elif op == AND: # load first operand into register and branch
        if type(x) != Cond: x = loadBool(x)
        putB(condOp(negate(x.cond)), x.left, x.right, x.labA[0])  # TODO
        releaseReg(x.left); releaseReg(x.right); putLab(x.labB)
    elif op == OR: # load first operand into register and branch
        if type(x) != Cond: x = loadBool(x)
        putB(condOp(x.cond), x.left, x.right, x.labB[0])  # TODO
        releaseReg(x.left); releaseReg(x.right); putLab(x.labA)
    else: assert False
    return x

### getBinaryOp
This method is currently missing a valid 'mod' implementation. 'mod' is not an instruction on AArch64 and a manual implementation has not been written yet. In the case of a 'mod' being emitted, the resulting code will not compile. By the final submission, we would like to implement the 'mod' operation as a series of instructions.

In [None]:
def genBinaryOp(op, x, y):
    """assumes x.tp == Int == y.tp and op is TIMES, DIV, MOD
    or op is AND, OR"""
    if op == PLUS: y = putOp('add', x, y)
    elif op == MINUS: y = putOp('sub', x, y)
    elif op == TIMES: y = putOp('mul', x, y)
    elif op == DIV: y = putOp('sdiv', x, y)
    elif op == MOD: y = putOp('mod', x, y)  # TODO
    elif op == AND: # load second operand into register 
        if type(y) != Cond: y = loadBool(y)
        y.labA += x.labA # update branch targets
    elif op == OR: # load second operand into register
        if type(y) != Cond: y = loadBool(y)
        y.labB += x.labB # update branch targets
    else: assert False
    return y

def negate(cd):
    """Assume cd in {EQ, NE, LT, LE, GT, GE}, return not cd"""
    return NE if cd == EQ else \
           EQ if cd == NE else \
           GE if cd == LT else \
           GT if cd == LE else \
           LE if cd == GT else \
           LT

def condOp(cd):  # TODO only allows checking for zero and not zero on branch ops. Comparisons will have to be done elsewhere
    """Assumes cd in {EQ, NE, LT, LE, GT, GE}, return instruction mnemonic"""
    return 'beq' if cd == EQ else \
           'bne' if cd == NE else \
           'blt' if cd == LT else \
           'ble' if cd == LE else \
           'bgt' if cd == GT else \
           'bge'

def genRelation(cd, x, y):
    """Assumes x, y are Int and cd is EQ, NE, LT, LE, GT, GE;
    x and y cannot be both constants; return Cond for x cd y"""
    if type(x) != Reg: x = loadItem(x)
    if type(y) != Reg: y = loadItem(y)
    return Cond(cd, x.reg, y.reg)

assignCount = 0

def genAssign(x, y):
    """Assume x is Var, generate x := y"""
    global assignCount, regs
    if type(y) == Cond:
        putB(condOp(negate(y.cond)), y.left, y.right, y.labA[0])
        releaseReg(y.left); releaseReg(y.right); r = obtainReg()
        putLab(y.labB); put2('mov', r, '#1') # load true
        lab = '.A' + str(assignCount); assignCount += 1
        putInstr('B', lab)
        putLab(y.labA); put2('mov', r, '#0') # load false 
        putLab(lab)
    elif type(y) != Reg: y = loadItem(y); r = y.reg
    else: r = y.reg
    s = obtainReg()
    s = 'x' + s[1:]
    put2('adrp', s, x.adr)
    put('add', s, s, ':lo12:' + x.adr)
    putM('str', r, x.reg, s)
    s = 'w' + s[1:]
    releaseReg(s)
    releaseReg(r)

def genActualPara(ap, fp, n):  # TODO
    """Pass parameter, ap is actual parameter, fp is the formal parameter,
    either Ref or Var, n is the parameter number"""
    if type(fp) == Ref:  #  reference parameter, assume p is Var
        if ap.adr != 0:  #  load address in register
            r = obtainReg(); putM('la', r, ap.reg, ap.adr)  # TODO
        else: r = ap.reg  #  address already in register
        putM('str', r, SP, - 4 * (n + 1)); releaseReg(r)
    else:  #  value parameter
        if type(ap) != Cond:
            if type(ap) != Reg: ap = loadItem(ap)
            putM('str', ap.reg, SP, - 4 * (n + 1)); releaseReg(ap.reg)
        else: mark('unsupported parameter type')

def genCall(pr):  # TODO?
    """Assume pr is Proc"""
    putInstr('bl', pr.name)

### gen*
The next three gen functions are unimplemented because syscalls are not implemented.

In [None]:
def genRead(x):  # TODO
    """Assume x is Var"""
    putInstr("nop")
    #putInstr('li $v0, 5'); putInstr('syscall')
    #putM('sw', '$v0', x.reg, x.adr)

def genWrite(x):  # TODO
    """Assumes x is Ref, Var, Reg"""
    putInstr("nop")
    #loadItemReg(x, '$a0'); putInstr('li $v0, 1'); putInstr('syscall')

def genWriteln():  # TODO, Prologue needs to generate syscall table
    putInstr("nop")
    #putInstr('li $v0, 11'); putInstr("li $a0, '\\n'"); putInstr('syscall')

def genSeq(x, y):
    """Assume x and y are statements, generate x ; y"""
    pass

def genCond(x):
    """Assume x is Bool, generate code for branching on x"""
    if type(x) != Cond: x = loadBool(x)
    putB(condOp(negate(x.cond)), x.left, x.right, x.labA[0])
    releaseReg(x.left); releaseReg(x.right); putLab(x.labB)
    return x

def genIfThen(x, y):
    """Generate code for if-then: x is condition, y is then-statement"""
    putLab(x.labA)

ifCount = 0

def genThen(x, y):  # TODO
    """Generate code for if-then-else: x is condition, y is then-statement"""
    global ifCount
    lab = '.I' + str(ifCount); ifCount += 1
    putInstr('b', lab)
    putLab(x.labA); 
    return lab

def genIfElse(x, y, z):
    """Generate code of if-then-else: x is condition, y is then-statement,
    z is else-statement"""
    putLab(y)

loopCount = 0

def genTarget():
    """Return target for loops with backward branches"""
    global loopCount
    lab = '.L' + str(loopCount); loopCount += 1
    putLab(lab)
    return lab

def genWhile(lab, x, y):
    """Generate code for while: lab is target, x is condition, y is body"""
    putInstr('b', lab)
    putLab(x.labA); 

### genArrayVectorOp
Emits a deferred code block which implements a portion of the body of the loop to multiply two arrays together. Currently parts of this method could be refactored with the scalar variant, however that was omitted in the submission in an attempt to get Unicorn running so we would be able to test. Currently, only multiplication is implemented in terms of operations, however, this proof of concept shows it is easy to expand this method to account for other binary operations similar to genBinaryOp. 

The deferred_block implements a method which will be called later to emit the actual codegen. The body will take the previous block's destination register if operand x is a register otherwise each iteration will load the register.
Finally, the method itself returns the DeferredBlock back to the parser. The body will be appended to a DeferredBlock is one has not been emitted and a DeferredBlock will be generated if operand x is an Array.

In [None]:
def genArrayVectorOp(op, x, y):  # TODO
    def deferred_block(op, iterate, x, y):
        x_loc = obtainReg()
        x_loc = 'x' + x_loc[1:]
        r = obtainVectorReg()
        t = obtainVectorReg()
        put2('adrp', x_loc, y.adr)
        put('add', x_loc, x_loc, iterate)
        put('add', x_loc, x_loc, ':lo12:' + y.adr)
        putM('ld1', '{' + r + '.4S}', ZR, x_loc)
        opcode = 'nop'
        if op == TIMES: opcode = 'mul'
        else: assert False
        if type(x) == Reg:
            t = x.reg
        else:
            t = obtainVectorReg()
            put2('adrp', x_loc, x.adr)
            put('add', x_loc, x_loc, iterate)
            put('add', x_loc, x_loc, ':lo12:' + x.adr)
            putM('ld1', '{' + t + '.4S}', ZR, x_loc)
        s = obtainVectorReg()
        put(opcode, s + '.4S', t + '.4S', r + '.4S')
        releaseReg(r)
        releaseReg(t)
        releaseReg(x_loc)
        return Reg(x.tp, s)
    if type(x) == DeferredBlock:
        x.func.append(lambda iterate, z: deferred_block(op, iterate, z, y))
        a = x
    else:
        a = DeferredBlock(x.tp, lambda iterate: deferred_block(op, iterate, x, y))
    return a

### genDeferredAssign
This function emits DeferredBlocks in a loop to realize array element-wise operations. This method could be merged in with genAssign in the future, however at present it is seperate for debugging purposes and clarification. The body itself sets up the loop prologue before calling all funcs in a DeferredBlock in order to emit the code state as generated. Finally, the save to destination memory is generated before the loop prologue is emitted.

#### Limitations
At the moment, the genDeferredAssign and genArray* methods assume array lengths are multiples of 16 bytes. This limitation should be removed by the final submission to allow arrays (of similar size) to be affected element-wise. Additionally, both this and genArray ops do not validate that array lengths are similar and the frontend does not validate that both operands are Arrays of Integers (or the same type) before emitting genArray* instructions. These methods on both backend and the subsequent code which calls them on the frontend should be more strict in type validation in the final submission.

In [None]:
def genDeferredAssign(x, y):  # TODO
    inc = obtainReg()
    inc = 'x' + inc[1:]
    bc = obtainReg()
    bc = 'x' + bc[1:]
    put2("mov", inc, 'xzr')
    length = genArray(x.tp)
    put2("mov", bc, '#' + str(length.size))
    tar = genTarget()
    ret = None
    for i in range(len(y.func)):
        if i > 0:
            ret = y.func[i](inc, ret)
        else:
            ret = y.func[i](inc)
    x_loc = obtainReg()
    x_loc = 'x' + x_loc[1:]
    put2('adrp', x_loc, x.adr)
    put('add', x_loc, x_loc, inc)
    put('add', x_loc, x_loc, ':lo12:' + x.adr)
    putM('st1', '{' + ret.reg + '.4S}', ZR, x_loc)
    x_loc = 'w' + x_loc[1:]
    releaseReg(x_loc)
    put("add", inc, inc, '#16')  # 4 bytes per value, 4 values
    put2("cmp", inc, bc)
    putInstr('blt ' + tar)
    inc = 'w' + inc[1:]
    bc = 'w' + bc[1:]
    releaseReg(inc)
    releaseReg(bc)

### genArrayScalarOp
Generates a deferred block similar to genArrayVectorOp except with vector load y with a register load to vector duplication.

In [None]:
def genArrayScalarOp(op, x, y): # TODO optimize with chained operations
    def deferred_block(op, iterate, x, y):
        x_loc = obtainReg()
        x_loc = 'x' + x_loc[1:]
        scalar_op = obtainVectorReg()
        y = loadItem(y)
        put2('dup', scalar_op + '.4S', y.reg)
        releaseReg(y.reg)
        opcode = 'nop'
        if op == TIMES: opcode = 'mul'
        else: assert False
        if type(x) == Reg:
            t = x.reg
        else:
            t = obtainVectorReg()
            put2('adrp', x_loc, x.adr)
            put('add', x_loc, x_loc, iterate)
            put('add', x_loc, x_loc, ':lo12:' + x.adr)
            putM('ld1', '{' + t + '.4S}', ZR, x_loc)
        s = obtainVectorReg()
        put(opcode, s + '.4S', t + '.4S', scalar_op + '.4S')
        releaseReg(scalar_op)
        releaseReg(t)
        releaseReg(x_loc)
        return Reg(x.tp, s)
    if type(x) == DeferredBlock:
        x.func.append(lambda iterate, z: deferred_block(op, iterate, z, y))
        a = x
    else:
        a = DeferredBlock(x.tp, lambda iterate: deferred_block(op, iterate, x, y))
    return a