Assembly Language Examples

Basic Add Function

This basic add function demonstrates how to add using assembly language. This add function is a didactic example meant to demonstrate how to use assembly at a basic level.

add :: (a: int, b: int) -> int {
    #asm {
       add a, b;
    }
    return a;
}

Basic Sub Function

This basic add function demonstrates how to subtract using assembly language. This sub function is a didactic example meant to demonstrate how to use assembly at a basic level.

sub :: (a: int, b: int) -> int {
    #asm {
       sub a, b;
    }
    return a;
}

Basic Multiply Function

Multiplying two numbers using the imul is more complex compared with the basic add. imul places the product of the two integers in either the RAX or RDX registers depending on how it is called. To specify which variable name represents a particular register, we can take advantage of 'pinning'. We pin variables a and b to the registers RAX and RDX respectively.

mul :: (a: int, b: int) -> int {
    #asm {
        a === a; // a = RAX register
        b === d; // b = RDX register
        imul.64 a, b;
    }
    return a;
}

Basic Divide Function

Dividing two numbers using the idiv is more complex compared with the basic add. idiv places the division of the two integers in either the RAX or RDX registers depending on how it is called. To specify which variable name represents a particular register, we can take advantage of 'pinning'. We pin variable a to the registers RAX and declare a dummy rdx set to zero and pin it to RDX. We perform the division and return the return in a, in accordance with the idiv x86-64 assembly instruction behavior.

div :: (a: int, b: int) -> int {
    #asm {
        rdx: gpr === d;
        a === a;
        xor.64  rdx, rdx;
        idiv.64 rdx, a, b;
    }
    return a;
}

Addps

This code uses addps assembly instruction to add 4 element float arrays in parallel.

movps :: (a: [4] float, b: [4] float) -> [4] float {
    c: [4] float;
    pointer_a := a.data;
    pointer_b := a.data;
    pointer_c := c.data;

    #asm {
        xmm0: vec;
        xmm1: vec;
        movups.128 xmm0, [pointer_a];
        movups.128 xmm1, [pointer_b];
        addps.128  xmm0, xmm1;
        movups.128 [pointer_c], xmm0;
    }

    return c;

}

Here is another way to write the same function. The inline assembly has a by-reference / by-value distinction (like high level code) as well as allowing by-value moves of structs into and out of vector registers. The goal here is to let the compiler manage some moves such that they can be avoided during code gen. This let's you drop single #asm instructions in small composable functions that will be properly collapsed by LLVM (release mode).

addps :: (a: [4] float, b: [4] float) -> [4] float {
    c := a;
    #asm {
        addps c, b;
    }
    return c;
}

Min using cmovle

This code example makes use of the cmovle assembly instruction to compare two 64-bit integer values and return the minimum between integer variables a and b. This code can be useful to reduce branch prediction misses.

min :: (a: int, b: int) -> int {
    ret: int;
    #asm {
        cmp.64     a, b;
        mov.64     ret, b;
        cmovle.64  ret, a;
    }
    return ret;
}

Max using cmovge

This code example makes use of the cmovge assembly instruction to compare two 64-bit integer values and return the maximum between integer variables a and b. This code can be useful to reduce branch prediction misses.

max :: (a: int, b: int) -> int {
    ret: int;
    #asm {
        cmp.64     a, b;
        mov.64     ret, b;
        cmovge.64  ret, a;
    }
    return ret;
}

Abs with cmovs

This code example makes use of the cmovs assembly instruction to compare a value with its negation and return the absolute value of a particular integer number. This code can be useful to reduce branch prediction misses.

abs :: (a: int) -> int {
    ret: int;
    #asm {
        mov     ret, a;
        neg     ret;
        cmovs   ret, a;
    }
    return ret;
}

Read and Write to an Array

In the following example below, we have a high level language code. We translate it into a low level assembly language code in order to elaborate on how to read/write from address in assembly language.

This is the high level language example. The piece of code adds 10 to each individual element of the array.

high_level_code :: () {

    array := int.[1,2,3,4];

    for i: 0..3 {
        array[i] = array[i] + 10;
    }

    print("%\n", array);
}

This is the low level assembly language example. The piece of code adds 10 to each individual element of the array, producing the same exact output as the previous example. However, this assembly language example makes use of read/write memory from addresses and uses assembly language directly as opposed to compiling high level language into assembly.

assembly_language_code :: () {

    array := int.[1,2,3,4];

    array_data := array.data;
    for 0..3 {
        #asm {
            register: gpr;
            mov.64 register, [array_data];
            add.64 register, 10;
            mov.64 [array_data], register;
            add.64 array_data, 8;
        }
    }

    print("%\n", array);
}

Popcount

One can use the CPU builtin assembly language instruction popcount to speedup the computation of bits.

Popcount u8

This code example utilizes the x86-64 assembly language to do a popcount on a u8.

popcount_u8 :: (value: u8) -> int {
    result: int;
    #asm {
        bytes: gpr;                // declare a register
        movzxbw    bytes,  value;  // bytes = value
        popcnt.16  result, bytes;  // result = popcount(bytes);
    }
    return result;
}

Popcount u16

This code example utilizes the x86-64 assembly language to do a popcount on a u16.

popcount_u16 :: (value: u16) -> int {
    result: int;
    #asm {
        popcnt.16  result, value;  // result = popcount(value);
    }
    return result;
}

Popcount u32

This code example utilizes the x86-64 assembly language to do a popcount on a u32.

popcount_u32 :: (value: u32) -> int {
    result: int;
    #asm {
        popcnt.32  result, value;  // result = popcount(value);
    }
    return result;
}

Popcount u64

This code example utilizes the x86-64 assembly language to do a popcount on a u64.

popcount_u64 :: (value: u64) -> int {
    result: int;
    #asm {
        popcnt.64  result, value;  // result = popcount(result);
    }
    return result;
}

Polymorphic Popcount

One can combine popcount_u8, popcount_u16, popcount_u32, and popcount_u64 together into one single polymorphic popcount function which handles all cases in one polymorphic function. This reduces redundant code across all integer data types.

popcount :: (value: $T) -> int {
    result: int;
    assert(CPU == .X64);
    #if T == u8 {
        #asm {
            // There is no popcnt.8, so we need to move into 16 bits.
            movzxbw    two_bytes:, value;
            popcnt.16  result, two_bytes;
        }
    } else {
        #asm {
            popcnt?T   result, value;
        }
    }

    return result;
}

Bit Scan Forward

One can use the CPU builtin assembly language instruction bsf to speedup the computation of bit scan forward on a CPU.

Bit Scan Forward u8

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u8.

bit_scan_forward_u8 :: (number: u8) -> int {
    result: int;
    #asm {
        temp: gpr;
        movzxbw   temp, number;
        bsf.16    result, temp;
    }
    return result;
}

Bit Scan Forward u16

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u16.

bit_scan_forward_u16 :: (number: u16) -> int {
    result: int;
    #asm {
        bsf.16    result, number;
    }
    return result;
}

Bit Scan Forward u32

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u32.

bit_scan_forward_u32 :: (number: u32) -> int {
    result: int;
    #asm {
        bsf.32    result, number;
    }
    return result;
}

Bit Scan Forward u64

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u64.

bit_scan_forward_u64 :: (number: u64) -> int {
    result: int;
    #asm {
        bsf.64    result, number;
    }
    return result;
}

Polymorphic Bit Scan Forward

One can combine bit_scan_forward_u8, bit_scan_forward_u8, bit_scan_forward_u8, and bit_scan_forward_u8 together into one single polymorphic bit_scan_forward function which handles all cases in one polymorphic function. This reduces redundant code across all integer data types.

bit_scan_forward :: (input: $T) -> int {
    assert(CPU == .X64);
    result: int = -1;
    #if T == u8 {  // There's no bsf for 8 bits. Sad.
        #asm {
            movzxbw   temp:, input;
            bsf.16    result, temp;
        }
    } else {
        #asm {
            bsf?T     result, input;
        }
    }

    return result;
}

Assembly Language Examples

Basic Add Function

Basic Sub Function

Basic Multiply Function

Basic Divide Function

Addps

Min using cmovle

Max using cmovge

Abs with cmovs

Read and Write to an Array

Popcount

Popcount u8

Popcount u16

Popcount u32

Popcount u64

Polymorphic Popcount

Bit Scan Forward

Bit Scan Forward u8

Bit Scan Forward u16

Bit Scan Forward u32

Bit Scan Forward u64

Polymorphic Bit Scan Forward

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally