Skip to content

Assembly Language Examples

danieltan1517 edited this page Jan 4, 2026 · 21 revisions

Basic Add Function

This basic add function demonstrates how to add using assembly language. This add function is a didactic example meant to demonstrate how to use assembly at a basic level.

add :: (a: int, b: int) -> int {
    #asm {
       add a, b;
    }
    return a;
}

Basic Sub Function

This basic add function demonstrates how to subtract using assembly language. This sub function is a didactic example meant to demonstrate how to use assembly at a basic level.

sub :: (a: int, b: int) -> int {
    #asm {
       sub a, b;
    }
    return a;
}

Basic Multiply Function

Multiplying two numbers using the imul is more complex compared with the basic add. imul places the product of the two integers in either the RAX or RDX registers depending on how it is called. To specify which variable name represents a particular register, we can take advantage of 'pinning'. We pin variables a and b to the registers RAX and RDX respectively.

mul :: (a: int, b: int) -> int {
    #asm {
        a === a; // a = RAX register
        b === d; // b = RDX register
        imul.64 a, b;
    }
    return a;
}

Basic Divide Function

Dividing two numbers using the idiv is more complex compared with the basic add. idiv places the division of the two integers in either the RAX or RDX registers depending on how it is called. To specify which variable name represents a particular register, we can take advantage of 'pinning'. We pin variable a to the registers RAX and declare a dummy rdx set to zero and pin it to RDX. We perform the division and return the return in a, in accordance with the idiv x86-64 assembly instruction behavior.

div :: (a: int, b: int) -> int {
    #asm {
        rdx: gpr === d;
        a === a;
        xor.64  rdx, rdx;
        idiv.64 rdx, a, b;
    }
    return a;
}

Min using cmovle

This code example makes use of the cmovle assembly instruction to compare two 64-bit integer values and return the minimum between integer variables a and b. This code such as this can be useful to reduce branch prediction misses.

min :: (a: int, b: int) -> int {
    ret: int;
    #asm {
        cmp.64     a, b;
        mov.64     ret, b;
        cmovle.64  ret, a;
    }
    return ret;
}

Max using cmovge

This code example makes use of the cmovge assembly instruction to compare two 64-bit integer values and return the maximum between integer variables a and b. This code such as this can be useful to reduce branch prediction misses.

max :: (a: int, b: int) -> int {
    ret: int;
    #asm {
        cmp.64     a, b;
        mov.64     ret, b;
        cmovge.64  ret, a;
    }
    return ret;
}

Abs with cmovs

This code example makes use of the cmovs assembly instruction to compare a value with its negation and return the absolute value of a particular integer number. This code such as this can be useful to reduce branch prediction misses.

abs :: (a: int) -> int {
    ret: int;
    #asm {
        mov     ret, a;
        neg     ret;
        cmovs   ret, a;
    }
    return ret;
}

Popcount

One can use the CPU builtin assembly language instruction popcount to speedup the computation of bits.

Popcount u8

This code example utilizes the x86-64 assembly language to do a popcount on a u8.

popcount_u8 :: (value: u8) -> int {
    result: int;
    #asm {
        bytes: gpr;                // declare a register
        movzxbw    bytes,  value;  // bytes = value
        popcnt.16  result, bytes;  // result = popcount(bytes);
    }
    return result;
}

Popcount u16

This code example utilizes the x86-64 assembly language to do a popcount on a u16.

popcount_u16 :: (value: u16) -> int {
    result: int;
    #asm {
        popcnt.16  result, value;  // result = popcount(value);
    }
    return result;
}

Popcount u32

This code example utilizes the x86-64 assembly language to do a popcount on a u32.

popcount_u32 :: (value: u32) -> int {
    result: int;
    #asm {
        popcnt.32  result, value;  // result = popcount(value);
    }
    return result;
}

Popcount u64

This code example utilizes the x86-64 assembly language to do a popcount on a u64.

popcount_u64 :: (value: u64) -> int {
    result: int;
    #asm {
        popcnt.64  result, value;  // result = popcount(result);
    }
    return result;
}

Polymorphic Popcount

One can combine popcount_u8, popcount_u16, popcount_u32, and popcount_u64 together into one single polymorphic popcount function which handles all cases in one polymorphic function. This reduces redundant code across all integer data types.

popcount :: (value: $T) -> int {
    result: int;
    assert(CPU == .X64);
    #if T == u8 {
        #asm {
            // There is no popcnt.8, so we need to move into 16 bits.
            movzxbw    two_bytes:, value;
            popcnt.16  result, two_bytes;
        }
    } else {
        #asm {
            popcnt?T   result, value;
        }
    }

    return result;
}

Bit Scan Forward

One can use the CPU builtin assembly language instruction bsf to speedup the computation of bit scan forward on a CPU.

Bit Scan Forward u8

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u8.

bit_scan_forward_u8 :: (number: u8) -> int {
    result: int;
    #asm {
        temp: gpr;
        movzxbw   temp, number;
        bsf.16    result, temp;
    }
    return result;
}

Bit Scan Forward u16

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u16.

bit_scan_forward_u16 :: (number: u16) -> int {
    result: int;
    #asm {
        bsf.16    result, number;
    }
    return result;
}

Bit Scan Forward u32

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u32.

bit_scan_forward_u32 :: (number: u32) -> int {
    result: int;
    #asm {
        bsf.32    result, number;
    }
    return result;
}

Bit Scan Forward u64

This code example utilizes the x86-64 assembly language to do a bit scan forward on a u64.

bit_scan_forward_u64 :: (number: u64) -> int {
    result: int;
    #asm {
        bsf.64    result, number;
    }
    return result;
}

Polymorphic Bit Scan Forward

One can combine bit_scan_forward_u8, bit_scan_forward_u8, bit_scan_forward_u8, and bit_scan_forward_u8 together into one single polymorphic bit_scan_forward function which handles all cases in one polymorphic function. This reduces redundant code across all integer data types.

bit_scan_forward :: (input: $T) -> int {
    assert(CPU == .X64);
    result: int = -1;
    #if T == u8 {  // There's no bsf for 8 bits. Sad.
        #asm {
            movzxbw   temp:, input;
            bsf.16    result, temp;
        }
    } else {
        #asm {
            bsf?T     result, input;
        }
    }

    return result;
}

Clone this wiki locally