-
Notifications
You must be signed in to change notification settings - Fork 0
Assembly Language Examples
This basic add function demonstrates how to add using assembly language. This add function is a didactic example meant to demonstrate how to use assembly at a basic level.
add :: (a: int, b: int) -> int {
#asm {
add a, b;
}
return a;
}
This basic add function demonstrates how to subtract using assembly language. This sub function is a didactic example meant to demonstrate how to use assembly at a basic level.
sub :: (a: int, b: int) -> int {
#asm {
sub a, b;
}
return a;
}
Multiplying two numbers using the imul is more complex compared with the basic add. imul places the product of the two integers in either the RAX or RDX registers depending on how it is called. To specify which variable name represents a particular register, we can take advantage of 'pinning'. We pin variables a and b to the registers RAX and RDX respectively.
mul :: (a: int, b: int) -> int {
#asm {
a === a; // a = RAX register
b === d; // b = RDX register
imul.64 a, b;
}
return a;
}
Dividing two numbers using the idiv is more complex compared with the basic add. idiv places the division of the two integers in either the RAX or RDX registers depending on how it is called. To specify which variable name represents a particular register, we can take advantage of 'pinning'. We pin variable a to the registers RAX and declare a dummy rdx set to zero and pin it to RDX. We perform the division and return the return in a, in accordance with the idiv x86-64 assembly instruction behavior.
div :: (a: int, b: int) -> int {
#asm {
rdx: gpr === d;
a === a;
xor.64 rdx, rdx;
idiv.64 rdx, a, b;
}
return a;
}
This code example makes use of the cmovle assembly instruction to compare two 64-bit integer values and return the minimum between integer variables a and b. This code such as this can be useful to reduce branch prediction misses.
min :: (a: int, b: int) -> int {
ret: int;
#asm {
cmp.64 a, b;
mov.64 ret, b;
cmovle.64 ret, a;
}
return ret;
}
This code example makes use of the cmovge assembly instruction to compare two 64-bit integer values and return the maximum between integer variables a and b. This code such as this can be useful to reduce branch prediction misses.
max :: (a: int, b: int) -> int {
ret: int;
#asm {
cmp.64 a, b;
mov.64 ret, b;
cmovge.64 ret, a;
}
return ret;
}
This code example makes use of the cmovs assembly instruction to compare a value with its negation and return the absolute value of a particular integer number. This code such as this can be useful to reduce branch prediction misses.
abs :: (a: int) -> int {
ret: int;
#asm {
mov ret, a;
neg ret;
cmovs ret, a;
}
return ret;
}
One can use the CPU builtin assembly language instruction popcount to speedup the computation of bits.
This code example utilizes the x86-64 assembly language to do a popcount on a u8.
popcount_u8 :: (value: u8) -> int {
result: int;
#asm {
bytes: gpr; // declare a register
movzxbw bytes, value; // bytes = value
popcnt.16 result, bytes; // result = popcount(bytes);
}
return result;
}
This code example utilizes the x86-64 assembly language to do a popcount on a u16.
popcount_u16 :: (value: u16) -> int {
result: int;
#asm {
popcnt.16 result, value; // result = popcount(value);
}
return result;
}
This code example utilizes the x86-64 assembly language to do a popcount on a u32.
popcount_u32 :: (value: u32) -> int {
result: int;
#asm {
popcnt.32 result, value; // result = popcount(value);
}
return result;
}
This code example utilizes the x86-64 assembly language to do a popcount on a u64.
popcount_u64 :: (value: u64) -> int {
result: int;
#asm {
popcnt.64 result, value; // result = popcount(result);
}
return result;
}
One can combine popcount_u8, popcount_u16, popcount_u32, and popcount_u64 together into one single polymorphic popcount function which handles all cases in one polymorphic function. This reduces redundant code across all integer data types.
popcount :: (value: $T) -> int {
result: int;
assert(CPU == .X64);
#if T == u8 {
#asm {
// There is no popcnt.8, so we need to move into 16 bits.
movzxbw two_bytes:, value;
popcnt.16 result, two_bytes;
}
} else {
#asm {
popcnt?T result, value;
}
}
return result;
}
One can use the CPU builtin assembly language instruction bsf to speedup the computation of bit scan forward on a CPU.
This code example utilizes the x86-64 assembly language to do a bit scan forward on a u8.
bit_scan_forward_u8 :: (number: u8) -> int {
result: int;
#asm {
temp: gpr;
movzxbw temp, number;
bsf.16 result, temp;
}
return result;
}
This code example utilizes the x86-64 assembly language to do a bit scan forward on a u16.
bit_scan_forward_u16 :: (number: u16) -> int {
result: int;
#asm {
bsf.16 result, number;
}
return result;
}
This code example utilizes the x86-64 assembly language to do a bit scan forward on a u32.
bit_scan_forward_u32 :: (number: u32) -> int {
result: int;
#asm {
bsf.32 result, number;
}
return result;
}
This code example utilizes the x86-64 assembly language to do a bit scan forward on a u64.
bit_scan_forward_u64 :: (number: u64) -> int {
result: int;
#asm {
bsf.64 result, number;
}
return result;
}
One can combine bit_scan_forward_u8, bit_scan_forward_u8, bit_scan_forward_u8, and bit_scan_forward_u8 together into one single polymorphic bit_scan_forward function which handles all cases in one polymorphic function. This reduces redundant code across all integer data types.
bit_scan_forward :: (input: $T) -> int {
assert(CPU == .X64);
result: int = -1;
#if T == u8 { // There's no bsf for 8 bits. Sad.
#asm {
movzxbw temp:, input;
bsf.16 result, temp;
}
} else {
#asm {
bsf?T result, input;
}
}
return result;
}