Skip to content

Bools, Bool Arrays, and Bitfields

amirroth edited this page Sep 20, 2021 · 1 revision

char. int. long. unsigned. float. double. bool. One of these things is not like the others. Hint: it's bool. All the others are hardware data types. They have standard binary representations and the processor has instructions for loading, storing, and manipulating those representations. bool (conceptually a single bit) is a C++ data type. The processor has bitwise/boolean instructions like and, or, xor, not, etc. but they operate on 64-bit integers (or whatever the native integer width of the processor is), not on individual bits. There is no way to load or store an individual bit from or to memory. If that is what you want to do, you have to load or store the char or int that contains the bit and then do the bit manipulation on either end to make sure that you are only manipulating the intended bit.

With no fixed hardware representation, the representation of bool is implementation dependent. Most compilers will implement the a bool as a char (g++ does). If you type sizeof(bool) you will get 1 (byte). That may seem like a waste of seven bits, but for a single bool it's the best you can do.

Arrays of bools

That waste of space may start to seem a little more significant if you have an array of bool's, and so the standard template library (and EPVector and ObjexxFCL) implement arrays of bools as "packed" with each char holding eight bool's. This eliminates the space overhead, but unfortunately TANSTAAFL and the price for this space savings is additional operations (i.e., time) to read and write the individual bits. Here are the basic pseudo-instructions for reading a bit for element I from array BA into register R1 if each bool is represented as a char.

ADD BA, I -> R2
LOAD R2 -> R1

Here is the equivalent sequence if eight bool's are packed into a char.

SHIFT-RIGHT I, 3 -> R2 // find the character that contains the bit: shifting right by 3 is the faster equivalent of dividing by 8 
ADD BA, R2 -> R2
LOAD R2 -> R2
AND I, 7 -> R1 // find the bit within the character: and'ing with 7 is the faster equivalent of mod'ing by 8 
SHIFT-RIGHT R2, R1 -> R1
AND R1, 1 -> R1

Setting or resetting the bit takes a similar sequences of steps. The trade-off is essentially 8X in space savings for a 3X in instructions and time. Is this worth it? Maybe, maybe not. Is there a better way? There is, especially when you are dealing with multiple (arrays of) bool's as one often does. Consider this realistic example from DataSurfaces.hh:

Array1D<bool> SurfEMSConstructionOverrideON;
Array1D<bool> SurfEMSOverrideIntConvCoef;
Array1D<bool> SurfEMSOverrideExtConvCoef;
Array1D<bool> SurfOutDryBulbTempEMSOverrideOn;
Array1D<bool> SurfOutWetBulbTempEMSOverrideOn;
Array1D<bool> SurfWindSpeedEMSOverrideOn;
Array1D<bool> SurfWindDirEMSOverrideOn;
Array1D<bool> SurfViewFactorGroundEMSOverrideOn;

SurfWindSpeedEMSOverrideOn.allocate(NumOfSurfaces); // etc.

bool ThisSurfWindSpeedEMSOverride = SurfWindSpeedEMSOverrideOn(SurfNum); // etc.

Ignore the naming inconsistency. If these arrays are accessed in separate contexts, each access requires six instructions. If they are all accessed in the same context, some of the indexing calculations can be shared and so each array can be accessed using only two instructions once the two indices have been calculated. However, this is not likely to be the case.

The "bitfield" or "flags register" pattern

Here is an alternative implementation, which also saves space and has a much faster implementation. Instead of packing bool's corresponding to different surfaces into a single char, we pack bool's corresponding to different conditions for the same surface into a single char. This is called the "bitfield" or "flags register" pattern. Ready?

class enum SurfEMSOverrideType : int {
   Construction = 0,
   IntConvCoef,
   ExtConvCoef,
   OutDryBulbTemp,
   OutWebBulbTemp,
   WindSpeed,
   WindDir,
   ViewFactorGround
};

// These are generic bit manipulation functions, they are not specific to this example
constexpr char bitf(int b) { return 1 << b; } // b is the bit position
constexpr bool read_bool(char c, int b) { return (c & bitf(b)) != 0; }
constexpr void set_bool(char c, int b) { c |= bitf(b); }
constexpr void clear_bool(char c, int b) { c &= ~bitf(b); }

Array1D<char> SurfEMSOverrides;

SurfEMSOverrides.allocate(NumOfSurfaces); // only one array

bool ThisSurfWindSpeedEMSOverride = read_bool(SurfEMSOverrides[SurfNum], static_cast<int>(SurfEMSOverrideType::WindSpeed));

What does read_bool look like in terms of instructions?

ADD SurfEMSOverrides, SurfNum -> R1
LOAD R1 -> R1
AND R1, 32 -> R1 // Read the bit corresponding to WindSpeed, WindSpeed is bit 5, 32 is 1 << 5 which the compiler evaluated at compile time.
NOTZERO R1 -> R1

Hmmmm. Only four instructions rather than six. Why? Enumerations are compile time constants, and so because the WindSpeed bit is always in the same position in the char the compiler can partially evaluate the math for extracting that bit at compile time! The position of SurfNum is not a compile time constant and so the bit manipulation math behind has to be done at runtime.

This pattern also works well if you have a bunch of bool's in a struct/class. You can represent them all in a single char (or an int if you have more than eight).

The moral? If you find yourself using a bunch of bool's, either in arrays or even in a struct/class, consider using the bitfield/flag register pattern instead.

More bitfield tricks

While we are on the subject of bitfields, here is another fun trick.

enum class WinShadingType {
   IntShade,
   ExtShade, 
   SwitchableGlazing,
   ExtScreen,
   IntBlind,
   ExtBlind,
   BGBlind
};

This is a simplified version of the WinShadingType enumeration from EnergyPlus. What if you wanted to test the shading variable to see if any type of blind was present? One way to do this is:

if (shading == WinShadingType::IntBlind || 
    shading == WinShadingType::ExtBlind || 
    shading == WinShadingType::BGBlind)

Assuming shading is in register R1, this compiles to something like the following sequence of pseudo-instructions:

COMPARE-EQ R1, 4 -> R2 // 4 is the value of IntBlind
BRANCH-IF-TRUE R2, THEN-LABEL: // branch to the "if" block since the IntBlind comparison succeeded
COMPARE-EQ R1, 5 -> R2 // 5 is ExtBlind
BRANCH-IF-TRUE R2, THEN-LABEL: // branch to the "if" block since the ExtBlind comparison hit
COMPARE-EQ R1, 5 -> R2 // 6 is BGBlind
BRANCH-IF-TRUE R2, THEN-LABEL: // branch to the "if" block since the BGBlind comparison hit

That's six instructions including three "data-dependent" branches that the processor will have a hard time predicting correctly (here is a tutorial about data-dependent branches, what makes them hard and how to minimize them). Not great!

You could also do this:

if (shading >= WinShadingType::IntBlind && shading <= WinShadingType::BGBlind)

Which would compile to this:

COMPARE-GEQ R1, 4 -> R2 // 4 is the value of IntBlind
BRANCH-IF-FALSE R2, ELSE-LABEL: // branch to the "else" block since the IntBlind comparison failed
COMPARE-LEQ R1, 6 -> R2 // 6 is BGBlind
BRANCH-IF-FALSE R2, ELSE-LABEL: // branch to the "else" block since the BGBlind comparison failed

That's somewhat better, but doing arithmetic on enum values (i.e., relying on them to be specific values) is considered poor form by the cognescenti, i.e., StackOverflow.

Here's an even better solution however that is not frowned upon. Remember bitf? Check this out:

if (bitf(shading) & (bitf(WinShadingFlag::IntBlind) | bitf(WinShadingFlag::ExtBlind) | bitf(ShadingFlag::BGBlind)) != 0)

Which compiles to:

SHIFT-RIGHT 1, R1 -> R2 // represent shading as a bitfield
AND R2, 112 -> R2 // 112 in binary is 01110000, i.e., the 4th, 5th, and 6th bits corresponding to IntBlind, ExtBlind, and BGBlind are set
BRANCH-IF-TRUE R2, THEN-BLOCK: 

What just happened here? Basically we converted the enum values to bitfields in a single char, converted shading to a bitfield in the same manner, and then tested shading against all three possibilities at once using and. Oh, and since all of the enum values are compile time constants, the compiler was able to evaluate this entire expression (bitf(WinShadingFlag::IntBlind) | bitf(WinShadingFlag::ExtBlind) | bitf(ShadingFlag::BGBlind) to 112 at compile-time. Bang!

Clone this wiki locally