You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
static_assert(width > 0, "Width must be positive");
public:
staticconstexprintkBits = 8 * sizeof(T);
staticconstexprintkWidth = width;
using Pack_t = ap_uint<kBits>;
using Internal_t = ap_uint<width * kBits>;
This of course is necessary on a CPU since it can only access memory on a resolution of bytes.
However, on an FPGA this introduces significant storage inefficiency. I want to store an array of fixed-point data in the URAM of the FPGA. I have some liberty to decide the bit-width of a single data element. Considering that it is best to use power-of-two shaped accesses, we want to choose a bit width so that some power-of-two elements fit into 72 bits. This means we can choose 9, 18, 36 or 72 as the bit-width for best efficiency in the URAM. In our case, we were able to make do with 9bits.
My first try was thus doing something like the following:
Due to line 31 in the code I cited above however, the DataPack actually internally uses a uint<128> to store the 8 fixed-point numbers, not uint<72>.
This of course means I need twice as much URAM for this type of array.
To solve this for my case, I made my own tiny class (taking the important parts from your DataPack):
template<int width>
classTightPack {
public:using Data_t = ap_fixed<9, 1>;
staticconstexprintkWidth = width;
#if defined(HLSLIB_SYNTHESIS)
staticconstexprintkBits = Data_t::width;
#elsestaticconstexprintkBits = 8 * sizeof(Data_t);
#endifprivate:
ap_uint<width*kBits> data_;
using Pack_t = ap_uint<kBits>;
public:
Data_t Get(int i) const {
// Just like regular DataPack
}
voidSet(int i, weight_t val) {
// Just like regular DataPack
}
};
The only difference (aside from a lot of missing functionality) is the calculation of kWidth, and I made that depend on whether we are in synthesis or not.
It is also specialized for ap_fixed.
Ideally, I would like to make a small template specialization of your DataPack class instead, as in
template<int width>
classDataPack<ap_fixed> {
// Only change kWidth implementation
};
However, this would mean I would need to copy the entire DataPack class content in there, defeating the purpose of specialization.
Maybe putting a helper class like the following in between might improve this:
In
DataPack
, the data is stored in anap_uint
of bit width8*<bytes in T>
.hlslib/include/hlslib/xilinx/DataPack.h
Lines 25 to 34 in 54b23af
This of course is necessary on a CPU since it can only access memory on a resolution of bytes.
However, on an FPGA this introduces significant storage inefficiency. I want to store an array of fixed-point data in the URAM of the FPGA. I have some liberty to decide the bit-width of a single data element. Considering that it is best to use power-of-two shaped accesses, we want to choose a bit width so that some power-of-two elements fit into 72 bits. This means we can choose 9, 18, 36 or 72 as the bit-width for best efficiency in the URAM. In our case, we were able to make do with 9bits.
My first try was thus doing something like the following:
Due to line 31 in the code I cited above however, the DataPack actually internally uses a
uint<128>
to store the 8 fixed-point numbers, notuint<72>
.This of course means I need twice as much URAM for this type of array.
To solve this for my case, I made my own tiny class (taking the important parts from your DataPack):
The only difference (aside from a lot of missing functionality) is the calculation of
kWidth
, and I made that depend on whether we are in synthesis or not.It is also specialized for
ap_fixed
.Ideally, I would like to make a small template specialization of your DataPack class instead, as in
However, this would mean I would need to copy the entire
DataPack
class content in there, defeating the purpose of specialization.Maybe putting a helper class like the following in between might improve this:
DataPack could use this class to calculate kBits in line 31 like:
Then I could subclass the WidthCalculator to my liking, for example
and use
DataPack<ap_fixed<9, 1>, 8>
for optimal storage in URAM (and BRAM for that matter).Actually, might this specialization (and similar ones for the other fixed width types in HLS) be important enough to include in the library itself?
The text was updated successfully, but these errors were encountered: