extend FusedAccTree #6

xiaolingyi · 2023-07-28T12:46:07Z

Thanks for your kind reply. I finally find the problem. I use the Maple program to obtain the coefficiency C2 C1 and C0. (The same as this paper: https://ieeexplore.ieee.org/abstract/document/1388195). The code is listed below:

numapprox:-minimax(tanh(interval_start+interval_width*i/interval_num+x), x = 0 .. interval_width/interval_num, [2, 0], 1, 'err'),

where interval_num is 128, i is the ith lut entry. The point is x used in c0 + c1 * x + c2 * x * x should be the least 16 bits >> (127 - s_in_exponent). The code should be:

assign s_X2_for_tanh = {1'b0,s_in_mantiza[15:0]} >> (127 - s_in_exponent);

It is a tricky point in IEEE 745 format. This story also tells us the hardware must be co-designed with the software!

Now the problem is that the error is too small which is about 1ulp. The error of the SFU in Nvidia's GPU is roughly 130ulp. We need to be aligned with the error of Nivida to accurately inference the model which is trained on Nivida's GPU using the large error. So we plan to reduce the entry of the generated LUTs to enlarge the error. But with less LUT index bit, the uFusedAccTree's bit should be larger. As I am not familiar with the adder, could you please provide more advice on how to enlarge the input bit number of the adder?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend FusedAccTree #6

extend FusedAccTree #6

xiaolingyi commented Jul 28, 2023

extend FusedAccTree #6

extend FusedAccTree #6

Comments

xiaolingyi commented Jul 28, 2023