You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your kind reply. I finally find the problem. I use the Maple program to obtain the coefficiency C2 C1 and C0. (The same as this paper: https://ieeexplore.ieee.org/abstract/document/1388195). The code is listed below:
numapprox:-minimax(tanh(interval_start+interval_width*i/interval_num+x), x = 0 .. interval_width/interval_num, [2, 0], 1, 'err'),
where interval_num is 128, i is the ith lut entry. The point is x used in c0 + c1 * x + c2 * x * x should be the least 16 bits >> (127 - s_in_exponent). The code should be:
It is a tricky point in IEEE 745 format. This story also tells us the hardware must be co-designed with the software!
Now the problem is that the error is too small which is about 1ulp. The error of the SFU in Nvidia's GPU is roughly 130ulp. We need to be aligned with the error of Nivida to accurately inference the model which is trained on Nivida's GPU using the large error. So we plan to reduce the entry of the generated LUTs to enlarge the error. But with less LUT index bit, the uFusedAccTree's bit should be larger. As I am not familiar with the adder, could you please provide more advice on how to enlarge the input bit number of the adder?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Thanks for your kind reply. I finally find the problem. I use the Maple program to obtain the coefficiency C2 C1 and C0. (The same as this paper: https://ieeexplore.ieee.org/abstract/document/1388195). The code is listed below:
numapprox:-minimax(tanh(interval_start+interval_width*i/interval_num+x), x = 0 .. interval_width/interval_num, [2, 0], 1, 'err'),
where interval_num is 128, i is the ith lut entry. The point is x used in c0 + c1 * x + c2 * x * x should be the least 16 bits >> (127 - s_in_exponent). The code should be:
assign s_X2_for_tanh = {1'b0,s_in_mantiza[15:0]} >> (127 - s_in_exponent);
It is a tricky point in IEEE 745 format. This story also tells us the hardware must be co-designed with the software!
Now the problem is that the error is too small which is about 1ulp. The error of the SFU in Nvidia's GPU is roughly 130ulp. We need to be aligned with the error of Nivida to accurately inference the model which is trained on Nivida's GPU using the large error. So we plan to reduce the entry of the generated LUTs to enlarge the error. But with less LUT index bit, the uFusedAccTree's bit should be larger. As I am not familiar with the adder, could you please provide more advice on how to enlarge the input bit number of the adder?
Thanks in advance!
The text was updated successfully, but these errors were encountered: