Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend FusedAccTree #6

Open
xiaolingyi opened this issue Jul 28, 2023 · 0 comments
Open

extend FusedAccTree #6

xiaolingyi opened this issue Jul 28, 2023 · 0 comments

Comments

@xiaolingyi
Copy link

Thanks for your kind reply. I finally find the problem. I use the Maple program to obtain the coefficiency C2 C1 and C0. (The same as this paper: https://ieeexplore.ieee.org/abstract/document/1388195). The code is listed below:

numapprox:-minimax(tanh(interval_start+interval_width*i/interval_num+x), x = 0 .. interval_width/interval_num, [2, 0], 1, 'err'),

where interval_num is 128, i is the ith lut entry. The point is x used in c0 + c1 * x + c2 * x * x should be the least 16 bits >> (127 - s_in_exponent). The code should be:

assign s_X2_for_tanh = {1'b0,s_in_mantiza[15:0]} >> (127 - s_in_exponent);

It is a tricky point in IEEE 745 format. This story also tells us the hardware must be co-designed with the software!

Now the problem is that the error is too small which is about 1ulp. The error of the SFU in Nvidia's GPU is roughly 130ulp. We need to be aligned with the error of Nivida to accurately inference the model which is trained on Nivida's GPU using the large error. So we plan to reduce the entry of the generated LUTs to enlarge the error. But with less LUT index bit, the uFusedAccTree's bit should be larger. As I am not familiar with the adder, could you please provide more advice on how to enlarge the input bit number of the adder?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant