Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tight packing on hardware for DataPack #44

Merged
merged 13 commits into from
Sep 16, 2021

Conversation

derpda
Copy link
Contributor

@derpda derpda commented Sep 2, 2021

This pull request attempts to solve #43 by implementing tight packing for the DataPack class.
Mainly this is done through a small proxy template class that can easily be specialized for any datatype who's bit width is not a multiple of 8.

I also added a few sample specializations for the ap_uint and the ap_fixed types, but they are not ideal yet by any means.

One problem that this may introduce is that the code will not be equivalent between CPU and FPGA, and thus testing is difficult (impossible?) without actually running tests on an FPGA.

@derpda
Copy link
Contributor Author

derpda commented Sep 2, 2021

Thinking about this a bit more, maybe the reason why it wasn't tightly packed even on hardware was so that it can easily be used for memory access from external DDR?
I don't think I understand this well enough yet...

@definelicht
Copy link
Owner

Hey! I think your solution with the width calculator helper class is the right approach.

Did you experience that it does not work in simulation? I can't immediately see why it shouldn't. The reinterpret_cast in the Get and Set methods could be a little fishy when dealing with non-byte aligned types, so we might need to specialize this type conversion for these types -- but I can't tell without testing it.

Could you add some tests that use DataPack of ap_uint and ap_fixed, then we can take it from there?

@derpda
Copy link
Contributor Author

derpda commented Sep 2, 2021

I have tried a pure software run and a hardware build, both were successful.
I currently don't use hardware emulation in my workflow since I need to use a large amount of data. Just building the hardware ends up being faster and more useful.
I'll try to make a small test for the hw_emu case!

@derpda
Copy link
Contributor Author

derpda commented Sep 2, 2021

Thinking about it a little more, the ap_uint type used internally is implemented correctly on CPUs (because of course it is, otherwise sw_emu would not work), and will also work correctly for hw_emu. Since that type and it's functions are the only way to actually access the data, we should be able to rely on this guarantee by Xilinx.
I thus removed the #if defined(HLSLIB_SYNTHESIS) checks and now always use the non-byte-multiple bit width as kWidth.
It works fine as far as I can tell.

I have taken the liberty to update catch so that i can use TEMPLATE_TEST_CASE to just use the same DataPack tests that are already there, just with different value types.
We can roll this back and find another way if you don't want to update catch.

Now I am running some trouble. I get segfaults on all tests that use non-const DataPacks... Can't quite tell why.
I know that the operator[] is different for non-const packs, returning a proxy instead, but I don't know how that would cause the issue.
Furthermore, using .Get directly, thus avoiding the proxy, does not fix the problem.

@derpda
Copy link
Contributor Author

derpda commented Sep 2, 2021

Some further checking shows that this may be down to faulty type resolution.
Not sure why this happens, and due to the templates around the code my IDE fails to resolve the function at all.
The whole issue might also just be related to the catch2 macro TEMPLATE_TEST_CASE and how it provides the TestType.

I have found a (somewhat ugly) workaround for this in the last commit.
All tests are passing now

@derpda
Copy link
Contributor Author

derpda commented Sep 3, 2021

So exactly as you were guessing, the reinterpret_cast did cause problems.
I changed the name of the WidthCalculator to TypeHandler for now, and have it also contain functions to translate from and to ap_uint<width>.
Sadly reinterpret_cast cannot be changed/specialized, so I had to go for this route.

As far as I can tell, everything is working now: Tests are passing in the library, and the code produced is correct for my application case.

@derpda derpda marked this pull request as ready for review September 3, 2021 08:13
xilinx_test/test/TestDataPack.cpp Outdated Show resolved Hide resolved
xilinx_test/test/TestDataPack.cpp Outdated Show resolved Hide resolved
xilinx_test/test/TestDataPack.cpp Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
@definelicht
Copy link
Owner

My PhD dissertation is due on Wednesday, so I might respond a bit slowly over the next week :-)

Again, I like your solution with specializing the bit reinterpretation -- well done. I did a first review and had some comments.

@derpda
Copy link
Contributor Author

derpda commented Sep 3, 2021

Ok I think I've fixed all the issues, thanks for your feedback!

So you're almost done with your PhD work, awesome!
I ended up quitting my PhD recently to start working more on FPGAs instead of supercomputers...
Don't worry about answering late, I'm using my own fork for my project in the meantime.

@derpda
Copy link
Contributor Author

derpda commented Sep 9, 2021

Actually there may still be issues with the PR on hardware, my code seems to produce wrong results.
It might still be due to other factors, but this needs a bit more testing I think.

Hope your Thesis submission went well, I hope you can take some time off and enjoy the outdoors before autumn starts!

@definelicht
Copy link
Owner

Hey, I'm out on the other side now :-)

Ok -- let me know when you're confident that it works!

include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
include/hlslib/xilinx/DataPack.h Outdated Show resolved Hide resolved
@definelicht
Copy link
Owner

We should also have a test that runs this through HLS!

@derpda
Copy link
Contributor Author

derpda commented Sep 10, 2021

I can see that in normal C++ code passing by reference would be preferable, but I have been wondering for a while how stuff like that translates to hardware.
My assumption was that it ends up being the same if its a pure input argument (ie no modification to it), but I never tested this.

I have quite a lot on my plate right now, so if possible I would like some help with the HLS tests!

@definelicht
Copy link
Owner

I can see that in normal C++ code passing by reference would be preferable, but I have been wondering for a while how stuff like that translates to hardware.
My assumption was that it ends up being the same if its a pure input argument (ie no modification to it), but I never tested this.

It will make literally no difference in hardware, and in most cases not in software either (any data type of a size <= to a pointer will not be more expensive to copy than the pointer). It's mostly a convention thing :-)

I have quite a lot on my plate right now, so if possible I would like some help with the HLS tests!

Ok, we can merge it into a branch

@definelicht definelicht changed the base branch from master to tight-packing September 13, 2021 09:19
@definelicht
Copy link
Owner

Could you remove the update of catch.hpp in this PR? I would like to handle this separately. Then I will merge this into a branch.

@derpda
Copy link
Contributor Author

derpda commented Sep 13, 2021

Sure!
Should I remove the tests I wrote or copy the whole testing code a few times for the different types?

@definelicht
Copy link
Owner

There's nothing wrong with the existing tests, I just want to add tests in synthesis also

@derpda
Copy link
Contributor Author

derpda commented Sep 13, 2021

I only updated the version because types tests were not available in 1.x. the catch macro I use for the typed tests was introduced in 2.x, so reverting to the original version means I either need to delete the tests or copy the entire code (or slide some other trick maybe?) to implement it for other types.

@definelicht
Copy link
Owner

I already updated master, so just merge/rebase and you should have the newest release of catch

@derpda
Copy link
Contributor Author

derpda commented Sep 15, 2021

Sorry it took me so long, this branch is now rebased onto master!

@definelicht definelicht changed the base branch from tight-packing to master September 16, 2021 12:01
@definelicht definelicht changed the base branch from master to tight-packing September 16, 2021 12:01
@definelicht definelicht merged commit 13d65e3 into definelicht:tight-packing Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants