Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External weights vivado accelerator #646

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

GiuseppeDiGuglielmo
Copy link
Contributor

Extend the VivadoAccelerator backend and add programmable weights. An ideal setup is a Xilinx Zynq/ZynqMP board (ARM core + programmable logic).

The backend generates code for Vivado HLS, Vivado, and Vivado SDK:

  • the hls4ml IP with AXI master interfaces to move input features, output predictions, and weights between off-chip RAM and FPGA chip
  • a complete Vivado project that integrates the hls4ml IP (for the target board/chip)
  • a complete baremetal application to control and program the accelerator

Type of change

Some changes to the signature of the function convert_from_keras_model():

    hls_model =  hls4ml.converters.convert_from_keras_model(
            model=model,
            clock_period=CLOCK_PERIOD,
            backend='VivadoAccelerator',
            board=BOARD_NAME,
            part=FPGA_PART,
            io_type='io_stream',
            interface='axi_master',
            driver='c',
            input_data_tb=DATA_DIR+'/X_test.npy',
            output_data_tb=DATA_DIR+'/y_test.npy',
            hls_config=config,
            output_dir=OUTPUT_DIR)

and a new function write_header_file() to write a header file with an harcoded dataset:

hls4ml.writer.vivado_accelerator_writer.VivadoAcceleratorWriter.write_header_file(hls_model, X_test, y_test, y_qkeras, y_hls, 64, OUTPUT_DIR + '/sdk/common/data.h')

Tests

You can test it with the example at this repo: https://github.com/GiuseppeDiGuglielmo/test-hls4ml-backend

Right now, we support Ultra96v2, but more Zynq/ZynqMP boards can be added.

jmitrevs and others added 4 commits August 17, 2022 17:37
The VivadoAccelerator backend create a Vivado IP that can be easily integrated in a Zynq/ZynqMP setup. The input features, the output predictions, and the weights sit in main memory and get moved via AXI-Master interfaces.
Copy link
Contributor

@thesps thesps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, that's a nice feature! I have some general feedback here and some specific points that I'll address inline.

  • Can we decouple ‘repgrommable weights’ from ‘AXI Master interface’? I’d like to be able to do both:
    — AXI Master interface without reprogrammable weights
    — AXI Stream interface with reprogrammable weights (maybe only the NN in/out on AXI Stream interface while weights use AXI Master?)
  • Each weight of the model will have its own top-level interface/port. How about a mode with a single 'weights' port with one type that can be configured (so could be some ap_fixed or float, double for example) with casting to the proper weights types in the HLS. This would be similar to what we allow now for inputs/outputs with VivadoAccelerator
  • Can we have a C driver that has some function taking the data as argument and returning the NN prediction? And some other function to reprogram the weights. We can call those from the standalone example that puts the data in header files, but they would be helpful for anyone integrating an hls4ml NN in a real application
  • Can we have a Python driver for this? - I had some pynq Python for an AXI Master interface before, so I can help with this
  • Can we have the design for pynq-z2 as well? - I can help with this also

It might be easier to split these contributions into different PRs, roughly:

  • AXI Master interface
  • C Driver
  • Ultra96 support
  • Reprogrammable weights

Also, Xilinx SDK is a legacy tool, superseded by Vitis. Can we do it with Vitis instead?

Comment on lines +608 to +610
io_type = self.config.get_config_value('IOType')
interface = self.config.get_config_value('AcceleratorConfig')['Interface'] if self.config.get_config_value('AcceleratorConfig') else None
config_weights = (io_type == 'io_stream') and (interface == 'axi_master')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • I don't think IOType of io_stream and Interface of axi_master should be the trigger for reprogrammable weights, I think it would be better as a new configuration parameter (of the AcceleratorConfig). Then there would probably need to be some assert somewhere to allow only combinations of those other parameters that have been implemented for (ie right now it also depends on the board)
  • As written this can't go here in ModelGraph as it's backend specific

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This was mostly a placeholder, and I was going to ask what should be better.

As in the main thread, do we already have a matrix or list of the existing possible combinations (what goes with what)? I am not sure how to properly set up the configuration/trigger. Do you have suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started putting together a document, which is far from being final. @thesps and @jmitrevs should be able to edit. Not sure if that may help or if it is better to keep it to the comments here. I bit of both I guess.

If I understand it right the additional configuration parameters (to enable programmable weights) should be passed via the function create_initial_config for the VivadoAccelerator backend.

If you agree, then I would add:

  • configurable_weights
    • bool, optional
    • weights are configurable at run time and thus they are exposed to the wrapper interface
    • defaults to False
  • weight_type
    • dict, optional
    • a dictionary that specifies the data type for set of layer weights (can be float or an ap_type); if the type is not specified for a layer, then it defaults to float
    • defaults to an empty dictionary that is all of the weights are float
    • note: VivadoAcceleratorBackend will round the number of bits used to the next power-of-2 value.

Please let me know if that makes sense.

Comment on lines +664 to +666
io_type = self.config.get_config_value('IOType')
interface = self.config.get_config_value('AcceleratorConfig')['Interface'] if self.config.get_config_value('AcceleratorConfig') else None
config_weights = (io_type == 'io_stream') and (interface == 'axi_master')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

@@ -6,7 +6,7 @@ def match(self, node):
cast = False
if isinstance(node, Activation):
cast = node.get_input_variable().type.precision != node.get_output_variable().type.precision
return isinstance(node, Activation) and node.get_attr('activation') == 'linear' and not cast
return isinstance(node, Activation) and node.get_attr('activation') == 'linear' # and not cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? I don't think this should be included in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a quick and dirty hack to get some models to optimize better, but it really was meant only for that branch, where we had checked the correctness. I agree that it should not be in the PR.

Comment on lines -25 to +27
#include <stdlib.h>
#include <math.h>
#include <cstdlib>
#include <cmath>
#include <cfloat>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to include cfloat at a certain point, but I do not need it in the current version of the code.

When I was at it I changed to cmath and cstdlib, that I usually include that way when working in C++. We can discard this change.

Comment on lines +12 to +14
rm -f $(DESIGN)_standalone/src/helloworld.c
cd $(DESIGN)_standalone/src && ln -s ../../common/main.c main.c
cd $(DESIGN)_standalone/src && ln -s ../../common/data.h data.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try running anything yet, but it seems like these files might not have the same names in the driver & standalone example

Comment on lines +20 to +22
io_type = model.config.get_config_value('IOType')
interface = model.config.get_config_value('AcceleratorConfig')['Interface'] if model.config.get_config_value('AcceleratorConfig') else None
config_weights = (io_type == 'io_stream') and (interface == 'axi_master')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before that I think a different trigger for reprogrammable weights is needed

newline = ''
for v in model.get_weight_variables():
newline += indent + ', model_axi_t {name} [{shape}]\n'.format(name=v.name, shape=v.data_length)
newline += indent + ', char load_weights'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not bool for load_weights?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It came from a previous iteration. For a cleaner/safer C++ code, bool is the right choice. In hardware, it would become a single-bit register if that was just in the module, while on an AXI interface you can only use 8, 16, 32, etc sized words.

def write_new_tar(self, model):
os.remove(model.config.get_output_dir() + '.tar.gz')
super(VivadoAcceleratorWriter, self).write_tar(model)


def write_standalone_app(self, model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the standalone app should be put in the examples instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function and the write_header_file can be either part of the example or come with the hls4ml code base.

f.close()
fout.close()

def write_header_file(model, X, y, y_keras, y_hls, n_samples, filename='data.h'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the comment above - maybe this should be an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with moving those functions to the examples, and later on, we can "promote" them to the hls4ml codebase if necessary.

@GiuseppeDiGuglielmo
Copy link
Contributor Author

@thesps thank you for all of the comments that help a lot!

I will reply both here and to the other inlined questions.

Can we decouple ‘repgrommable weights’ from ‘AXI Master interface’? I’d like to be able to do both:
— AXI Master interface without reprogrammable weights
— AXI Stream interface with reprogrammable weights (maybe only the NN in/out on AXI Stream interface while weights use AXI Master?)

We need the greater flexibility that you suggest. I agree.

Do we have a matrix or list of the existing possible combinations (what goes with what)?

We may also define how to pass this configuration information to the function convert_from_keras_model(). I am unsure if we should keep adding function parameters or if some nested struct/dictionary would be better. We may discuss this as a very first step because you have a better global and local view of the current status.

Each weight of the model will have its own top-level interface/port. How about a mode with a single 'weights' port with one type that can be configured (so could be some ap_fixed or float, double for example) with casting to the proper weights types in the HLS. This would be similar to what we allow now for inputs/outputs with VivadoAccelerator

We can fine-tune the weights to properly-sized ports. Not sure if that would significantly increase the logic. As you noticed, for simplicity, we bring in all the weights as float (32b) and convert them in the accelerator to the expected fixed precision.

Can we have a C driver that has some function taking the data as argument and returning the NN prediction? And some other function to reprogram the weights. We can call those from the standalone example that puts the data in header files, but they would be helpful for anyone integrating an hls4ml NN in a real application

Definitely. The current software application was mostly for proof-of-concept and debugging. We can break it down into a more organic library.

Can we have a Python driver for this? - I had some pynq Python for an AXI Master interface before, so I can help with this

Yes. Python would require an OS (or a lighter-weight RTOS). We are not booting an OS right now, so a C/C++ bare-metal application was sufficient as we had for the MLPerf Tiny submission.

I will comment a little more about this question.

Can we have the design for pynq-z2 as well? - I can help with this also

Definitely.

I have a few more boards in my working directory that I did not push yet. Essentially there are three classes of chips/boards:

  • "Pure FPGA" that for us are Microblaze-based systems (see MLPerf Tiny submission ort ARTY)
  • Zynq-based systems (like pynq-z2, pynq-z1, etc)
  • ZynqMP-based systems (like Ultra96v2, ZCU106, ZCU102, etc)

It might be easier to split these contributions into different PRs, roughly:

AXI Master interface
C Driver
Ultra96 support
Reprogrammable weights

I like the breakdown.

Also, Xilinx SDK is a legacy tool, superseded by Vitis. Can we do it with Vitis instead?

This is a good question, and @jmitrevs brought it up as well.

I did not use Vitis seriously enough, and I assume it may come with a similar flow and equivalent tools (Vivado HLS -> Vivado -> Vivado SDK). Given Vitis, I am unsure how much we should push on the software side. Do you have any experience with the software stack in Vitis? In the longer term, I would also like to close the loop and create software applications/library for Linux. I started something based on Petalinux etc. but that once again may overlap what Vitis is doing.

Similarly, there is an ongoing effort to use the PYNQ software stack and overlays. I am wondering if we should look at that as well (I do not have experience with it, but I guess there should a good degree of compatibility).

@jmitrevs jmitrevs marked this pull request as draft September 15, 2022 22:32
@GiuseppeDiGuglielmo
Copy link
Contributor Author

  • AXI Master interface

  • C Driver

  • Ultra96 support

  • Reprogrammable weights

Breaking down this draft in 4 PRs.

Let's start with PR653

@jmduarte jmduarte mentioned this pull request Nov 2, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants