External weights vivado accelerator #646

GiuseppeDiGuglielmo · 2022-08-26T01:49:54Z

Extend the VivadoAccelerator backend and add programmable weights. An ideal setup is a Xilinx Zynq/ZynqMP board (ARM core + programmable logic).

The backend generates code for Vivado HLS, Vivado, and Vivado SDK:

the hls4ml IP with AXI master interfaces to move input features, output predictions, and weights between off-chip RAM and FPGA chip
a complete Vivado project that integrates the hls4ml IP (for the target board/chip)
a complete baremetal application to control and program the accelerator

Type of change

Some changes to the signature of the function convert_from_keras_model():

    hls_model =  hls4ml.converters.convert_from_keras_model(
            model=model,
            clock_period=CLOCK_PERIOD,
            backend='VivadoAccelerator',
            board=BOARD_NAME,
            part=FPGA_PART,
            io_type='io_stream',
            interface='axi_master',
            driver='c',
            input_data_tb=DATA_DIR+'/X_test.npy',
            output_data_tb=DATA_DIR+'/y_test.npy',
            hls_config=config,
            output_dir=OUTPUT_DIR)

and a new function write_header_file() to write a header file with an harcoded dataset:

hls4ml.writer.vivado_accelerator_writer.VivadoAcceleratorWriter.write_header_file(hls_model, X_test, y_test, y_qkeras, y_hls, 64, OUTPUT_DIR + '/sdk/common/data.h')

Tests

You can test it with the example at this repo: https://github.com/GiuseppeDiGuglielmo/test-hls4ml-backend

Right now, we support Ultra96v2, but more Zynq/ZynqMP boards can be added.

The VivadoAccelerator backend create a Vivado IP that can be easily integrated in a Zynq/ZynqMP setup. The input features, the output predictions, and the weights sit in main memory and get moved via AXI-Master interfaces.

thesps

Thanks for this, that's a nice feature! I have some general feedback here and some specific points that I'll address inline.

Can we decouple ‘repgrommable weights’ from ‘AXI Master interface’? I’d like to be able to do both:
— AXI Master interface without reprogrammable weights
— AXI Stream interface with reprogrammable weights (maybe only the NN in/out on AXI Stream interface while weights use AXI Master?)
Each weight of the model will have its own top-level interface/port. How about a mode with a single 'weights' port with one type that can be configured (so could be some ap_fixed or float, double for example) with casting to the proper weights types in the HLS. This would be similar to what we allow now for inputs/outputs with VivadoAccelerator
Can we have a C driver that has some function taking the data as argument and returning the NN prediction? And some other function to reprogram the weights. We can call those from the standalone example that puts the data in header files, but they would be helpful for anyone integrating an hls4ml NN in a real application
Can we have a Python driver for this? - I had some pynq Python for an AXI Master interface before, so I can help with this
Can we have the design for pynq-z2 as well? - I can help with this also

It might be easier to split these contributions into different PRs, roughly:

AXI Master interface
C Driver
Ultra96 support
Reprogrammable weights

Also, Xilinx SDK is a legacy tool, superseded by Vitis. Can we do it with Vitis instead?

thesps · 2022-08-26T08:42:11Z

hls4ml/model/graph.py

+        io_type = self.config.get_config_value('IOType')
+        interface = self.config.get_config_value('AcceleratorConfig')['Interface'] if self.config.get_config_value('AcceleratorConfig') else None
+        config_weights = (io_type == 'io_stream') and (interface == 'axi_master')


Two things:

I don't think IOType of io_stream and Interface of axi_master should be the trigger for reprogrammable weights, I think it would be better as a new configuration parameter (of the AcceleratorConfig). Then there would probably need to be some assert somewhere to allow only combinations of those other parameters that have been implemented for (ie right now it also depends on the board)

As written this can't go here in ModelGraph as it's backend specific

I agree. This was mostly a placeholder, and I was going to ask what should be better.

As in the main thread, do we already have a matrix or list of the existing possible combinations (what goes with what)? I am not sure how to properly set up the configuration/trigger. Do you have suggestions?

I started putting together a document, which is far from being final. @thesps and @jmitrevs should be able to edit. Not sure if that may help or if it is better to keep it to the comments here. I bit of both I guess.

If I understand it right the additional configuration parameters (to enable programmable weights) should be passed via the function create_initial_config for the VivadoAccelerator backend.

If you agree, then I would add:

configurable_weights

bool, optional

weights are configurable at run time and thus they are exposed to the wrapper interface

defaults to False

weight_type

dict, optional

a dictionary that specifies the data type for set of layer weights (can be float or an ap_type); if the type is not specified for a layer, then it defaults to float

defaults to an empty dictionary that is all of the weights are float

note: VivadoAcceleratorBackend will round the number of bits used to the next power-of-2 value.

Please let me know if that makes sense.

thesps · 2022-08-26T08:43:05Z

hls4ml/model/graph.py

+        io_type = self.config.get_config_value('IOType')
+        interface = self.config.get_config_value('AcceleratorConfig')['Interface'] if self.config.get_config_value('AcceleratorConfig') else None
+        config_weights = (io_type == 'io_stream') and (interface == 'axi_master')


Same comment as above

thesps · 2022-08-26T08:43:32Z

hls4ml/model/optimizer/passes/nop.py

@@ -6,7 +6,7 @@ def match(self, node):
        cast = False
        if isinstance(node, Activation):
            cast = node.get_input_variable().type.precision != node.get_output_variable().type.precision
-        return isinstance(node, Activation) and node.get_attr('activation') == 'linear' and not cast
+        return isinstance(node, Activation) and node.get_attr('activation') == 'linear' # and not cast


? I don't think this should be included in this PR

That was a quick and dirty hack to get some models to optimize better, but it really was meant only for that branch, where we had checked the correctness. I agree that it should not be in the PR.

thesps · 2022-08-26T08:43:48Z

hls4ml/templates/vivado/myproject_test.cpp

-#include <stdlib.h>
-#include <math.h>
+#include <cstdlib>
+#include <cmath>
+#include <cfloat>


I had to include cfloat at a certain point, but I do not need it in the current version of the code.

When I was at it I changed to cmath and cstdlib, that I usually include that way when working in C++. We can discard this change.

thesps · 2022-08-26T08:45:56Z

hls4ml/templates/vivado_accelerator/ultra96v2/c_drivers/sdk/Makefile

+	rm -f $(DESIGN)_standalone/src/helloworld.c
+	cd  $(DESIGN)_standalone/src && ln -s ../../common/main.c main.c
+	cd  $(DESIGN)_standalone/src && ln -s ../../common/data.h data.h


I didn't try running anything yet, but it seems like these files might not have the same names in the driver & standalone example

thesps · 2022-08-26T08:47:18Z

hls4ml/writer/vivado_accelerator_writer.py

+        io_type = model.config.get_config_value('IOType')
+        interface = model.config.get_config_value('AcceleratorConfig')['Interface'] if model.config.get_config_value('AcceleratorConfig') else None
+        config_weights = (io_type == 'io_stream') and (interface == 'axi_master')


Same comment as before that I think a different trigger for reprogrammable weights is needed

thesps · 2022-08-26T08:48:18Z

hls4ml/writer/vivado_accelerator_writer.py

+                newline = ''
+                for v in model.get_weight_variables():
+                    newline += indent + ', model_axi_t {name} [{shape}]\n'.format(name=v.name, shape=v.data_length)
+                newline += indent + ', char load_weights'


Why not bool for load_weights?

It came from a previous iteration. For a cleaner/safer C++ code, bool is the right choice. In hardware, it would become a single-bit register if that was just in the module, while on an AXI interface you can only use 8, 16, 32, etc sized words.

thesps · 2022-08-26T08:51:15Z

hls4ml/writer/vivado_accelerator_writer.py

    def write_new_tar(self, model):
        os.remove(model.config.get_output_dir() + '.tar.gz')
        super(VivadoAcceleratorWriter, self).write_tar(model)

-
+    def write_standalone_app(self, model):


Maybe the standalone app should be put in the examples instead?

This function and the write_header_file can be either part of the example or come with the hls4ml code base.

thesps · 2022-08-26T08:52:26Z

hls4ml/writer/vivado_accelerator_writer.py

+        f.close()
+        fout.close()
+
+    def write_header_file(model, X, y, y_keras, y_hls, n_samples, filename='data.h'):


Related to the comment above - maybe this should be an example?

I am good with moving those functions to the examples, and later on, we can "promote" them to the hls4ml codebase if necessary.

GiuseppeDiGuglielmo · 2022-08-26T17:05:23Z

@thesps thank you for all of the comments that help a lot!

I will reply both here and to the other inlined questions.

Can we decouple ‘repgrommable weights’ from ‘AXI Master interface’? I’d like to be able to do both:
— AXI Master interface without reprogrammable weights
— AXI Stream interface with reprogrammable weights (maybe only the NN in/out on AXI Stream interface while weights use AXI Master?)

We need the greater flexibility that you suggest. I agree.

Do we have a matrix or list of the existing possible combinations (what goes with what)?

We may also define how to pass this configuration information to the function convert_from_keras_model(). I am unsure if we should keep adding function parameters or if some nested struct/dictionary would be better. We may discuss this as a very first step because you have a better global and local view of the current status.

Each weight of the model will have its own top-level interface/port. How about a mode with a single 'weights' port with one type that can be configured (so could be some ap_fixed or float, double for example) with casting to the proper weights types in the HLS. This would be similar to what we allow now for inputs/outputs with VivadoAccelerator

We can fine-tune the weights to properly-sized ports. Not sure if that would significantly increase the logic. As you noticed, for simplicity, we bring in all the weights as float (32b) and convert them in the accelerator to the expected fixed precision.

Can we have a C driver that has some function taking the data as argument and returning the NN prediction? And some other function to reprogram the weights. We can call those from the standalone example that puts the data in header files, but they would be helpful for anyone integrating an hls4ml NN in a real application

Definitely. The current software application was mostly for proof-of-concept and debugging. We can break it down into a more organic library.

Can we have a Python driver for this? - I had some pynq Python for an AXI Master interface before, so I can help with this

Yes. Python would require an OS (or a lighter-weight RTOS). We are not booting an OS right now, so a C/C++ bare-metal application was sufficient as we had for the MLPerf Tiny submission.

I will comment a little more about this question.

Can we have the design for pynq-z2 as well? - I can help with this also

Definitely.

I have a few more boards in my working directory that I did not push yet. Essentially there are three classes of chips/boards:

"Pure FPGA" that for us are Microblaze-based systems (see MLPerf Tiny submission ort ARTY)
Zynq-based systems (like pynq-z2, pynq-z1, etc)
ZynqMP-based systems (like Ultra96v2, ZCU106, ZCU102, etc)

It might be easier to split these contributions into different PRs, roughly:

AXI Master interface
C Driver
Ultra96 support
Reprogrammable weights

I like the breakdown.

Also, Xilinx SDK is a legacy tool, superseded by Vitis. Can we do it with Vitis instead?

This is a good question, and @jmitrevs brought it up as well.

I did not use Vitis seriously enough, and I assume it may come with a similar flow and equivalent tools (Vivado HLS -> Vivado -> Vivado SDK). Given Vitis, I am unsure how much we should push on the software side. Do you have any experience with the software stack in Vitis? In the longer term, I would also like to close the loop and create software applications/library for Linux. I started something based on Petalinux etc. but that once again may overlap what Vitis is doing.

Similarly, there is an ongoing effort to use the PYNQ software stack and overlays. I am wondering if we should look at that as well (I do not have experience with it, but I guess there should a good degree of compatibility).

GiuseppeDiGuglielmo · 2022-09-15T23:01:11Z

AXI Master interface

C Driver

Ultra96 support

Reprogrammable weights

Breaking down this draft in 4 PRs.

Let's start with PR653

jmitrevs and others added 4 commits August 17, 2022 17:37

loosen EliminateLinearActivation match criteria--potentially unsafe

5de33c6

Cover the case Interface is not defined

932fec8

Generate standalone application for ZynqMP (tested on Ultra96v2)

1c9929e

thesps requested changes Aug 26, 2022

View reviewed changes

Add AXI-master Python driver for Ultra96v2

fc00d4b

jmitrevs marked this pull request as draft September 15, 2022 22:32

jmduarte mentioned this pull request Nov 2, 2022

MLPerf Tiny developments #503

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External weights vivado accelerator #646

External weights vivado accelerator #646

GiuseppeDiGuglielmo commented Aug 26, 2022

thesps left a comment •

edited

thesps Aug 26, 2022

GiuseppeDiGuglielmo Aug 26, 2022

GiuseppeDiGuglielmo Aug 27, 2022

thesps Aug 26, 2022

thesps Aug 26, 2022

jmitrevs Aug 26, 2022

thesps Aug 26, 2022

GiuseppeDiGuglielmo Aug 26, 2022

thesps Aug 26, 2022

thesps Aug 26, 2022

thesps Aug 26, 2022

GiuseppeDiGuglielmo Aug 26, 2022

thesps Aug 26, 2022

GiuseppeDiGuglielmo Aug 26, 2022

thesps Aug 26, 2022

GiuseppeDiGuglielmo Aug 26, 2022

GiuseppeDiGuglielmo commented Aug 26, 2022

GiuseppeDiGuglielmo commented Sep 15, 2022

External weights vivado accelerator #646

Are you sure you want to change the base?

External weights vivado accelerator #646

Conversation

GiuseppeDiGuglielmo commented Aug 26, 2022

Type of change

Tests

thesps left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GiuseppeDiGuglielmo commented Aug 26, 2022

GiuseppeDiGuglielmo commented Sep 15, 2022

thesps left a comment •

edited