Obtained different result after calling an IP several times, seems like the data did not work properly inside. #796

KokyoK · 2023-04-08T09:20:42Z

Hello, here we are facing a weird problem.
What we have done:

Created an .onnx model file,
Created an IP with no warnings.
Ran the C simulation and RTL simulation, every test passed.

What we are currently doing:
Trying to use the IP on Pynq-z2 board. But when we call an IP several times, it outputs different results. Like this:

As you can see, the inputs of this IP are exactly the same. But the outputs are similar but different every time, and the first output is exactly correct, but others are not. Seems like it uses some internal data from the previous input. However, if we reinitialize the Overlay every time we want to use it, it works properly but much slower, as shown in the following pic:

These are the onnx file and final test notebook we used.
ipynb&onnx.zip

auphelia · 2023-04-13T08:11:15Z

Hi @KokyoK , could you please provide more information?

Which FINN version and which tool versions (Vivado/Vitis) are you using?
Is that one of our example networks or one you created/trained yourself?

KokyoK · 2023-04-13T08:25:24Z

Hello @auphelia ,

the versions are:
Pynq: v2.6.1
FINN: the newest version (from main branch)
FINN-examples: v0.0.5
Vivado: 2022.1
it is the network we created and trained ourselves

Thanks

auphelia · 2023-04-20T11:14:49Z

Hi @KokyoK,
While we're looking from our side if we can reproduce your issue, could you please update the PYNQ version to use 3.0.1 and see if the error persists?

KokyoK · 2023-04-24T05:45:19Z

Hi @auphelia ,
I have tried to use PYNQ 3.0.1 and the same issue still occurs.

fionnodonohoe-xlnx · 2023-05-04T13:39:09Z

Hi @KokyoK,
I am unable to build the bitstream from the provided ONNX file, could you please provide the original trained model?
Would it be possible to also share the 'input_permute.npy' file that is used by the notebook? Thanks

KokyoK · 2023-05-08T02:38:42Z

Hi @fionnodonohoe-xlnx , here are the files:
model.py includes the model structure, which is a normal convolutional model. weight.pt is the trained weights loaded by the model. Also input_permute.npy is provided.

Since we've build a bitstream with the provided ONNX, I guess there might be something wrong when we built bitstream.

model.zip

Thanks for your effort!

fionnodonohoe-xlnx · 2023-05-17T14:14:11Z

Hi @KokyoK,

I tried creating the ONNX file from model.py. When adding model.save(is_onnx=1) after model.eval() I get the following error:
RuntimeError: Given groups=1, weight of size [16, 40, 1, 3], expected input[16, 30, 1, 101] to have 40 channels, but got 30 channels instead

I then changed the expected input to have 40 channels instead of 30 - only to get this error:
RuntimeError: input_shape.size() > 0 || reshape.size() > 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/passes/onnx/shape_type_inference.cpp":448, please report a bug to PyTorch. Reshape node should have at least one input size > 0 when constant folding.

Are you also seeing this error? Maybe you could send on your TCResNet8.onnx file created from your script. I can try to put that ONNX file through the bitstream generation stage then.

KokyoK · 2023-05-22T04:55:02Z

Hello @fionnodonohoe-xlnx ,

The correct way to save .onnx file is putting the following line in the main function:

    model = QuantizedTCResNet8(1, 40, 10)
    model.load("weight.pt")
    model.eval()

    import brevitas.onnx as bo
    export_onnx_path = "8b_weight_act_bias_net.onnx"
    input_shape = (1, 40, 1, 101)
    bo.export_finn_onnx(model, input_shape, export_onnx_path)

and run the main function. The onnx file should be the same as I uploaded before. Sorry for the previous confusion.

We built the bitstream with this file modified build_dataflow_steps.py basically we add line 327, and commented line 338. The file is attached.

build_dataflow_steps.py.zip

fionnodonohoe-xlnx · 2023-05-22T16:04:24Z

Hi @KokyoK,
Thank you for that. I hit another bitstream generation error unfortunately. It turns out that the added TLastMarker in the provided code causes a bitstream generation error for me:
+ model = model.transform(InsertTLastMarker(both=True))
... as the TLastMarker class has no method get_input_datatype(). Do you have edits elsewhere in your local clone that circumnavigate this issue?

Here is what I see on the command line when I use the provided build_dataflow_steps.py:

Running step: step_qonnx_to_finn [1/17]
Running step: step_tidy_up [2/17]
Running step: step_streamline [3/17]
Running step: step_convert_to_hls [4/17]
Running step: step_create_dataflow_partition [5/17]
Running step: step_target_fps_parallelization [6/17]
Running step: step_apply_folding_config [7/17]
Running step: step_generate_estimate_reports [8/17]
Running step: step_hls_codegen [9/17]
Running step: step_hls_ipgen [10/17]
Running step: step_set_fifo_depths [11/17]
Running step: step_create_stitched_ip [12/17]
Running step: step_measure_rtlsim_performance [13/17]
Running step: step_out_of_context_synthesis [14/17]
Running step: step_synthesize_bitfile [15/17]
Traceback (most recent call last):
  File "~/workspace/src/finn/builder/build_dataflow.py", line 168, in build_dataflow_cfg
    model = transform_step(model, cfg)
  File "~/workspace/src/finn/builder/build_dataflow_steps.py", line 772, in step_synthesize_bitfile
    model = model.transform(
  File "~/workspace/deps/qonnx/src/qonnx/core/modelwrapper.py", line 140, in transform
    (transformed_model, model_was_changed) = transformation.apply(transformed_model)
  File "~/workspace/src/finn/transformation/fpgadataflow/make_zynq_proj.py", line 350, in apply
    kernel_model = kernel_model.transform(InsertFIFO())
  File "~/workspace/deps/qonnx/src/qonnx/core/modelwrapper.py", line 140, in transform
    (transformed_model, model_was_changed) = transformation.apply(transformed_model)
  File "~/workspace/src/finn/transformation/fpgadataflow/insert_fifo.py", line 199, in apply
    dtype = n0.get_input_datatype(inp_ind)
  File "~/workspace/src/finn/custom_op/fpgadataflow/hlscustomop.py", line 711, in get_input_datatype
    raise Exception("get_input_datatype not implemented for this op")
Exception: get_input_datatype not implemented for this op
> ~/workspace/src/finn/custom_op/fpgadataflow/hlscustomop.py(711)get_input_datatype()
-> raise Exception("get_input_datatype not implemented for this op")

KokyoK · 2023-05-23T04:59:56Z

Hi @fionnodonohoe-xlnx ,
You can simply remove all code related to TLAST in the code. Remove this line:
model = model.transform(InsertTLastMarker(both=True))
As this line is not related to this issue. We've tried to remove it and the issue still occurs.

fionnodonohoe-xlnx · 2023-05-24T17:24:37Z

Hi @KokyoK ,
I went ahead and removed the TLastMarker insertion point. The bitstream failed to generate this time due to a lack of resources available for the Pynq FPGA part. I then removed all changes from the modified build_dataflow_steps.py and retried building but to no avail. I have attached the DRC report.
As you were able to generate a bitstream for this model, how did you get around this particular resourcing issue? Thanks.
top_wrapper_drc_opted.txt

KokyoK · 2023-05-25T06:27:03Z

Hi @fionnodonohoe-xlnx ,
We tried again and did not meet your problem. I attached the build_customize folder, and we just tried this build with no error.
This is the attachment: https://drive.google.com/file/d/1yfMkpSVOmBp5GrzpWn62daF1Um2d_-RO/view?usp=sharing

KokyoK added the bug Something isn't working label Apr 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Obtained different result after calling an IP several times, seems like the data did not work properly inside. #796

Obtained different result after calling an IP several times, seems like the data did not work properly inside. #796

KokyoK commented Apr 8, 2023

auphelia commented Apr 13, 2023

KokyoK commented Apr 13, 2023

auphelia commented Apr 20, 2023

KokyoK commented Apr 24, 2023

fionnodonohoe-xlnx commented May 4, 2023

KokyoK commented May 8, 2023

fionnodonohoe-xlnx commented May 17, 2023

KokyoK commented May 22, 2023

fionnodonohoe-xlnx commented May 22, 2023

KokyoK commented May 23, 2023

fionnodonohoe-xlnx commented May 24, 2023

KokyoK commented May 25, 2023

Obtained different result after calling an IP several times, seems like the data did not work properly inside. #796

Obtained different result after calling an IP several times, seems like the data did not work properly inside. #796

Comments

KokyoK commented Apr 8, 2023

auphelia commented Apr 13, 2023

KokyoK commented Apr 13, 2023

auphelia commented Apr 20, 2023

KokyoK commented Apr 24, 2023

fionnodonohoe-xlnx commented May 4, 2023

KokyoK commented May 8, 2023

fionnodonohoe-xlnx commented May 17, 2023

KokyoK commented May 22, 2023

fionnodonohoe-xlnx commented May 22, 2023

KokyoK commented May 23, 2023

fionnodonohoe-xlnx commented May 24, 2023

KokyoK commented May 25, 2023