Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latency difference with same Model but version of hls4ml #889

Open
sparajul opened this issue Oct 17, 2023 · 1 comment
Open

Latency difference with same Model but version of hls4ml #889

sparajul opened this issue Oct 17, 2023 · 1 comment
Labels

Comments

@sparajul
Copy link

Quick summary

I was trying to find the FPGA resource usage and the latency with the CNN model i build, I used exact same setting and got completely different result with 0.6.0 and 0.7.1 version of hls4ml.
While using 0.6.0-> the latency was around 1.2 us and
While using 0.7.1-> the latency was around 7us, which is a huge difference.

Steps to Reproduce

I worked in the jupyter notebook. If needed here is the complete notebook.
https://github.com/sparajul/fastmachinelearning/blob/main/TrainCNN.ipynb

import hls4ml
import os
model_cnn = load_model('cnn.h5')
os.environ['PATH'] = '/tools/Xilinx/Vivado/2018.3/bin:' + os.environ['PATH']

hls_config = hls4ml.utils.config_from_keras_model(model_cnn, granularity='name')

hls_config['Model']['Precision'] = 'ap_fixed<16,8>'
hls_config['Model']['ReuseFactor'] = 10

cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType'] = 'io_stream'
cfg['HLSConfig'] = hls_config
cfg['KerasModel'] = model_cnn
cfg['OutputDir'] = 'keras_cnn/vu13p'

cfg['XilinxPart'] = 'xcvu13p-flga2577-2L-e'

hls_model_aq = hls4ml.converters.keras_to_hls(cfg)
hls_model_aq.compile()

hls_model_aq.build(csim=False, synth=True, vsynth=True)

hls4ml.report.read_vivado_report('keras_cnn/vu13p')

Actual behavior

Difference Latency in different version of hls4ml

#Saved model here
cnn.h5.zip

@sparajul sparajul added the bug label Oct 17, 2023
@calad0i
Copy link
Contributor

calad0i commented Nov 2, 2023

It seems that you are using parallel io. In the newer version of hls4ml, conv unrolls are controlled by PrallelizationFactor in the config file (hls_config). (Dull unroll was done for latency strategy.) This value defaults to one, and you will need to set it to match the number your kernel is applied to get the whole convolution done in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants