## hls4ml: Bridging Machine Learning and FPGAs for Ultra-Fast Inference  


💡 **High-Level Synthesis for Machine Learning (hls4ml)**  is an open-source library that transforms machine learning models into hardware descriptions optimized for FPGA deployment.

**Key Features of hls4ml:** 

- Converts models from Keras, TensorFlow, PyTorch, and ONNX into High-Level Synthesis (HLS) projects.

- Utilizes tools like Xilinx Vitis HLS and Intel HLS Compiler to generate optimized C++ code for hardware implementation.

- Enhances efficiency by reducing latency and power consumption, making it ideal for AI applications in edge computing.

- Supports quantization and pruning techniques to shrink model size while maintaining accuracy.


For further details:

- GitHub: https://github.com/fastmachinelearning/hls4ml

- Web site: https://fastmachinelearning.org/hls4ml/

---

In [1]:
import os
import numpy as np
import tensorflow as tf 
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from qkeras import *
from qkeras import QActivation
from qkeras import QDense, QConv2DBatchnorm
import hls4ml
import matplotlib.pyplot as plt


2025-10-16 00:34:23.978368: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-10-16 00:34:24.327142: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.






### Path Vitis HLS


As an initial step, the Vivado HLS or Vitis HLS (or another tool) installation directory must be specified.

In [2]:
os.environ['PATH'] = '/tools/Xilinx/XilinxUnified_2022/Vitis_HLS/2022.2/bin:' + os.environ['PATH']
os.environ['PATH']

'/tools/Xilinx/XilinxUnified_2022/Vitis_HLS/2022.2/bin:/tools/anaconda3/envs/neuralEnv/bin:/tools/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'

#### Load the model (.h5)

In [3]:

from qkeras.utils import _add_supported_quantized_objects
co = {}
_add_supported_quantized_objects(co)
model = load_model('../models/mnistPQKD.h5', custom_objects=co)


2025-10-16 00:35:02.793182: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-10-16 00:35:03.000352: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-10-16 00:35:03.000539: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf





In [4]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 fc1_input (QDense)          (None, 5)                 3925      
                                                                 
 relu_input (QActivation)    (None, 5)                 0         
                                                                 
 fc1 (QDense)                (None, 7)                 42        
                                                                 
 relu1 (QActivation)         (None, 7)                 0         
                                                                 
 fc2 (QDense)                (None, 10)                80        
                                                                 
 relu2 (QActivation)         (None, 10)                0         
                                                                 
 output (QDense)             (None, 2)                 2

### High-Level Synthesis for Machine Learning (hls4ml )

Configuration - Granularity: Model

In [5]:
# granularity='model'

hls_config = hls4ml.utils.config_from_keras_model(model, granularity='model')


# User Configuration 

hls_config['Model']['Precision'] = 'ap_fixed<8, 6>'   
hls_config['Model']['ReuseFactor'] = 16
hls_config['Model']['Strategy'] = 'Latency' # or resource

import plotting

print("-----------------------------------")
plotting.print_dict(hls_config)
print("-----------------------------------")




Interpreting Sequential
Topology:
Layer name: fc1_input_input, layer type: InputLayer, input shapes: [[None, 784]], output shape: [None, 784]
Layer name: fc1_input, layer type: QDense, input shapes: [[None, 784]], output shape: [None, 5]
Layer name: relu_input, layer type: Activation, input shapes: [[None, 5]], output shape: [None, 5]
Layer name: fc1, layer type: QDense, input shapes: [[None, 5]], output shape: [None, 7]
Layer name: relu1, layer type: Activation, input shapes: [[None, 7]], output shape: [None, 7]
Layer name: fc2, layer type: QDense, input shapes: [[None, 7]], output shape: [None, 10]
Layer name: relu2, layer type: Activation, input shapes: [[None, 10]], output shape: [None, 10]
Layer name: output, layer type: QDense, input shapes: [[None, 10]], output shape: [None, 2]
Layer name: sigmoid, layer type: Activation, input shapes: [[None, 2]], output shape: [None, 2]
-----------------------------------
Model
  Precision:         ap_fixed<8, 6>
  ReuseFactor:       16
  Stra

Configuration - Granularity: Name

In [6]:
# granularity='name'

hls_config = hls4ml.utils.config_from_keras_model(model, granularity='name')

for layer in hls_config['LayerName'].keys():
    # to collect the output from each layer
    # hls_config['LayerName'][layer]['Trace'] = True  
    
    hls_config['LayerName'][layer]['ReuseFactor'] = 16

hls_config['LayerName']['fc1_input_input']['Precision'] = 'ap_fixed<16, 6>'   
hls_config['LayerName']['fc1']['Precision'] = 'ap_fixed<8, 4>'   

hls_config['LayerName']['sigmoid']['Strategy'] = 'Stable'

# To ensure DSPs are optimized, “unrolled” Dense multiplication must be used before synthesizing HLS
hls_config['Model']['Strategy'] = 'Unrolled'


Interpreting Sequential
Topology:
Layer name: fc1_input_input, layer type: InputLayer, input shapes: [[None, 784]], output shape: [None, 784]
Layer name: fc1_input, layer type: QDense, input shapes: [[None, 784]], output shape: [None, 5]
Layer name: relu_input, layer type: Activation, input shapes: [[None, 5]], output shape: [None, 5]
Layer name: fc1, layer type: QDense, input shapes: [[None, 5]], output shape: [None, 7]
Layer name: relu1, layer type: Activation, input shapes: [[None, 7]], output shape: [None, 7]
Layer name: fc2, layer type: QDense, input shapes: [[None, 7]], output shape: [None, 10]
Layer name: relu2, layer type: Activation, input shapes: [[None, 10]], output shape: [None, 10]
Layer name: output, layer type: QDense, input shapes: [[None, 10]], output shape: [None, 2]
Layer name: sigmoid, layer type: Activation, input shapes: [[None, 2]], output shape: [None, 2]


In [7]:
print("-----------------------------------")
plotting.print_dict(hls_config)
print("-----------------------------------")


-----------------------------------
Model
  Precision:         fixed<16,6>
  ReuseFactor:       1
  Strategy:          Unrolled
  BramFactor:        1000000000
  TraceOutput:       False
LayerName
  fc1_input_input
    Trace:           False
    Precision:       ap_fixed<16, 6>
    ReuseFactor:     16
  fc1_input
    Trace:           False
    Precision
      result:        fixed<16,6>
      weight:        fixed<8,4>
      bias:          fixed<8,4>
    ReuseFactor:     16
  fc1_input_linear
    Trace:           False
    Precision
      result:        fixed<16,6>
    ReuseFactor:     16
  relu_input
    Trace:           False
    Precision
      result:        fixed<16,7,RND_CONV,SAT>
    ReuseFactor:     16
  fc1
    Trace:           False
    Precision:       ap_fixed<8, 4>
    ReuseFactor:     16
  fc1_linear
    Trace:           False
    Precision
      result:        fixed<16,6>
    ReuseFactor:     16
  relu1
    Trace:           False
    Precision
      result:        fixed<16

### hls4ml with Vitis HLS as backend

In [8]:
cfg = hls4ml.converters.create_config(backend='vitis')

# cfg['IOType']     = 'io_stream'   # Must set this if using CNNs!
cfg['HLSConfig']  = hls_config      # HLS configuraiton
cfg['KerasModel'] = model           # Keras model to be converted
cfg['OutputDir']  = 'hw/'           # Project name
cfg['Part'] = 'xc7z020clg484-1'     # PYNQ-Z1 or Zedboard: xc7z020clg484-1  ARTIX-7 xc7a35tcsg325-1  # MPSoC xczu4eg-sfvc784-2-e  xczu3eg-sfvc784-1-e

In [9]:
hls_model = hls4ml.converters.keras_to_hls(cfg)

Interpreting Sequential
Topology:
Layer name: fc1_input_input, layer type: InputLayer, input shapes: [[None, 784]], output shape: [None, 784]
Layer name: fc1_input, layer type: QDense, input shapes: [[None, 784]], output shape: [None, 5]
Layer name: relu_input, layer type: Activation, input shapes: [[None, 5]], output shape: [None, 5]
Layer name: fc1, layer type: QDense, input shapes: [[None, 5]], output shape: [None, 7]
Layer name: relu1, layer type: Activation, input shapes: [[None, 7]], output shape: [None, 7]
Layer name: fc2, layer type: QDense, input shapes: [[None, 7]], output shape: [None, 10]
Layer name: relu2, layer type: Activation, input shapes: [[None, 10]], output shape: [None, 10]
Layer name: output, layer type: QDense, input shapes: [[None, 10]], output shape: [None, 2]
Layer name: sigmoid, layer type: Activation, input shapes: [[None, 2]], output shape: [None, 2]
Creating HLS model


In [10]:
hls_model.compile()

Writing HLS project


bash: /tools/anaconda3/envs/neuralEnv/lib/libtinfo.so.6: no version information available (required by bash)


Done


#### Hardware synthesis

In [11]:
hls_model.build(csim=False, export=False)

# hls_model.build(csim=False, export=True, bitfile=True)

/bin/bash: /tools/anaconda3/envs/neuralEnv/lib/libtinfo.so.6: no version information available (required by /bin/bash)


/bin/bash: /tools/anaconda3/envs/neuralEnv/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /tools/anaconda3/envs/neuralEnv/lib/libtinfo.so.6: no version information available (required by /bin/bash)

****** Vitis HLS - High-Level Synthesis from C, C++ and OpenCL v2022.2 (64-bit)
  **** SW Build 3670227 on Oct 13 2022
  **** IP Build 3669848 on Fri Oct 14 08:30:02 MDT 2022
    ** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.

source /tools/Xilinx/XilinxUnified_2022/Vitis_HLS/2022.2/scripts/vitis_hls/hls.tcl -notrace
INFO: [HLS 200-10] Running '/tools/Xilinx/XilinxUnified_2022/Vitis_HLS/2022.2/bin/unwrapped/lnx64.o/vitis_hls'
INFO: [HLS 200-10] For user 'ro' on host 'mareKaleido' (Linux_x86_64 version 5.15.0-139-generic) on Thu Oct 16 00:38:51 CEST 2025
INFO: [HLS 200-10] On os Ubuntu 20.04.6 LTS
INFO: [HLS 200-10] In directory '/home/ro/kaleido/repo/github/training-AI-Embedded-UTP-Peru/demos/hls4ml/hw'
Sourcing Tcl script 'build_prj.tcl'
IN

{'CSynthesisReport': {'TargetClockPeriod': '5.00',
  'EstimatedClockPeriod': '4.332',
  'BestLatency': '24',
  'WorstLatency': '24',
  'IntervalMin': '7',
  'IntervalMax': '7',
  'BRAM_18K': '1',
  'DSP': '4',
  'FF': '35096',
  'LUT': '51833',
  'URAM': '0',
  'AvailableBRAM_18K': '280',
  'AvailableDSP': '220',
  'AvailableFF': '106400',
  'AvailableLUT': '53200',
  'AvailableURAM': '0'}}

In [None]:

# Vivado version
# hls_config['Flows'] = ['vivado:fifo_depth_optimization']
# hls4ml.model.optimizer.get_optimizer('vivado:fifo_depth_optimization').configure(profiling_fifo_depth=100_000)


# Vitis version
# hls_config['Flows'] = ['vitis:fifo_depth_optimization']
# hls4ml.model.optimizer.get_optimizer('vitis:fifo_depth_optimization').configure(profiling_fifo_depth=100_000)


---
#### UTP - Perú - 2025

Romina Soledad Molina, Ph.D. - MLab/STI ICTP, Trieste, Italy