Spark on Pynq
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
conf Update spark-defaults.conf Jan 21, 2017

SPynq: Spark on Pynq


SPynq is a framework for the efficient deployment of Spark data analytics applications on the heterogeneous MPSoC FPGA called Zynq on the Xilinx Pynq platform. The mapping of Spark on Pynq allows the acceleration of Spark application by utlizing seamlessly the programmable logic of the FPGAs. Below we will describe the configuration steps for the deployment of Spark on Pynq as well as the actions needed to access the built-in Xilinx libraries from PySpark.

In this project we have developed a library of hardware accelerators that can be used to speedup Spark applications such as Machine learning. SPecifically, we have developed a hardware accelerator that can speedup Logistic regression application up to 22x as it is described in SPynq paper (bibtex).

Deploying Apache Spark

The Zynq FPGA, hosts a dual-core ARM Cortex-A9 32-bit processor and programmable logic. The Pynq platform hosts the Zynq SoC and 512MB of DDR3 memory. It is obvious that building Spark from source on Pynq could take too long or even end up failing since its resources are limited enough. For that reason, a pre-built version of Spark is used for the deployment. After connecting with Pynq through ssh protocol and extracting Spark to the directory of your choice, we need to follow certain steps in order for it to work.

First of all, we need to edit the following three files under the conf/ dir of Spark.

  1. spark-defaults.conf (Add or edit the following lines)
spark.eventLog.enabled           true  
spark.eventLog.dir               {path/to/the/dir/where/log/files/will/be/stored}  
spark.driver.memory              505M  
spark.executor.memory            505M 
spark.executor.cores             1
spark.executor.instances         1
  1. (Add or edit the following lines)
export PYSPARK_PYTHON           = python3
export PYTHONHASHSEED           = 0
export SPARK_DAEMON_MEMORY      = 505m
export SPARK_DRIVER_MEMORY      = 505m
export SPARK_EXECUTOR_MEMORY    = 505m
export SPARK_WORKER_MEMORY      = 505m
export SPARK_WORKER_CORES       = 1

Xilinx libraries for accessing Pynq's modules, are built using python3. Not adding the first line to the configuration, would prevent applciations using those libraries from running.

    Should look something like the file uploaded to the current project. (this one is optional and doesn't affect Spark execution, is used in order to hide unwanted log lines from the console)

After these configurations are made, Spark is ready to be run on Pynq. Of course, before running a Spark application, jdk has to be present and JAVA_HOME to be set properly.

(Under the project's conf dir you can find all of the three files mentioned above.)

Xilinx libraries for python applications

In order to use Pynq's board peripherals or the PL, always remember to append the path to Xilinx's pre-built libraries. Every python application should include the following two lines:

import sys

Contact Details:

Christoforos Kachris Microprocessors and Digital Systems Lab (Microlab) School of Electrical & Computer Engineering National Technical University of Athens (NTUA) Athens, Greece


Ioannis Stamelos Elias Koromilas