DEDALE Distributed Learning Toolbox (D4.3)
Toolbox Main Authors: Nancy Panousopoulou (FORTH-ICS), Samuel Farrens (CEA), Konstantina Fotiadou (FORTH-ICS), Greg Tsagkatakis (FORTH-ICS), Arnauld Woiselle (SAFRAN)
Corresponding Author Email: email@example.com Website: https://github.com/dedale-fet
This repository contains the Python toolbox for distributed sparsity-based learning architectures, along with benchmarking imaging test sets.
The toolbox implements the Dedale Distributed Learning Architecture for solving large-scale imaging problems, associated to:
- Space variant deconvolution of galaxy survey images (package: Distributed Space Variant Deconvolution)
- Hyperspectral and color image super resolution (package: Distributed Sparce Coupled Dictionary Learning)
Prior referring to the the documentation in each sub-folder on how to use the toolbox for each application, please read the following guidelines the prerequisities of the toolbox.
Sample datasets are available here.
Prerequisities of the toolbox
The implementation of the Distributed Learning Architecture considers the use of the Apache Spark distributed computing framework.
Software packages and Operating System
The prerequisities for installing a Spark-compliant cluster over a set of working terminals are:
Linux OS (tested with Ubuntu 16.04.3 LTS)
SSH client-server packages (e.g., the openssh packages available with the Linux distribution).
Apache Spark. Tested the version 2.1.1 (pre-build with Apache Hadoop 2.7 and later), which is available for download here.
Python. Tested with version 2.7.12.
Java JDK / RE . Tested with SE version 1.8.0
Each of these packages should be installed on all terminals which will comprise the Spark cluster.
Spark cluster configuration
On each terminal for your cluster extract the prebuild version of Spark into a prefered folder with read/write/execute permissions. In this guide, the preselected folder for all terminals is
On the master node:
Download the folder spark-configurations into a local folder
Copy contents of spark-configurations into
Define the master host: Edit line 50 of the file
$SPARK/conf/spark-env.shto bind the master of cluster to the IP of the master terminal. For example, if the IP of the master terminal is
SPARK_MASTER_HOST='XXX.XXX.XXX.XXX'. Save and close the file.
Define the slave nodes: Open and edit file
$SPARK/conf/slavesto indicate the IPs of the worker nodes (line 19 and onwards). Save and close the file.
Cluster configuration parameters. The configuration and environmental parameters for the cluster can be tuned at the file
Define the port number for the spark cluster web-interface: Edit line 28 of the file
$SPARK/spark-defaults.confto indicate the URL for the spark cluster web interface. For example if the IP of the master terminal is
Define the location of the logging configuration: Edit the value of
-Dlog4j.configurationat line 34 to indicate the location of the
(If needed:) Define the memory size allocated at the master for spark calculations by accordingly changing the value of variable
spark.driver.memoryat line 32 (in the current configuration: 8GB of RAM)
(If needed:) Define the memory size allocated at each worker for spark calculations by accordingly changing the value of variable
spark.executor.memoryat line 35 (in the current configuration: 2GB of RAM)
Save and close the
Note: For a complete list of tunable parameters for the cluster configuration consult the documentation available here
Launching/Stopping the cluster
- For starting the cluster: Open a command terminal at the master node and type:
$ $SPARK/sbin/start-master.sh; $SPARK/sbin/start-slaves.sh
$SPARK is the location of the spark prebuild files (e.g., /usr/local/spark)
To check whether the cluster configuration and launching is successful open an internet browser and type
http://XXX.XXX.XXX.XXX:8080/, where XXX.XXX.XXX.XXX is the IP of the master node. For example, for a cluster with master node IP 220.127.116.11, the web interface is as follows:
- For shutting down the cluster: Open a command terminal at the master node and type:
$ $SPARK/sbin/stop-master.sh; $SPARK/sbin/stop-slaves.sh
Note: The configuration, launch and stop of the cluster can also be handled through ssh connection at the master node.
For the complete documentation on spark standalone clusters please refer here.
A. Panousopoulou, S. Farrens, K. Fotiadou, A. Woiselle, G. Tsagkatakis, J-L Starck, P. Tsakalides. (2018). A Distributed Learning Architecture for Scientific Imaging Problems. arXiv preprint arXiv:1809.05956.
K. Fotiadou, G. Tsagkatakis, P. Tsakalides, "Linear Inverse Problems with Sparsity Constraints," DEDALE DELIVERABLE 3.1, 2016.
A. Panousopoulou, S. Farrens, S., Y. Mastorakis, J.L. Starck, P. Tsakalides, "A distributed learning architecture for big imaging problems in astrophysics," In 2017 25th European (pp. 1440-1444). IEEE.
S. Farrens, F.M. Ngole Mboula, and J.-L. Starck, “Space variant deconvolution of galaxy survey images,” Astronomy and & Astrophysics, vol. 601, A66 , 2017.