Skip to content
Branch: master
Find file History
Tomasz Grel
Tomasz Grel Initial commit
Latest commit 159cc5c Jun 22, 2017
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md Initial commit Jun 22, 2017
download_data.sh Initial commit Jun 22, 2017
job.sh Initial commit Jun 22, 2017
main.py Initial commit Jun 22, 2017
my_job.sh Initial commit Jun 22, 2017

README.md

CIFAR-10 Example

This directory contains an example job training a simple CNN model for CIFAR-10 on a Slurm cluster with distributed TensorFlow. Please note that the example is meant only to illustrate the usage of this package, the accuracy of the model could certainly be easily improved.

Running the example

Firstly, create a virtualenv and install TensorFlow and tensorflow_on_slurm in it with:

virtualenv cifar-10-job-venv
source venv/bin/activate
pip install tensorflow
pip install git+https://github.com/deepsense-io/tensorflow_on_slurm

The virtualenv is necessary to make sure the packages are there on all the nodes taking part in the job. Now to run the example script on 5 nodes (4 workers + 1 parameter server) issue:

srun  -N 5 -n 5 -t 24:00:00 job.sh cifar-10-job-venv
You can’t perform that action at this time.