Skip to content
This repository has been archived by the owner. It is now read-only.
Switch branches/tags
Go to file

Latest commit


Failed to load latest commit information.

#Distributed TensorFlow on Spark First presented at the 2016 Spark Summit East: [Slide deck] (, [Presentation video] (, [Blog post] (

##TensorSpark productionalized in yarn-cluster mode This latest version contains modifications/improvements that are mostly relevant to someone interested in taking TensorSpark to production in yarn-cluster mode (tested with a Hortonworks distribution [HDP 2.4] with CPU machines). For other deployment and machine types, the earlier version as of [Commit #62] ( might still be a better option.

###Summary of changes since [Commit #62] ( There are few minor improvements (see commits for details) and the following 2 major changes:

  • Reading the testset from the HDFS instead (Avoiding the need to put the testset on local disk; we are putting training and test sets at the same location on the HDFS)
  • Find the machine that gets the Spark Driver in yarn-cluster mode (either way, there are some configs to be done here)

###To run

  1. zip ./ ./ ./ ./ ./ ./
  2. spark-submit

    --master yarn

    --deploy-mode cluster

    --queue default

    --num-executors 3

    --driver-memory 20g

    --executor-memory 60g

    --executor-cores 8

    --py-files ./


Partial project layout:
tensorspark/ - script to build tf from source with gpu support for aws
tensorspark/simple_websocket_*.py - simple tornado websocket example
tensorspark/ - "abstract" model class that has all tensorspark required methods implemented
tensorspark/* - specific fully connected models for specific datasets
tensorspark/ - convolutional model for mnist
tensorspark/ - spark worker code
tensorspark/ - entry point and spark driver code


TensorFlow on Spark



No releases published


No packages published