Skip to content

Distributed-Deep-Learning/raytf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tensorflow Cluster on Ray

How to Use?

python -m pip install raytf

from raytf.tf_cluster_driver import TensorflowCluster
# When you using it in local single machine
# ray.init()
tf_cluster = TensorflowCluster.build(resources=
    {
        "ps": {"cores": "2", "memory": "2", "gpu": "2", "instances": "1"},
        "worker": {"cores": "2", "memory": "2", "gpu": "2", "instances": "1"},
        "chief": {"cores": "2", "memory": "2", "gpu": "2", "instances": "1"}
    },
    event_log="/tmp/opal/4"
)
tf_cluster.start(model_process=process, args=None)

This training code will be attached to the existed perm-Ray cluster. If you want to debug, you can use ray.init() to init Ray cluster in local.

When you specify the event_log in tf builder, sidecar tensorboard will be started on one worker.

How to build

[Requirement] python -m pip install twine

  1. python setup.py bdist_wheel --universal
  2. python -m pip install xxxxxx.whl
  3. twine upload dist/*

Tips

  1. To solve the problem of Python module importing on Ray perm-cluster, this project must use Ray 1.5+ version, refer to this RFC(ray-project/ray#14019)
  2. This project is only be tested by Tensorflow estimator training

About

Tensorflow Cluster on Ray

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%