python -m pip install raytf
from raytf.tf_cluster_driver import TensorflowCluster
# When you using it in local single machine
# ray.init()
tf_cluster = TensorflowCluster.build(resources=
{
"ps": {"cores": "2", "memory": "2", "gpu": "2", "instances": "1"},
"worker": {"cores": "2", "memory": "2", "gpu": "2", "instances": "1"},
"chief": {"cores": "2", "memory": "2", "gpu": "2", "instances": "1"}
},
event_log="/tmp/opal/4"
)
tf_cluster.start(model_process=process, args=None)
This training code will be attached to the existed perm-Ray cluster. If
you want to debug, you can use ray.init()
to init Ray cluster in
local.
When you specify the event_log in tf builder, sidecar tensorboard will be started on one worker.
[Requirement] python -m pip install twine
- python setup.py bdist_wheel --universal
- python -m pip install xxxxxx.whl
- twine upload dist/*
- To solve the problem of Python module importing on Ray perm-cluster, this project must use Ray 1.5+ version, refer to this RFC(ray-project/ray#14019)
- This project is only be tested by Tensorflow estimator training