You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can not reproduce this on a single node to start 4 pservers and 4 trainers using different ports, but this issue can reproduce on two of our test kubernetes clusters, using hostNetwork mode.
Background, run distributed vgg16 goes well but model:
https://github.com/typhoonzero/fluid_gpu_benchmark/blob/master/text_fluid.py results in following error first-time trainer want to send variables.
The text was updated successfully, but these errors were encountered: