Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No ServerUI informations with SharedTrainingMaster #5835

Closed
newinai opened this issue Jul 6, 2018 · 2 comments

Comments

@newinai
Copy link

commented Jul 6, 2018

Hi,
I postpone this behavior:

ServerUI is running in a separate JVM.

Running SparkComputationGraph with ParameterAveragingTrainingMaster send informations to the ServerUI but if you switch to a SharedTrainingMaster, no information is reported to the ServerUI.

val tm1 = new SharedTrainingMaster.Builder(voidConfiguration, 1)
  .updatesThreshold(1e-3)
  .collectTrainingStats(true)
  .rngSeed(1)
  .rddTrainingApproach(RDDTrainingApproach.Direct)
  .batchSizePerWorker(1)
  .workersPerNode(1)
  .build();
  
  val tm2 = new ParameterAveragingTrainingMaster.Builder(1)
  .workerPrefetchNumBatches(1)
  .averagingFrequency(1)
  .batchSizePerWorker(1)
  .build();
  
  val net = new SparkComputationGraph(jsc, config, tm1 / tm2)
  net.setCollectTrainingStats(true)
  
  val remoteUIRouter = new RemoteUIStatsStorageRouter("http://localhost:9000");
  net.setListeners(
    remoteUIRouter, 
    Collections.singletonList(
      new StatsListener(null, 1)
     )
  )
@AlexDBlack AlexDBlack added this to the DL4J/Arbiter/DataVec Next Steps milestone Jul 23, 2018
@AlexDBlack AlexDBlack self-assigned this Jul 23, 2018
AlexDBlack added a commit that referenced this issue Jul 23, 2018
…outer serializable
@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2018

Thanks for reporting. Fixed here, will be merged soon: #5947

AlexDBlack added a commit that referenced this issue Jul 24, 2018
* #5255 Add pretrain overloads with epoch arg

* Add SingletonDataSetIterator; delete bad test

* #5238 global pooling layer no-arg constructor

* Fix failing test due to softmax backprop issue

* #5789 Conv1d config

* #5519 fix index out of bounds issue with UI

* Fix DataSet CNN mask merging + test

* Test + fixes for 4d mask MultiDataSet merging

* #5765 Fix OCNNOutputLayer issue

* Basic listener support for SharedTrainingMaster

* #5835 SharedTrainingMaster support for UI; make RemoteUIStatsStorageRouter serializable
@lock

This comment has been minimized.

Copy link

commented Sep 21, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Sep 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants
You can’t perform that action at this time.