Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UIServer.detach(statsStorage) fails for trying to remove listeners twice #6859

Closed
printomi opened this issue Dec 14, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@printomi
Copy link
Contributor

commented Dec 14, 2018

Use case

I would like to run multiple trainings, one after the other, but only display the current training on a UIServer. I think that detaching the StatsStorage from UIServer after each training is what I need, but I get an exception in this scenario. (Further plan: if this works, I would like to store the stats with my models to display them later.)

How to reproduce the error

Insert this line:

uiServer.detach(statsStorage);

into UIExample.java, after this line:

net.fit(trainData);

The error

This will cause the following error:

---java.lang.UnsupportedOperationException
	at java.util.concurrent.CopyOnWriteArrayList$COWIterator.remove(CopyOnWriteArrayList.java:1176)
	at org.deeplearning4j.ui.play.PlayUIServer.detach(PlayUIServer.java:324)
	at org.deeplearning4j.examples.userInterface.UIExample.main(UIExample.java:41)

Discussion

I think that the following two lines of PlayUIServer are responsible for the above error:



The two lines try to remove the same element from listeners.
See also this line in BaseCollectionStatsStorage

Should we change the code of PlayUIServer? Or is there another solution for my use case?

@printomi printomi changed the title UIServer.detach(statsStorage) tries to remove listeners twice UIServer.detach(statsStorage) fails for trying to remove listeners twice Dec 15, 2018

AlexDBlack added a commit that referenced this issue Dec 21, 2018

AlexDBlack added a commit that referenced this issue Dec 22, 2018

AlexDBlack added a commit that referenced this issue Dec 22, 2018

[WIP] Misc fixes (#6858)
* Javadoc fixes and small cleanup

* More minor fixes

* #6790 Nd4j.arange javadoc

* Clean up + MLN javadoc pass

* #6890 Nd4j.writeAsNumpy javadoc

* #6859 Fix UIServer.detach

* #6811 ElementWiseVertex single input fix

* #6804 Transforms.dot, Transforms.cross

* #6852 Throw original exception on I18N loading errors

* #6841 Fix dropout instances not being cloned when config is cloned

* #6475 INDArray.isInfinite, INDArray.isNaN

* #6856 Allow scoped out in feedForwardToActivationDetached

* Minor test fixes
@printomi

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

@AlexDBlack I was wrong with my first insight, and your fix did not fix all the problems of attaching and detaching.
The two lines actually removed element from different listeners objects. The correct way to remove StatsStorageListener from PlayUIServer.listeners is
listeners.remove(p); instead of iterator.remove(). (iterator.remove() was actually not supported, it threw UnsupportedOperationException, this was my original problem.)

    @Override
    public synchronized void detach(StatsStorage statsStorage) {
        if (statsStorage == null)
            throw new IllegalArgumentException("StatsStorage cannot be null");
        if (!statsStorageInstances.contains(statsStorage))
            return; //No op
        boolean found = false;
        for (Iterator<Pair<StatsStorage, StatsStorageListener>> iterator = listeners.iterator(); iterator.hasNext();) {
            Pair<StatsStorage, StatsStorageListener> p = iterator.next();
            if (p.getFirst() == statsStorage) { //Same object, not equality
                statsStorage.deregisterStatsStorageListener(p.getSecond());
                listeners.remove(p);
                found = true;
            }
        }
        statsStorageInstances.remove(statsStorage);
        for (UIModule uiModule : uiModules) {
            uiModule.onDetach(statsStorage);
        }
        if (found) {
            log.info("StatsStorage instance detached from UI: {}", statsStorage);
        }
    }

And there is more:
I debugged my use case and found out three more problems in TrainModule.onDetach():

  1. currentSessionID is not updated. I handled this with the following code modification:
    @Override
    public void onDetach(StatsStorage statsStorage) {
        for (String s : knownSessionIDs.keySet()) {
            if (knownSessionIDs.get(s) == statsStorage) {
                knownSessionIDs.remove(s);
                currentSessionID = null;
                getDefaultSession();
            }
        }
    }

This way my code started to work: show only current/last training by detaching last before attaching new.
It was not crucial in my current code, but I think that removing fully the previous session would be nice
2. workerIdxCount still contains the old sessionId
3. workerIdxToName still contains the old sessionId
This version of code handles these too:

    @Override
    public void onDetach(StatsStorage statsStorage) {
        for (String s : knownSessionIDs.keySet()) {
            if (knownSessionIDs.get(s) == statsStorage) {
                knownSessionIDs.remove(s);
                workerIdxCount.remove(s);
                workerIdxToName.remove(s);
                currentSessionID = null;
                getDefaultSession();
            }
        }
    }

Now, I would like to share my above described solution in a new PR. OK?

@AlexDBlack

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

Good catch. Feel free to send us a pull request with that fix, it looks good to me at first glance.

@AlexDBlack AlexDBlack reopened this Jan 7, 2019

@printomi

This comment has been minimized.

Copy link
Contributor Author

commented Jan 7, 2019

I created a new test case based on testUIMultipleSessions, but it seems that I missed to include an ND4J backend. Got this exception:

org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html

If I add nd4j-native to the pom.xml of deeplearning4j-play, I can run these tests.

But I wonder if there is a solution without adding this dependency, which is only needed by the test codes.

Shall I add this changed pom.xml also to the PR?

@printomi

This comment has been minimized.

Copy link
Contributor Author

commented Jan 7, 2019

Some background info: I use IntelliJ IDEA for running the test.
I have just installed IDEA for the first time, so I would appreciate any suggestions. I opened the maven project deeplearning4j-play from the cloned github repo folder.

See this screenshot of the my project setup and the selected/available maven profiles in IDEA. (I selected x86-64.)
dl4j-intellij_idea-maven_profiles

printomi added a commit to printomi/deeplearning4j that referenced this issue Jan 7, 2019

[WIP] Misc fixes (deeplearning4j#6858)
* Javadoc fixes and small cleanup

* More minor fixes

* deeplearning4j#6790 Nd4j.arange javadoc

* Clean up + MLN javadoc pass

* deeplearning4j#6890 Nd4j.writeAsNumpy javadoc

* deeplearning4j#6859 Fix UIServer.detach

* deeplearning4j#6811 ElementWiseVertex single input fix

* deeplearning4j#6804 Transforms.dot, Transforms.cross

* deeplearning4j#6852 Throw original exception on I18N loading errors

* deeplearning4j#6841 Fix dropout instances not being cloned when config is cloned

* deeplearning4j#6475 INDArray.isInfinite, INDArray.isNaN

* deeplearning4j#6856 Allow scoped out in feedForwardToActivationDetached

* Minor test fixes
@lock

This comment has been minimized.

Copy link

commented Feb 7, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Feb 7, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.