Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSNE - BarnesHutTsne fit() error #6058

Closed
newinai opened this issue Aug 2, 2018 · 4 comments
Closed

TSNE - BarnesHutTsne fit() error #6058

newinai opened this issue Aug 2, 2018 · 4 comments

Comments

@newinai
Copy link

newinai commented Aug 2, 2018

Hi,

I'm using snapshot version of DL4J ( 1.0.0-SNAPSHOT ) to generate TSNE csv but an error occurs.
The vocab loaded is a classic word2vec generated file with 4693 words.

code is here: https://gist.github.com/newinai/f95d4b37660b97bf4cda092a710170c9

Error:
18:33:03.411 [main] INFO o.deeplearning4j.plot.BarnesHutTsne - Calculating probabilities of data similarities...
18:33:03.412 [main] INFO o.deeplearning4j.plot.BarnesHutTsne - Handled 0 records
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at org.deeplearning4j.clustering.vptree.VPTree.buildFromData(VPTree.java:194)
at org.deeplearning4j.plot.BarnesHutTsne.computeGaussianPerplexity(BarnesHutTsne.java:246)
at org.deeplearning4j.plot.BarnesHutTsne.fit(BarnesHutTsne.java:527)
at org.deeplearning4j.plot.BarnesHutTsne.fit(BarnesHutTsne.java:752)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)

@AlexDBlack
Copy link
Contributor

AlexDBlack commented Aug 4, 2018

I can't run your code directly, as clearly I don't have your word vectors.
Is it possible for you to give me something complete that I can run to reproduce this?

I'm unable to reproduce it with something that I think is equivalent, however:

    @Test
    public void test() throws Exception {

        //STEP 1: Initialization
        val iterations = 100;

        //create an n-dimensional array of doubles
        DataTypeUtil.setDTypeForContext(DataBuffer.Type.DOUBLE);

        val cacheList = new ArrayList<String>(); //cacheList is a dynamic array of strings used to hold all words

        int nWords  = 4693;
        for(int i=0; i<nWords; i++ ) {  //seperate strings of words into their own list
            cacheList.add("word_" + i);
        }

        //STEP 3: build a dual-tree tsne to use later
        System.out.println("Build model....");
        val tsne = new BarnesHutTsne.Builder()
                .setMaxIter(iterations)
                .theta(0.5)
                .normalize(false)
                .learningRate(1000)
                .useAdaGrad(false)
                //.usePca(false)
                .build();

        //STEP 4: establish the tsne values and save them to a file
        System.out.println("Store TSNE Coordinates for Plotting....");
        val outputFile = "tsne.csv";

        System.out.println("fit");
        INDArray weights = Nd4j.rand(new int[]{nWords, 300});
        tsne.fit(weights);

        System.out.println("save");
        tsne.saveAsFile(cacheList, outputFile);
    }

@newinai
Copy link
Author

newinai commented Aug 6, 2018

So, after some checks/tests, i have found the problem:

I opened the dictionnary file and the second line was the "stop word" with only 0.0 values ( 0.0 0.0 .... 0.0)
Removing this line execute the code without any problem.

You can reproduce it easily :)

@AlexDBlack
Copy link
Contributor

AlexDBlack commented Aug 6, 2018

Thanks for reporting. Fixed here - code now throws a useful exception.
#6094

Note that cosine similarity (default distance metric) is undefined if one of the args are all zeros.

@lock
Copy link

lock bot commented Sep 21, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Sep 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants