Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using labels in combination with length filter seems to cause problems #3

Closed
claczny opened this issue Sep 25, 2014 · 2 comments
Closed
Assignees
Labels

Comments

@claczny
Copy link
Owner

claczny commented Sep 25, 2014

When I take a fasta file that contains sequences that are below the length threshold (e.g., 1 knt) and want to use a matching labels file, I get the following error:

2014-09-25 16:56:35,995 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE: Fitting performed in 0.00 seconds.
2014-09-25 16:56:36,426 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE: Wrote the 60079 x 2 data matrix successfully!
2014-09-25 16:56:36,426 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE:
2014-09-25 16:56:36,525 DEBUG [Thread-0] (ProcessInput.java:88) - Points created.
java.lang.IndexOutOfBoundsException: Index: 60079, Size: 60079
    at java.util.ArrayList.rangeCheck(ArrayList.java:638)
    at java.util.ArrayList.get(ArrayList.java:414)
    at lcsb.vizbin.service.DataSetFactory.createDataSetFromPointFile(DataSetFactory.java:75)
    at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:184)
2014-09-25 16:56:36,759 DEBUG [Thread-0] (ProcessInput.java:88) - Error! Check the logs.
2014-09-25 16:56:36,761 ERROR [Thread-0] (ProcessInput.java:250) - Index: 60079, Size: 60079
java.lang.IndexOutOfBoundsException: Index: 60079, Size: 60079
    at java.util.ArrayList.rangeCheck(ArrayList.java:638)
    at java.util.ArrayList.get(ArrayList.java:414)
    at lcsb.vizbin.service.DataSetFactory.createDataSetFromPointFile(DataSetFactory.java:75)
    at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:184)

When I do the length filtering before and include a matching labels file, VizBin runs through.
Probably that's a bug in connecting the labels with the sequences where the labels are not properly matched to the length-selected sequences.

@claczny claczny added the bug label Sep 25, 2014
piotr-gawron added a commit that referenced this issue Apr 16, 2015
@fwhelan
Copy link

fwhelan commented Aug 25, 2015

I might be running into this issue as well:

grep ">" genus_idtxt.fna | wc -l
20282
wc -l genus_idtxt.ann
20283

less confirms that genus_idtxt.ann's first line is label; each following line is the name of the bin that corresponds to the fna file.
When I load fna and ann as the File to visualize and Annotation file, respectively, I do not get an error but the annotation file is ignored (all dots blue). I do have some contigs < 1000. When I restart VizBin and change the minimal contig length to 10, then everything goes through as expected.

@claczny
Copy link
Owner Author

claczny commented Aug 26, 2015

Thank you for confirming this 👍

It is easy to workaround this issue by ensuring that the input is all >= 1,000 nt and that the annotation file matches that.
Although the size filtering integrated into VizBin becomes ineffective then.
However, this integrated filtering is IMO oftentimes a very convenient feature ;)
Yet, if the user already uses/creates an annotation file, size-filtering the sequences before using VizBin shouldn't be an issue.
Nevertheless, it is inconvenient and will be corrected in a future release. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants