Skip to content
Cedric LACZNY edited this page Dec 1, 2014 · 26 revisions

#Welcome to the VizBin wiki!

General information

You can find information on the usage of VizBin but also on setting up a minimal build environment.

Using efficient version of PCA

We integrate Principle Component Analysis via MTJ as well as EJML. In our experience, the MTJ-based version is faster and requires less memory. Thus, the MTJ-based version is the default. It should be noted that it is based netlib-java (i.e., on so-called "native code") and therefore might require the installation of dependencies to exhibit it's maximal performance (s.a. MTJ - Machine Optimised System Libraries). Without these dependencies installed, the PCA step will simply run (much) slower and might consume more memory.

Mac OS X

No installation is required (s.a. here).

Linux

Running sudo apt-get install libatlas3-base libopenblas-base should be enough to install the necessary dependencies for executing the "native code".

Windows

Unfortunately, the installation is not that easy under Windows. However, all necessary dependencies can be downloaded from http://icl.cs.utk.edu/lapack-for-windows/lapack/. Detailed instructions will follow.

Out-of-memory, or how to increase Java's heap size

Should VizBin notify you about too little available memory and restart, this is usually related to the fact that the Java VirtualMachine has too little heap memory available. In this case, you have to let Java know that it can reserve more heap memory by running:

java -jar -Xmx3g VizBin-dist.jar

in the console, where -Xmx3g informs Java that it should reserve 3GB of memory. This works for Linux and Mac OS X and instructions for Windows will soon follow. There are several reasons possible for why this can occur. It might be that you have a large amount of sequences and/or very long sequences. While the latter is a less frequent problem, a large amount of sequences is usually the reason why an Out-of-memory error occurs. The majority of the sequences may by relatively short (i.e., much smaller than 1,000 nt) in such cases. It can then help to remove the short sequences using other means (e.g., pullseq) before loading them into VizBin.

Alternatively, through clicking on Show additional options, a smaller k-mer size (i.e., Kmer length: 4) can be set which, in turn, can possibly help to mitigate Out-of-memory errors. However, this is only suggested when your dataset is relatively large, i.e., contains around or above 50,000 - 100,000 sequences. This may be used in combination with the -Xmx option above for very large datasets.

Please also check if you have selected "MTJ" as the PCA library and installed the dependencies to make use of the "native code" (s.a. above).

Save/Load embeddings

Documentation on how to save/load embeddings is provided here.

A future version will provide a pretty-much-one-click solution for loading/saving a run (s.a. Issue #11).

Creating screenshots

Creating an image of the actual visualization is also easy with VizBin. A tutorial is given in the screenshot documentation.