Help and Frequently Asked Questions
Table of Contents
- Common Install Issues
- Extending MAVIS
- Interpreting the Output
Common Install Issues
Outdated Versions of setuptools and/or pip
Setuptools and pip have changed a lot so older versions often have trouble installing all the required packages. If you see an error like any of the below
pkg_resources.VersionConflict: (pip 8.1.1 ... Requirement.parse('pip>=9.0.0'))
ImportError: 'module' object has no attribute 'check_specifier'
error: Setup script exited with Missing required dependency NumPy (Numerical Python)
It is likely that pip and/or setuptools should be upgraded. Please upgrade before installing mavis either to the latest versions
pip install --upgrade pip setuptools
or to exact version numbers
pip install pip==9.0.0 setuptools=36.0.0
Could not find Library
This indicates you are missing a shared library dependency on your system that a package requires. You will need to install the required library using your system package manager (i.e. apt-get) before continuing with the install
Issues with Buildout and virtual environments
Buildout can have issues if you have already installed mavis in the python virtual environment. If you see an error like the following:
TypeError: can't pickle zipimport.zipimporter objects
You will need to use a clean virtualenv and run the buildout before installing the mavis using pip.
How can I use an unsupported tool?
MAVIS supports a lot of SV Callers natively meaning that it can read their output files directly using built-in conversion options. However tools are evolving and being created constantly. To allow the user to stay up-to-date with the latest tools MAVIS defines a standard input file. Using this standard input format means that users who wish to use an unsupported input file type do not need to edit the MAVIS code base to do so.
What to do when the target tool doesn’t output all the necessary information?
Don’t worry, this is the case for a lot of tools. MAVIS accepts unknowns for this reason for some of the required columns. These unknown or not-specified values are then expanded to all possible combinations during the clustering step. For this reason it is sometimes helpful to use the tracking_id column to track your calls through the MAVIS pipeline.
See Writing A Custom Conversion Script for more details
How to build the reference annotations input file?
Instructions on generating annotations from ensembl can be found on the reference annotations page. It is also possible to use other sources/databases for the annotations but is left up to the user to convert them to the expected format. See an example here.
Interpreting the Output
How can I track my SV calls?
There are a lot of steps to the MAVIS pipeline and calls may be collapsed or expanded throughout the process. To ensure you can trace your original calls through to the final output, MAVIS uses UUID identifiers assigned at the clustering stage which are mapped to your original calls through the assignment mapping file which is output during clustering.
Why Does My Event Have 0 Spanning Reads?
When looking at the evidence columns of the MAVIS output files it is important to look first at the call_method column. Some evidence types do not apply to certain call methods so they will always be 0 or None. For example, if a small indel is called by contig, we would first look at the contig_remapped_reads column. This is the evidence that was used in determining whether or not to include the call in the output file.
If the break1_split_reads column is 0 and the call method is not by split reads it does not mean there is low evidence.
I See Split Reads in IGV, Why Does MAVIS Call 0 Split Reads?
If the event was called by contig for example, the breakpoint positions will be based on the alignment of the contig. Only split reads which exactly match this breakpoint will be given as evidence by split reads.
If the event is not an exact breakpoint call, only flanking evidence will be given
Why is the Breakpoint Called by MAVIS Different From What I See in IGV?
MAVIS normalizes the read alignments before calling events. This is especially important in repeat regions. Aligners like bwa mem align deletions to the start of a repeat span, whereas MAVIS follows the hgvs standard and aligns deletions to the end of a repeat span.
Does the aligner affect the results?
Although MAVIS does attempt to standardize alignments there will still be some difference in the coordinates of the final call set dependent on the aligner used to align putative contigs. Additionally the aligner used on the input bam will have a more significant impact as it will affect the reads collected in addition to the coordinates of all non-contig calls.
Can I run mavis locally?
Although the default pipeline builds scripts to submit to a compute cluster, The same scripts can also be run locally. They should be executed in the same order as the main submit script would submit them: validation, annotation, pairing, then summary. For example if your MAVIS output looked something like this
. |-- lib/ | |-- validate/.../submit.sh | `-- annotate/.../submit.sh |-- pairing/submit.sh `-- summary/submit.sh
you would run them in the following order
bash ./lib/validate/.../submit.sh; \ bash ./lib/annotate/.../submit.sh; \ bash ./pairing/submit.sh; \ bash ./summary/submit.sh
As MAVIS splits clusters into multiple jobs by default there may be many scripts to bash. If want to avoid this (and your machine has sufficient memory) then set
MAVIS_MAX_FILES=1 to restrict the number of jobs/cluster files to 1.