New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STAR-Fusion Requires Missing File in CTAT Resource #15
Comments
Hi The various index files are very large (including the STAR index), and some best, ~b On Wed, Oct 12, 2016 at 4:00 AM, DarioS notifications@github.com wrote:
Brian J. Haas |
The first step of preparation is creating indices for STAR with
where as your preparation script has a read length option (default 100) and passes it unmodified to STAR. Perhaps Although distributing the large STAR index would cause the download to become too large, the indexes of the BLAST pairs are only 42 MB. The BLAST result, which is already included in the download, is 174 MB, so including those indexes would only increase the download modestly. |
I agree that the .idx file should be easily distributable. Note, it only The .idx file is generated using DB_file (berkeley DB) and this is highly cheers, ~b On Thu, Oct 13, 2016 at 4:00 AM, DarioS notifications@github.com wrote:
Brian J. Haas |
I see. Is it best to set |
probably. I'll check with Alex about this again. The default setting On Thu, Oct 13, 2016 at 7:00 AM, DarioS notifications@github.com wrote:
Brian J. Haas |
Brian, I asked Alex about this years ago, and your intuition is correct. https://groups.google.com/forum/#!searchin/rna-star/blachly%7Csort:relevance/rna-star/h9oh10UlvhI/WqaVMukuGDMJ https://groups.google.com/forum/#!searchin/rna-star/blachly|sort:relevance/rna-star/h9oh10UlvhI/WqaVMukuGDMJ Bottom line, if you want only to have one index then make the sjdbOverhang one less than your longest expected read length. There is only a minor speed penalty for this, whereas if it is too short (e.g 49 when you have 75 or 100 nt reads) you face the prospect of false negatives (missed mappings). So, if you are going to distribute an index or build only once, I would recommend 149 or 150 (in this era of 2x150 RNA-seq; note that in the thread Alex said sjdb 100 was fine) James Blachly
|
Thanks, James! That was the discussion that I was very vaguely remembering. much appreciated! ~brian On Thu, Oct 13, 2016 at 2:19 PM, James S Blachly, MD <
Brian J. Haas |
Hi, It starts by generating the star index files which I have done already once for getting star alignments and it stops there. Thanks, |
Hi,
Are you using the 'source_data' that we provide for building the repo, or
are you using your own?
Also, are you using the very latest software release?
Finally, are there error messages being generated from the failed process?
best,
~b
…On Wed, Mar 7, 2018 at 4:20 PM, arnavaz ***@***.***> wrote:
Hi,
When I follow the instructions on fusion filter wiki page for making the
index files and run the command it takes forever to run and it eventually
crashes bc of memory. The command I am running is:/
/FusionFilter-master/prep_genome_lib.pl --genome_fa $REF --gtf $GTF
--pfam_db PFAM.domtblout.dat.gz --CPU 10
It starts by generating the star index files which I have done already
once for getting star alignments and it stops there.
Can you tell me what is wrong with what I am doing?
Thanks,
Arna
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHMVX0cqnTB8jloHYEebTyOAJqxnkLfgks5tcE8fgaJpZM4KUSYk>
.
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Thanks for your prompt response. And I ran the command one more time increasing memory to 60G but still got the following error -found STAR at /cluster/tools/software/star/2.4.2a/STAR -found makeblastdb at /cluster/tools/software/blast+/2.2.30/bin/makeblastdb -found blastn at /cluster/tools/software/blast+/2.2.30/bin/blastn -- Skipping CMD: cp /cluster/tools/data/genomes/human/hg38/iGenomes/Sequence/WholeGenomeFasta/genome.fa /cluster/projects/pughlab/projects/Neurofibroma/CTAT_resource_lib/GRCh38_v27_CTAT_lib_Feb092018/ctat_genome_lib_build_dir/ref_genome.fa, checkpoint [/cluster/projects/pughlab/projects/Neurofibroma/CTAT_resource_lib/GRCh38_v27_CTAT_lib_Feb092018/ctat_genome_lib_build_dir/__chkpts/ref_genome.fa.ok] exists.
|
Additionally, in the above thread you indicate that this should take a minute or so. But it takes for a few hours to get the above error message.. |
Looks like you're missing another module here:
Can't locate PerlIO/gzip.pm in @inc <https://github.com/inc> (you may need
to install the PerlIO::gzip module)
As you can see, though, the system will resume where it left off.
If you're running the tutorial data set through, then it should only take a
small amount of time as indicated. If you're running this with the full
source data (for use beyond the tutorial data), then it will take a
while... hours, easily, depending on your setup.
…On Thu, Mar 8, 2018 at 10:37 AM, arnavaz ***@***.***> wrote:
Additionally, in the above thread you indicate that this should take a
minute or so. But it takes for a few hours to get the above error message..
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHMVX1te8mk9kJSpn-egQf5-7vcD8rzFks5tcVApgaJpZM4KUSYk>
.
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
In case it's helpful, this is what you need wrt perl modules:
A typical perl module installation may involve:
perl -MCPAN -e shell
install DB_File
install URI::Escape
install Set::IntervalTree
install Carp::Assert
install JSON::XS
install PerlIO::gzip
If there are others that crop up, I can extend the list.
best,
~b
On Thu, Mar 8, 2018 at 10:45 AM, Brian Haas <bhaas@broadinstitute.org>
wrote:
… Looks like you're missing another module here:
Can't locate PerlIO/gzip.pm in @inc <https://github.com/inc> (you may
need to install the PerlIO::gzip module)
As you can see, though, the system will resume where it left off.
If you're running the tutorial data set through, then it should only take
a small amount of time as indicated. If you're running this with the full
source data (for use beyond the tutorial data), then it will take a
while... hours, easily, depending on your setup.
On Thu, Mar 8, 2018 at 10:37 AM, arnavaz ***@***.***> wrote:
> Additionally, in the above thread you indicate that this should take a
> minute or so. But it takes for a few hours to get the above error message..
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#15 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AHMVX1te8mk9kJSpn-egQf5-7vcD8rzFks5tcVApgaJpZM4KUSYk>
> .
>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Thanks for your help! I will install those modules. |
I ran the form of STAR-Fusion that uses existing STAR outputs. It runs for a couple of minutes, outputs candidate fusions, then terminates showing an error looking for an index file:
Indeed, there is no index provided by the downloaded resource:
Perhaps the resources should be updated to contain all necessary files for fusion finding.
The text was updated successfully, but these errors were encountered: