Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cufflinks incompatibility to STAR output #1012

Closed
schelhorn opened this issue Sep 14, 2015 · 9 comments
Closed

Cufflinks incompatibility to STAR output #1012

schelhorn opened this issue Sep 14, 2015 · 9 comments

Comments

@schelhorn
Copy link

There seems to be an acknowledged incompatibility of the current cufflinks 2.2.1 that makes it incompatible to certain (standard-compliant) SAM inputs. Specifically, one has to expect the critical Error (GFaSeqGet): subsequence cannot be larger than 16569 from cufflinks if bias modeling is turned on (as it should). This error results from the inability of cufflinks to model softclipped bases that extend over the end of the chromosome (a circumstance that is not forbidden in the SAM specs). Since STAR sometimes generates such outputs if a read best fits to the chromosomal end (this can happen in general and specifically in cancer genomes due to chromosomal rearrangements), there are two ways to post-process STAR BAMs to cut off the ends and make cufflinks eat the input file. Still, I would like to avoid generating yet another copy of a BAM file just for cufflinks in bcbio (since other quantitation methods deal with softclipped bases just fine), so that's why I am escalating the bug to cufflinks and tracking the completion here.

@schelhorn
Copy link
Author

This issue likely has been fixed in a development release upstream. I will test it using the next data set I am running. I expect that the new cufflinks should be incorporated into the biolinux distribution mechanism at some point, which will resolve this issue completely.

@roryk
Copy link
Collaborator

roryk commented Sep 29, 2015

Thanks @schelhorn for keeping an eye on this.

@csardas
Copy link

csardas commented Jul 11, 2016

Seems the bug was fixed and committed in development version of cufflinks. Is there anyway to replace current one in bcbio? or just embed the awk script mentioned above? just generate a special version bam file for cufflinks only, other tools use the original bam file?

@roryk
Copy link
Collaborator

roryk commented Jul 11, 2016

Hi @csardas,

Are you seeing this same issue?

@csardas
Copy link

csardas commented Jul 11, 2016

Yes, I got same error message like " Error (GFaSeqGet): subsequence cannot be larger than 16569" in some cases (not every case)

@roryk
Copy link
Collaborator

roryk commented Jul 12, 2016

Ok-- I think there is a flag we can set in STAR that will make Cufflinks compatible files. What are you using Cufflinks for? We've been thinking about deprecating it for a while.

@csardas
Copy link

csardas commented Jul 12, 2016

I want to enable transcript-assemble function in RNAseq pipeline to find out novel transcript

@roryk
Copy link
Collaborator

roryk commented Jul 12, 2016

Hi @csardas,

Gotcha. There has been some buzzing on the Cufflinks issue list of them releasing a fixed version soon. In the meantime, if you set the transcript_assembler to stringtie, that should work.

@csardas
Copy link

csardas commented Jul 12, 2016

OK, I will try it. Thanks for response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants