-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trouble running the pipeline for a bacterial species #99
Comments
Hm. What's causing your specific error is this:
I'm not sure what your GTF file looks like, but this must work (for
|
Thank you Charlotte, sorry for the late reply, was on holidays last week. I got the gtf from NCBI (tried a GenBank and RefSeq assembly), the gtf looks reasonably standard, but guess not. Will look into the details this week and report about any possible solutions I can find. We are interested in using the pipeline with plant genomes as well (not all in Emsembl), but if there's too much trouble making adjustments, we will probably look for other options. |
So the issue indeed seems to be that the GTF file can not be read with |
I haven’t seen this one yet. Let me know if there are ways I can help. |
I've now tried to use an Ensembl genome annotation of a parent strain (with more genes) available in gff3, which I transformed into gtf before running. This has lead me to a completely different error: Verifying validity of the information in the database: which I am currently interpretting is caused by a loss of gene-level data when going from the gff3 to gtf. Here's the gtf file, with only exon and CDS data: So, not solution yet, but working on it. |
Did you requantify the data with the new annotation? It looks like the transcript IDs are not matching. |
As for the original GTF file, somehow all the transcripts with annotated CDSs are called "unknown_transcript_1" (regardless of the associated gene), which seems a bit strange. |
yes, the original gtf really is strange. yes, I did requantify (i always delete all outputs and the salmon and star indexes, then run the pipeline on a small part of my dataset, my understanding is this forces requantification), this should not be the reason for transcript ids not matching, i think |
Could you post just the first lines of one of the quant.sf files from Salmon? |
Name Length EffectiveLength TPM NumReads |
Right. So the transcript IDs in the GTF file are of the form |
I see, here's where my utter lack of understanding of how gtf files work let me down. So, if I just replace the form in the GTF file, it should work. I will try later today, thanks! |
Ran the whole thing over night, and the salmon to edgeR part is solved now, only the shiny part now fails. But, since this is another topic, I'll try and figure it out myself, but if it does not work, will open a new issue. I have a much better idea now of what to look out for when using ARMOR for non-model organisms, thanks for all your help!!! |
Hi,
after successfully running the pipeline for human and zebrafish, I've encountered some problems with a bacterial species. After going through (hopefully all) issues here that were already encountered and solved on github, I can't really find a solution for this one, so maybe you can help?
The error occurs at tximeta, here's an error Output, apparently there is a problem with .makesplicing. Any ideas on what may be causing this?
best
Anze
The text was updated successfully, but these errors were encountered: