-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All transcripts/genes ineligible? #64
Comments
Hi Brett! Right off the bat, I can see that the transcript IDs in the data don't match the IDs in the annotation. RATs should have given you a warning or error about that? Did you force it through? Your As stated in its own section of the input vignette, the set of transcript IDs in the annotation look-up table must match exactly the set of transcript IDs in the data, so that RATs knows how to group transcripts. If the set of IDs is different between the two, RATs can't match the data to the annotation. So you get 0 counts for all the IDs in the annotation, while IDs in the data not found in the annotation lack grouping information and are simply ignored and lost. You'll have to either clean up the IDs in the data (before importing it, otherwise it seems I hope this helps, good luck! Kimon |
No warning, I did not force it through
Yes I thought that was suspicious too, but I geuss not suspicious enough 👍
Ok great! Sorry I did not see that section and issue #40 until after posting. Really sorry about that.
So you are talking about parsing the abundance.tsv file and converting ie:
to just include the ENST id:
for every row? If so that is a pretty straight forward fix.
Thank you, I really appreciate the feedback and help |
Yep, that should do it. :) I am however concerned that you did not get a warning about this. RATs should have sipmly not gone through with the run at all. Maybe the ID check is not as thorough as I thought it was. |
Ok I will give this a shot tonight. You can probably close this issue, but if you want I can come back with a confirmation that this was the solution once I get a chance to try this out. |
On second thought, no, a bit more complicated than that. You'll have to go into the abundance.h5, not the .tsv
The tsv does not contain the bootstraps, only the averages.
For the amount of time it could take to figure out the h5 format it may be simpler to re-run kallisto instead, ensuring the IDs in the gtf are clean from the start. Or to create the lookup table from the same gtf used for kallisto, keeping the long format throughout.
On 14 Feb 2019 00:32, Brett Vanderwerff <notifications@github.com> wrote:
Ok I will give this a shot tonight. You can probably close this issue, but if you want I can come back with a confirmation that this was the solution once I get a chance to try this out.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<#64 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARTPOoDRHwpW7Di2Xayqi09mcf6ljWb2ks5vNKCXgaJpZM4a6W4B>.
|
I ended up just changing changing the h5 files. It wasn't too bad and I learned more about the format. I was able to generate some plots after that with RATS and things look good so far. Thank you again. I am working with a gene that has many different isoforms so this tool is very interesting to me thank you for making RATS and maintaining it. Go ahead and close this if you want. |
Great! Glad it is working for you now. Thank you for using RATs. |
Hi,
I am having some issues, but would like to try and give as much information as possible about my workflow.
I am doing pseudo alignment with kallisto. I get the transcriptome from GENCODE by following the "transcript sequences" (CHR regions) link under fasta files heading in https://www.gencodegenes.org/human/:
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.transcripts.fa.gz
I build the rna index for kallisto like so:
./kallisto/kallisto index -i ./transcriptome/rna_index_gencode ./transcriptome/gencode.v29.transcripts.fa.gz
I then run kallisto to do pseudo alignment on paired end fastq files similar to the code shown below:
The files are actually a subset from this dataset: https://www.ebi.ac.uk/ena/data/view/PRJNA347513
I then run a little script with RATS to try things out. I use an annotation file from GENCODE by following the "Comprehensive gene annotation" (CHR regions) link under GTF/GFF3 heading in https://www.gencodegenes.org/human/:
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz
and run the script:
The strange thing is that after running
print( dtu_summary(mydtu) )
this I get:It seems strange that all of the transc/genes are ineligible. I'm not sure if this is a bug or just a negative result with my data.
this is what myannot looks like:
this is what mydtu looks like:
this is what mydata$boot_data_A looks like:
mydata$boot_data_B:
sesion info:
I have also tried running kalliso by using ensembls cDNA
file: ftp://ftp.ensembl.org/pub/release-95/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
then running either of these annotation files, but got a similar result:
ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf.gz
ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/Homo_sapiens.GRCh38.95.gtf.gz
Sorry to bother you or if I have missed something obvious but I am pretty interested in your method and would really like to see this work.
The text was updated successfully, but these errors were encountered: