-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample order matters? #222
Comments
Hi @louislamont Could you run it again with '--no-temp-splicesite' option? |
Thanks for the reply. When running with the --no-temp-splicesite option, I get the same alignment regardless of file order. Here is the summary output from both scripts:
I am using HISAT2 as part of the Stringtie lncRNA discovery pipeline... Do you have any recommendations on whether or not to include the --no-temp-splicesite flag? |
Hi, I run hisat2 with both --no-temp-splicesite and --novel-splicesite-outfile options, and I notice that the reads order is important again. I use a different test design than @louislamont. I have one fastq file with 2 mln single-end reads of different sizes (from 14 to 64 bps) and split it into two: top 1 mln reads file and bottom 1 mln reads file. Then I run hisat2 v2.2.1 on these three files (--reorder option on) and compare the output for the individual reads. Top 1 mln reads map identically if processed as a separate file or as the beginning of a larger library. However, bottom 1 mln reads mapping seems to depend on splicing annotation of the upstream processed reads. I compare four commands:
To give you some more numbers, if I compare the output of --no-softclip with and without --novel-splicesite-outfile, it produces around 0.43% of different mapping annotations. Do you recommend using --novel-splicesite-outfile with --no-temp-splicesite? How do I report splicing sites and make sure that I do not depend on the order of the reads, and upstream processed reads in particular?
|
@agalitsyna Currently, there is no way to generate same result regardless of the order of input reads with using the --novel-splicesite-outfile option. |
Hi, I noticed some strange behavior when running HISAT2 (v 2.1.0) on many files at once. Some background: The data I'm working with are from 2 flow cells (fcA/B), 4 lanes each(L001/2/3/4), paired end, so we have 16 files per sample (e.g., sample1.L001.fcA.R1.fq, sample1.L001.fcA.R2.fq ... sample1.L004.fcB.R1.fq, sample1.L004.fcB.R2.fq). The issue: When setting up the HISAT2 pipeline, I noticed differences in number of aligned reads depending on the order that the fq files for a given sample were supplied to HISAT2 (meaning, when I set up the script manually, samples were entered by flowcell (L001.fcA, L002.fcA, etc). Using bash scripting provided the same samples, but ordered alphabetically - L001.fcA, L001.fcB, etc.). Here are the two summary files that were produced. By flowcell:
By lane:
I set up two scripts that explicitly gave the same sets of reads in different order and was able to reproduce the results. When I used featureCounts to get counts from each bam file, they were pretty close to one another (only 3191/60623 genes had different counts), but it seemed odd to me that specifying the same files in two different orders would lead to differences in alignment and counts. Has anyone else noticed anything like this? Am I doing something wrong, is this a bug, or is HISAT functioning properly? Does this post even make sense?
Here are the commands I ran. Samples ordered by flowcell:
Samples ordered by lane:
The text was updated successfully, but these errors were encountered: