Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Reads Can Map Entirely Beyond Reference Sequence End #48
Comments
|
That's strange. We have to be able to reproduce before we can investigate, though. I know your attempt to find a minimal example wasn't successful, but can you share any example (reads+reference) where this happens? |
|
Is it possible you have multiple sequences with the same name, and that that is confusing IGV? |
DarioS
commented
Apr 26, 2017
|
No, it's recorded like that in the SAM file. What is strange about the FASTA reference file is that there are entries with different names but identical nucleotide sequence present. The alleles have only the protein coding region (CDS) sequence recorded, but the difference between two alleles may biologically occur in the UTRs. So, they end up with different allele IDs but identical nucleotide sequences. Could |
|
Can you share any example (reads+reference) where this happens? |
DarioS
commented
Apr 27, 2017
|
Yes. Step 1: Download reference file from ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla_nuc.fasta Generate a Bowtie index for it. |
BenLangmead
assigned BenLangmead and ch4rr0 and unassigned BenLangmead
May 2, 2017
|
Hello, I am looking into this issue. The link provided is taking me to a website that requires registration. Would it be possible to share the FASTQ file on dropbox? |
DarioS
commented
Jun 6, 2017
|
I changed the permissions to make it publicly accessible. Please click the link again. |
DarioS commentedApr 24, 2017
I have a set of thousands of reference sequences with high sequence similarity (i.e. alleles of HLA genes). I notice that
bowtiesometimes maps reads beyond the ends of a small number of reference sequences. If I make a minimal example with only one reference sequence and one pair of reads that were mapped beyond the boundary of that particular reference sequence, thenbowtiedoesn't align the read pair to the reference sequence it mistakenly did before (the read is reported as unaligned). I used the commandbowtie -v 0 -a -S indexes/IMGT-HLA/hla -1 R1.fq -2 R2.fq test.sam. I used version 1.2 downloaded from the website which is pre-compiled for Linux.