Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Genome patching effect on alignment #673
I'm creating this issue not to report a problem, but to get your opinion on an issue that I could not get any idea on by searching. We use the
Additionally, you specify the GENCODE gene set as one of the gene models you recommend. If you go to the most current release of the GENCODE gene model (v30 at the time of writing), you will see that the gene set is based on
Keeping in mind that the coordinates remain backward compatible between patches but the underlying sequence may change, do you have thoughts on what effect (if any) might occur because the apparent mismatch between the
As far as I can tell, the community is not overly concerned:
But before moving forward, I wanted to get your thoughts. We're considering using the latest GENCODE release for our pipeline (v30), so it's quite a bit more divergent from the original GRCh38 release than versions 22 and 24.
if the patches substantially change the sequence in the exon of expressed genes, both gene expression and splice junction detection may be affected. This is probably not easy to quantify, the simplest way may be to map a few samples to both p0 and p12 and compare the results.
For consistency sake, I think the best approach is to use the GENCODE GTF and FASTA (i.e. p12 for v30).
Thank you for your reply, Dr. Dobin. My feeling is that it would be most consistent to use the supplied FASTA file as you say. However, I do think most of our users would probably prefer the benefits afforded to them in the no alt analysis set.
We will do an analysis on the effect of this discordance and either (a) use the latest GENCODE gene set if the differences are small or (b) roll back to the latest gene set based on patch 0 of