-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QDNAseq.hg38: Bin annotation for hg38? #59
Comments
Ditto above-- would like to use QDNAseq with hg38 aligned data. I'd be happy to build references if somebody wants to provide comment or recipe on what's required to do so. |
Regarding creating hg38 bin annotations: Section 'Generating bin annotations' of the 'Introduction to QDNAseq' vignette describes how to build the bin annotation data. Some quick comments: It looks pretty forward as soon as one identified exactly which genome reference files to download. It appears that it is only the last step on 'median residuals of the LOESS fit' that requires some serious modeling (done on a set of 1000 Genome Project reference samples). It would be awesome if someone could:
BTW, @lbeltrame mentions in #57 (comment) they have built these based on "1000G fastq files aligned against hg38". |
I need PE150 hg38 annotations for QDNAseq, so I've had a go at reproducing the hg19 SR50. It's not quite the same; think because my version of bwa is slightly different, but vv good correlation in the residuals (Pearsons 0.994). Everything else is identical. Will report back on the hg38 attempt - may not be able to share as I'm basing the LOESS residuals on a set of downsampled WGS normals from an internal project to keep them as closely matched as possible to my samples. |
Thank you. I think it would also be useful if you shared how you re-generated the hg19 SR50 annotations. Knowing exactly which URLs you used to download reference annotation data will help others who might attempt doing the exact thing. Feel free to cut'n'paste your commnds/scripts here. It will also be valuable because it might help us to re-identify exactly which source data was used in the QDNAseq.hg19 data, cf. #80 (comment). |
Script submit_process_1kg_hg19_samples.sh
Generate sample list from sample/file map and run processing steps.
|
38 1000G samples, 241 single end FASTQ files for each, reproducible by following step 1 above.
|
Thank you very much for this |
The hg38 PE150 bin generation worked fine following the same recipe as for hg19, but with my own internal set of control BAMs downsampled to 3.5X. I've attached plots comparing the stats - mappability of course is going to be quite different, but the G+C and residuals should and do look similar. Also a similar proportion of usable bins. I didn't compare the blacklisting, happy to trust the input resource files on that.
hg19_SR50_vs_hg38_PE150_qqplot_mappability.pdf |
Are you planning to provide the annotations for hg38 in a package similar to QDNAseq.hg19? |
QDNAseq.hg38 -Would be really helpful, please? |
I have hg38 PE150 5kbp and 10kbp bins generated. I'll have to check with the PI about the normal samples used for the residual generation step, as they are from a cancer WGS study, but yes, technically I could share these. It wasn't at all difficult to generate the bins though, just took time and the right set of normal samples. |
Any news on this? Will you be able to share the hg38 bin annotations? |
Unfortunately no, the information governance for the samples I used for the panel is quite strict.
…--
Alison Meynert, IGMM Bioinformatics Analysis Core Manager
MRC Human Genetics Unit
MRC Institute of Genetics and Molecular Medicine
The University of Edinburgh
Western General Hospital
Crewe Road, Edinburgh EH4 2XU
E: alison.meynert@igmm.ed.ac.uk<mailto:alison.meynert@igmm.ed.ac.uk>
T: +44 (0) 131 651 8549
W: www.ed.ac.uk/mrc-human-genetics-unit<http://www.ed.ac.uk/mrc-human-genetics-unit>
From: Michael <notifications@github.com>
Reply-To: ccagc/QDNAseq <reply@reply.github.com>
Date: Tuesday, 3 December 2019 at 13:45
To: ccagc/QDNAseq <QDNAseq@noreply.github.com>
Cc: MEYNERT Alison <alison.meynert@igmm.ed.ac.uk>, Comment <comment@noreply.github.com>
Subject: Re: [ccagc/QDNAseq] QDNAseq.hg38: Bin annotation for hg38? (#59)
Any news on this? Will you be able to share the hg38 bin annotations?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#59?email_source=notifications&email_token=ABTPEA77RWH46RMCV545G4DQWZPFFA5CNFSM4GSO3LSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFZNLUY#issuecomment-561173971>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABTPEA75BSDKVUI4KTD5JQTQWZPFFANCNFSM4GSO3LSA>.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
|
The example as shown in #59 (comment): Step 4 generation of bins needs R3.4.4 max due to the package BSgenome.Hsapiens.UCSC.hg19 /BSgenome.Hsapiens.UCSC.hg38 only being released for bioconductor3.6/R3.4.4 or so. my resources (hg 38 poorly tested) should fail : |
Is there any chance to get the bin annotations for ref genome hg38? I'm using @ameynert steps #59 (comment) but it is taking time. |
I've created a package Install it from here https://github.com/asntech/QDNAseq.hg38 |
Hi I'm trying to create a bin size 200kb, I'm following the steps, but how do you get from |
@sxu922 I still have the processed bams and other annotations and happy to create 200kb and add to the package. This is because I want to make sure all the bins are computationally reproducible. |
Wow thank you so much for the reply! I didn't expect to get a response so fast! I created the 200kb bin and figured out that I can just read the rds file and use that as the bin. No need to create 200kb for me. Still, thank you so much! |
Hi Aziz, Thank you |
I have this query too. Did you manage to run QDNAseq.hg38 for your 150bp reads? |
Hi, I'm analyzing data aligned to GRCh38 using QDNAseq.hg38. The system works with no errors, however, when I run How does this impact the re segmentation and calls? What function pulls the centromere coordinates? is there a way to change the assembly for the centromere positions of GRCh38 or T2T? e.g. either passing an argument or automatically retrieving the argument from the Please note that the Thank you, any help is much appreciated,
|
It seems there isn't really a PE100 experiment type in the package. When I tried to use it I got following error: QDNAseq bin annotation for 100 or 150bp paired end reads will be very much appreciated if available. |
Dear Alison @ameynert, Bests, |
Hi @kandabarau unfortunately the project that the samples came from did not allow for sharing even the derived data. |
asntech/QDNAseq.hg38#2 same issue different flavour. Maybe use a single end for analysis and alignment and then using the workflow adapted to a SE150 bp one in the asntech/QDNAseq.hg38 repo might work. |
I was thinking about this. Unfortunately https://github.com/asntech/QDNAseq.hg38 has no SE150 but SE50 (those 38 1kg samples I could find - have PE100 maximum). I doubt I do not harm the performance if I crop my reads to SE50. Do I? Anyway thank you! Bests, |
Some regions will likely drop off due to lack of mappability (see post above hg19_SR50_vs_hg38_PE150_qqplot_mappability.pdf). Note that the control residuals are probably used as a weight in the normalization of the test dataset. This why you will likely prefer your own control samples or similarly prepped dataset to improve the fit (not an expert on how this works). |
thank you @mmterpstra , does not ease my life but it was important to hear an opinion from someone who did it already. How do you think - how important is downsampling of control WGS? If both control and test WGS are PCR-free - how important is to keep their depths comparable? Bests, fyi @Overnightology |
Hi everyone, I want use on my T2T CHM13 v2 genome. Can I use hg38, or need the T2T annotation? Is there any ready package for T2T annotation? Thank you! Best, |
Hi @liuyangzzu, you'll need to use the T2T b/c the coordinates aren't the same. I've put together T2T annotations and recently uploaded the repo. At the moment the subset of 1000 genomes for the residuals estimation is still processing, however, the repo is here if you want to begin using it for development. I'll be updating the residuals once the 1000 genomes data is processed. |
Great! Looking forward to your updates for T2T genome. I will check the repo also. Thank you! |
Hi.
I am trying to use QDNAseq for hg38 and thus need to use hg38 rds file. Any suggestion how to create the hg38 file and how to use it?
Thank you very much in advance.
H.
The text was updated successfully, but these errors were encountered: