QDNAseq.hg38: Bin annotation for hg38? #59

nikleotide · 2019-01-25T21:49:05Z

Hi.
I am trying to use QDNAseq for hg38 and thus need to use hg38 rds file. Any suggestion how to create the hg38 file and how to use it?

Thank you very much in advance.
H.

AaronHechmer · 2019-07-30T13:57:14Z

Ditto above-- would like to use QDNAseq with hg38 aligned data. I'd be happy to build references if somebody wants to provide comment or recipe on what's required to do so.

HenrikBengtsson · 2019-09-17T15:52:07Z

Regarding creating hg38 bin annotations:

Section 'Generating bin annotations' of the 'Introduction to QDNAseq' vignette describes how to build the bin annotation data.

Some quick comments: It looks pretty forward as soon as one identified exactly which genome reference files to download. It appears that it is only the last step on 'median residuals of the LOESS fit' that requires some serious modeling (done on a set of 1000 Genome Project reference samples).

It would be awesome if someone could:

Reproduce the existing instruction for hg19 and validate that we can get numerically reproducible bin annotations as what's in the QDNAseq.hg19 package.
After confirming hg19 can be produced correctly, it should be "easier" to attack hg38

BTW, @lbeltrame mentions in #57 (comment) they have built these based on "1000G fastq files aligned against hg38".

ameynert · 2019-10-15T19:32:24Z

Reproduce the existing instruction for hg19 and validate that we can get numerically reproducible bin annotations as what's in the QDNAseq.hg19 package.

After confirming hg19 can be produced correctly, it should be "easier" to attack hg38

I need PE150 hg38 annotations for QDNAseq, so I've had a go at reproducing the hg19 SR50. It's not quite the same; think because my version of bwa is slightly different, but vv good correlation in the residuals (Pearsons 0.994). Everything else is identical. Will report back on the hg38 attempt - may not be able to share as I'm basing the LOESS residuals on a set of downsampled WGS normals from an internal project to keep them as closely matched as possible to my samples.

HenrikBengtsson · 2019-10-15T19:48:58Z

Thank you. I think it would also be useful if you shared how you re-generated the hg19 SR50 annotations. Knowing exactly which URLs you used to download reference annotation data will help others who might attempt doing the exact thing. Feel free to cut'n'paste your commnds/scripts here. It will also be valuable because it might help us to re-identify exactly which source data was used in the QDNAseq.hg19 data, cf. #80 (comment).

ameynert · 2019-10-15T21:10:05Z

Download 1000 Genomes files for baseline - following the instructions at http://bioconductor.org/packages/release/bioc/vignettes/QDNAseq/inst/doc/QDNAseq.pdf with modified top level URL

urlroot <- "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/historical_data/former_toplevel"
g1k <- read.table(file.path(urlroot, "sequence.index"), header=TRUE, sep="\t", as.is=TRUE, fill=TRUE)
g1k <- g1k[g1k$INSTRUMENT_PLATFORM == "ILLUMINA", ]
g1k <- g1k[g1k$ANALYSIS_GROUP == "low coverage", ]
g1k <- g1k[g1k$LIBRARY_LAYOUT == "SINGLE", ]
g1k <- g1k[g1k$WITHDRAWN == 0, ]
g1k <- g1k[!g1k$BASE_COUNT %in% c("not available", ""), ]
g1k$BASE_COUNT <- as.numeric(g1k$BASE_COUNT)
g1k$READ_COUNT <- as.integer(g1k$READ_COUNT)
g1k$readLength <- g1k$BASE_COUNT / g1k$READ_COUNT
g1k <- g1k[g1k$readLength > 50, ]
readCountPerSample <- aggregate(g1k$READ_COUNT, by=list(sample=g1k$SAMPLE_NAME), FUN=sum)
g1k <- g1k[g1k$SAMPLE_NAME %in% readCountPerSample$sample[readCountPerSample$x >= 1e6], ]
g1k$fileName <- basename(g1k$FASTQ_FILE)

seqroot = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/"
for (i in rownames(g1k)) { sourceFile <- file.path(seqroot, g1k[i, "FASTQ_FILE"])
 destFile <- g1k[i, "fileName"]
 if (!file.exists(destFile))
 download.file(sourceFile, destFile, mode="wb")
 }
x = subset(g1k, select=c(SAMPLE_NAME, fileName))
write.table(x, "sample_file_map.txt", col.names=F, quote=F, row.names=F, sep="\t")

Trim reads to 50bp with TrimGalore, align with bwa aln and samse, and merge runs for the same sample into a single BAM file. Confirmed 38 samples, which matches the number mentioned in the QDNAseq instructions linked above.

Script submit_process_1kg_hg19_samples.sh

#!/bin/bash

# Requires: trim_galore, samtools, bwa (0.7.16 used, QDNAseq instructions used 0.5.9)
#$ -N qdnaseq_process_samples
#$ -j y
#$ -S /bin/bash
#$ -cwd
#$ -l h_vmem=8G
#$ -l h_rt=24:00:00

unset MODULEPATH
. /etc/profile.d/modules.sh

IDS=$1
MAP=$2

SAMPLE=`head -n $SGE_TASK_ID $IDS | tail -n 1`

# bcbio bwa-indexed hg19 reference
HG19_REF=/path/to/hg19.fa

for FASTQ in `grep ^$SAMPLE $MAP | cut -f 2`
do
  trim_galore --hardtrim5 50 $FASTQ
  BFASTQ=${FASTQ%.fastq.gz}
  COMBINED=${SAMPLE}_${BFASTQ}
  bwa aln -n 2 -q 40 $HG19_REF $BFASTQ.50bp_5prime.fq.gz > $COMBINED.sai
  bwa samse $HG19_REF $COMBINED.sai $BFASTQ.50bp_5prime.fq.gz | samtools view -hb | samtools sort - > ${COMBINED}.bam
  samtools index $COMBINED.bam
done

samtools merge $SAMPLE.bam ${SAMPLE}_*.bam
samtools index $SAMPLE.bam

Generate sample list from sample/file map and run processing steps.

cut -f 1 sample_file_map.txt | sort -u > samples.txt
qsub -t 1-38 submit_process_1kg_hg19_samples.sh samples.txt sample_file_map.txt

Resource files

wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign50mer.bigWig
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeDacMapabilityConsensusExcludable.bed.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeDukeMapabilityRegionsExcludable.bed.gz
gunzip *.gz

Create bins, add mappability and blacklist

library(QDNAseq)
library(Biobase)
library(BSgenome.Hsapiens.UCSC.hg19)

binSize = 15
bins = createBins(bsgenome = BSgenome.Hsapiens.UCSC.hg19, binSize = binSize)
bins$mappability = calculateMappability(bins, bigWigFile = "wgEncodeCrgMapabilityAlign50mer.bigWig", bigWigAverageOverBed = "/path/to/bigWigAverageOverBed")
bins$blacklist = calculateBlacklist(bins, bedFiles = c("wgEncodeDacMapabilityConsensusExcludable.bed", "wgEncodeDukeMapabilityRegionsExcludable.bed"))
saveRDS(bins, file = "hg19_bins_before_loess.rds")

Calculate residuals

samples = read.table("1000G/samples.txt", header=F, stringsAsFactors=F)
bams = paste0("1000G/", samples$V1, ".bam")
ctrl <- binReadCounts(bins, bamfiles = bams)
ctrl <- applyFilters(ctrl, residual=FALSE, blacklist=FALSE, mappability=FALSE, bases=FALSE)
bins$residual <- iterateResiduals(ctrl)
bins$use <- bins$chromosome %in% as.character(1:22) & bins$bases > 0
saveRDS(bins, file = "hg19_bins.rds")

Compare to pre-calculated SR50

library(Biobase)
bins_precalc <- getBinAnnotations(binSize=15)
sum((pData(bins_precalc)$bases - bins$bases))
[1] 0
sum((pData(bins_precalc)$gc - bins$gc), na.rm=TRUE)
[1] 0
sum((pData(bins_precalc)$mappability - bins$mappability))
[1] 0
sum((pData(bins_precalc)$blacklist - bins$blacklist))
[1] 0
cor.test(pData(bins_precalc)$residual, bins$residual)

        Pearson's product-moment correlation

data:  pData(bins_precalc)$residual and bins$residual
t = 3877.7, df = 167510, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9944231 0.9945286
sample estimates:
      cor 
0.9944761 

sum(pData(bins_precalc)$use)
[1] 179187
> sum(bins$use)
[1] 179187

ameynert · 2019-10-15T21:11:41Z

38 1000G samples, 241 single end FASTQ files for each, reproducible by following step 1 above.

HG01101
HG01204
HG01495
HG01522
NA06994
NA07051
NA11830
NA11832
NA11918
NA11919
NA11992
NA11994
NA12005
NA18489
NA18498
NA18504
NA18623
NA18624
NA18632
NA18633
NA18636
NA18867
NA18912
NA18960
NA18982
NA18984
NA18986
NA19058
NA19063
NA19064
NA19066
NA19116
NA19138
NA19474
NA19703
NA19707
NA19789
NA19901

HenrikBengtsson · 2019-10-15T22:03:27Z

Thank you very much for this

ameynert · 2019-10-17T08:59:00Z

The hg38 PE150 bin generation worked fine following the same recipe as for hg19, but with my own internal set of control BAMs downsampled to 3.5X. I've attached plots comparing the stats - mappability of course is going to be quite different, but the G+C and residuals should and do look similar. Also a similar proportion of usable bins. I didn't compare the blacklisting, happy to trust the input resource files on that.

sum(bins19$use) / length(bins19$start)
[1] 0.8681919
sum(bins38$use) / length(bins38$start)
[1] 0.8931605

hg19_SR50_vs_hg38_PE150_qqplot_mappability.pdf
hg19_vs_hg38_qqplot_GC.pdf
hg19_vs_hg38_qqplot_residual.pdf

leachim · 2019-11-01T09:43:35Z

Are you planning to provide the annotations for hg38 in a package similar to QDNAseq.hg19?

Jane-Gibson · 2019-11-01T11:32:03Z

QDNAseq.hg38 -Would be really helpful, please?

ameynert · 2019-11-04T09:14:08Z

I have hg38 PE150 5kbp and 10kbp bins generated. I'll have to check with the PI about the normal samples used for the residual generation step, as they are from a cancer WGS study, but yes, technically I could share these. It wasn't at all difficult to generate the bins though, just took time and the right set of normal samples.

leachim · 2019-12-03T13:44:49Z

Any news on this? Will you be able to share the hg38 bin annotations?

ameynert · 2020-01-06T08:57:57Z

Unfortunately no, the information governance for the samples I used for the panel is quite strict.

…

-- Alison Meynert, IGMM Bioinformatics Analysis Core Manager MRC Human Genetics Unit MRC Institute of Genetics and Molecular Medicine The University of Edinburgh Western General Hospital Crewe Road, Edinburgh EH4 2XU E: alison.meynert@igmm.ed.ac.uk<mailto:alison.meynert@igmm.ed.ac.uk> T: +44 (0) 131 651 8549 W: www.ed.ac.uk/mrc-human-genetics-unit<http://www.ed.ac.uk/mrc-human-genetics-unit> From: Michael <notifications@github.com> Reply-To: ccagc/QDNAseq <reply@reply.github.com> Date: Tuesday, 3 December 2019 at 13:45 To: ccagc/QDNAseq <QDNAseq@noreply.github.com> Cc: MEYNERT Alison <alison.meynert@igmm.ed.ac.uk>, Comment <comment@noreply.github.com> Subject: Re: [ccagc/QDNAseq] QDNAseq.hg38: Bin annotation for hg38? (#59) Any news on this? Will you be able to share the hg38 bin annotations? — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#59?email_source=notifications&email_token=ABTPEA77RWH46RMCV545G4DQWZPFFA5CNFSM4GSO3LSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFZNLUY#issuecomment-561173971>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABTPEA75BSDKVUI4KTD5JQTQWZPFFANCNFSM4GSO3LSA>. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

mmterpstra · 2020-05-20T12:02:28Z

The example as shown in #59 (comment):

Step 4 generation of bins needs R3.4.4 max due to the package BSgenome.Hsapiens.UCSC.hg19 /BSgenome.Hsapiens.UCSC.hg38 only being released for bioconductor3.6/R3.4.4 or so.

my resources (hg 38 poorly tested) should fail :
~~https://gist.github.com/mmterpstra/44f55b589852708fa1bdc0cb09bde517~~

asntech · 2020-09-09T14:40:50Z

Is there any chance to get the bin annotations for ref genome hg38? I'm using @ameynert steps #59 (comment) but it is taking time.

asntech · 2020-10-22T18:13:56Z

I've created a package QDNAseq.hg38 for hg38 build using the steps mentioned above, QDNAseq vignette and #80.
It has bin sizes 5, 10, 15, 30, 50, 100, 500 and 1000 kb. It took longer for 1kb but happy to get it if anyone needs it.

Install it from here https://github.com/asntech/QDNAseq.hg38

AaronHechmer · 2020-10-22T18:16:11Z

Thanks Aziz!

…

________________________________________ From: Aziz Khan <notifications@github.com> Sent: Thursday, October 22, 2020 11:14 AM To: ccagc/QDNAseq Cc: Hechmer, Aaron; Comment Subject: Re: [ccagc/QDNAseq] QDNAseq.hg38: Bin annotation for hg38? (#59) I've created a package QDNAseq.hg38 for hg38 build using the steps mentioned above and source from QDNAseq.hg19. It has bin sizes 5, 10, 15, 30, 50, 100, 500 and 1000 kb. It took longer for 1kb but happy to get it if anyone needs it. Install it from here https://github.com/asntech/QDNAseq.hg38<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_asntech_QDNAseq.hg38&d=DwMCaQ&c=iORugZls2LlYyCAZRB3XLg&r=kK5BO0vReqfwenBYZKoW0ij2itfL5-6M_LOhSaAH7sM&m=um0gRssMVP5_t-mU1U6dQSCOcO2kUtf4O-FQh5luBbE&s=COOucNDQBh-6VKAQOiGDcNbSZwG9V7SF0dZwhfl9DXk&e=> — You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ccagc_QDNAseq_issues_59-23issuecomment-2D714670231&d=DwMCaQ&c=iORugZls2LlYyCAZRB3XLg&r=kK5BO0vReqfwenBYZKoW0ij2itfL5-6M_LOhSaAH7sM&m=um0gRssMVP5_t-mU1U6dQSCOcO2kUtf4O-FQh5luBbE&s=ROvWLVY09QddQqWU-deLUxAPa4Q4x4b5WZPmZJQ4O4w&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHAXZPXVIC53RSFADPJL23TSMBY7HANCNFSM4GSO3LSA&d=DwMCaQ&c=iORugZls2LlYyCAZRB3XLg&r=kK5BO0vReqfwenBYZKoW0ij2itfL5-6M_LOhSaAH7sM&m=um0gRssMVP5_t-mU1U6dQSCOcO2kUtf4O-FQh5luBbE&s=OvWPGiviMWEDgJBvxxKQQiIFjwwtJqZ1T3zP4nqUQpI&e=>.

sxu922 · 2021-03-01T22:20:02Z

I've created a package QDNAseq.hg38 for hg38 build using the steps mentioned above, QDNAseq vignette and #80.
It has bin sizes 5, 10, 15, 30, 50, 100, 500 and 1000 kb. It took longer for 1kb but happy to get it if anyone needs it.

Install it from here https://github.com/asntech/QDNAseq.hg38

Hi I'm trying to create a bin size 200kb, I'm following the steps, but how do you get from saveRDS(bins, file = "hg19_bins.rds") to hg38.200kbp.SR50.rda? And also if I want to add 200kb to your existing package, do I just add the rda file to the data folder and that's it? I'm really new to R, so please forgave me for asking a silly question.

asntech · 2021-03-02T17:44:33Z

@sxu922 I still have the processed bams and other annotations and happy to create 200kb and add to the package. This is because I want to make sure all the bins are computationally reproducible.

sxu922 · 2021-03-02T18:23:34Z

@sxu922 I still have the processed bams and other annotations and happy to create 200kb and add to the package. This is because I want to make sure all the bins are computationally reproducible.

Wow thank you so much for the reply! I didn't expect to get a response so fast! I created the 200kb bin and figured out that I can just read the rds file and use that as the bin. No need to create 200kb for me. Still, thank you so much!

francesca-guana · 2022-10-21T09:58:08Z

I've created a package QDNAseq.hg38 for hg38 build using the steps mentioned above, QDNAseq vignette and #80. It has bin sizes 5, 10, 15, 30, 50, 100, 500 and 1000 kb. It took longer for 1kb but happy to get it if anyone needs it.

Install it from here https://github.com/asntech/QDNAseq.hg38

Hi Aziz,
I've installed the package QDNAseq.hg38. I have paired end data with reads length of 150 bp. Can I use the package even if there is no specific experiment type (I saw only SR50 and PE100)?

Thank you

aliahhawari · 2023-09-28T08:39:25Z

I've created a package QDNAseq.hg38 for hg38 build using the steps mentioned above, QDNAseq vignette and #80. It has bin sizes 5, 10, 15, 30, 50, 100, 500 and 1000 kb. It took longer for 1kb but happy to get it if anyone needs it.
Install it from here https://github.com/asntech/QDNAseq.hg38

Hi Aziz, I've installed the package QDNAseq.hg38. I have paired end data with reads length of 150 bp. Can I use the package even if there is no specific experiment type (I saw only SR50 and PE100)?

Thank you

I have this query too. Did you manage to run QDNAseq.hg38 for your 150bp reads?

RodrigoGM · 2023-10-24T19:31:09Z

Hi, I'm analyzing data aligned to GRCh38 using QDNAseq.hg38. The system works with no errors, however, when I run callBins it pulls centromeres from b37 (L87 below) and hoped to get some clarifications on this.

How does this impact the re segmentation and calls? What function pulls the centromere coordinates? is there a way to change the assembly for the centromere positions of GRCh38 or T2T? e.g. either passing an argument or automatically retrieving the argument from the featureData.

Please note that the featureData slot in both the segmented only (L77) and copy number called objects (L3747 ) suggests hg38 was used.

Thank you, any help is much appreciated,

77   > lpsB.segmented.copynumbers@featureData
78   QDNAseq bin annotations for Hsapiens, build hg38.
79   Created by Aziz Khan with QDNAseq 1.22.0, 2020-09-11 05:45:57.
80   An object of class 'AnnotatedDataFrame'
81     featureNames: 1:1-100000 1:100001-200000 ... Y:57200001-57227415 (30894 total)
82     varLabels: chromosome start ... use (9 total)
83     varMetadata: labelDescription
84
85  > lpsB.copynumber.calls <- lpsB.segmented.copynumbers %>%                                                                                                                                                                                                                                                                 
86  +     callBins()                                                                                                                                                                                                                                                                                                          
87  + Dividing chromosomes into arms using centromere positions from GRCh37
88 
89  [1] 1
90  [1] 2
...
113 EM algorithm started ...
114 
115 [1] "Total number of segments present in the data: 34434"
116 [1] "Number of segments used for fitting the model: 3046"

...

3747 > lpsB.copynumber.calls@featureData
     QDNAseq bin annotations for Hsapiens, build hg38.
     Created by Aziz Khan with QDNAseq 1.22.0, 2020-09-11 05:45:57.
     An object of class 'AnnotatedDataFrame'
       featureNames: 1:1-100000 1:100001-200000 ... Y:57200001-57227415 (30894 total)
       varLabels: chromosome start ... use (9 total)
       varMetadata: labelDescription

anagrei · 2023-10-26T09:29:30Z

I've created a package QDNAseq.hg38 for hg38 build using the steps mentioned above, QDNAseq vignette and #80. It has bin sizes 5, 10, 15, 30, 50, 100, 500 and 1000 kb. It took longer for 1kb but happy to get it if anyone needs it.
Install it from here https://github.com/asntech/QDNAseq.hg38

Hi Aziz, I've installed the package QDNAseq.hg38. I have paired end data with reads length of 150 bp. Can I use the package even if there is no specific experiment type (I saw only SR50 and PE100)?
Thank you

I have this query too. Did you manage to run QDNAseq.hg38 for your 150bp reads?

I've created a package QDNAseq.hg38 for hg38 build using the steps mentioned above, QDNAseq vignette and #80. It has bin sizes 5, 10, 15, 30, 50, 100, 500 and 1000 kb. It took longer for 1kb but happy to get it if anyone needs it.
Install it from here https://github.com/asntech/QDNAseq.hg38

Hi Aziz, I've installed the package QDNAseq.hg38. I have paired end data with reads length of 150 bp. Can I use the package even if there is no specific experiment type (I saw only SR50 and PE100)?

Thank you

It seems there isn't really a PE100 experiment type in the package. When I tried to use it I got following error:
" QDNAseq bin annotation ‘hg38.1000kbp.PE100’ was not found in package ‘QDNAseq.hg38’. Available bin annotations are: ‘hg38.1000kbp.SR50’, ‘hg38.100kbp.SR50’, ‘hg38.10kbp.SR50’, ‘hg38.15kbp.SR50’, ‘hg38.1kbp.SR50’, ‘hg38.30kbp.SR50’, ‘hg38.500kbp.SR50’, ‘hg38.50kbp.SR50’, ‘hg38.5kbp.SR50’ ".

QDNAseq bin annotation for 100 or 150bp paired end reads will be very much appreciated if available.

kandabarau · 2024-03-05T12:10:14Z

Reproduce the existing instruction for hg19 and validate that we can get numerically reproducible bin annotations as what's in the QDNAseq.hg19 package.

After confirming hg19 can be produced correctly, it should be "easier" to attack hg38

I need PE150 hg38 annotations for QDNAseq, so I've had a go at reproducing the hg19 SR50. It's not quite the same; think because my version of bwa is slightly different, but vv good correlation in the residuals (Pearsons 0.994). Everything else is identical. Will report back on the hg38 attempt - may not be able to share as I'm basing the LOESS residuals on a set of downsampled WGS normals from an internal project to keep them as closely matched as possible to my samples.

Dear Alison @ameynert,
are you able to share your hg38 PE150 bin annotations? I am currently searching for how I could analyze our PE150 5x data and 1k genomes does not seem to be a proper reference - they have shorter reads.

Bests,
Sergey

ameynert · 2024-03-06T12:49:40Z

Hi @kandabarau unfortunately the project that the samples came from did not allow for sharing even the derived data.

mmterpstra · 2024-03-06T13:11:32Z

asntech/QDNAseq.hg38#2 same issue different flavour. Maybe use a single end for analysis and alignment and then using the workflow adapted to a SE150 bp one in the asntech/QDNAseq.hg38 repo might work.

kandabarau · 2024-03-06T14:02:47Z

asntech/QDNAseq.hg38#2 same issue different flavour. Maybe use a single end for analysis and alignment and then using the workflow adapted to a SE150 bp one in the asntech/QDNAseq.hg38 repo might work.

I was thinking about this. Unfortunately https://github.com/asntech/QDNAseq.hg38 has no SE150 but SE50 (those 38 1kg samples I could find - have PE100 maximum). I doubt I do not harm the performance if I crop my reads to SE50. Do I?

Anyway thank you!

Bests,
Sergey

mmterpstra · 2024-03-07T13:23:17Z

Some regions will likely drop off due to lack of mappability (see post above hg19_SR50_vs_hg38_PE150_qqplot_mappability.pdf). Note that the control residuals are probably used as a weight in the normalization of the test dataset. This why you will likely prefer your own control samples or similarly prepped dataset to improve the fit (not an expert on how this works).

kandabarau · 2024-03-07T15:58:57Z

Some regions will likely drop off due to lack of mappability (see post above hg19_SR50_vs_hg38_PE150_qqplot_mappability.pdf). Note that the control residuals are probably used as a weight in the normalization of the test dataset. This why you will likely prefer your own control samples or similarly prepped dataset to improve the fit (not an expert on how this works).

thank you @mmterpstra ,

does not ease my life but it was important to hear an opinion from someone who did it already.

How do you think - how important is downsampling of control WGS? If both control and test WGS are PCR-free - how important is to keep their depths comparable?

Bests,
Sergey

fyi @Overnightology

liuyang2006 · 2024-09-22T17:16:09Z

Hi everyone,

I want use on my T2T CHM13 v2 genome. Can I use hg38, or need the T2T annotation? Is there any ready package for T2T annotation? Thank you!

Best,
Yang

RodrigoGM · 2024-09-24T13:20:34Z

Hi @liuyangzzu, you'll need to use the T2T b/c the coordinates aren't the same. I've put together T2T annotations and recently uploaded the repo. At the moment the subset of 1000 genomes for the residuals estimation is still processing, however, the repo is here if you want to begin using it for development. I'll be updating the residuals once the 1000 genomes data is processed.

liuyang2006 · 2024-09-24T14:07:54Z

Great! Looking forward to your updates for T2T genome. I will check the repo also. Thank you!

HenrikBengtsson added the help wanted label Sep 17, 2019

HenrikBengtsson changed the title ~~rds file for hg38~~ QDNAseq.hg38: Bin annotation for hg38? Sep 17, 2019

This was referenced Sep 17, 2019

hg38 compatibility #57

Closed

Downloads of PE100 bin annotations #50

Open

asntech added a commit to asntech/QDNAseq.hg38 that referenced this issue Oct 24, 2020

added 1kbp hg38 bins annotations ccagc/QDNAseq#59

f2b8625

kojimak324 mentioned this issue Nov 24, 2023

Downloading 1000 Genomes samples is inactive #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QDNAseq.hg38: Bin annotation for hg38? #59

QDNAseq.hg38: Bin annotation for hg38? #59

nikleotide commented Jan 25, 2019

AaronHechmer commented Jul 30, 2019

HenrikBengtsson commented Sep 17, 2019

ameynert commented Oct 15, 2019

HenrikBengtsson commented Oct 15, 2019

ameynert commented Oct 15, 2019

ameynert commented Oct 15, 2019

HenrikBengtsson commented Oct 15, 2019

ameynert commented Oct 17, 2019

leachim commented Nov 1, 2019

Jane-Gibson commented Nov 1, 2019

ameynert commented Nov 4, 2019

leachim commented Dec 3, 2019

ameynert commented Jan 6, 2020 via email

mmterpstra commented May 20, 2020 •

edited

Loading

asntech commented Sep 9, 2020 •

edited

Loading

asntech commented Oct 22, 2020 •

edited

Loading

AaronHechmer commented Oct 22, 2020 via email

sxu922 commented Mar 1, 2021

asntech commented Mar 2, 2021

sxu922 commented Mar 2, 2021

francesca-guana commented Oct 21, 2022

aliahhawari commented Sep 28, 2023

RodrigoGM commented Oct 24, 2023 •

edited

Loading

anagrei commented Oct 26, 2023

kandabarau commented Mar 5, 2024

ameynert commented Mar 6, 2024

mmterpstra commented Mar 6, 2024

kandabarau commented Mar 6, 2024 •

edited

Loading

mmterpstra commented Mar 7, 2024

kandabarau commented Mar 7, 2024 •

edited

Loading

liuyang2006 commented Sep 22, 2024

RodrigoGM commented Sep 24, 2024

liuyang2006 commented Sep 24, 2024

QDNAseq.hg38: Bin annotation for hg38? #59

QDNAseq.hg38: Bin annotation for hg38? #59

Comments

nikleotide commented Jan 25, 2019

AaronHechmer commented Jul 30, 2019

HenrikBengtsson commented Sep 17, 2019

ameynert commented Oct 15, 2019

HenrikBengtsson commented Oct 15, 2019

ameynert commented Oct 15, 2019

ameynert commented Oct 15, 2019

HenrikBengtsson commented Oct 15, 2019

ameynert commented Oct 17, 2019

leachim commented Nov 1, 2019

Jane-Gibson commented Nov 1, 2019

ameynert commented Nov 4, 2019

leachim commented Dec 3, 2019

ameynert commented Jan 6, 2020 via email

mmterpstra commented May 20, 2020 • edited Loading

asntech commented Sep 9, 2020 • edited Loading

asntech commented Oct 22, 2020 • edited Loading

AaronHechmer commented Oct 22, 2020 via email

sxu922 commented Mar 1, 2021

asntech commented Mar 2, 2021

sxu922 commented Mar 2, 2021

francesca-guana commented Oct 21, 2022

aliahhawari commented Sep 28, 2023

RodrigoGM commented Oct 24, 2023 • edited Loading

anagrei commented Oct 26, 2023

kandabarau commented Mar 5, 2024

ameynert commented Mar 6, 2024

mmterpstra commented Mar 6, 2024

kandabarau commented Mar 6, 2024 • edited Loading

mmterpstra commented Mar 7, 2024

kandabarau commented Mar 7, 2024 • edited Loading

liuyang2006 commented Sep 22, 2024

RodrigoGM commented Sep 24, 2024

liuyang2006 commented Sep 24, 2024

mmterpstra commented May 20, 2020 •

edited

Loading

asntech commented Sep 9, 2020 •

edited

Loading

asntech commented Oct 22, 2020 •

edited

Loading

RodrigoGM commented Oct 24, 2023 •

edited

Loading

kandabarau commented Mar 6, 2024 •

edited

Loading

kandabarau commented Mar 7, 2024 •

edited

Loading