comprehension for 'Using prephased SNP profiles' #158

hanjun98 · 2024-01-25T06:31:24Z

Hi, first, Thank you for awesome tool to create new insight!
I felt person who made numbat is so meticulous.

I have a question to 'Using prephased SNP profiles'

'''
Using DNA-derived genotype information is another way to improve SNP density and phasing. If you have SNP calls from DNA genotyping (e.g. WGS/WES), you can first perform phasing on the DNA-derived VCF. Then run cellsnp-lite on scRNA-seq BAMs against the DNA-derived VCF to generate allele counts (only include heterozygous SNPs). Finally, merge the phased GT fields (from phased DNA-derived VCF) with the obtained allele counts to produce an allele dataframe in the format of df_allele (see section Preparing data).
'''

If you have SNP calls from DNA genotyping (e.g. WGS/WES), you can first perform phasing on the DNA-derived VCF. Then run cellsnp-lite on scRNA-seq BAMs against the DNA-derived VCF to generate allele counts (only include heterozygous SNPs).
--> I already have VCF(derived from WES) so I performed phasing on the DNA-derived VCF like this

`pileup_R='/mnt/gmi-l1/_90.User_Data/dhthxkr/codes/MDS/sc-seq/numbat/pileup_and_phase.R'
gmap='/mnt/gmi-l1/_90.User_Data/dhthxkr/tools/Eagle_v2.4.1/tables/genetic_map_hg38_withX.txt.gz'
#snpvcf='/mnt/gmi-l1/_90.User_Data/dhthxkr/ref/numbat/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf'
paneldir='/mnt/gmi-l1/_90.User_Data/dhthxkr/ref/numbat/1000G_hg38'
eagle='/mnt/gmi-l1/_90.User_Data/dhthxkr/tools/Eagle_v2.4.1/eagle'

sample=$1
WES=$2

mkdir -p ${sample}/Numbat_WES

Rscript ${pileup_R}
--label ${sample}
--samples ${sample}
--bams ${sample}_cellranger/outs/possorted_genome_bam.bam \ #single cell cellranger ouput bam
--barcodes ${sample}/leiden_v2.4/matrix_files/barcodes.tsv.gz \ #single cell barcodes
--outdir ${sample}/Numbat_WES
--gmap ${gmap}
--eagle ${eagle}
--snpvcf ${sample}/WES/${WES}.vcf.gz \ #WES derived VCF(contains germlines and somtics, only filter='PASS')
--paneldir ${paneldir}
--ncores 5 &`

so I can get a '${sample}/Numbat_WES/MDSC02_allele_counts.tsv.gz'

Finally, merge the phased GT fields (from phased DNA-derived VCF) with the obtained allele counts to produce an allele dataframe in the format of df_allele.
--> what's mean 'merge the phased GT fields'? Actually, I inferred that merging allele_counts.tsv(Pileup output with option(--snpvcf ${sample}/WES/${WES}.vcf.gz)) with allele_counts.tsv(Pileup output with option(--snpvcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf))

so, I make a merged_allele_counts.tsv(from option(--snpvcf ${sample}/WES/${WES}.vcf.gz) allele_counts.tsv and option(--snpvcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf)) allele_counts.tsv

And then run 'run_numbat' with merged_allele_counts.tsv like this

`#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
sample <- args[1]

library(glue)
library(stringr)
library(data.table)
library(dplyr)
library(vcfR)
library(Matrix)
library(numbat)
library(pagoda2)
library(ggplot2)
library(ggtree)
library(tidygraph)
library(patchwork)

df_allele=fread(paste0('/mnt/gmi-l1/_90.User_Data/dhthxkr/MDS/',sample,'/Numbat_WES_v2/',sample,'_merged_allele_counts.tsv.gz'), header = T)
count_mat = readMM(paste0('/mnt/gmi-l1/_90.User_Data/dhthxkr/MDS/',sample,'/QC_output/matrix_files_v2.4/matrix.mtx.gz'))
cells = fread(paste0('/mnt/gmi-l1/_90.User_Data/dhthxkr/MDS/',sample,'/QC_output/matrix_files_v2.4/barcodes.tsv.gz'), header = F)$V1
genes = fread(paste0('/mnt/gmi-l1/_90.User_Data/dhthxkr/MDS/',sample,'/QC_output/matrix_files_v2.4/features.tsv.gz'), header = F)$V2
colnames(count_mat) = cells
rownames(count_mat) = genes
count_mat = as.matrix(count_mat)
count_mat = rowsum(count_mat, rownames(count_mat))
count_mat = as(count_mat, "dgCMatrix")

out = run_numbat(
count_mat, # gene x cell integer UMI count matrix
ref_hca, # reference expression profile, a gene x cell type normalized expression level matrix
df_allele, # allele dataframe generated by pileup_and_phase script
genome = "hg38",
t = 1e-5,
ncores = 20,
plot = TRUE,
out_dir = paste0('/mnt/gmi-l1/_90.User_Data/dhthxkr/MDS/',sample,'/Numbat_WES_v2')
)`

But I got this error
"Warning message:
In asMethod(object) :
sparse->dense coercion: allocating vector of size 2.4 GiB
Error in check_allele_df(df_allele) :
Inconsistent SNP genotypes; Are cells from two different individuals mixed together?
Calls: run_numbat -> check_allele_df
Execution halted"

So, I don't know when it wrong?

Thank you for reading!
I'll be waiting for your answer!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comprehension for 'Using prephased SNP profiles' #158

comprehension for 'Using prephased SNP profiles' #158

hanjun98 commented Jan 25, 2024 •

edited

comprehension for 'Using prephased SNP profiles' #158

comprehension for 'Using prephased SNP profiles' #158

Comments

hanjun98 commented Jan 25, 2024 • edited

hanjun98 commented Jan 25, 2024 •

edited