-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Description
Hi @brentp ,
thanks again for developing such a fantastic tool. We use it for nearly every project in my group!
I'm curious if it would be possible to add a new property/method to VariantInfo that returns a sparse representation of genotypes. Ideally, something like var.sparse_genotypes that returns a (values, indices) for non-zero genotypes and sample indices where those occur.
This is already achievable with numpy filtering of var.gt_types, it is somewhat slow, and I'm curious if doing this in Cython space is faster.
The overall goal is to be able to build a sparse genotype matrix across all variants, which would look something like,
vcf = VCF(...)
data = []
indices = []
for vdx, var in enumerate(vcf):
_data, _idxs = var.sparse_genotypes(include_missing=False)
# construct local index
_idx = np.column_stack((_idxs, np.ones_like(_idxs) * vdx))
data.append(_data)
indices.append(_idx)
data = np.concatenate(data)
indices = np.concatenate(indices)
n = len(vcf.samples)
p = vdx # last variant
sp_geno_mat = coo_matrix(data, indices, shape=(n, p))Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels