# admix.Dataset

We introduce the central data stuctures used in this package.

- `dset.geno`: genotype (``n_snp``, ``n_indiv``, ``n_ploidy``)
- `dset.lanc`: local ancestry (``n_snp``, ``n_indiv``, ``n_ploidy``)
- `dset.snp`: information about SNPs (``n_snp``, ``n_snp_feature``)
- `dset.indiv`: information about individuals (``n_indiv``, ``n_indiv_feature``)

Central in python API is the admix.Dataset class, which support various convenient operations for manipulating large on-disk data sets.

In [None]:
import admix

# load toy data
dset = admix.load_toy_admix()

In [None]:
# overview of data set
dset

In [None]:
# SNP attributes, CHROM (chromosomes), POS (positions), REF (reference allele), ALT (alternative allele), etc.
# we have also precomputed FREQ1, FREQ2 as ancestry-specific allele frequencies
dset.snp

In [None]:
# individual attributes
dset.indiv

In [None]:
# phased genotype (n_snp, n_indiv, 2)
dset.geno

In [None]:
# local ancestry (n_snp, n_indiv, 2)
dset.lanc

In [None]:
# subset the first 50 SNPs
dset[0:50, :]

In [None]:
# subset the first 50 individuals
dset[:, 0:50]

In [None]:
# subset the first 50 SNPs and first 50 individuals
dset[0:50:, 0:50]

In [None]:
# calculate allele per ancestry backgrounds
dset.allele_per_anc()

In [None]:
# calculate allele frequencies per ancestry backgrounds
dset.af_per_anc()