Skip to content

SantosJGND/Imputation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Imputation repository.

This sub-directory holds work on imputation. As usual, this research is done with population genetic data in mind.

The data consists a single genotype data set. Variables are variant count features ranging between 0 and 2; Samples are designed to derive from a semi-consistent population network. Semi-consistent is used here to indicate that certain observations have variable pdfs, and the characteristics of the structure vary (cluster distance may change).

Data generation

VCF files are generated using the Genome Simulator tool of the first Tools repository link.

  • replicated here for the specific data sets used notebook.

I. Distances / Dimensionality reduction.

Window based analysis constructs data sets of distance data with which to predict position of missing observation in incomplete data set.

notebook

II. PCA inverse transformation.

An aside on the accuracy of PCA inverse transformation.

notebook

III. Cluster search.

Dimensionality reduction and maximum likelihood cluster classification. Use for stats, imputation.

notebook

Application to rice data.

i. Haplotype imputation

Based on the method described in section I.. Additions include: composite likelihood; control for distance; exclusion of observations carrying missing or heterozygous calls in local distance calculations.

data requirement: haplotype, phased, or nearly homozygous data.

validation: benchmark included.

notebook

ii. cluster distance and imputation

Application of the cluster search and inference pipeline on 3000 Rice Genomes data. Focus on Japonica and cBasmati variation. Distance inference now performed within 1MB of focal target.

notebook

iii. targeted Ne estimation at local windows

notebook

image

About

VCF, dimensionality reduction, distances, KDE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published