Exploring refine.bio species compendia
Once refine.bio reaches production, we will periodically release compendia comprised of all the samples from a species that we were able to process.
We refer to these as species compendia and envision that these collections will be useful for extracting features from a diverse set of biological conditions.
Creating a species compendia pipeline required us to tackle various problems such as selecting a method for imputing missing values.
This repository holds a series of analyses related to refine.bio species compendia divided up into related modules.
See the README
files in the individual directories for more information.
select_imputation_method
- A series of experiments/evaluations for selecting a method for imputing missing values.human_missingness
- Typically genes that are measured in less than 30% of samples are removed before imputing missing values in gene expression data. How many genes would be left in the human compendium using this cutoff?impute_requirements
- How long does it take to run KNN impute? (Too long for our use case.)quality_check
- Exploring test zebrafish compendia.