-
Notifications
You must be signed in to change notification settings - Fork 35
ancestry
Brent Pedersen edited this page Jan 5, 2022
·
1 revision
note: this feature is working, but still experimental. it may change in future versions.
somalier
can predict ancestry on a set of query samples given a set of labelled samples, for example from thousand genomes along with labels for.
This would look like:
somalier ancestry --labels ancestry-labels-1kg.tsv 1kg-somalier/*.somalier ++ query-samples-somalier/*.somalier
Where the ++
separates the labeled samples from the query samples. This command will create an html output along
with a text file of the predictions.
ancestry-labels-1kg.tsv
is here
and the somalier files for thousand genomes can be downloaded from here These were created from the thousand genomes high coverage data from here
Note that these will work for either GRCh37 or hg38 as long as you use the most recent sites files distributed with somalier
.
Example output is here
Usage:
somalier pca [options] [extracted ...]
Arguments:
[extracted ...] $sample.somalier files for each sample. place labelled samples first followed by '++' then *.somalier for query samples
Options:
--labels=LABELS file with ancestry labels
-o, --output-prefix=OUTPUT_PREFIX
prefix for output files (default: somalier-ancestry)
--n-pcs=N_PCS number of principal components to use in the reduced dataset (default: 5)
--nn-hidden-size=NN_HIDDEN_SIZE
shape of hidden layer in neural network (default: 16)
--nn-batch-size=NN_BATCH_SIZE
batch size fo training neural network (default: 32)
--nn-test-samples=NN_TEST_SAMPLES
number of labeled samples to test for NN convergence (default: 101)
-h, --help Show this help