You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we are using Adam to count the total number of Genotypes for the input file “ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf” from 1000 Genome project stored on s3
I would guess this has to do with how ADAM handles multiallelic variants. You'll notice that the ADAM Variant schema only supports a single alternate allele:
Hello,
we are using Adam to count the total number of Genotypes for the input file “ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf” from 1000 Genome project stored on s3
With Adam we run these 2 steps:
Step 1: convert to Adam with vcf2adam :
adam-submit --master yarn-client --driver-memory 8g
--num-executors $TOTAL_EXECUTORS
--executor-cores $CORES_PER_EXECUTOR
--executor-memory $MEMORY_PER_EXECUTOR
--
vcf2adam
-parquet_compression_codec SNAPPY
hdfs:///user/ec2-user/1kg/chr1.vcf \
hdfs:///user/ec2-user/1kg/chr1.adam
then we start Step2: Adam-Shell
adam-shell --master yarn-client --driver-memory 8g
--num-executors $TOTAL_EXECUTORS
--executor-cores $CORES_PER_EXECUTOR
--executor-memory $MEMORY_PER_EXECUTOR
scala> import org.bdgenomics.adam.rdd.ADAMContext
scala> import org.bdgenomics.formats.avro._
scala> val ac = new ADAMContext(sc)
scala> val genotypes = ac.loadGenotypes("/user/ec2-user/1kg/chr1.adam")
scala> genotypes.count
the result : 16277357168
As a comparision we are also using the HTSJDK with this script:
And this shows a different result:
16196107376
Any idea ? Are we using the wrong commands ?
greetings
-Jerry
The text was updated successfully, but these errors were encountered: