New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-940] Fix adam2vcf -sort_on_save flag #949

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@massie
Member

massie commented Feb 18, 2016

The -sort_on_save flag allows users to sort an adam file
before saving to VCF format.

The ADAMVCFOutputFormat stores VariantContextWritable objects
which are not Serializable. If a user requests a sort, this
elicits a shuffle which, prior to this commit, would exit
with an NPE at serialization.

The fix is to only convert to VariantContextWritable
objects immediately before saving since the Hadoop output
format understands how to write Writables.

[ADAM-940] Fix adam2vcf -sort_on_save flag
The -sort_on_save flag allows users to sort an adam file
before saving to VCF format.

The ADAMVCFOutputFormat stores VariantContextWritable objects
which are not Serializable. If a user requests a sort, this
elicits a shuffle which, prior to this commit, would exit
with an NPE at serialization.

The fix is to keep only convert to VariantContextWritable
objects immediately before saving since the Hadoop output
format understands how to write Writables.
@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Feb 18, 2016

Member

This fixed #940

Member

massie commented Feb 18, 2016

This fixed #940

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1086/

Build result: FAILURE

[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 1.0.4,2.11,1.2.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Feb 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1086/

Build result: FAILURE

[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 1.0.4,2.11,1.2.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Feb 18, 2016

Member

retest this please

Member

massie commented Feb 18, 2016

retest this please

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1087/

Build result: FAILURE

[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 1.0.4,2.11,1.2.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Feb 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1087/

Build result: FAILURE

[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 1.0.4,2.11,1.2.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh heuermh modified the milestone: 0.19.0 Feb 18, 2016

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 18, 2016

Member

Looks like methods missing from Spark version 1.2.1

[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:48: error: value leftOuterJoin is not a member of org.apache.spark.rdd.RDD[(org.bdgenomics.adam.rich.RichVariant, org.bdgenomics.adam.models.VariantContext)]
[ERROR] possible cause: maybe a semicolon is missing before `value leftOuterJoin'?
[ERROR]       .leftOuterJoin(ann.keyBy(_.getVariant))
[ERROR]        ^
[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:110: error: value sortByKey is not a member of org.apache.spark.rdd.RDD[(org.bdgenomics.adam.models.ReferencePosition, org.bdgenomics.adam.models.VariantContext)]
[ERROR]       keyByPosition.sortByKey()
[ERROR]                     ^
[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:129: error: value saveAsNewAPIHadoopFile is not a member of org.apache.spark.rdd.RDD[(org.apache.hadoop.io.LongWritable, org.seqdoop.hadoop_bam.VariantContextWritable)]
[ERROR]     writableVCs.saveAsNewAPIHadoopFile(
[ERROR]                 ^
[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:143: error: value groupByKey is not a member of org.apache.spark.rdd.RDD[(org.bdgenomics.adam.rich.RichVariant, org.bdgenomics.formats.avro.Genotype)]
[ERROR] possible cause: maybe a semicolon is missing before `value groupByKey'?
[ERROR]       .groupByKey
[ERROR]        ^
[ERROR] four errors found
Member

heuermh commented Feb 18, 2016

Looks like methods missing from Spark version 1.2.1

[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:48: error: value leftOuterJoin is not a member of org.apache.spark.rdd.RDD[(org.bdgenomics.adam.rich.RichVariant, org.bdgenomics.adam.models.VariantContext)]
[ERROR] possible cause: maybe a semicolon is missing before `value leftOuterJoin'?
[ERROR]       .leftOuterJoin(ann.keyBy(_.getVariant))
[ERROR]        ^
[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:110: error: value sortByKey is not a member of org.apache.spark.rdd.RDD[(org.bdgenomics.adam.models.ReferencePosition, org.bdgenomics.adam.models.VariantContext)]
[ERROR]       keyByPosition.sortByKey()
[ERROR]                     ^
[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:129: error: value saveAsNewAPIHadoopFile is not a member of org.apache.spark.rdd.RDD[(org.apache.hadoop.io.LongWritable, org.seqdoop.hadoop_bam.VariantContextWritable)]
[ERROR]     writableVCs.saveAsNewAPIHadoopFile(
[ERROR]                 ^
[ERROR] /home/jenkins/workspace/ADAM-prb/HADOOP_VERSION/1.0.4/SCALAVER/2.10/SPARK_VERSION/1.2.1/label/centos/adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:143: error: value groupByKey is not a member of org.apache.spark.rdd.RDD[(org.bdgenomics.adam.rich.RichVariant, org.bdgenomics.formats.avro.Genotype)]
[ERROR] possible cause: maybe a semicolon is missing before `value groupByKey'?
[ERROR]       .groupByKey
[ERROR]        ^
[ERROR] four errors found
@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Feb 22, 2016

Member

retest this please

Member

massie commented Feb 22, 2016

retest this please

1 similar comment
@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Feb 23, 2016

Member

retest this please

Member

massie commented Feb 23, 2016

retest this please

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 23, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1093/

Build result: FAILURE

[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.2.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Feb 23, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1093/

Build result: FAILURE

[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.2.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@andrewmchen

This comment has been minimized.

Show comment
Hide comment
@andrewmchen

andrewmchen Feb 23, 2016

Member

Hi @massie!
Before you commit this, there seems to be a small problem with this code. With the change, my VCFs are now being sorted but differently than the convention that GATK follows. For example, when evaluating my variants I got

##### ERROR   /home/eecs/amchen/data/working/NA12878.gmm.vcf contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 3, 4, 5, 6, 7, 8, 9, GL000191.1, GL000192.1, GL000193.1, GL000194.1, GL000195.1, GL000196.1, GL000197.1, GL000198.1, GL000199.1, GL000200.1, GL000201.1, GL000202.1, GL000203.1, GL000204.1, GL000205.1, GL000206.1, GL000207.1, GL000208.1, GL000209.1, GL000210.1, GL000211.1, GL000212.1, GL000213.1, GL000214.1, GL000215.1, GL000216.1, GL000217.1, GL000218.1, GL000219.1, GL000220.1, GL000221.1, GL000222.1, GL000223.1, GL000224.1, GL000225.1, GL000226.1, GL000227.1, GL000228.1, GL000229.1, GL000230.1, GL000231.1, GL000232.1, GL000233.1, GL000234.1, GL000235.1, GL000236.1, GL000237.1, GL000238.1, GL000239.1, GL000240.1, GL000241.1, GL000242.1, GL000243.1, GL000244.1, GL000245.1, GL000246.1, GL000247.1, GL000248.1, GL000249.1, MT, NC_007605, X, Y, hs37d5]
##### ERROR   reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1, NC_007605, hs37d5]

I'll try to put in a fix but the second part with the GL contigs confuses me. It seems like they are neither alphabetically nor numerically sorted..

Member

andrewmchen commented Feb 23, 2016

Hi @massie!
Before you commit this, there seems to be a small problem with this code. With the change, my VCFs are now being sorted but differently than the convention that GATK follows. For example, when evaluating my variants I got

##### ERROR   /home/eecs/amchen/data/working/NA12878.gmm.vcf contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 3, 4, 5, 6, 7, 8, 9, GL000191.1, GL000192.1, GL000193.1, GL000194.1, GL000195.1, GL000196.1, GL000197.1, GL000198.1, GL000199.1, GL000200.1, GL000201.1, GL000202.1, GL000203.1, GL000204.1, GL000205.1, GL000206.1, GL000207.1, GL000208.1, GL000209.1, GL000210.1, GL000211.1, GL000212.1, GL000213.1, GL000214.1, GL000215.1, GL000216.1, GL000217.1, GL000218.1, GL000219.1, GL000220.1, GL000221.1, GL000222.1, GL000223.1, GL000224.1, GL000225.1, GL000226.1, GL000227.1, GL000228.1, GL000229.1, GL000230.1, GL000231.1, GL000232.1, GL000233.1, GL000234.1, GL000235.1, GL000236.1, GL000237.1, GL000238.1, GL000239.1, GL000240.1, GL000241.1, GL000242.1, GL000243.1, GL000244.1, GL000245.1, GL000246.1, GL000247.1, GL000248.1, GL000249.1, MT, NC_007605, X, Y, hs37d5]
##### ERROR   reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1, NC_007605, hs37d5]

I'll try to put in a fix but the second part with the GL contigs confuses me. It seems like they are neither alphabetically nor numerically sorted..

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 24, 2016

Member

@andrewmchen neither sort order is wrong per se, but it means that the VCF you are recalibrating and the reference are sorted differently. Which reference build are you using? HG19 or GRCh38? If GRCh38, I can send you a link to a reference build with the same sort order. If HG19, I'll be generating a properly sorted reference tomorrow.

Also, see #952 and associated issues.

Member

fnothaft commented Feb 24, 2016

@andrewmchen neither sort order is wrong per se, but it means that the VCF you are recalibrating and the reference are sorted differently. Which reference build are you using? HG19 or GRCh38? If GRCh38, I can send you a link to a reference build with the same sort order. If HG19, I'll be generating a properly sorted reference tomorrow.

Also, see #952 and associated issues.

@andrewmchen

This comment has been minimized.

Show comment
Hide comment
@andrewmchen

andrewmchen Feb 24, 2016

Member

Oh right, I'm using a reference file called hs37d5.fa. Not sure which one above it corresponds to. Also thanks for offering to send me the correct reference! However since most of my other stuff is using this reference (I think?!) it would be safer to stick with this one. I'm currently planning to just use picard to sort my VCF.

In the meantime, I can try taking a look at issue #952 if no one else is.

Member

andrewmchen commented Feb 24, 2016

Oh right, I'm using a reference file called hs37d5.fa. Not sure which one above it corresponds to. Also thanks for offering to send me the correct reference! However since most of my other stuff is using this reference (I think?!) it would be safer to stick with this one. I'm currently planning to just use picard to sort my VCF.

In the meantime, I can try taking a look at issue #952 if no one else is.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 24, 2016

Member

hs37 is a specific build of b37 (which is approximately HG19). The problem will be your VCF header lines and associated bogus stuff.

Member

fnothaft commented Feb 24, 2016

hs37 is a specific build of b37 (which is approximately HG19). The problem will be your VCF header lines and associated bogus stuff.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Feb 24, 2016

Member

Thanks for finding this incongruity, @andrewmchen.

As an aside, one of our general community rules is that the person submitting the PR isn't allowed to merge it. It's not until you and others feel it's ready, will it be merged by someone else on the team.

Let me know if you need any help "sorting" this out.

Member

massie commented Feb 24, 2016

Thanks for finding this incongruity, @andrewmchen.

As an aside, one of our general community rules is that the person submitting the PR isn't allowed to merge it. It's not until you and others feel it's ready, will it be merged by someone else on the team.

Let me know if you need any help "sorting" this out.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 24, 2016

Member

Jenkins, retest this please.

Member

heuermh commented Feb 24, 2016

Jenkins, retest this please.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 24, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1096/
Test PASSed.

AmplabJenkins commented Feb 24, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1096/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 24, 2016

Member

Thank you, @massie! Merged manually.

Member

heuermh commented Feb 24, 2016

Thank you, @massie! Merged manually.

@heuermh heuermh closed this Feb 24, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment