New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path #938

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@heuermh
Member

heuermh commented Feb 11, 2016

Fixes #937

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 11, 2016

Member

I'm not sure this is comprehensive enough, e.g. sometimes even Parquet files won't have .seqdict files

$ ./bin/adam-submit fasta2adam \
  adam-cli/src/test/resources/contigs.fa contigs.contig.adam

$ ./bin/adam-submit transform -aligned_read_predicate \
  contigs.contig.adam output.adam

Command body threw exception:
java.io.FileNotFoundException: File contigs.contig.adam.seqdict does not exist
  Exception in thread "main" java.io.FileNotFoundException:
    File contigs.contig.adam.seqdict does not exist
Member

heuermh commented Feb 11, 2016

I'm not sure this is comprehensive enough, e.g. sometimes even Parquet files won't have .seqdict files

$ ./bin/adam-submit fasta2adam \
  adam-cli/src/test/resources/contigs.fa contigs.contig.adam

$ ./bin/adam-submit transform -aligned_read_predicate \
  contigs.contig.adam output.adam

Command body threw exception:
java.io.FileNotFoundException: File contigs.contig.adam.seqdict does not exist
  Exception in thread "main" java.io.FileNotFoundException:
    File contigs.contig.adam.seqdict does not exist
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 11, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1079/
Test PASSed.

AmplabJenkins commented Feb 11, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1079/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 11, 2016

Member

Sorry; should be checking that the file is a Parquet file of alignment records.

Alternatively, we could make this a ValidationStringency effect. E.g., if you have ValidationStringency.LENIENT, we give a RecordGroupDictionary.empty/SequenceDictionary.empty if they're missing.

Member

fnothaft commented Feb 11, 2016

Sorry; should be checking that the file is a Parquet file of alignment records.

Alternatively, we could make this a ValidationStringency effect. E.g., if you have ValidationStringency.LENIENT, we give a RecordGroupDictionary.empty/SequenceDictionary.empty if they're missing.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 13, 2016

Member

Sorry; should be checking that the file is a Parquet file of alignment records.

What do you mean by this, flipping the logic of isNonParquet around?

Member

heuermh commented Feb 13, 2016

Sorry; should be checking that the file is a Parquet file of alignment records.

What do you mean by this, flipping the logic of isNonParquet around?

@heuermh heuermh changed the title from Adding check for aligned read predicate or limit projection flags and non-parquet input path to [ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path Feb 13, 2016

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 13, 2016

Member

What do you mean by this, flipping the logic of isNonParquet around?

Oh, I just meant that it shouldn't check for any Parquet dataset. Rather, the check should be more specific and check that it is a Parquet dataset of AlignmentRecords.

Member

fnothaft commented Feb 13, 2016

What do you mean by this, flipping the logic of isNonParquet around?

Oh, I just meant that it shouldn't check for any Parquet dataset. Rather, the check should be more specific and check that it is a Parquet dataset of AlignmentRecords.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 13, 2016

Member

Can't do that by filename check, so it would have to come later, instead of fail fast in the cli class.

Member

heuermh commented Feb 13, 2016

Can't do that by filename check, so it would have to come later, instead of fail fast in the cli class.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 13, 2016

Member

Ah, yes, fair. I think we suggest "*.contig.adam" as a convention for files of NucleotideContigFragments, but we don't enforce it anywhere. We can patch that later, if needed.

Do you want to do any of the ValidationStringency stuff I suggested earlier, or shall we skip that for now and possibly come back to it later?

Member

fnothaft commented Feb 13, 2016

Ah, yes, fair. I think we suggest "*.contig.adam" as a convention for files of NucleotideContigFragments, but we don't enforce it anywhere. We can patch that later, if needed.

Do you want to do any of the ValidationStringency stuff I suggested earlier, or shall we skip that for now and possibly come back to it later?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 13, 2016

Member

I think we suggest "*.contig.adam" as a convention for files of NucleotideContigFragments, but we don't enforce it anywhere.

Yep, I cribbed from here
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala#L858

I can add the check for *.contig.adam if useful. I think validation strategy might be overkill.

Member

heuermh commented Feb 13, 2016

I think we suggest "*.contig.adam" as a convention for files of NucleotideContigFragments, but we don't enforce it anywhere.

Yep, I cribbed from here
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala#L858

I can add the check for *.contig.adam if useful. I think validation strategy might be overkill.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 13, 2016

Member

I can add the check for *.contig.adam if useful. I think validation strategy might be overkill.

This is probably sufficient for now. We can add the two later if need be. I'll merge this shortly.

Member

fnothaft commented Feb 13, 2016

I can add the check for *.contig.adam if useful. I think validation strategy might be overkill.

This is probably sufficient for now. We can add the two later if need be. I'll merge this shortly.

@heuermh heuermh modified the milestone: 0.19.0 Feb 16, 2016

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Feb 24, 2016

Member

Thanks, Michael.

Member

massie commented Feb 24, 2016

Thanks, Michael.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Feb 24, 2016

Member

Merged manually.

Member

massie commented Feb 24, 2016

Merged manually.

@massie massie closed this Feb 24, 2016

@heuermh heuermh deleted the heuermh:issue-937 branch Jun 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment