-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ADAMContext loadXxx methods for consistency #1481
Milestone
Comments
$ javap -classpath adam-core/target/adam-core_2.10-0.23.0-SNAPSHOT.jar org.bdgenomics.adam.rdd.ADAMContext | grep load | grep -v default | sort
public <T extends org.apache.avro.specific.SpecificRecordBase> scala.collection.Seq<T> org$bdgenomics$adam$rdd$ADAMContext$$loadAvro(java.lang.String, org.apache.avro.Schema, scala.reflect.ClassTag<T>);
public <T> org.apache.spark.rdd.RDD<T> loadParquet(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>, scala.Option<org.apache.avro.Schema>, scala.Function1<T, org.apache.avro.specific.SpecificRecord>, scala.reflect.Manifest<T>);
public org.bdgenomics.adam.models.RecordGroupDictionary loadAvroReadGroupMetadata(java.lang.String);
public org.bdgenomics.adam.models.RecordGroupDictionary loadBamReadGroups(htsjdk.samtools.SAMFileHeader);
public org.bdgenomics.adam.models.RecordGroupDictionary org$bdgenomics$adam$rdd$ADAMContext$$loadAvroReadGroupMetadataFile(java.lang.String);
public org.bdgenomics.adam.models.SequenceDictionary loadAvroSequences(java.lang.String);
public org.bdgenomics.adam.models.SequenceDictionary loadBamDictionary(htsjdk.samtools.SAMFileHeader);
public org.bdgenomics.adam.models.SequenceDictionary org$bdgenomics$adam$rdd$ADAMContext$$loadAvroSequencesFile(java.lang.String);
public org.bdgenomics.adam.rdd.contig.NucleotideContigFragmentRDD loadFasta(java.lang.String, long);
public org.bdgenomics.adam.rdd.contig.NucleotideContigFragmentRDD loadParquetContigFragments(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>, scala.Option<org.apache.avro.Schema>);
public org.bdgenomics.adam.rdd.contig.NucleotideContigFragmentRDD loadSequences(java.lang.String, scala.Option<org.apache.avro.Schema>, long);
public org.bdgenomics.adam.rdd.feature.CoverageRDD loadCoverage(java.lang.String);
public org.bdgenomics.adam.rdd.feature.CoverageRDD loadParquetCoverage(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>);
public org.bdgenomics.adam.rdd.feature.FeatureRDD loadBed(java.lang.String, scala.Option<org.apache.spark.storage.StorageLevel>, scala.Option<java.lang.Object>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.feature.FeatureRDD loadFeatures(java.lang.String, scala.Option<org.apache.spark.storage.StorageLevel>, scala.Option<org.apache.avro.Schema>, scala.Option<java.lang.Object>);
public org.bdgenomics.adam.rdd.feature.FeatureRDD loadGff3(java.lang.String, scala.Option<org.apache.spark.storage.StorageLevel>, scala.Option<java.lang.Object>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.feature.FeatureRDD loadGtf(java.lang.String, scala.Option<org.apache.spark.storage.StorageLevel>, scala.Option<java.lang.Object>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.feature.FeatureRDD loadIntervalList(java.lang.String, scala.Option<java.lang.Object>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.feature.FeatureRDD loadNarrowPeak(java.lang.String, scala.Option<org.apache.spark.storage.StorageLevel>, scala.Option<java.lang.Object>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.feature.FeatureRDD loadParquetFeatures(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>, scala.Option<org.apache.avro.Schema>);
public org.bdgenomics.adam.rdd.fragment.FragmentRDD loadFragments(java.lang.String);
public org.bdgenomics.adam.rdd.fragment.FragmentRDD loadInterleavedFastqAsFragments(java.lang.String);
public org.bdgenomics.adam.rdd.fragment.FragmentRDD loadParquetFragments(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>, scala.Option<org.apache.avro.Schema>);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadAlignments(java.lang.String, scala.Option<org.apache.avro.Schema>, scala.Option<java.lang.String>, scala.Option<java.lang.String>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadBam(java.lang.String, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadFastq(java.lang.String, scala.Option<java.lang.String>, scala.Option<java.lang.String>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadIndexedBam(java.lang.String, org.bdgenomics.adam.models.ReferenceRegion);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadIndexedBam(java.lang.String, scala.collection.Iterable<org.bdgenomics.adam.models.ReferenceRegion>, scala.Predef$DummyImplicit);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadInterleavedFastq(java.lang.String);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadPairedFastq(java.lang.String, java.lang.String, scala.Option<java.lang.String>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadParquetAlignments(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>, scala.Option<org.apache.avro.Schema>);
public org.bdgenomics.adam.rdd.read.AlignmentRecordRDD loadUnpairedFastq(java.lang.String, scala.Option<java.lang.String>, boolean, boolean, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.variant.GenotypeRDD loadGenotypes(java.lang.String, scala.Option<org.apache.avro.Schema>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.variant.GenotypeRDD loadParquetGenotypes(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>, scala.Option<org.apache.avro.Schema>);
public org.bdgenomics.adam.rdd.variant.VariantContextRDD loadIndexedVcf(java.lang.String, org.bdgenomics.adam.models.ReferenceRegion);
public org.bdgenomics.adam.rdd.variant.VariantContextRDD loadIndexedVcf(java.lang.String, scala.collection.Iterable<org.bdgenomics.adam.models.ReferenceRegion>, htsjdk.samtools.ValidationStringency, scala.Predef$DummyImplicit);
public org.bdgenomics.adam.rdd.variant.VariantContextRDD loadVcf(java.lang.String, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.rdd.variant.VariantRDD loadParquetVariants(java.lang.String, scala.Option<org.apache.parquet.filter2.predicate.FilterPredicate>, scala.Option<org.apache.avro.Schema>);
public org.bdgenomics.adam.rdd.variant.VariantRDD loadVariants(java.lang.String, scala.Option<org.apache.avro.Schema>, htsjdk.samtools.ValidationStringency);
public org.bdgenomics.adam.util.ReferenceFile loadReferenceFile(java.lang.String, long);
public scala.Tuple3<org.bdgenomics.adam.models.SequenceDictionary, scala.collection.Seq<org.bdgenomics.formats.avro.Sample>, scala.collection.Seq<htsjdk.variant.vcf.VCFHeaderLine>> loadVcfMetadata(java.lang.String);
public scala.Tuple3<org.bdgenomics.adam.models.SequenceDictionary, scala.collection.Seq<org.bdgenomics.formats.avro.Sample>, scala.collection.Seq<htsjdk.variant.vcf.VCFHeaderLine>> org$bdgenomics$adam$rdd$ADAMContext$$loadSingleVcfMetadata(java.lang.String);
public scala.collection.Seq<org.bdgenomics.formats.avro.Sample> loadAvroSampleMetadata(java.lang.String); |
Per https://github.com/databricks/scala-style-guide#default-parameter-values perhaps it would be better to not use default parameter values at all. |
These methods both need default values, and that is disallowed by the scala compiler
def loadIndexedVcf(
pathName: String,
viewRegion: ReferenceRegion,
stringency: ValidationStringency = ValidationStringency.STRICT): VariantContextRDD = {
loadIndexedVcf(pathName, Iterable(viewRegion))
}
def loadIndexedVcf(
pathName: String,
viewRegions: Iterable[ReferenceRegion],
stringency: ValidationStringency = ValidationStringency.STRICT)(implicit s: DummyImplicit): VariantContextRDD = {
// ... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Suggestions:
@param
doc strings should be consistent@return
doc strings should be consistentlog.info
messages across load methods should be consistentMethod parameters of type
Path
should bepath
Method parameters of type
String
should bepathName
Method parameters of type
Option
should be namedoptXxx
Include
StorageLevel
parameter ifcache()
is calledAll load methods should include
stringency
parameter withSTRICT
defaultAll load methods should handle lenient and silent stringency consistently
All load methods should instrument the RDD if
Metrics.isRecording
All load methods should be wrapped in
LoadXxx.time {
metricsParquet load methods should include
optPredicate
andoptProjection
parameters withNone
defaultsType-guessing methods should use methods similar to
isVcfExt
for consistencyType-guessing methods should explicitly document support for compressed files
Type-guessing load methods should include
optPredicate
andoptProjection
parameters withNone
defaultsThe text was updated successfully, but these errors were encountered: