New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All file-based input methods should support running on directories, compressed files, and wildcards #993

Closed
heuermh opened this Issue Apr 7, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@heuermh
Member

heuermh commented Apr 7, 2016

Following up on a frequently asked question via irc, would it make sense to follow Spark's lead:

All of Spark’s file-based input methods, including textFile, support running on directories, compressed files, and wildcards as well. For example, you can use textFile("/my/directory"), textFile("/my/directory/*.txt"), and textFile("/my/directory/*.gz").

https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell

Wildcard globs are supported in a few places (e.g. vcf2adam, perhaps others), and compressed files are supported where Hadoop-BAM supports them. I've never tried using a directory.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Apr 7, 2016

Member

Yeah, I would +1 that.

Member

fnothaft commented Apr 7, 2016

Yeah, I would +1 that.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Apr 7, 2016

Member

The only place I think this might fall down is on paired FASTQ files, or maybe we can rely on convention for that.

Member

heuermh commented Apr 7, 2016

The only place I think this might fall down is on paired FASTQ files, or maybe we can rely on convention for that.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Apr 7, 2016

Member

FASTQ makes me sad, but paired FASTQ makes me sadder.

Member

fnothaft commented Apr 7, 2016

FASTQ makes me sad, but paired FASTQ makes me sadder.

@laserson

This comment has been minimized.

Show comment
Hide comment
@laserson

laserson Apr 11, 2016

Contributor

+1 on supporting this

Contributor

laserson commented Apr 11, 2016

+1 on supporting this

@fnothaft fnothaft added this to the 0.21.0 milestone Jul 20, 2016

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 18, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 18, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 18, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 31, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 31, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 31, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files

fnothaft added a commit to fnothaft/adam that referenced this issue Sep 1, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files

@heuermh heuermh closed this in 08950d4 Sep 6, 2016

@heuermh heuermh modified the milestones: 0.21.0, 0.20.0 Oct 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment