Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading whole file or FileIO integration ? #17

Open
jonathanBieler opened this issue Dec 9, 2022 · 2 comments
Open

Reading whole file or FileIO integration ? #17

jonathanBieler opened this issue Dec 9, 2022 · 2 comments
Labels
enhancement New feature or request I/O File reading / writing

Comments

@jonathanBieler
Copy link

jonathanBieler commented Dec 9, 2022

In 99% of my use cases l just want to read the whole bed file and get a vector of records. Doing so requires quite a bit of boilerplate :

# Import the BED module.
using BED

# Open a BED file.
reader = open(BED.Reader, "data.bed")

# Iterate over records.
for record in reader
    # Do something on record (see Accessors section).
    chrom = BED.chrom(record)
    # ...
end

# Finally, close the reader.
close(reader)

Boilerplate that every user will have to write (possibly several times). In comparison in Python you can do pr.read_bed(path). This seems like an important usability issue.

The solution would either to add a internal BED.load("file.bed") or to integrate FileO interface. I don't have a strong preference but l would also do the same for other "small" (that typically fit in memory) file format like VCF so it would be better to be consistent about it. To note FileIO also has a streaming interface for large files, so it could also be used for bams and fastqs.

@kescobo
Copy link
Member

kescobo commented Dec 9, 2022

There was a lot of discussion of a similar nature over at FASTX.jl (see eg BioJulia/FASTX.jl#76), and I think @jakobnissen has started putting in some work on that in BioGenerics.jl.

In short, you are completely correct 😉

@kescobo kescobo added enhancement New feature or request I/O File reading / writing labels Dec 9, 2022
@CiaranOMara
Copy link
Member

CiaranOMara commented Dec 11, 2022

I'm for FileIO integration, but think it should be done in a new BEDFiles.jl package.

As a result of @jakobnissen's work, it's possible to load all records with the following.

records = open(collect, BED.Reader, "data.bed")

This approach also closes the reader.

And for completeness, below is a longhand variant using the do syntax.

records = open(BED.Reader, "data.bed") do reader
    return collect(reader)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request I/O File reading / writing
Projects
None yet
Development

No branches or pull requests

3 participants