Skip to content

Single-pass file reading #138

@Jelinek-J

Description

@Jelinek-J

Consider a simple script test.R:
#! /usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
Biostrings::readDNAStringSet(args[1])

and a simple fasta file test.fa:
>test
ACGT

If I run it directly $ ./test.R test.fa, it works correctly and prints this:
DNAStringSet object of length 1:
  width seq names
[1] 4 ACGT test

But if I use process substitution $ ./test.R <( cat test.fa ) (I need to preprocess the file), it only prints:
DNAStringSet object of length 0
Warning messages:
1: In file(fp) : using 'raw = TRUE' because '/dev/fd/63' is a fifo or pipe
2: In file(fp) : using 'raw = TRUE' because '/dev/fd/63' is a fifo or pipe

resp. throws an error if the fasta file is at least 4096 bytes long:
Error in .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, :
reading FASTA file /dev/fd/63: ">" expected at beginning of line 1
Calls: -> .read_XStringSet -> fasta.index -> .Call2
In addition: Warning messages:
1: In file(fp) : using 'raw = TRUE' because '/dev/fd/63' is a fifo or pipe
2: In file(fp) : using 'raw = TRUE' because '/dev/fd/63' is a fifo or pipe
Execution halted

From the warnings I gather that Biostrings loads files in two passes, which is not possible with process substitution. Is there any combination of arguments that will force loading files in single-pass? If not, would be possible to implement such an option?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions