Skip to content

Speed up read.FCS() #280

@SamGG

Description

@SamGG

In my hands, most current FCS files use float (or double?) representation for intensity. I think that reading their matrix of intensities uses the following call.

flowCore/R/IO.R

Lines 906 to 907 in e898b05

readBin(con = con, what = dattype, n = count, size = size,
signed = signed, endian=endian)

readBin() is doing so much stuff that it makes reading a matrix from file very slow. As a workaround, I replaced the matrix reading with a Cpp code. In my workflow, reading and sampling 69 FCS goes from 180 sec to 30 sec. I use ff <- read.FCS(..., which.lines = 1:5) to get a flowFrame and all information from a FCS file. Then I get intensities matrix using Cpp code and replace the exprs(ff) of the flowFrame with the matrix. Unfortunately, it is sound difficult/dangereous to replace flowCore code above by infering the C file pointer from the R connection.
The following Cpp code works fine for float and should be adapted for double (or integers).
I think the great developpers of flowCore will perform a better job than me.

#include <Rcpp.h>
#include <fstream>
#include <stdexcept>

using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix read_fcs_data(
        const std::string& file_path,
        long byte_offset,
        long n_row,
        long n_par,
        bool swap
) {
    // open file in binary mode
    std::ifstream con(file_path, std::ios::binary);
    if (!con.is_open())
        stop("Cannot open file: " + file_path);
    
    // seek to byte offset
    con.seekg(byte_offset, std::ios::beg);
    if (con.fail())
        stop("Failed to seek to offset " + std::to_string(byte_offset));
    
    // read n_vals float32 values into a flat buffer
    int n_vals = n_row * n_par;
    std::vector<float> buf(n_vals);
    con.read(reinterpret_cast<char*>(buf.data()), n_vals * sizeof(float));
    if (con.fail())
        stop("Failed to read " + std::to_string(n_vals) + " values from file");
    
    // swap bytes if file endian differs from host
    if (swap) {
        for (int k = 0; k < n_vals; k++) {
            char* p = reinterpret_cast<char*>(&buf[k]);
            std::swap(p[0], p[3]);
            std::swap(p[1], p[2]);
        }
    }
    
    // fill matrix row-by-row (byrow = TRUE means values are row-major)
    NumericMatrix data_mat(n_row, n_par);
    for (int row = 0; row < n_row; row++) {
        for (int col = 0; col < n_par; col++) {
            data_mat(row, col) = static_cast<double>(buf[row * n_par + col]);
        }
    }
    
    return data_mat;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions