Skip to content

Flexible matrix for 2D data with annotated rows and columns

License

Notifications You must be signed in to change notification settings

dgront/datamatrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datamatrix provides a lightweight and efficient Rust implementation of a two-dimensional matrix of numeric values (f64) with labeled rows and columns. It is particularly suited for datasets where elements are naturally accessed by meaningful names rather than numeric indices. In addition to in-memory construction, the crate offers utilities to read matrices directly from structured text files.

  • Storage of 2D numeric data with row and column labels.

  • Indexing by position or by label.

  • Simple and expressive builder API for constructing matrices:

  • Reading from the following text file formats:

    • Three-column format: (row_label, column_label, value).
    • Single column of values: for square matrices.
    • Indexed format: explicit row/column indices with labels.
  • Optional symmetric filling, automatically populating both (i, j) and (j, i) for symmetric data (e.g., distances or correlations).

  • Transparent reading of compressed files (.gz, .bz2, .xz).

The following example_input.tsv input file with 3 columns:

gene sample value
G1 S1 0.81
G1 S2 0.93
G2 S1 0.72
G2 S2 1.00

can be loaded with the code given below:

use data_matrix::{DataMatrixBuilder, Error};
let dm = DataMatrixBuilder::new()
      .label_columns(0, 1)          // 0-based column indexes for row and column labels
      .data_column(2)               // numeric data column
      .separator('\t')              // optional; inferred from file extension if omitted
      .symmetric(false)             // this is the default behaviour
      .skip_header(true)
      .from_file("./tests/test_files/example_input.tsv")?;
println!("Matrix shape: {} × {}", dm.nrows(), dm.ncols());
// access by labels
println!("Value at (G1,S1): {:?}", dm.get_by_label("G1", "S1"));
// access by indexes
println!("Value at [0,1]: {:?}",  dm.get(0, 1));

By default, DataMatrixBuilder expects labels to be in the first two columns and the data in the third. The code above can be therefore shortened to:

use data_matrix::{DataMatrixBuilder, Error};
let matrix = DataMatrixBuilder::new().skip_header(true).from_file("./tests/test_files/example_input.tsv")?;
let value = matrix.get_by_label("G1", "S1");

Single column, three-column and five-column input files are supported. Alternatively, a DataMatrix struct can be created from raw data.

Add the following line to your Cargo.toml file an let cargo do the rest

[dependencies]
datamatrix = "0.2"

The project provides also Python bindings to the datamatrix crate, which allows to use it in Python scripts as below:

from datamatrix import DataMatrixBuilder

dmatrix = (DataMatrixBuilder()
    .label_columns(0, 1)
    .data_column(4)
    .index_columns(2, 3)
    .symmetric(True)
    .from_file("../../../tests/test_files/five_columns_short.txt"))
assert dmatrix.ncols() == 3
assert dmatrix.get_by_label("Bob", "Alice") == 1.5

You need maturin to compile the datamatrix Python module, which runs in a virtual environment You can use the requirements.txt file provided in ./bindings/python to ease the installation:

cd bindings/python

python3 -m venv .venv-maturin
source .venv-maturin/bin/activate

pip install -U pip
pip install -r requirements.txt

This compiles the Rust extension and installs the Python package into the active venv:

maturin develop --release
python -c "import datamatrix; print(datamatrix.__doc__[:60])"

Build wheels into target/wheels/:

maturin build --release

Licensed under Apache License, Version 2.0 (LICENSE-APACHE https://www.apache.org/licenses/LICENSE-2.0)

About

Flexible matrix for 2D data with annotated rows and columns

Resources

License

Stars

Watchers

Forks

Packages

No packages published