Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Howto read a non-scalar string attribute using hdf5-rust #274

Closed
gpcureton opened this issue Mar 15, 2024 · 3 comments
Closed

Comments

@gpcureton
Copy link

I've got a HDF5 file with the following structure (viewed with h5dump):

❯ h5dump -n GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5
HDF5 "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5" {
FILE_CONTENTS {
 group      /
 group      /All_Data
 group      /All_Data/VIIRS-MOD-GEO-TC_All
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Height
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Latitude
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Longitude
 ...
 group      /Data_Products
 group      /Data_Products/VIIRS-MOD-GEO-TC
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Aggr
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0
 }
}

I am interested in using the hdf5-rust crate to read string attributes of both the root group /, and of the dataset /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0. The signature of the dataset attribute is

ATTRIBUTE "N_Granule_ID" {
   DATATYPE  H5T_STRING {
      STRSIZE 16;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
   DATA {
   (0,0): "NPP002194429582"
   }
}

I tried the following...

use anyhow::{Ok, Result};
use hdf5::File;
use ndarray::{Array, Array2};
use hdf5::types::VarLenUnicode;

fn main() -> Result<()> {

    filename = "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5".to_string();
    let file = File::open(filename)?;

    let dataset = file.dataset("Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0")?;
    let attribute = dataset.attr("N_Granule_ID")?;
    let datatype = attribute.dtype()?;
    let dims = attribute.ndim();

    let v_reader = attribute.as_reader();
    let v = v_reader.read::<VarLenUnicode, ndarray::Dim<[usize; 2]>>()?;

    Ok(())
}

at which the .read() method returns Error: no conversion paths found. I get the same error if I use

let v = attribute.read_2d::<VarLenUnicode>()?

or

let v = attribute.read_2d::<FixedUnicode<16_usize>>()?;

In each of these cases the variable v has the type ArrayBase<OwnedRepr<VarLenUnicode>, Dim<[usize; 2]>>.

Looking through the hdf5-rust examples and tests, I haven't been able to find any examples of reading a non-scalar string attribute with anything like a hl interface, and I suspect the stumbling block is that the attribute DATASPACE is for something like an array rather than a scalar.

@mulimoen
Copy link
Collaborator

Could you try reading as FixedAscii<16> instead of VarLenUnicode?

@gpcureton
Copy link
Author

gpcureton commented Mar 16, 2024

Thanks for your reply @mulimoen. For the root group attribute

let root_attr = file.attr("Mission_Name")?;

I tried

let v_reader = root_attr.as_reader();
let v = v_reader.read::<FixedAscii<4>, ndarray::Dim<[usize; 2]>>()?;
println!("\tv = {:?}", v);

and

let v = root_attr.read_2d::<FixedAscii<4>>()?;
println!("\tv = {:?}", v);

and they both gave the result

v = [["NPP"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

and I got to the attribute payload with

if let Some(x) = v.first() {
    print!("\tx = {:?}", x.to_string());
}

which is what I was after. Luckily the attributes I am interested in have fixed sizes which I know ahead of time. I'm going to check a string attribute which is a "vector" of strings, and then close this issue.

@gpcureton
Copy link
Author

I was also able to read in a "vector" string attribute (something like a list of filenames). The filenames are of differing sizes, but as long as the argument to FixedAscii<> is equal or greater than the longest filename, it works...

println!("\n\nReading dataset (15, 1) attribute...\n");

let dset_attr = dataset.attr("N_Anc_Filename")?;

let v = dset_attr.read_2d::<FixedAscii<104>>()?;

println!("\tv = {:?}", v);
println!("\tv.shape() = {:?}", v.shape());
println!("\tv.strides() = {:?}", v.strides());
println!("\tv.ndim() = {:?}", v.ndim());

let arr = v.iter().collect::<Vec<_>>();

let _ = arr
    .iter()
    .enumerate()
    .map(|(idx, val)| {
        println!("\tarr[{:?}] = {:?}", idx, val);
    })
    .collect::<Vec<_>>();

for (idx, val) in arr.iter().enumerate() {
    println!("\tarr[{:?}] = {:?} ({:?})", idx, val.to_string(), val.len());
}

This basically covers the most complicated use case for the files I am reading, so I'm closing this issue. Thanks again for your tip, @mulimoen !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants