-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fixed-length strings with UTF-8 character set #270
Comments
Are they supported in the library? |
Yes, string encoding and how many bytes are reserved for its storage are decoupled. |
Yep, see here for an example of fixed-length unicode strings being used in datasets/attributes - the native VOL passes both of these tests. |
The question may be more related to how h5py treats HDF5 strings where this combo is not really supported. Any fixed-length string is treated as |
A fixed width unicode would be utf-32, but like @ajelenak says, it's not explicitly supported by the library. (or HSDS). |
I think there's a confusion in terminology here. The request is not support for a unicode character encoding where each particular character has a fixed width in bytes (e.g. UTF-32), but support for string datatypes that have a fixed total length in bytes (fixed length strings) AND have the character set/encoding UTF-8 (where a particular character does not have a fixed number of bytes associated with it). I've updated the title of this issue to be more clear. The library does support fixed-length strings in UTF-8 (See the tests I linked above). |
Implemented in #278 |
HSDS currently does not support these (see
hdf5dtype.py:617
)The text was updated successfully, but these errors were encountered: