Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: dtype system and integrating record types #254

Closed
aldanor opened this issue Jan 9, 2022 · 58 comments
Closed

Discussion: dtype system and integrating record types #254

aldanor opened this issue Jan 9, 2022 · 58 comments

Comments

@aldanor
Copy link
Contributor

aldanor commented Jan 9, 2022

I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.

Let's look at Element:

pub unsafe trait Element: Clone + Send {
    const DATA_TYPE: DataType;
    fn is_same_type(dtype: &PyArrayDescr) -> bool;
    fn npy_type() -> NPY_TYPES { ... }
    fn get_dtype(py: Python) -> &PyArrayDescr { ... }
}
  • npy_type() is used in PyArray::new() and the like. Instead, one should use PyArray_NewFromDescr() to make use of the custom descriptor. Should all places where npy_type() is used split between "simple type, use New" and "user type, use NewFromDescr"? Or, alternatively, should arrays always be constructed from descriptor? (in which case, npy_type() becomes redundant and should be removed)
  • Why is same_type() needed at all? It is only used in FromPyObject::extract where one could simply use PyArray_EquivTypes (like it's done in pybind11). Isn't it largely redundant? (or does it exist for optimization purposes? In which case, is it even noticeable performance-wise?)
  • DATA_TYPE constant is really only used to check if it's an object or not in 2 places, like this:
    if T::DATA_TYPE != DataType::Object
    Isn't this redundant as well? Given that one can always do
    T::get_dtype().get_datatype() != Some(DataType::Object)
    // or, can add something like: T::get_dtype().is_object()
  • With all the notes above, Element essentially is just
     pub unsafe trait Element: Clone + Send {
         fn get_dtype(py: Python) -> &PyArrayDescr;
     }
  • For structured types, do we want to stick the type descriptor into DataType? E.g.:
    enum DataType { ..., Record(RecordType) }
    Or, alternatively, just keep it as DataType::Void? In which case, how does one recover record type descriptor? (it can always be done through numpy C API of course, via PyArrayDescr).
  • In order to enable user-defined record dtypes, having to return &PyArrayDescr would probably require:
    • Maintaining a global static thread-safe registry of registered dtypes (kind of like it's done in pybind11)
    • Initializing this registry somewhere
    • Any other options?
  • Element should probably be implemented for tuples and fixed-size arrays.
  • In order to implement structured dtypes, we'll inevitably have to resort to proc-macros. A few random thoughts and examples of how it can be done (any suggestions?):
    • #[numpy(record)]
      #[derive(Clone, Copy)]
      #[repr(packed)]
      struct Foo { x: i32, u: Bar } // where Bar is a registered numpy dtype as well
      // dtype = [('x', '<i4'), ('u', ...)]
    • We probably have to require either of #[repr(C)], #[repr(packed)] or #[repr(transparent)]
    • If repr is required, it can be an argument of the macro, e.g. #[numpy(record, repr = "C")]. (or not)
    • We also have to require Copy? (or not? technically, you could have object-type fields inside)
    • For wrapper types, we can allow something like this:
    • #[numpy(transparent)]
      #[repr(transparent)]
      struct Wrapper(pub i32);
      // dtype = '<i4'
    • For object types, the current suggestion in the docs is to implement a wrapper type and then impl Element for it manually. This seems largely redundant, given that the DATA_TYPE will always be Object. It would be nice if any #[pyclass]-wrapped types could automatically implement Element, but it would be impossible due to orphan rule. An alternative would be something like this:
      #[pyclass]
      #[numpy] // i.e., #[numpy(object)]
      struct Foo {}
    • How does one register dtypes for foreign (remote) types? I.e., OrderedFloat<f32> or Wrapping<u64> or some PyClassFromOtherCrate? We can try doing something like what serde does for remote types.
@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

Related: #234, #186.

@adamreichold
Copy link
Member

W.r.t. the question on why Element looks it does, I can only answer

DATA_TYPE constant is really only used to check if it's an object or not in 2 places, like this:

for lack of involvement in the other design decisions. Here, we discussed that a simple const TRIVIALLY_COPYABLE: bool; could serve the same purpose but having the enum does not really hurt either. In any case, this being a compile-time constant is certainly nice as it will completely remove one of the two branches from the generated code where this is checked

Maintaining a global static thread-safe registry of registered dtypes (kind of like it's done in pybind11)

I think having a GIL-protected lazily initialized static like in

fn type_object_raw(py: pyo3::Python) -> *mut ffi::PyTypeObject {
static TYPE_OBJECT: LazyStaticType = LazyStaticType::new();
TYPE_OBJECT.get_or_init::<Self>(py)
}

might be preferable to a single global data structure?

Should all places where npy_type() is used split between "simple type, use New" and "user type, use NewFromDescr"?

While the overhead of producing a PyArrayDesc from a suitably synchronized data structure is probably small, I suspect that not having to produce it is still faster. Or does NumPy resort to PyArray_DescrFromType internally in any case?

@adamreichold
Copy link
Member

Or does NumPy resort to PyArray_DescrFromType internally in any case?

That seems to be the case indeed:

https://github.com/numpy/numpy/blob/1798a7d463590de6dc0dfdad69735d92288313f3/numpy/core/src/multiarray/ctors.c#L1105

So we can probably replace our calls to PyArray_New. The only remaining difference I see is that we have to go through PY_ARRAY_API which implies an additional acquire load.

In order to implement structured dtypes, we'll inevitably have to resort to proc-macros.

I think it would actually be helpful to ignore this part completely for now and design the necessary types without any macro support first. The proc-macros should essentially only relieve programmers from writing repetitive and error prone code, shouldn't they?

@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

we discussed that a simple const TRIVIALLY_COPYABLE: bool; could serve the same purpose but having the enum does not really hurt either. In any case, this being a compile-time constant is certainly nice as it will completely remove one of the two branches from the generated code where this is checked

Then my initial guess was correct. I wonder though whether it's actually noticeable, i.e. has anyone tried to measure it? With all the other Python cruft on top, I would strongly suspect there'd be no visible difference at all. I understand where it comes from, it's just there's another implicit logical requirement in the Element trait - and it's that Self::get_dtype(py).get_datatype() == Some(Self::DATA_TYPE) - and this requirement is imposed only (?) to have DATA_TYPE as a compile time thing instead of runtime (where at runtime it's a single load dtype_ptr->type_num).

I think it would actually be helpful to ignore this part completely for now and design the necessary types without any macro support first. The proc-macros should essentially only relieve programmers from writing repetitive and error prone code, shouldn't they?

With record types, it's pretty much a requirement; anything else would be extremely boilerplate-ish and unsafe (with the user having to provide all offsets and descriptors for all fields manually). It might be nice to leave an option to construct descriptors manually at runtime, but I can't see a single real use case for it off top of my head. So, the necessary types and the infrastructure have to be designed in a way that would make it trivial to integrate them into proc macros, from the get go.

Or does NumPy resort to PyArray_DescrFromType internally in any case?

Yes, I think it does (and so does pybind11).

I think having a GIL-protected lazily initialized static like in ... might be preferable to a single global data structure?

Yep - thanks for pointing that out, something like that should be sufficient.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

Just checked it out: if one replaces

if T::DATA_TYPE != DataType::Object {

with something runtime like

        if !T::get_dtype(py).is_object() {

the difference in from_slice runtime for an array of 5 elements is 1ns (20ns vs 21ns).

For comparison, creating arange(1, 5, 1) is 31ns.

So yea... you could say it's noticeable (but only if you create hundreds millions of arrays).

@adamreichold
Copy link
Member

and this requirement is imposed only (?) to have DATA_TYPE as a compile time thing instead of runtime

ATM, the DATA_TYPE member is used to generate default implementations for the other methods, i.e. it is a utility to simplify the implementation. If same_type and npy_type are dropped in favour of using get_dtype we could certainly remove remove this member. But I also do not really see a significant cost to having at least a const TRIVIAL: bool; member.

(where at runtime it's a single load dtype_ptr->type_num).

I think if anything has measurable overhead, then it is acquiring the reference to a PyArrayDesc itself, either by accessing PY_ARRAY_API.PyArray_DescrFromType or by accessing a lazily initialized static.

It might be nice to leave an option to construct descriptors manually at runtime, but I can't see a single real use case for it off top of my head. So, the necessary types and the infrastructure have to be designed in a way that would make it trivial to integrate them into proc macros, from the get go.

proc-macros generate code that is injected into the dependent crate, i.e. they have to use this crate's public API and all the code they produce must have been possible to be written by hand. The point is not that I expect user code to manually produce those trait implementations (but avoiding the build time overhead of proc-macros would be one reason to do so), but that we can focus the discussion on the mechanisms and leave the proc-macro UI for later.

@adamreichold
Copy link
Member

So yea... you could say it's noticeable (but only if you create hundreds millions of arrays).

One possible caveat with a targeted benchmark here is that deciding this at compile time is about removing code and thereby reducing instruction cache pressure which could only be noticeable when other code shares the address space.

@adamreichold
Copy link
Member

As a first actionable result of your findings, why not start with a PR to simplify the Element trait to avoid having same_type and npy_type and directly producing a descriptor via PyArrayDescr::from_npy_type? I think that would certainly be a nice simplification. Personally, I would like this combined with a const TRIVIAL: bool flag, but I understand that I am most likely not the only one with an opinion on this.

@adamreichold
Copy link
Member

adamreichold commented Jan 9, 2022

For object types, the current suggestion in the docs is to implement a wrapper type and then impl Element for it manually. This seems largely redundant, given that the DATA_TYPE will always be Object.

I fear that it might actually be unsound: During extraction, we only check the data type to be object, but not whether all the elements are of type Py<T> for the givenT: PyClass. So e.g. .readonly().as_array() would enable safe code to call methods on T even though completely different - possibly heterogeneously typed - objects are stored in the array.

I fear we need to remove this suggestion and limit ourselves to Py<PyAny> as the only reasonable Element implementation for now. I also wonder how checking for homogeneity during extraction is best handled if we do want to properly support PyArray<Py<T>, D>.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

One possible caveat with a targeted benchmark here is that deciding this at compile time is about removing code and thereby reducing instruction cache pressure which could only be noticeable when other code shares the address space.

While I'm often guilty in micro-optimizing Rust code down to every single branch hint myself, I just don't think it matters here, the last thing you're going to want to be doing in numpy bindings is instantiate millions of array objects in a loop - that was kind of my point.

Here's a few more thoughts on what would need to happen if const DATA_TYPE remains in place:

  • NB: DATA_TYPE != DataType::Object would no longer be sufficient - because record dtypes can be both trivially copyable (requires all fields to be trivially copyable) and not.
  • If we want to keep the data type const, this would imply DataType::Record(<...>) would need to be const-constructible as well. This implies that e.g. all fields names need to be &'static, all offsets are const and so on. It's probably doable but will be tough (see below).
  • Note also that MSRV of 1.48.0 doesn't allow us access to min const generics, so messing with consts is going to be even harder. Also, std::ptr::addr_of!() has been only stabilized in the same release (1.51.0), if I remember correctly.

Oh yea, and also, record types can be nested - that would be fun to implement as const as well (at runtime we can just box it all on the heap).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

The note above only applies if it remains as DATA_TYPE, of course.

If it's changed to IS_POD: bool, then DataType can become a complex non-const type (to host the record dtype descriptor), etc.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

all the elements are of type Py<T> for the givenT: PyClass. So e.g. .readonly().as_array() would enable safe code to call methods on T even though completely different - possibly heterogeneously typed - objects are stored in the array.

I fear we need to remove this suggest and limit ourselves to Py<PyAny> as the only reasonable Element implementation for now. I also wonder how checking for homogeneity during extraction is best handled if we do want to properly support PyArray<Py<T>, D>.

Yea, I was asking myself whether I was reading that code correctly - it only checks for Object dtype but not the type of objects themselves.

There's three options:

  • Always safe: Py<PyAny>, in which case it's always good and safe.
  • Py<T>, in which case, unfortunately, we need to isinstance-check every single element in order to keep it safe.
  • Unchecked and unsafe version of Py<T> which would be nice to keep available.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 9, 2022

As a first actionable result of your findings, why not start with a PR to simplify the Element trait to avoid having same_type and npy_type and directly producing a descriptor via PyArrayDescr::from_npy_type? I think that would certainly be a nice simplification. Personally, I would like this combined with a const TRIVIAL: bool flag, but I understand that I am most likely not the only one with an opinion on this.

Yes, I've already started on this, as a first step, will try to push something shortly. There's actually a few more subtle dtype-related bugs I've uncovered in the process that I'll fix as well (the tests are very sparse so I guess noone noticed anything so far...).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 10, 2022

(This is a bit of a wall of text, so thanks in advance to whoever reads through it in its entirety. I tried to keep it as organized as possible.)

tagging @adamreichold @kngwyu @davidhewitt

Ok, now that step 1 (#256) is ready to be merged, let's think about next steps. I'll use pybind11 as reference here since that's what I'm most familiar with having implemented a big chunk of that myself in the past.

Descriptor (format) strings and PEP 3118

In pybind11, dtypes can be constructed from buffer descriptor strings (see PEP 3118 for details). It actually calls back to NumPy's Python API to do that (numpy.core._internal._dtype_from_pep3118), but it's possible to reimplement it manually if we want to - I would actually learn towards the latter, provided we borrow and verify all the related tests from numpy itself.

The benefit of using descriptor strings - they're easy to hash if a registry is used, there's no nesting, etc - it's just a simple string that can cover any dtype possible, including scalar types, object types, multi-dimensionals subarrays, recarrays, and any combination of the above. We can also generate them with proc-macros at compile time and make them &'static if need be.

Now, if we go this route, we'll have to delve into "format chars", "type kinds", "buffer descriptor strings" etc. There's obviously a big overlap with pyo3::buffer module as there's ElementType::from_format and all the stuff around it, but it won't be sufficient without some reworking. Which raises the question: do we want to copy/paste stuff in pyo3::buffer and reimplement it to suit our needs? Or do we want to make it pybind11-like so that numpy arrays and buffers are tightly bound together, sharing the same infrastructure? See the next section for more details.

Buffers, pyo3::buffer and rust-numpy

In pybind11, buffers and arrays are bound together (like they should be), i.e. you can instantiate one from another (both ways) while controlling who owns the data. There's no such thing in rust-numpy and I think it's a pretty big gap.

There's pyo3::buffer that we can try to integrate with, but, off top of my head, there's some issues with it. But before we continue, consider an example:

>>> dt = np.dtype([('x', '<i8'), ('y', '<U1'), ('z', 'O', (1, 2))])

>>> arr = np.rec.fromarrays(
...     [[1, 2], ['a', 'b'], [[[int, str]], [[float, list]]]], 
...     dtype=dt,
... )

>>> arr
rec.array([(1, 'a', [[<class 'int'>, <class 'str'>]]),
           (2, 'b', [[<class 'float'>, <class 'list'>]])],
          dtype=[('x', '<i8'), ('y', '<U1'), ('z', 'O', (1, 2))])

>>> arr.data.format
'T{=q:x:@1w:y:(1,2)O:z:}'

Here, arr.data is the matching Python buffer (which you can also get via memoryview(arr)). Normally, you can do np.array(arr.data) and get the identical array back, but funnily enough, this exact example triggers the padding bug I have already reported in numpy upstream 6 years ago (numpy/numpy#7797) and which was then patched manually in pybind11 by hand.

Back to pyo3::buffer:

pub enum ElementType {
    SignedInteger { bytes: usize },
    UnsignedInteger { bytes: usize },
    Bool,
    Float { bytes: usize },
    Unknown,
}

Here's what's wrong with this enum:

  • It allows unsupported types, like Float(3)
  • It doesn't support complex types
  • It doesn't support struct types
  • It doesn't support subarrays
  • It doesn't support object types

It's basically a distant relative of DataType that we have just removed from rust-numpy in favor of a single dtype pointer. With these limitations, there's no way we can make proper use of pyo3::buffer and we'll end up duplicating lots of its logic in rust-numpy.

And here's the trait:

pub unsafe trait Element: Copy {
    fn is_compatible_format(format: &CStr) -> bool;
}

Note that this is similar to numpy::dtype::Element and is_compatible_format is equivalent to same_type() which we have just cut out. And again, there's some issues with it:

  • Copy is unnecessary as it blocks object types
    • ... and if Copy bound is removed, then some methods like PyBuffer::copy_to_slice() are no longer valid and should be rewritten same way as in rust-numpy (taking IS_COPY into account)
  • It's impossible to recover the format descriptor from a type (e.g. u32) - kind of like we can do with numpy::dtype::<u32>(py).

One thing that could be done to improve it would be this:

pub unsafe trait Element: Clone + Send {
    const IS_COPY: bool;
    fn format() -> &'static CStr;
}

fn can_convert(from: &CStr, to: &CStr) -> bool { ... } // or whichever other way it's done

And then ElementType can be removed. Alternatively, it can be replaced with FormatString(&CStr) which would then be returned by Element::format().

Note how close it is now to the newly reworked numpy::Element - the only difference being in nul-terminated string pointer being returned instead of the dtype pointer.

Note also that any valid T: pyo3::buffer::Element is (logically speaking) also a valid numpy::dtype::Element since dtype ptr can always be built from a valid format string (the inverse is not always true, e.g. due to M and m types; but in most cases it can be worked around if need be).

What to do next

Should the buffers be touched up in pyo3 first, so we can then use that in rust-numpy to add support for record types, subarrays, etc? In which case, what exactly should be done?

One interesting point for discussion may be - could we share some of the functionality between buffers and numpy arrays? (in pybind11, array is derived from buffer and array_t is derived from array - which is quite nice and avoid lots of duplication) Just some random thinking, e.g. maybe buffer-like types like the buffer itself or numpy array could deref to some common base type which would provide shared functionality. Or maybe we could somehow try and connect the two Element traits together.

@adamreichold
Copy link
Member

Just my first thoughts after reading:

  • Personally, I am neither fond of stringly-typed mini languages nor of global registries. (I don't think hashing is an issue having derive(Hash) and while nesting is usually handled via Box, I think it could also be handled via references to produce static data.)
  • I do like the idea of integrating the Element trait here with the Element trait from pyo3::buffer, even though I would prefer to keep using a strongly-typed representation of the format like ElementType.

@kngwyu
Copy link
Member

kngwyu commented Jan 11, 2022

Yeah, I really don't like the current disagreement between PyBuffer and rust-numpy.
I actually tried to refactor the PyBuffer years before, but I didn't have a good idea about how to merge two parts.
So,

  • About Element: I like the idea of using the same Element trait between two crates. Then the refactoring of PyO3's buffer module should be prioritized.
  • About formatting: I also don't like the format string since it looks too complicated for me, but can we say that ElementType is strongly typed? As @aldanor points out, it allows some invalid configs, which I really don't like. Anyway some improvements are required here.

(in pybind11, array is derived from buffer and array_t is derived from array - which is quite nice and avoid lots of duplication) Just some random thinking, e.g. maybe buffer-like types like the buffer itself or numpy array could deref to some common base type which would provide shared functionality. Or maybe we could somehow try and connect the two Element traits together.

I don't think using Deref for resembling OOP is not a recommended way, so I prefer the abstraction by trait.

@adamreichold
Copy link
Member

but can we say that ElementType is strongly typed? As @aldanor points out, it allows some invalid configs, which I really don't like. Anyway some improvements are required here.

One could maybe say it is strongly wrongly typed. ;-) More seriously though, I just meant to say that it would be nice to represent the descriptors within the type system directly which we will probably do anyway if do not consistently delegate their interpretation to Python code.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 11, 2022

Ok then. I think I'll try to sketch type descriptor system (focused on buffers to start with, not numpy, as the only real difference would be datetime/timedelta types) in a separate temporary crate and then share it here for further discussion if it works out.

Here's a few random points, assuming there's some magic TypeDescriptor<T> where T is the "leaf type" (i.e. enum with a bunch of type primitives):

  • It should be possible to construct it statically (so we can have some consts for std types and to be used in proc-macro codegen). This is not very trivial given that it contains nesting, repetitions etc - but doable.
  • The only type information we have about Python buffers is their format strings. So if want to check "is type T compatible with buffer X", or maybe, "is buffer A type-equivalent to buffer B", we will have to first convert the format string to the TypeDescriptor (i.e. implement proper parser).
  • Because of the above, it needs to be possible to instantiate the type descriptor dynamically as well, not only statically.
  • To compare equivalence of two type descriptors, we would need to normalise them (i.e., for struct types, infer all offsets of all fields based on padding, alignment etc). There's no other way other than reimplementing np.core.internal._dtype_from_pep3118 (while fixing trailing padding bug). I'm planning to import all related tests from ctypes, all related tests from numpy, add corner case tests and maybe generate random tests (with answers being generated via numpy); it's a very dodgy piece of logic and easy to mess up unless tested well.
  • The normalisation would also allow to create dtypes out of type descriptors (i.e. via converting them first to Python strings of tuples of objects offsets, names, dtypes). However, it'd be nice to keep the 'fast path' for trivial types like integers etc like it's currently done (via typenums).
  • On numpy side, instead of doing PyArray_EquivTypes which is a very complicated procedure (as it goes through the casting rules), we could compare normalised type descriptors ourselves which might end up being simpler / faster. Would be interesting to run some benchmarks (and, we can always stick PyArray_EquivTypes in a debug assert for some peace of mind).
  • Numpy doesn't deviate much from the base buffer protocol, however there's M and m types that are not part of pep3118. You can't create Python buffer from a numpy array if there's those field types. However, technically, they act like simple integers, e.g. for alignment purposes. Not sure what's the best plan of action is.
  • How to link Element traits? Well, for pyo3::buffer one could do something like:
    // pyo3::buffer
    pub unsafe trait Element {
        const IS_COPY: bool;
        fn type_descriptor() -> &'static TypeDescriptor<BufferType>;
        // where BufferType includes all of the PEP3118 types supported by numpy except some weird ones 
    }
    For numpy::dtype::Element, we would ideally want to extend the API to something like
    // numpy::dtype
    pub unsafe trait Element {
        const IS_COPY: bool;
        fn dtype_ptr(py: Python) -> &PyArrayDescr {
            // we may provide default impl here, creating the dtype from type_descriptor()
        }
        fn type_descriptor() -> &'static TypeDescriptor<NumpyType>;
        // where NumpyType is either BufferType, or M/m
    }
    Need to think a bit more how to link it all here.

@kngwyu
Copy link
Member

kngwyu commented Jan 12, 2022

Sorry but I still don't understand what kind of recursiveness/hierarchy should be allowed to support record types... I'll come back to discussion after reading some PEPs 😓

@adamreichold
Copy link
Member

Because of the above, it needs to be possible to instantiate the type descriptor dynamically as well, not only statically.

I see two options from the top of my head: Using arena allocation if recursion is always handled by references. Or using a Cow-like type for static references or boxes. Admittedly the ergonomics of neither approach are particularly nice as Rust is highly explicit about memory management as usual... (If we are out for efficiency Sob could play tricks with the lower bits of the pointer to indicate whether it is owned or static, so there could be no space overhead at all.)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 12, 2022

@kngwyu See below in this sketch where type recursion is (sub-array types - an array may contain any type as element type, and element type may be another subarray).

@adamreichold Yes, that's what I've already implemented (BoxCow). I wouldn't say the ergonomics are "not nice", it's pretty much like a normal Cow, and it all derefs to the base type when you use it so you don't really care about it much except for the cases when you need to clone/convert it.

pub struct FieldDescriptor<T: ValueType> {
    ty: TypeDescriptor<T>,
    name: Option<Cow<'static, str>>,
    offset: usize,
}

pub struct RecordDescriptor<T: ValueType> {
    fields: Cow<'static, [FieldDescriptor<T>]>,
    itemsize: usize,
}

pub struct ArrayDescriptor<T: ValueType> {
    ty: BoxCow<'static, TypeDescriptor<T>>,
    shape: Cow<'static, [usize]>,
}

pub enum TypeDescriptor<T: ValueType> {
    Object,
    Value(T),
    Record(RecordDescriptor<T>),
    Array(ArrayDescriptor<T>),
}

@aldanor
Copy link
Contributor Author

aldanor commented Jan 13, 2022

Ok, I have started sketching a few things out, please feel free to browse around (although it's obviously a very early WIP sketch, I figured I'd better share it asap so as not to spend too much time on something that will be thrown away):

What's implemented so far:

  • Type descriptors (desc::TypeDescriptor)
  • Standard scalar types as the mix of struct spec and PEP 3118 (scalar::Scalar)
  • Format string parser (parse::parse_type_descriptor) that mimics np.core.internal._dtype_from_pep3118 but with a few bug fixes - this seems to work on a few examples but would need a very extensive suite of tests; there's also a whole bunch of todos scattered around the code with various questions
  • Element trait that's implemented for built-in types
    • This is something inbetween the rust-numpy's and pyo3's Element - basically, there's a single const TypeDescriptor which would also remove the need for IS_COPY flag

I'm currently pondering on how to link the two Element traits as discussed above in this thread, see numpy-type-desc/src/element.rs, but I'm facing some design problems, especially with the lack of min const generics since our msrv is lower than 1.51... Maybe the design can be changed so it works out - any ideas are welcome.

What I've tried to do:

// pyo3
pub enum Scalar {
    // all the PEP 3118 types: bools, ints, floats, complex, char strings, etc
}
pub unsafe trait Element: Clone + Send {
    const TYPE: TypeDescriptor<Scalar>;
    // this Scalar covers the most of PEP 3118 type system
}

// numpy
pub enum Scalar {
    Base(pyo3_type_desc::Scalar),
    Datetime,  // 'M'
    Timedelta, // 'm'
}
pub unsafe trait Element: Clone + Send {
    const TYPE: TypeDescriptor<Scalar>;
    // this Scalar is either the PEP 3118 or M/m types
}

The problem is in implementing a blanket impl like this:

impl<T: pyo3_type_desc::Element> Element for T {
    const TYPE: TypeDescriptor<Scalar> = ???;
    // How do we convert TypeDescriptor<A> to TypeDescriptor<B> here?
    // See TypeDescriptor::map() - but it would not work in const context :(
}

Having a const type descriptor is super nice, but maybe for numpy::Element it's impossible? (in which case you could of course have a type_descriptor() -> TypeDescriptor<Scalar>. Or maybe I'm just stuck on a wrong design and this could be done completely differently?

Anyways, any thoughts welcome. I have a feeling we can do it 😺

@aldanor
Copy link
Contributor Author

aldanor commented Jan 13, 2022

Just for completeness, the hack/solution I've mentioned in the last paragraph is basically

// numpy
pub trait Element: Clone + Send {
    fn type_descriptor() -> TypeDescriptor<Scalar>;
}
impl<T: pyo3_type_desc::Element> Element for T {
    fn type_descriptor() -> TypeDescriptor<Scalar> {
        T::TYPE.map(|&scalar| Scalar::Base(scalar))
    }
}

So, type_descriptor() technically stops being a const and can no longer be used in const context (which is not nice), but, in practice, it will be pretty much always evaluated as a const and the whole thing is going to be inlined; there's no overhead here. Also, TypeDescriptor<Scalar> will be still constructible in const contexts (e.g. in proc-macros), so for generated code you could always have e.g.

unsafe impl Element for MyType {
    fn type_descriptor() -> TypeDescriptor<Scalar> {
        const TYPE: TypeDescriptor<Scalar> = ...;
        TYPE
    }
}

@aldanor
Copy link
Contributor Author

aldanor commented Jan 14, 2022

And here's another problem with the above system :(

// pyo3
unsafe impl<T: Element> Element for (T,) {
    const TYPE: TypeDescriptor<Scalar> = T::TYPE;
}
// numpy
unsafe impl<T: Element> Element for (T,) {
    fn type_descriptor() -> TypeDescriptor<Scalar> {
        T::type_descriptor()
    }
}
error[E0119]: conflicting implementations of trait `element::Element` for type `(_,)`
  --> numpy-type-desc/src/element.rs:36:1
   |
30 | unsafe impl<T: pyo3_type_desc::Element> Element for T {
   | ----------------------------------------------------- first implementation here
...
36 | unsafe impl<T: Element> Element for (T,) {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ conflicting implementation for `(_,)`

So now I'm even more unsure of how to proceed.

@adamreichold
Copy link
Member

adamreichold commented Jan 14, 2022

It just my first so it might be completely bogus: Could we make Element generic over Scalar, e.g

pub unsafe trait Element<S>: Clone + Send {
    const TYPE: TypeDescriptor<S>;
}

so that there is only impl for tuples like

unsafe impl<T: Element, S> Element<S> for (T,) {

and rust-numpy would not define another Element trait at all?

I am not sure how a trait bound for S would like though. Or as written in the beginning whether this works at all. Actually, I am mainly trying to say that I am thinking about it but didn't find the time to actually work with the linked repository yet. Sorry.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 14, 2022

And testing a few things out further, in order to e.g. map generic tuples to type descriptors, const TYPE may be not sufficient (at least as of today), as we would need to compute offsets. E.g., as an example:

use memoffset::*;

fn main() {
    const O: usize = offset_of_tuple!((u8, i32, u8, i32), 0);
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 0));
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 1));
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 2));
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 3));
}

outputs

[src/main.rs:4] offset_of_tuple!((u8, i32, u8, i32), 0) = 4
[src/main.rs:5] offset_of_tuple!((u8, i32, u8, i32), 1) = 0
[src/main.rs:6] offset_of_tuple!((u8, i32, u8, i32), 2) = 5
[src/main.rs:7] offset_of_tuple!((u8, i32, u8, i32), 3) = 8

So while all this will get probably inlined to a compile-time constant in the end, we can't force it into a const context (which implies it should probably be a type_descriptor() trait method).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 14, 2022

@adamreichold Thanks for the input - I had this idea in the beginning but somewhy it was discarded and forgotten. Any ideas are really welcome - I think I can handle most of the technical numpy/rust details and quirks myself, but brainstorming design decisions together is exponentially more efficient.

The Scalar (as in, the type parameter in TypeDescriptor<>) is already bound by a trait (maybe there will be more methods, but it seems sufficient for now):

pub trait ScalarDescriptor: 'static + Clone {
    fn itemsize(&self) -> usize;
    fn alignment(&self) -> usize;
}

I guess we could further restrict it by From<pyo3_type_desc::Scalar> (and maybe even Into<Option<pyo3_type_desc::Scalar>> although that I'm unsure about), because 'base' scalars should always remain valid. Also, really, 'static + Clone might be just Copy (it's over-constraining it but may simplify a few things).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 14, 2022

@adamreichold So, from the first glance, this actually seems to work, see below (need to dig around it further to confirm).

The magic key here is the From<Scalar>:

// pyo3

pub trait ScalarDescriptor: 'static + Clone + From<Scalar> {
    fn itemsize(&self) -> usize;
    fn alignment(&self) -> usize;
}
pub unsafe trait Element<S: ScalarDescriptor>: Clone + Send {
    fn type_descriptor() -> TypeDescriptor<S>;
}
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub enum Scalar { ... }

// implement `Element` for built-in int/float types etc like so:
macro_rules! impl_element {
    ($ty:ty, $expr:expr) => {
        unsafe impl<S: ScalarDescriptor> Element<S> for $ty {
            #[inline]
            fn type_descriptor() -> TypeDescriptor<S> {
                const TYPE: Scalar = $expr;
                debug_assert_eq!(std::mem::size_of::<$ty>(), TYPE.itemsize());
                TypeDescriptor::Scalar(S::from(TYPE))
            }
        }
    };
    ...
}

// <--- here (in pyo3) is where we can implement generic logic for arrays/tuples etc
// (proc-macro for record types can also be generic and numpy-independent)

// numpy

#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub enum Scalar {
    Base(BaseScalar),
    Datetime,
    Timedelta,
}
impl From<BaseScalar> for Scalar {
    fn from(scalar: BaseScalar) -> Self {
        Self::Base(scalar)
    }
}
impl ScalarDescriptor for Scalar { ... }

@aldanor
Copy link
Contributor Author

aldanor commented Jan 14, 2022

To think about, alternatively we can not require From<Scalar> in ScalarDescriptor but only add it as a bound when defining those impls for basic builtin types. Hmm.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 15, 2022

For scalar types, alignment is trivial. For object types, it's pointer-width, I guess? (size_t) For array types, it's the alignment of the underlying element.

But for record types... I'm a bit lost. Given that we will likely never call np.dtype() constructor ourselves manually, we don't really care about numpy's logic of producing aligned c-like struct types for us from a dict/list/tuple-like description. We will generate those ourselves via proc-macros for structs, but what should we do then? A few options/thoughts:

  • Can just set the alignment to 1 and forget about it.
  • We can respect #[repr(packed / C / transparent / align(...))] and set the alignment to that, and otherwise... set it to 1? Which means that RecordDescriptor should have an alignment: Option<usize>, I guess?

Again, I'm not exactly sure about how this should be handled, so any ideas welcome.

P.S. this is the last bit left for me to finish the entire TypeDescriptor<Scalar> -> &PyArrayDescr conversion logic which I will be able to push shortly. Then I could probably work on the reverse.

@adamreichold
Copy link
Member

We will generate those ourselves via proc-macros for structs, but what should we do then?

I am probably missing something, but isn't calling mem::align_of::<T>() to determine the struct's alignment an option?

@aldanor
Copy link
Contributor Author

aldanor commented Jan 15, 2022

We will generate those ourselves via proc-macros for structs, but what should we do then?

I am probably missing something, but isn't calling mem::align_of::<T>() to determine the struct's alignment an option?

Yea. It's an option for when we're going down the "Rust type -> TypeDescriptor -> numpy dtype" path. In order to enable this, there would definitely need to be an alignment field (optional, I guess? can always default it to 1) in the RecordDescriptor. I can add it and then add some validation in the record dtype building logic (like itemsize is divisible by alignment, fields don't overlap etc).

There's more paths, like buffer format string -> TypeDescriptor -> numpy dtype, but I guess the best we can do there is mimic precisely what numpy is doing (which, from what I'm seeing, is to just leave alignment field as 1). So, there's cases where alignment information will be lost, not by us, but by numpy being stupid and buffer protocol being inconsistent. Again, not sure whether it's relevant, just a note.

One more catch is that PyObject* pointers may be unaligned inside structs, but I guess that's fine?

I

@aldanor
Copy link
Contributor Author

aldanor commented Jan 15, 2022

Another thing that all of this is slowly leading to, and that's one of the most interesting parts - logic of how to treat two type descriptors "equivalent" or "compatible". IIUC, numpy ignores the alignment there. So the alignment information we provide to numpy will be mostly useful if new dtypes are created out of the ones we export (e.g. wrapping them in rec types).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 15, 2022

We will generate those ourselves via proc-macros for structs, but what should we do then?

I am probably missing something, but isn't calling mem::align_of::<T>() to determine the struct's alignment an option?

@adamreichold Ah, yea, I forgot, the confusing part, here's a quote from dtype creation logic in descriptor.c:

    /* Structured arrays get a sticky aligned bit */ 
    if (align) {
        new->flags |= NPY_ALIGNED_STRUCT;
    }

The NPY_ALIGNED_STRUCT flag is what's reflected in dtype.isalignedstruct.

Normally, it will only be set when manually requested via passing dtype(..., align=True). But what should we do?

@adamreichold
Copy link
Member

One more catch is that PyObject* pointers may be unaligned inside structs, but I guess that's fine?

I think that is as good/bad as any other repr(packed) struct in Rust, as long the as the value of pointer respects alignment.

Normally, it will only be set when manually requested via passing dtype(..., align=True). But what should we do?

If I understand this somewhat correctly, then I think we should compare the alignment reported by mem::align_of::<T> with the alignment that NumPy considers "natural" for a given struct which I would expect to be the maximum of the alignment of its fields?

@aldanor
Copy link
Contributor Author

aldanor commented Jan 15, 2022

If I understand this somewhat correctly, then I think we should compare the alignment reported by mem::align_of:: with the alignment that NumPy considers "natural" for a given struct which I would expect to be the maximum of the alignment of its fields?

It's not that simple, I think. I'll try to explain how I understand what I know:

  • IIUC, NPY_ALIGNED_STRUCT being set (via passing align=True) is a kind of a guarantee from numpy to the user (since it cannot be modified from within Python) that:

    • offsets of all fields in a struct are divisible by alignments of types of those fields
    • alignment of the dtype is maximum (or lcm?..) of alignments of its fields
    • size of the dtype is divisible by its alignment

    If offsets/itemsize are provided by the user, the dtype constructor validates the above. If not, they are being calculated so that all fields are aligned. (we will always provide our offsets/itemsize ourselves though)

  • Note that you can have technically-aligned identical struct dtypes (see d1 and d2 examples above) with or without the NPY_ALIGNED_STRUCT flag set. Having it set or not, however, may affect how this struct will behave when stuffed into into rec dtype fields (in numpy, from within Python)

So, I guess the question is, in the TypeDescriptor, should we just check whether the struct type is potentially an "aligned struct" and, if so, just set NPY_ALIGNED_STRUCT flag because we can?..

@aldanor
Copy link
Contributor Author

aldanor commented Jan 16, 2022

So far I went with the latter (leaving some todos around) - if a struct layout fits all three criteria, we will just mark it as aligned. One quirk, in 'unaligned' mode, I think numpy always just sets the alignment to 1 (regardless of struct layout). We have a choice there, either set it to 1 or set it to the actual alignment (I went with the latter for now but we can switch it if we want).

Anyways, some good progress - untested but it already compiles: dtype_from_type_descriptor() in here. Lots of testing ahead. Surprisingly small amount of code, too - a few hundred lines of code what in numpy itself takes a few (quite a few) thousands.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 17, 2022

Just reporting on a bit more progress: started writing tests (everything works so far), and, as an experiment, implemented Element<S> for arbitrary Rust tuples (impl here). Rust tuples don't have a guaranteed layout, obviously, but that's not our concern - at least now we can automatically generate a record-dtype descriptor. I've added some tests for tuple dtypes and everything seems to work so far - this confirms that both scalar and record dtypes already seem to work as expected 🎉

A few questions that came up related to arrays/tuples (any thoughts welcome):

  • Do we want to implement Element<S> for [T; N] where T: Element<S>? That would be very logical and nice. Here's a few immediate issues though:
    • This would required min-const-generics, and so until pyo3 MSRV is >=1.51.0, we can hide it behind min_const_generics optional feature
    • If we do this, we can't have a separate impl for [u8; N] (byte string, S dtype). What is more important? My vote would be to have a generic impl for any array, and then have newtypes for those string types (see below)
      • If we have a generic [T; N] implementation, do we want to add a Bytes<N>([u8; N]) wrapper? Similarly, optionally, also a Ucs4<N>([u8; 4 * N])
    • Should we automatically 'collapse' nested arrays into a single multi-dimensional descriptor?
      • E.g. should [[[u32; 3]; 4]; 5] produce a [{5, 4, 3}; u32] descriptor?
      • This would probably make the most sense, because why wouldn't you? (and this is an example of why you shouldn't simply compare dtypes for equality when casting - since np.dtype(((int, 3), 4)) != np.dtype((int, (4, 3))) but they are essentially equivalent)
      • If we don't want to do this... I don't know, use some sort of newtype? Not sure it's doable. Or maybe (in the upcoming proc-macros) add a flag to collapse or not collapse this.
      • Note that this implies that if T itself produces an array-like descriptor, doing [T; N] will essentially prepend N to the shape that it already has.
      • Note also that numpy sort-of-does this, but in the array constructor, not in dtype constructor (i.e., you create array of shape (2, 3) and subarray dtype (4, 5), then it collapses it into a (2, 3, 4, 5) array without asking you)
  • Do we want to allow zero-dimensional arrays? (similar to other zero-dim, zero-size questions here, my vote would be no, but this is up to discussion, do we want to exclude it or not)
  • Do we want to allow zero-sized arrays? (either of the dimensions is equal to zero)

@davidhewitt
Copy link
Member

Just wanted to say I've finally had a chance to read this thread. There's a huge amount of content here and I think you're both more knowledgeable than I on this topic. Thank you so much for working on it 👍

Regarding the PyO3 parts - yes I would definitely welcome some love to the PyBuffer code! And we already have some implementations for const generics which automatically get enabled on new enough compilers - see https://github.com/PyO3/pyo3/blob/91648b23159acbf6b44d1245060efa86cfbdf73f/src/conversions/array.rs#L3

If there's anything you need specifically from me, please ping and I'll do my best to comment. My laptop is currently OOO so I'm quite behind on more complex tasks, am limited to when my family doesn't need me and I can be at a desktop. Please have my blessing to be free to design this as appropriate; I'll try to drop in to read in future again (hopefully it's an easier read second time around 😂).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 19, 2022

@davidhewitt Cheers & no worries, thanks for taking time to skim through this, there's quite a lot indeed :) There's no rush since there's still quite a lot left to do and to test, but it's gradually converging.

For now, I'm working on a shared pyo3/numpy type descriptor prototype in here. The code in src/ has nothing to do with numpy in particular and should be probably at some point get integrated into the buffer part of pyo3 if everyone's happy with it. When the core numpy part is done, I'll take a look separately at what else with can do with PyBuffer (other than linking the two Element traits which has already been done), and whether there are any parts we can deduplicate that don't ndarray or dtypes.

Re: const generics, yes, I've started implementing them in exactly the same way (and even named the feature identically). Should be all good then.

@adamreichold
Copy link
Member

(Just wanted to say that I know that I have unread stuff here since 2021-01-15. I just have not yet found the time to go through it.)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 23, 2022

A little bit of a progress report as I found a bit of time to work on it today (all of it can be seen in the recent commits in the repo, link above):

  • Tuples (up to size 12) are now supported.
  • [T; N] types are now supported on 1.51+.
  • Dtype of [[[T; N1]; N2]; N3]]] will be now automatically set to (T, (N3, N2, N1)) - that's pretty cool? So you can create multi-dimensional subarrays just using built-in nested array types
  • Any other builtin types to support?
  • Added BufferElement and ArrayElement trait "aliases" (in pyo3 and in rust-numpy respectively) so as to avoid generics all over the place

Next thing, I'll probably work a bit on converting dtype -> type descriptor (the other direction). Also maybe sketch the #[numpy] proc macro for generating record dtype descriptors for custom user structs. And there's still a ton of todos in the code that need to be resolved.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 24, 2022

Some more progress - due to datetime64 and timedelta64 support being previously requested, and them being the only two types that diverge from the stdlib buffer format, I think it makes sense to add proper support for them. There's multiple ways to do it, I've sketched one of them here.

In a nutshell, you can just use Datetime64<units::Nanosecond> and it will translate to datetime64[ns] automatically. We can add a little bit of functionality later on, like basic arithmetics for datetime/timedelta within linear time units, maybe conversions to stdlib and/or time crate types, etc.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 24, 2022

Some more progress updates (this is kinda where all of this has been heading from the start) - here's one of the most recent tests.

This compiles and runs:

#[derive(Clone, numpy::Record)]
#[repr(C)]
struct Foo<T, const N: usize> {
    x: [[T; N]; 3],
    y: (PyObject, Timedelta64<units::Nanosecond>),
}

assert_eq!(
    Foo::<i64, 2>::type_descriptor(),
    td!({"x":0 => [(3, 2); <i64], "y":48 => {0 => O, 8 => m8[ns] [16, 8]} [64, 8]})
);

There's still lots of edge cases and bugs to be squashed and tests to be written etc, but on the surface, the core of it seems to be working quite nicely.

@adamreichold
Copy link
Member

adamreichold commented Jan 27, 2022

So, I guess the question is, in the TypeDescriptor, should we just check whether the struct type is potentially an "aligned struct" and, if so, just set NPY_ALIGNED_STRUCT flag because we can?..

So far I went with the latter (leaving some todos around) - if a struct layout fits all three criteria, we will just mark it as aligned.

If the semantics of the same type definition with and without the flag are different on the Python side, maybe we should give the user access to it, e.g. make it a parameter to dtype_from_type_descriptor()? Or do we need it is part of TypeDescriptor when converting a dtype into it?

Do we want to implement Element<S> for [T; N] where T: Element<S>? That would be very logical and nice. Here's a few immediate issues though:

And we already have some implementations for const generics which automatically get enabled on new enough compilers

I think we should definitely add this and do it in the same way as std or PyO3: Add impls up to length 32 using macros on older compilers and use const generics on newer ones dropping the small array macros when our MSRV includes support for const generics.

like basic arithmetics for datetime/timedelta within linear time units, maybe conversions to stdlib and/or time crate types, etc.

Personally, I would very much like this crate to provide only a minimal and efficient conversion API without providing any operations on the wrapper types itself. (I think this also applies to arrays itself and we should aim to focus the API to getting from/to ndarray and nalgebra types.)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 27, 2022

Personally, I would very much like this crate to provide only a minimal and efficient conversion API without providing any operations on the wrapper types itself. (I think this also applies to arrays itself and we should aim to focus the API to getting from/to ndarray and nalgebra types.)

Yea, I had the same feeling. For now I've just added transparent wrapper types themselves that can only do two things (aside from being registered in the dtype system) - be constructed from i64 and be converted into i64; the rest is up to whoever's using them.

I think we should definitely add this and do it in the same way as std or PyO3: Add impls up to length 32 using macros on older compilers and use const generics on newer ones dropping the small array macros when our MSRV includes support for const generics.

I've added the min-const-generics part of it, but I can also add the macro-based impl for up-to-length-32 arrays for older compiler versions. Personally, I'm not a huge fan of min_const_generics-like features all over the place, as I think it's cleaner to add a temporary dependency (until MSRV is high enough) and do it like so:

#[rustversion::since(1.51)]
// impl via min const geneics

#[rustversion::before(1.51)]
// impl via macros

The main benefit is that we won't break some dependent crates the day we kill the min_const_generics feature, as all of this logic stays completely internal to our crates and can be viewed as an implementational detail.

If the semantics of the same type definition with and without the flag are different on the Python side, maybe we should give the user access to it, e.g. make it a parameter to dtype_from_type_descriptor()? Or do we need it is part of TypeDescriptor when converting a dtype into it?

I think, first of all, we need to categorise how a type descriptor can be attached to a type. There will be two ways:

  1. #[derive(numpy::Record)] - this is probably what would be used 99% of time.
  2. Same as above but for remote types (see serde docs for remote/foreign types as an example, we might copy that)
  3. unsafe impl manually. Not sure why you'd want to do that, but perhaps there's some reason (e.g. you're codegen-ing something).

There's an alignment: Option<usize> field in RecordDescriptor which we allow to be set to None - this should cover case 3 as you're in control of that yourself. In cases 1 and 2 the derive macro would automatically set it to Some(std::mem::align_of::<T>()). So perhaps, if you really want a numpy dtype with align = False, we can allow something like a #[record(align = false)] attribute in the derive macro - this would cover cases 1 and 2. All that's left then is to ensure NPY_ALIGNED_STRUCT is not set when alignment.is_none()?

@adamreichold
Copy link
Member

Personally, I'm not a huge fan of min_const_generics-like features all over the place, as I think it's cleaner to add a temporary dependency (until MSRV is high enough) and do it like so:

Yeah, I think using a crate like rustversion instead of a public feature is fine.

There's an alignment: Option field in RecordDescriptor which we allow to be set to None

I guess this where I am not sure ATM. Reading the NumPy C API, I get the impression that the idea is that PyArray_Descr::alignment (that one stored in fields or parent) should always be set to the necessary alignment even if the field offset does not respect that. Which is why I am not sure whether tying alignment == None to NPY_ALIGNED_STRUCT makes sense (or actually whether alignment should be an Option at all).

I think we should always have an alignment (even if trivial, i.e. one) and we should set the NPY_ALIGNED_STRUCT field whenever the contents of aRecordDescriptor are indeed aligned (by checking the alignment of the field type descriptors against the offsets of the fields).

My understanding of the align argument to the dtype constructor is at this moment basically like "here is a record type but I can't be bothered to work out the correct offsets so please do the padding for me" which would be a courtesy that TypeDescriptor just does not need to provide.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 27, 2022

I guess this where I am not sure ATM. Reading the NumPy C API, I get the impression that the idea is that PyArray_Descr::alignment (that one stored in fields or parent) should always be set to the necessary alignment even if the field offset does not respect that. Which is why I am not sure whether tying alignment == None to NPY_ALIGNED_STRUCT makes sense (or actually whether alignment should be an Option at all).

The way it works in numpy itself (_convert_from_dict),

PyArray_Descr *new = PyArray_DescrNewFromType(NPY_VOID); // <-- new->alignment := 1
...
if (align) { // <-- align comes from either align=True or from being called recursively
    new->alignment = maxalign; // <-- maxalign is computed C-alignment if offsets are not provided
}
...    
/* Structured arrays get a sticky aligned bit */
if (align) {
    new->flags |= NPY_ALIGNED_STRUCT;
}

IIUC, descr->alignment will be set to something different from 1 only if align=True is passed in explicitly, and in that case NPY_ALIGNED_STRUCT flag will always be set.

There's 4 cases in numpy constructor itself:

  • You provide no offsets and align=False; it's treated as 'packed', alignment is 1, flag is not set
  • You provide no offsets and align=True; numpy simulates C and computes repr(C) offsets, flag is set
  • You provide offsets and align=False; nothing is verified, alignment is 1, flag is not set
  • You provide offsets and align=True; numpy checks that your offsets match alignment (but may be different from what it would have computed since there may be gaps etc), flag is set

So, the following must hold true: "if descr->alignment != 1, then NPY_ALIGNED_STRUCT is set". The reverse, obviously, does not need to hold true

Back to our business - I guess we can do something like this?

struct RecordDescriptor<T> {
    ...
    alignment: usize,  // always set
    is_aligned: Option<bool>,
}

Where is_aligned can be:

  • Some(true) = set descr->flag; set descr->alignment; when creating dtype, verify that all fields are indeed C-aligned
  • Some(false) = don't set the flag; don't check anything; (set descr->alignment or not? probably set it to 1?)
  • None = current behaviour and the default: if all fields are C-aligned, then set the flag and alignment field; otherwise, don't set the flag (and leave descr->alignment as 1?)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 27, 2022

Here's an illustration:

def check_alignment(*, offsets=None, align=False):
    print(f'offsets={offsets}, align={align} => ', end='')
    try:
        args = {
            'names': ['x', 'y'],
            'formats': ['u2', 'u8'],
        }
        if offsets is not None:
            args['offsets'] = offsets
        dtype = np.dtype(args, align=align)
        dtype_offsets = [dtype.fields[n][1] for n in args['names']]
        print(
            f'alignment={dtype.alignment}, '
            f'isalignedstruct={dtype.isalignedstruct}, '
            f'offsets={dtype_offsets}'
        )
    except BaseException as e:
        print(f'error: "{e}"')
check_alignment(offsets=None, align=False)
check_alignment(offsets=None, align=True)
check_alignment(offsets=[0, 8], align=False)
check_alignment(offsets=[0, 8], align=True)
check_alignment(offsets=[0, 5], align=False)
check_alignment(offsets=[0, 5], align=True)

which outputs

offsets=None, align=False => alignment=1, isalignedstruct=False, offsets=[0, 2]
offsets=None, align=True => alignment=8, isalignedstruct=True, offsets=[0, 8]
offsets=[0, 8], align=False => alignment=1, isalignedstruct=False, offsets=[0, 8]
offsets=[0, 8], align=True => alignment=8, isalignedstruct=True, offsets=[0, 8]
offsets=[0, 5], align=False => alignment=1, isalignedstruct=False, offsets=[0, 5]
offsets=[0, 5], align=True => error: "offset 5 for NumPy dtype with fields is not divisible by the field alignment 8 with align=True"

@adamreichold
Copy link
Member

Back to our business - I guess we can do something like this?

The semantics look reasonable to me. I do still wonder whether is_aligned should be part of RecordDescriptor instead of an argument to dtype_from_type_descriptor? For round-tripping from dtype to dtype via TypeDescriptor?

offsets=[0, 8], align=False => alignment=1, isalignedstruct=False, offsets=[0, 8]

So this means that NumPy treats this like a repr(packed) struct even if the fields are indeed aligned? Weird. (And does this mean the relevant fields will be read using the moral equivalent of read_unaligned, e.g. through a char* pointer?)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 27, 2022

The semantics look reasonable to me. I do still wonder whether is_aligned should be part of RecordDescriptor instead of an argument to dtype_from_type_descriptor? For round-tripping from dtype to dtype via TypeDescriptor?
That part I'm unsure about myself. On the one hand, stuffing an extra argument which would almost never be used into dtype_from_type_descriptor is not very nice. As for roundtripping...

  • We can fish out whether align=True or not from the isalignedstruct flag.
  • There's no way to figure RecordDescriptor::alignment from a numpy dtype though. If descr->alignment == 1, it does not mean a single thing, it's simply an alignment of char.
  • If you want roundtripping, I think we're back to a single alignment: Option<usize> which is what I suggested initially 😄 If isalignedstruct is false, then descr->alignment is surely 1 which corresponds to None in our world. Otherwise, it's Some(descr->alignment) which is now a meaningful value. And vice versa, so it should roundtrip, no?

So this means that NumPy treats this like a repr(packed) struct even if the fields are indeed aligned?

Yes, it sort of does. But I don't think it makes a distinction between aligned and unaligned reads, all that is left to the compiler. You can see that the usage of NPY_ALIGNED_STRUCT doesn't go too far beyond descriptor.c and creating dtypes.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 27, 2022

Ok, re: the last comment, maybe I wasn't entirely clear. In some cases, NumPy does distinguish between aligned and unaligned arrays.

array_method.c:

        /* TODO: We may need to distinguish aligned and itemsize-aligned */
        aligned &= PyArray_ISALIGNED(arrays[i]);
    }
    if (!aligned && !(self->method->flags & NPY_METH_SUPPORTS_UNALIGNED)) {
        PyErr_SetString(PyExc_ValueError,
                "method does not support unaligned input.");
        return NULL;
    }

Whether the array is aligned or not, e.g. if you select a field like arr['y'] from one of the examples above, does not depend on NPY_ALIGNED_STRUCT flag, but rather on the actual pointer offsets, strides, etc.

So, in the example above, examples 2, 3 and 4 will have arr.flags.aligned = True for field 'y' (regardless of what the descr->alignment is set to), whereas 1 and 5 will have arr.flags.aligned = False.

@adamreichold
Copy link
Member

If descr->alignment == 1, it does not mean a single thing, it's simply an alignment of char.

My impression is that indeed this means that NumPy will store these records unaligned? So resulting type descriptor would match a Rust type with repr(packed) if the offsets are indeed packed or something that cannot be written down as a struct (basically "repr(packed)" but with offsets as if without it). (Which seems to be true generally, i.e. there a lot of NumPy dtypes and TypeDescriptor instances which cannot be produced via #[derive(numpy::record)] on any Rust struct.)

@adamreichold
Copy link
Member

(Sorry, didn't mention that: Personally, I don't think we need roundtripping. I am basically only interested in the TypeDescriptor to dtype direction which is why I am asking if we can simplify things by ignoring the reverse mapping.)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 27, 2022

Which seems to be true generally, i.e. there a lot of NumPy dtypes and TypeDescriptor instances which cannot be produced via #[derive(numpy::record)] on any Rust struct.

Well, to start with... NumPy allows overlapping fields. As long as neither of them contains PyObject*.

if we can simplify things by ignoring the reverse mapping

I think we have to do the reverse mapping, which will be used in FromPyObject and the like. Instead of pre-existing "EquivTypes" logic in numpy API (which is overburdened, slow and extremely spaghetti), we can do whatever we want. So, it could work like this:

  • For the array object coming from outside, decode its dtype into a type descriptor. If we can't decode it, we can't use it
  • Compare it to our type descriptor and return true/false

My impression is that indeed this means that NumPy will store these records unaligned

What exactly do you mean by that? You can pass offsets [0, 8] with align=False as in example above - and the resulting dtype will have descr->alignment=1, but, in reality, it will be aligned. Also, the array made of this dtype will have NPY_ARRAY_ALIGNED flag on. And both of the field slices will have this flag on.

It will only be 'packed' if you don't specify offsets. In this case align flag lets you choose either way of computing them (this is the part we really don't care about as we have our own offsets).

@adamreichold
Copy link
Member

Instead of pre-existing "EquivTypes" logic in numpy API (which is overburdened, slow and extremely spaghetti), we can do whatever we want.

I have to think about this. My first instinct is that this would rather be an argument for improving the NumPy code instead of rolling our own.

What exactly do you mean by that? You can pass offsets [0, 8] with align=False as in example above - and the resulting dtype will have descr->alignment=1, but, in reality, it will be aligned. Also, the array made of this dtype will have NPY_ARRAY_ALIGNED flag on. And both of the field slices will have this flag on.

From your example in #254 (comment), I got the impression that align=True is not necessarily recursive. So if you have a record type with actually aligned offsets but alignment=1 and put that into an outer record and set align=True that this subrecord could be placed misaligned thereby also misaligning its fields whereas if the subrecord type itself had alignment > 1, padding would be added as necessary and the alignment of the subrecord field would imply alignment of its fields.

@PyO3 PyO3 locked and limited conversation to collaborators Apr 12, 2022
@adamreichold adamreichold converted this issue into discussion #321 Apr 12, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants