From 2d6d3702e4be8e38d8f430bdec25d15e4f491acc Mon Sep 17 00:00:00 2001 From: Marten Henric van Kerkwijk Date: Tue, 28 Oct 2025 20:12:07 -0400 Subject: [PATCH 1/4] Cycle 5: Quantities with Array API support, Improved Support for Masks and Uncertainties --- .../cycle5/mhvk-units-masked-uncertainties.md | 133 ++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md diff --git a/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md new file mode 100644 index 00000000..8fc95d40 --- /dev/null +++ b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md @@ -0,0 +1,133 @@ +### Title + +Quantities with Array API support, Improved Support for Masks and Uncertainties + +### Project Team + +Marten van Kerkwijk + +### Project Description / Scope of Work + +I request continued partial buy-out from my professorship at UofT to be able +to work one day a week on projects that are too large for the time I can +otherwise commit for astropy. Specifically, I propose, + +- Facilitate Quantity becoming a container class that can handle not just + ndarray but any type of array, i.e., also dask, jax, etc. +- Ensure Quantity is fully compliant with the new Quantity API being developped. +- Extend the same machinery to Masked and Distribution so that all main astropy + classes can use arbitrary array classes. +- Also extend the machinery to the internal arrays used by Time. +- Speed up unit conversion and thus all of astropy by smarter conversion functions + and caching. +- Finish my implementation of a Variable class that tracks uncertainties and + their correlations analytically (based on the uncertainties package). + +#### Roadmap Items + +I split these into direct goals of my work and pieces that will be enabled by +it. Here, note that my goal of adding quantity support for non-numpy arrays +includes support for JAX and Dask arrays, which would thus provide a major +requirement for astropy as a whole having support for those. + +Directly addressed: + +- :green_circle: Add quantity support for non-NumPy arrays. + +- :large_orange_diamond: Improve interoperability between unit packages (e.g., + `astropy.units`, `pint`, `unyt`). + +Provides a major requiremeent for: + +- :red_square: Support JIT compilation (e.g., numba, JAX, etc.) throughout + Astropy core and coordinated packages. + +- :large_orange_diamond: Improve and/or maintain interoperability with + performant I/O file formats and libraries such as HDF5 and Dask. + +#### Project / Work / Deliverables + +Prior to cycle 4, I spent about a day per week on astropy core, in reviews, +bug fixes, and development. I managed to use extra time for fairly large +developments (Quantity historically and Masked and Uncertainty more recently, +with also fairly major contributions to Time, Table, Representation and +numpy), but it was difficult to find enough time to actually wrap up larger +projects (at least outside sabbaticals). This changed with cycle 4 funding, +and a major part of this request is to complete some of the main parts of the +project I proposed for that cycle. + +In particular, in the current cycle I have started to develop Quantity 2.0. +As proposed in [APE 25](https://github.com/astropy/astropy-APEs/pull/91), this +follows the [Array API](https://data-apis.org/array-api/), ensuring +that the new Quantity class will work with any array that supports that API, +which includes those that really matter, like Dask for large, disk-based data +sets and JAX for GPU acceleration. There is a +[prototype](https://github.com/astropy/quantity-2.0), which already supports a +large part of the Array API (basically, those provided by numpy ufuncs) for JAX +and Dask. The work has been waylaid a little in a good way: during this period, +serious discussions started between the various units packages about a shared +[Quantity API](https://github.com/quantity-dev), which we would of course want +to follow. + +The primary goal of my proposal here is to finish the implementation, make it +compatible with the new Quantity API, ensure there are no performance +regressions, and of course document it all. + +A nice benefit of the approach laid out in +[APE 25](https://github.com/astropy/astropy-APEs/pull/91) is that it will be +very easy to extend it to Masked and Distribution (and possibly Variable), as +those basically are already the type of container classes that APE 25 +envisions. + +Furthermore, a direct benefit of Quantity being able to use other array types +than ndarray is that this will nearly automatically extend to coordinates +(since those use quantities almost exclusively; I foresee little more work +than adjusting tests!). Time will be slightly more work, as it works directly +with ndarray, but also here the path is straightforward: I can just follow my +earlier work on ensuring Time can work with Masked. + +Most of the above would benefit application of astropy on large arrays, by +allowing disk-based ones, and analysis via GPUs. But astropy is often used on +small arrays too, and while reviewing our own Quantity code as well as the +code for ndarray that it relies on, I realized there are a number of ways in +which we can improve the performance of Quantity and Unit operations for +scalars and small arrays, mostly by reducing overhead. Some initial PRs on +the numpy side add a [fast path for +scalars](https://github.com/numpy/numpy/pull/29819) and [include array storage +in the object](https://github.com/numpy/numpy/pull/29878). On the Quantity +side proper, I have a skeleton of code that would make unit conversion +substantially faster, especially if combined with caching. This would again +mostly benefit small arrays. Also for larger ones, I see a nice path forward: +the new dtype machinery of numpy provides a way to do the scaling needed for +unit conversion as part of an operation, thus avoiding the need to create +large temporary arrays. + +Finally, an undergraduate I was taught that a number without a unit or an +uncertainty is meaningless. Quantity provides the former, and Distribution +provides a monto-carlo like method for the latter. But often we just would +like to have error propagation, but including covariance. More than a decade +ago, I made a [PR](https://github.com/astropy/astropy/pull/3715) to introduce +a Variable class that tracks uncertainties and covariances (based on the +[uncertainties package](https://pythonhosted.org/uncertainties/), but extended +it to deal natively with arrays). This has been stalled since, but I believe +would still be super useful. A stretch goal of the current proposal is to +finally finish it. + +### Approximate Budget + +I request funding to replace salary equivalent to one day a week, reducing my +regular employment at the University of Toronto correspondingly. At a +standard rate of USD 150/hour for 8 hours per week and 45 weeks, this +corresponds to USD $54000 per year. + +### Period of Performance + +Ideally, I would be covered until June 2027, which is the end of an academic +year. + +I note that the funding provides me with teaching relief, which is for one +semester of an academic year (where thus more than 1 day/week is spent on +astropy, while less time is spent when I teach). So far, the relevant +semesters have been July-December 2024 and January to June 2026. It may be +possible to ensure the next semester will be July to December 2026, so that +most work is finished in 2026. From 69228d5cd8b088b56acf91a1b77230a0063e8f2e Mon Sep 17 00:00:00 2001 From: "Marten H. van Kerkwijk" Date: Wed, 12 Nov 2025 16:48:43 -0500 Subject: [PATCH 2/4] Update to finance part: total and schedule. --- .../cycle5/mhvk-units-masked-uncertainties.md | 21 ++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md index 8fc95d40..eadd1d74 100644 --- a/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md +++ b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md @@ -118,16 +118,23 @@ finally finish it. I request funding to replace salary equivalent to one day a week, reducing my regular employment at the University of Toronto correspondingly. At a standard rate of USD 150/hour for 8 hours per week and 45 weeks, this -corresponds to USD $54000 per year. +corresponds to USD $54,000 per year. + +I note that my approved cycle 4 proposal was for two years, but the funding +cycle means that only 16 months were used. Here, I request funding both to +finish that approved part of my project ($36,000), and in addition request +funding for another year to finish the whole ($54,000). The total request is +thus $90,000. ### Period of Performance Ideally, I would be covered until June 2027, which is the end of an academic year. -I note that the funding provides me with teaching relief, which is for one -semester of an academic year (where thus more than 1 day/week is spent on -astropy, while less time is spent when I teach). So far, the relevant -semesters have been July-December 2024 and January to June 2026. It may be -possible to ensure the next semester will be July to December 2026, so that -most work is finished in 2026. +I note that the funding provides me with teaching relief for one semester of +an academic year. During those semesters I spend more than 1 day/week on +astropy, while I spend less time when I teach. The first semester I had +teaching relief was July-December 2024, and the second will be January-June +2026 (some of which will thus be covered by funds already paid). I think it +will be possible to ensure the next semester will be July-December 2026, so +that most work is finished in 2026. From 122937ad20b8a249342f4d71a6c3b29e7216ab89 Mon Sep 17 00:00:00 2001 From: "Marten H. van Kerkwijk" Date: Tue, 18 Nov 2025 12:42:17 -0500 Subject: [PATCH 3/4] Give possbile partial funding options --- .../cycle5/mhvk-units-masked-uncertainties.md | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md index eadd1d74..87832be0 100644 --- a/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md +++ b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md @@ -37,7 +37,7 @@ Directly addressed: - :large_orange_diamond: Improve interoperability between unit packages (e.g., `astropy.units`, `pint`, `unyt`). -Provides a major requiremeent for: +Provides a major requirement for: - :red_square: Support JIT compilation (e.g., numba, JAX, etc.) throughout Astropy core and coordinated packages. @@ -126,6 +126,24 @@ finish that approved part of my project ($36,000), and in addition request funding for another year to finish the whole ($54,000). The total request is thus $90,000. +Given partial funding, I could do: + +- $36,000 (remaining approved in cycle 4): Wrap up making Quantity + itself a container class compliant with Array API; implement some of + the performance improvements in unit conversion. + +- $72,000 (above plus minimum amount for some teaching relief): + produce a Quantity replacement without performance regressions, + fully integrated in astropy, and fully compliant with the Quantity + API; implement full set of performance improvements in units + conversion; probably able to ensure Masked can also hold arbitrary + Array API compatible arrays, and possibly Time as well. + +- $90,000 (full request): ensure Masked, Distribution, and Time all + can hold arbitray Array API compatiblle arrays, making astropy as a + whole able to deal with JAX, dask, etc., arrays. Hopefully also + produce a properly functioning Variable class. + ### Period of Performance Ideally, I would be covered until June 2027, which is the end of an academic From f0696b8eaa6ce388b8664c44414bbae44d2df61c Mon Sep 17 00:00:00 2001 From: "Marten H. van Kerkwijk" Date: Tue, 18 Nov 2025 15:08:00 -0500 Subject: [PATCH 4/4] Small clarifications and edits --- .../cycle5/mhvk-units-masked-uncertainties.md | 45 ++++++++++--------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md index 87832be0..58c6dd2b 100644 --- a/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md +++ b/finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md @@ -18,8 +18,8 @@ otherwise commit for astropy. Specifically, I propose, - Extend the same machinery to Masked and Distribution so that all main astropy classes can use arbitrary array classes. - Also extend the machinery to the internal arrays used by Time. -- Speed up unit conversion and thus all of astropy by smarter conversion functions - and caching. +- Speed up unit conversion and thus all of astropy by smarter conversion + functions and caching. - Finish my implementation of a Variable class that tracks uncertainties and their correlations analytically (based on the uncertainties package). @@ -120,29 +120,30 @@ regular employment at the University of Toronto correspondingly. At a standard rate of USD 150/hour for 8 hours per week and 45 weeks, this corresponds to USD $54,000 per year. -I note that my approved cycle 4 proposal was for two years, but the funding -cycle means that only 16 months were used. Here, I request funding both to -finish that approved part of my project ($36,000), and in addition request -funding for another year to finish the whole ($54,000). The total request is -thus $90,000. +I note that my initial cycle 4 proposal was for three years, but that made it +extend beyond the cycle 4 funding cycle and hence the initial allocation was +for two years. With the cycle changing, so far only 16 months have been used. +Here, I request funding both to complete the two years previously approved +($36,000), and in addition request funding for the third year required to +finish the project ($54,000). The total request is thus $90,000. Given partial funding, I could do: -- $36,000 (remaining approved in cycle 4): Wrap up making Quantity - itself a container class compliant with Array API; implement some of - the performance improvements in unit conversion. - -- $72,000 (above plus minimum amount for some teaching relief): - produce a Quantity replacement without performance regressions, - fully integrated in astropy, and fully compliant with the Quantity - API; implement full set of performance improvements in units - conversion; probably able to ensure Masked can also hold arbitrary - Array API compatible arrays, and possibly Time as well. - -- $90,000 (full request): ensure Masked, Distribution, and Time all - can hold arbitray Array API compatiblle arrays, making astropy as a - whole able to deal with JAX, dask, etc., arrays. Hopefully also - produce a properly functioning Variable class. +- $36,000 (remainder approved in cycle 4): Wrap up making Quantity a container + class compliant with Array API; implement some of the performance + improvements in unit conversion. + +- $72,000 (above plus minimum amount to get partial teaching relief): Produce + a Quantity replacement without performance regressions, fully integrated in + astropy (i.e., usable inside SkyCoord, etc.), and fully compliant with the + Quantity API; implement full set of performance improvements in units + conversion; probably able to ensure Masked can also hold arbitrary Array API + compatible arrays, and possibly Time as well. + +- $90,000 (full request): Ensure Masked, Distribution, and Time all can hold + arbitray Array API compatiblle arrays, making astropy as a whole able to + deal with JAX, dask, etc., arrays. Hopefully also produce a properly + functioning Variable class. ### Period of Performance