Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add preliminary broadcasting support #54

Closed
wants to merge 1 commit into from

Conversation

nbren12
Copy link

@nbren12 nbren12 commented Jan 31, 2017

Hi. I did some work, and I got a basic prototype of broadcasting working with AxisArrays objects. Basically, the code permutes the dimensions of the data fields of an AxisArray to match a parent list of Axis objects. I then add singleton dimensions to each data array, and pass these modified arrays to broadcast!. I only rely on the julia's broadcast.jl code for this final step. It might be worthwhile to use the core language architecture a bit more, but I honestly had some trouble understanding their code.

So far my method has a couple of issues.

  1. broadcasting only works when an axis array is the first argument. This means that 1 +A will return an Array, while A + 1 will return an AxisArray object. This could be fixed by defining a pairwise broadcast(A::AxisArray, b) method, and then performing a reduction.
  2. The resulting AxisArrays are all coerced to have Real type elements.

@mbauman
Copy link
Member

mbauman commented Feb 1, 2017

Thanks for doing this! I'll try to find some time to more thoroughly evaluate this within a week, but some initial comments:

  1. Yes, this is a known limitation with dispatch in general. You simply can't specify "at least one arg is an AxisArray" in a function signature. The base broadcast code is complicated in part because they try to support this case. It's not an official interface, but I'd be okay coding against it, knowing that it may break without warning in the future. In this case, it means extending containertype and promote_containertype to return AxisArray. That'll then dispatch to Base.broadcast_c.

  2. Yes, this is another hard problem in Julia, and it's the other half of why Base broadcast is complicated. This is what's happening in broadcast_c. It's annoying since we can't fully trust inference to figure out the exact return type, but we don't want the empty case to mess up inference of the general case. We could start by using Core.Inference.return_type unconditionally, and then later move to doing this more "correctly".

It looks like you're trying to match the axes and re-order them as needed. I'm not sure that's the behavior we want here — am I reading this correctly that it'd mean that A .+ A == A .+ A' for matrix A?

@nbren12
Copy link
Author

nbren12 commented Feb 1, 2017

no problem! Yah, all the tail recursion and multiple dispatch routines in broadcast.jl were pretty hard to parse for me. we could definitely plug into the existing infrastructure, but I just wanted to get a working prototype ready.

As for your last question, A .+A ==A .+A', is pretty much the kind of behavior I want, and is how xarray works in python, although that example is a bit of a special case. I guess if I loaded two datasets temperatureA(time, lat,lon) and temperatureB(time, lon,lat) it would be convenient to be able to subtract them without permuting dimensions manually. The operation A' doesn't mean much to me if A has 3 or more dimensions.

However, if that doesn't make any sense to you, I guess it wouldn't be too painful to force the user to do some kind of align!(A,B,...) call before broadcasting.

This was referenced Mar 22, 2017
@GordStephen
Copy link

👍 for automatically aligning arrays. As it says in the README: "This permits one to implement algorithms that are oblivious to the storage order of the underlying arrays."

From my perspective there should be no difference in the information represented by a general 2D AxisArray versus its transpose - the order of dimensions is just an implementation detail. For performance-sensitive tasks where you know and care about your dimension ordering, it's easy enough to pull out A.data and work with that.

@nbren12
Copy link
Author

nbren12 commented Mar 27, 2017

FWIW, xarray basically takes this automatic dimension reshaping and reordering strategy as well. See xarray's code for broadcasting support. With xarray, I only really use transposing when I want to switch the axes of a two dimensional image or contour plot quickly.

@lamorton
Copy link

From my experience using the xarray approach to broadcasting, I would like to encourage some serious thinking about how to test compatibility of dimensions and how to respond to incompatibilities.

The xarray philosophy relies on float comparisons between the values along a dimensions to ensure that each point in the array has the same coordinates in space. This could present problems when interacting with unit conversions that are inexact, or any other case where the axis values might be computed in different ways so as to differ by epsilon).

As far as I can tell, there are several possible solutions to this issue: (1) provide a method for selecting some tolerance, but keep exact equality as the default (2) more radically, provide a method of interpolating/extrapolating from one discretization to another (3) more conservatively, demand identity between the dimension arrays that share names in the two AxisArrays. I'm not sure how Julia handles issues of identity, but this 3rd approach doesn't work in Python due to the way identity is handled. Personally, I favor this 3rd approach because it reduces comparison overhead and entails exact equality of the coordinates. It amounts to a guarantee that any new array introduced to a series of algebraic operations really does belong in the same 'namespace' (if you will) of dimensions as the rest of the calculation.

The second issue concerns what happens if there is a mismatch in coordinates along some dimension. Presently, xarray silently drops any points along a shared dimension if the exact coordinate value doesn't appear in both versions. This can result in some really bizarre failure modes when two arrays have some shared axis with values that only differ by a small amount at each point. A more sensible approach would be to raise an error if there were any mismatch at all, just as one would if two standard arrays had incompatible shapes.

@nbren12
Copy link
Author

nbren12 commented Nov 15, 2017

I generally agree that errors are better than magic for dealing with non-identical coordinates, and that there is a fair amount of magic under the hood of xarray. On the other hand, I have never really had any problems with the way xarray does broadcasting. For me, it just works. But I think it is fine to just demand equality, and give some helpful error message about whether the coordinates have slightly different numerical values or missing values.

@nbren12
Copy link
Author

nbren12 commented Dec 17, 2020

I'm closing this since it seems pretty dead.

@nbren12 nbren12 closed this Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants