Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Scan, generalizing Reduction to accumulating scans like cumsum! #3590

Merged
merged 20 commits into from May 10, 2024

Conversation

glwagner
Copy link
Member

@glwagner glwagner commented May 7, 2024

This PR generalizes Reduction to also support accumulating "scanning" operations like cumsum!.

This should not change the existing user interface but instead add the new feature, something like

cumulative_u = Accumulation(cumsum!, u, dims=3)

and

cumint_u = CumulativeIntegral(u, dims=3)

Previously I don't think this was possible on GPU because cumsum! was not supported for CuArray (?) But it is now it seems. To support this feature, we've implemented kernels for forward and reverse accumulation.

I also have used the generalization of Scan to clean up the internals / user interface for Average and Integral (now they are proper type aliases).

cc @hdrake @iuryt

@glwagner
Copy link
Member Author

glwagner commented May 7, 2024

Here we go:

julia> using Oceananigans

julia> grid = RectilinearGrid(size=(1, 1, 3), extent=(1, 1, 1))
1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── Periodic x  [0.0, 1.0)  regularly spaced with Δx=1.0
├── Periodic y  [0.0, 1.0)  regularly spaced with Δy=1.0
└── Bounded  z  [-1.0, 0.0] regularly spaced with Δz=0.333333

julia> c = CenterField(grid)
1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 3×3×9 OffsetArray(::Array{Float64, 3}, 0:2, 0:2, -2:6) with eltype Float64 with indices 0:2×0:2×-2:6
    └── max=0.0, min=0.0, mean=0.0

julia> set!(c, (x, y, z) -> rand())
1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 3×3×9 OffsetArray(::Array{Float64, 3}, 0:2, 0:2, -2:6) with eltype Float64 with indices 0:2×0:2×-2:6
    └── max=0.925176, min=0.853073, mean=0.877648

julia> cumsum_op = Accumulation(cumsum!, c, dims=3)
Accumulating cumsum! over dims 3 of 1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
└── operand: 1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
    └── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo

julia> cumsum_c = Field(cumsum_op)
1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── data: OffsetArrays.OffsetArray{Float64, 3, Array{Float64, 3}}, size: (1, 1, 3)
├── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── operand: Accumulating cumsum! over dims 3 of 1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── status: time=0.0
└── data: 3×3×9 OffsetArray(::Array{Float64, 3}, 0:2, 0:2, -2:6) with eltype Float64 with indices 0:2×0:2×-2:6
    └── max=0.0, min=0.0, mean=0.0

julia> compute!(cumsum_c)
1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── data: OffsetArrays.OffsetArray{Float64, 3, Array{Float64, 3}}, size: (1, 1, 3)
├── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── operand: Accumulating cumsum! over dims 3 of 1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── status: time=0.0
└── data: 3×3×9 OffsetArray(::Array{Float64, 3}, 0:2, 0:2, -2:6) with eltype Float64 with indices 0:2×0:2×-2:6
    └── max=2.63294, min=0.925176, mean=1.77933

@glwagner
Copy link
Member Author

glwagner commented May 7, 2024

Cool. Getting to the finish line will be a bit of work I guess. I think we usually want to integrate downwards so that's annoying. Not sure if that exists exactly, or we have to implement some lazy version of reverse to achieve it

@iuryt
Copy link
Collaborator

iuryt commented May 7, 2024

Isn't that a way to obtain the reverse cumulative integration by simply adding something?
Something like cumint[end] - cumint + cumint[1]
I don't remember exactly, but I think that there is a way to do it.

@glwagner
Copy link
Member Author

glwagner commented May 7, 2024

Ah that is a nice trick! I Think its something like

cumsum!(b, a, dims=3)
A = view(b, :, :, Nz) # total sum of a over 3rd dimension
b .= A .- b .+ a # compute the reversed cumsum

@glwagner
Copy link
Member Author

glwagner commented May 7, 2024

Dang, pretty cool:

julia> grid = RectilinearGrid(size=(1, 1, 3), extent=(1, 1, 1))
1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── Periodic x  [0.0, 1.0)  regularly spaced with Δx=1.0
├── Periodic y  [0.0, 1.0)  regularly spaced with Δy=1.0
└── Bounded  z  [-1.0, 0.0] regularly spaced with Δz=0.333333

julia> c = CenterField(grid); set!(c, (x, y, z) -> rand())
1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 3×3×9 OffsetArray(::Array{Float64, 3}, 0:2, 0:2, -2:6) with eltype Float64 with indices 0:2×0:2×-2:6
    └── max=0.53406, min=0.317935, mean=0.436216

julia> ci_op = CumulativeIntegral(c, dims=3)
CumulativeIntegral of BinaryOperation at (Center, Center, Center) over dims 3
└── operand: BinaryOperation at (Center, Center, Center)
    └── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo

julia> ci = compute!(Field(ci_op))
1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── data: OffsetArrays.OffsetArray{Float64, 3, Array{Float64, 3}}, size: (1, 1, 3)
├── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── operand: CumulativeIntegral of BinaryOperation at (Center, Center, Center) over dims 3
├── status: time=0.0
└── data: 3×3×9 OffsetArray(::Array{Float64, 3}, 0:2, 0:2, -2:6) with eltype Float64 with indices 0:2×0:2×-2:6
    └── max=0.436216, min=0.105978, mean=0.275398

julia> rci_op = CumulativeIntegral(c, dims=3, reverse=true)
CumulativeIntegral of BinaryOperation at (Center, Center, Center) over dims 3
└── operand: BinaryOperation at (Center, Center, Center)
    └── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo

julia> rci = compute!(Field(rci_op))
1×1×3 Field{Center, Center, Center} on RectilinearGrid on CPU
├── data: OffsetArrays.OffsetArray{Float64, 3, Array{Float64, 3}}, size: (1, 1, 3)
├── grid: 1×1×3 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 1×1×3 halo
├── operand: CumulativeIntegral of BinaryOperation at (Center, Center, Center) over dims 3
├── status: time=0.0
└── data: 3×3×9 OffsetArray(::Array{Float64, 3}, 0:2, 0:2, -2:6) with eltype Float64 with indices 0:2×0:2×-2:6
    └── max=0.436216, min=0.152218, mean=0.306224

@glwagner
Copy link
Member Author

glwagner commented May 7, 2024

@jagoosw thoughts?

@glwagner glwagner requested a review from navidcy May 7, 2024 20:28
@jagoosw
Copy link
Collaborator

jagoosw commented May 8, 2024

This looks great! I think we can re-state light integration as a field reduction with this.

@glwagner glwagner added the abstractions 🎨 Whatever that means label May 8, 2024
@tomchor
Copy link
Collaborator

tomchor commented May 9, 2024

I think we can use this to compute a cumulative integral as well, adding the feature to metric field reductions.

That's a great feature to add! I can remember it being requested more than a few times over the years.

Copy link
Collaborator

@tomchor tomchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! My only question is: is "scan" a default / frequently-used name for what we're doing here? I ask because I had never heard of it and when I think of a "scan" of a Field, I don't picture what is implemented here. That said I do agree the name makes sense after seeing what it does.

@glwagner
Copy link
Member Author

This looks great! My only question is: is "scan" a default / frequently-used name for what we're doing here? I ask because I had never heard of it and when I think of a "scan" of a Field, I don't picture what is implemented here. That said I do agree the name makes sense after seeing what it does.

That's fair, I found it on wikipedia: https://en.wikipedia.org/wiki/Prefix_sum#:~:text=In%20computer%20science%2C%20the%20prefix,y0%20%3D%20x&text=y1%20%3D%20x0%20%2B%20x

That said it is meant in a more specific context there...

However, I found it to be descriptive, since the operations that Scan represents involve "scanning" one or more dimensions. That's why the object has to have a dims property, and also explains why it is a "non-local operation", in contrast to the other abstract operations that can be evaluted locally.

That said I do think we can come up with a better name, maybe we can continue to discuss here even after this is merged.

@glwagner glwagner merged commit c91df92 into main May 10, 2024
46 checks passed
@glwagner glwagner deleted the glw/scans branch May 10, 2024 12:39
@tomchor
Copy link
Collaborator

tomchor commented May 10, 2024

That said I do think we can come up with a better name, maybe we can continue to discuss here even after this is merged.

Would CumulativeScan work?

@glwagner
Copy link
Member Author

glwagner commented May 10, 2024

Ok here's an idea: replace Scan with Accumulation. I think a sum can be thought of as "accumulating" without any leaps. We then would have reducing accumulations (intermediate accumulations are not stored), and cumulative accumulations (where the result is not reduced).

We then have the mapping:

  • Scan to Accumulation
  • Accumulation to Cumulation (but we only really need the cumulative sum / integral anyways probably, so this need not be user facing anyways)
  • Accumulating to Cumulating

Honestly discussing this, the phrase "cumulative sum" is not the most clear, either. But, history.

@glwagner
Copy link
Member Author

That said I do think we can come up with a better name, maybe we can continue to discuss here even after this is merged.

Would CumulativeScan work?

Couldn't this be confusing though, if the "cumulative scan of a sum" is in fact just a sum (and not a cumulative sum)?

@tomchor
Copy link
Collaborator

tomchor commented May 10, 2024

Ok here's an idea: replace Scan with Accumulation. I think a sum can be thought of as "accumulating" without any leaps. We then would have reducing accumulations (intermediate accumulations are not stored), and cumulative accumulations (where the result is not reduced).

Pardon my being pedantic, but "cumulative accumulations" sounds pretty redundant. We might not have enough words in the English language to describe this functionality precisely 😆

@tomchor
Copy link
Collaborator

tomchor commented May 10, 2024

That said I do think we can come up with a better name, maybe we can continue to discuss here even after this is merged.

Would CumulativeScan work?

Couldn't this be confusing though, if the "cumulative scan of a sum" is in fact just a sum (and not a cumulative sum)?

I thought a cumulative scan of a sum would be a "double sum". I thought a cumulative scan of a field would be a sum. No? Maybe I'm misunderstanding what Scan does after all.

The way I understood it (based off of the wikipedia page primarily):

  • A scan of a field is a cumsum
  • A reducing scan of a field (like a scan but we only get the last index) is a sum

Is this not correct?

@glwagner
Copy link
Member Author

Pardon my being pedantic, but "cumulative accumulations" sounds pretty redundant.

I think "cumulative sum" is redundant too. If I say I have an "accumulation of beans", I'm talking about the whole pile of beans.

It doesn't seem to me that terminology exists, to answer your first question about whether "scan" is commonly used. There isn't any term that is commonly used.

@iuryt
Copy link
Collaborator

iuryt commented May 11, 2024

I am not very familiar with this use of scan, and despite "cumulative sum" sounding redundant, it express to me better the idea that I am not simply summing all values of a series. I think that this is the reason why cumsum is so common.
I agree that we might not find a perfect word for that... haha
However, I was thinking that CumulativeScan also works. Scan gives this idea of an action moving through the series and Cumulative suggests that we are summing terms as we move through.

@iuryt
Copy link
Collaborator

iuryt commented May 11, 2024

Could SerialAccumulation, SequentialSum or ProgressiveTotaling also work? Or something with Aggregation?
Not sure those are good though, just brainstorming.

@glwagner
Copy link
Member Author

glwagner commented May 13, 2024

Well to be clear, the name we are searching for is something that can describe both reductions like sum, maximum, minimum (and average and integral, which are similar to sum but involve grid metrics) --- and operations like cumsum.

The things these have in common (in contrast to local operations like derivative, plus, minus, etc) is that they involve a loop over one or more dimensions (which is why they have a dims property) and therefore cannot be evaluated locally, they have to be precomputed.

For example, we could envision also supporting sort! through the Scan abstraction...

The key property of a reduction is that its output is also lower dimensionality than the input, the dims are collapsed. On the other hand sort! and cumsum! have output with the same dimension as the input.

So we need a few things to build the abstraction. First we need a name that encompasses reductions, plus cumsum and sort. Second, we need another pair of names that express the distinction between a "reduction" and a non-dimensionality-changing-yet-still-scanning operation like cumsum and sort.

I meant the term "scan" simply to mean literally that we are going to traverse one or more dimensions (eg scanning the dimension). No doubt it can be improved but the improvement needs to be sufficiently general. that's the whole challenge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abstractions 🎨 Whatever that means
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants