New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Simplify bounds checks for multi-dimensional array accesses #18064

Merged
merged 1 commit into from Aug 23, 2016

Conversation

10 participants
@MatthiasJReisinger
Contributor

MatthiasJReisinger commented Aug 16, 2016

This simplifies the array bounds check code that is emitted for
multi-dimensional array accesses that use "regular" indexing, i.e.
accesses of the form A[i1,i2,...,iN] to some N-dimensional array
A.

For example, with this change, the access A[i,j,k] to an array
with the three dimensions m, n and o now leads to bounds checks
that correspond to the following pseudo code:

if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (k >= o)
  out_of_bounds_error();

So far, the following more complicated bounds checks would have been
emitted:

if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (((k * n + j) * m + i) >= m * n * o)
  out_of_bounds_error();

Julia also allows one-dimensional and "partial" linear indexing
(see #14770), i.e. the number of indices used to access an array does
not have to match the actual number of dimensions of the accessed array.
For this case we still have use this old scheme.

One motivation for this change was the following: expressions like
((k * n + j) * m + i) are non-affine and Polly would not be able
to analyze them. This change therefore also facilitates Polly's bounds
check elimination logic, which would hoist such checks out of loops
or may remove them entirely where possible.

@blakejohnson

This comment has been minimized.

Contributor

blakejohnson commented Aug 17, 2016

The existing code is kind of strange. What do we get by checking the total linearized index instead of just checking the last dimension?

@MatthiasJReisinger

This comment has been minimized.

Contributor

MatthiasJReisinger commented Aug 17, 2016

I think that we could save a load with the linearized check because we do not need to access the last array dimension (m * n * o isn't actually computed but loaded from memory). But I don't know if that's the real reason. Sorry, that thought was contradictory, since m * n * o has to be loaded too we don't save a load.

@timholy

This comment has been minimized.

Member

timholy commented Aug 17, 2016

@blakejohnson, that's probably "partial linear indexing" (see #14770).

@tobig

This comment has been minimized.

Contributor

tobig commented Aug 17, 2016

Thinking about this partial indexing, we should be able to handle this case without special-casing Polly by generating the last check as:

if (last_index >= product_of_remaining_dimensions())
  out_of_bound_error();

In case dimensions and indexes match, we get the simple checks Matthias is looking for. In case of partial linear indexing, we still get checks that ensure we do not access outside of allocated memory. This would make Polly work in the common case without partial indexing, but partial indexing would continue to work.

@timholy

This comment has been minimized.

Member

timholy commented Aug 17, 2016

Since there seems to be a consensus about getting rid of partial indexing (just needs someone to actually implement the change), another option would be to say you can't use it in @polly-annotated blocks.

But if it's easy to keep, that would be fine, too.

@MatthiasJReisinger

This comment has been minimized.

Contributor

MatthiasJReisinger commented Aug 17, 2016

Thank you for your comments! @tobig With this approach we would have one load for each remaining dimension, whereas with the existing linearized check we only have a single one to retrieve the linearized array length (I hope I didn't mix that up again). Would it be okay to accept this? I just wanted to ask to be one the safe side.

@tobig

This comment has been minimized.

Contributor

tobig commented Aug 17, 2016

On Wed, Aug 17, 2016, at 09:36 PM, Matthias J. Reisinger wrote:

Thank you for your comments! @tobig With this approach we would have one
load for each remaining dimension, whereas with the existing linearized
check we only have a single one to retrieve the linearized array length
(I hope I didn't mix that up again). Would it be okay to accept this? I
just wanted to ask to be one the safe side.

Right. I personally would be OK with this, but I am not part of the
Yulia side. However, I realized that this might not work. Do we actually
have the remaining dimensions? AFAIK the array only saves the full
length.

Best,
Tobias

@blakejohnson

This comment has been minimized.

Contributor

blakejohnson commented Aug 18, 2016

Do we actually have the remaining dimensions?

Yes, emit_arraysize can be used to look up the dimensions.

@ViralBShah ViralBShah added the codegen label Aug 18, 2016

@MatthiasJReisinger MatthiasJReisinger force-pushed the MatthiasJReisinger:mjr/polly/boundchecks branch from 32d9e28 to 1fc813d Aug 18, 2016

@MatthiasJReisinger

This comment has been minimized.

Contributor

MatthiasJReisinger commented Aug 18, 2016

Thank you for your thoughts again! I now updated the PR in the following way: As soon as we have an array access that uses (1D or partial) linear indexing we still use the existing bounds checking, but in the case of regular indexing the simpler check is used (to be on the safe side here: is the check nidxs < nd sufficient to determine the presence of partial indexing?). That also allows us to remove the special-handling for Polly and still gives us the simpler bounds checks. Then, for code parts that use @polly it would be sufficient to require the absence of partial indexing as @timholy suggested.

Do you think this is the direction we could choose?

builder.CreateCondBr(builder.CreateICmpULT(i, alen), endBB, failBB);
} else {
// We have already emitted a bounds check for each index except for
// the last one which we therefore emit here, i.e.

This comment has been minimized.

@tobig

tobig Aug 18, 2016

Contributor

This sentence matches both branches and should be moved before the if (bc). Please also add a comment discussing the difference of the two checks and why each is used. Something like:

For the last dimension there are two options of how to emit bounds checks. In case only a single dimension remains (the number of subscript expressions in the array matches the number of array dimensions), we check if the outermost subscript is smaller than the size of the outermost dimension. In case more than one array dimension is left when we reach the last subscript (partial indexing), we verify if the computed element address is within the bounds of the allocated memory range.

@tobig

This comment has been minimized.

Contributor

tobig commented Aug 18, 2016

Hi Matthias,

thanks for the update. I like this change a lot. Two remarks:

  1. The first commit in this PR seems now unnecessary.

  2. The commit message in the second commit is not fully up-to-date. It currently states we use simpler checks when Polly is used, but we now generate them as soon as no partial indexing is used. This means, Polly is now only a motivating use for this change.

@MatthiasJReisinger MatthiasJReisinger force-pushed the MatthiasJReisinger:mjr/polly/boundchecks branch from 1fc813d to 39e1dc5 Aug 18, 2016

@MatthiasJReisinger MatthiasJReisinger changed the title from WIP: Emit simpler array bounds checks when using Polly to WIP: Simplify bounds checks for multi-dimensional array accesses Aug 18, 2016

@MatthiasJReisinger MatthiasJReisinger force-pushed the MatthiasJReisinger:mjr/polly/boundchecks branch from 39e1dc5 to ecd4a37 Aug 18, 2016

@MatthiasJReisinger

This comment has been minimized.

Contributor

MatthiasJReisinger commented Aug 18, 2016

Thank you for the feedback! I addressed it in the updated PR.

Best,
Matthias

@tobig

This comment has been minimized.

Contributor

tobig commented Aug 18, 2016

This looks great from my side. What do Julia people think?

Simplify bounds checks for multi-dimensional array accesses
This simplifies the array bounds check code that is emitted for
multi-dimensional array accesses that use "regular" indexing, i.e.
accesses of the form `A[i1,i2,...,iN]` to some `N`-dimensional array
`A`.

For example, with this change, the access `A[i,j,k]` to an array
with the three dimensions `m`, `n` and `o` now leads to bounds checks
that correspond to the following pseudo code:

```
if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (k >= o)
  out_of_bounds_error();
```

So far, the following more complicated bounds checks would have been
emitted:

```
if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (((k * n + j) * m + i) >= m * n * o)
  out_of_bounds_error();
```

Julia also allows one-dimensional and "partial" linear indexing
(see #14770), i.e. the number of indices used to access an array does
not have to match the actual number of dimensions of the accessed array.
For this case we still have use this old scheme.

One motivation for this change was the following: expressions like
`((k * n + j) * m + i)` are non-affine and Polly would not be able
to analyze them. This change therefore also facilitates Polly's bounds
check elimination logic, which would hoist such checks out of loops
or may remove them entirely where possible.

@MatthiasJReisinger MatthiasJReisinger force-pushed the MatthiasJReisinger:mjr/polly/boundchecks branch from ecd4a37 to 9f531d9 Aug 19, 2016

@MatthiasJReisinger MatthiasJReisinger changed the title from WIP: Simplify bounds checks for multi-dimensional array accesses to RFC: Simplify bounds checks for multi-dimensional array accesses Aug 22, 2016

@vtjnash

This comment has been minimized.

Member

vtjnash commented Aug 22, 2016

lgtm. this seems like a good find and a good fix. I'll merge if nanosoldier agrees (which I expect it will).

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier

This comment has been minimized.

nanosoldier commented Aug 22, 2016

Something went wrong when running your job: RemoteException(3,CapturedException(NanosoldierError: failed to run benchmarks against primary commit: ErrorException("failed process: Process(make, ProcessExited(2)) [2]"),Any[( in pipeline_error at process.jl:612,1),( in run at process.jl:588,1),( in build_julia! at build.jl:54,1),( in execute_benchmarks! at BenchmarkJob.jl:251,1),( in run at BenchmarkJob.jl:178,1),( in #503 at multi.jl:1410,1),( in run_work_thunk at multi.jl:996,1),( in macro expansion at multi.jl:1410 [inlined],1),( in #502 at event.jl:46,1)]))
cc @jrevels

@blakejohnson

This comment has been minimized.

Contributor

blakejohnson commented Aug 22, 2016

LGTM, too.

@jrevels

This comment has been minimized.

Member

jrevels commented Aug 22, 2016

Looks like doing a clean build from master is recently broken (is that correct @tkelman?), so Nanosoldier isn't going to be able to build Julia until that gets resolved.

@kshyatt kshyatt added the arrays label Aug 22, 2016

@tkelman

This comment has been minimized.

Contributor

tkelman commented Aug 23, 2016

the nightly build hit a different failure, which may or may not happen again...

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier

This comment has been minimized.

nanosoldier commented Aug 23, 2016

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@vtjnash vtjnash merged commit d474ca3 into JuliaLang:master Aug 23, 2016

2 of 3 checks passed

Nanosoldier possible performance regressions were detected
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment