Skip to content

Commit

Permalink
Migrate dask-image org's subpackages (#10)
Browse files Browse the repository at this point in the history
* Try to import tifffile if possible

First try to import `tifffile` directly. As this is a dependency of
`pims` in most cases this should work.  However some older versions of
`pims` (particularly for Python 3.4) pull in `scikit-image` instead for
some reason. So if `tifffile` can't be imported directly, try importing
it from `scikit-image`'s `external` package.

* Monkey patch pims to use the available tifffile

This seems to occur on Python 3.4 as the pims 0.3.3 package on
conda-forge pulls in scikit-image instead of tifffile, but seems to not
be able to use it. To fix that we import any available tifffile and
monkey patch pims only if it didn't find any tifffile. However there is
a chance we failed to find tifffile as well. In that case this amounts
to a no-op.

* Ignore test coverage when scikit-image is missing

Even though it is extremely unlikely that both `tifffile` and
`scikit-image` will not be present, we would still like to gracefully
handle this case. Instead of dropping the exception handling here
because it is  technically not covered in our testing, simply note to
coverage that it should not bother with this case.

* Rename `imread`'s `fn` to `fname`

Changes the filename parameter of `imread` from `fn` to `fname`. This is
done to be more consistent with SciPy and scikit-image, which both use
`fname`. Though this differs from Dask, which uses `filename`.

* Add a docstring for imread

Provides some brief explanation of about how `imread` works and what it
returns.

* Test another negative number of frames

Also test -2 as an argument to `nframes` to make sure it also fails.

* Drop ValueError if `nframes` is -1

As having `nframes` be `-1` will mean having one chunk, we no longer
should raise a `ValueError` when that value occurs. Also drop the test
for a `ValueError` if `nframes` is `-1`.

* Treat `nframes` equal to `-1` as one big chunk

If the user provides `nframes` equal to `-1`, simply replace `nframes`
with the number of frames in the image. In other words, the entire image
will be loaded into memory for computation. This may be handy in image
data that is spread out amongst many small image files. For cases like
this, it may be completely reasonable to read the whole image into
memory as IO may be more expensive than the memory consumed. Include
tests to make sure that this value of `nframes` behaves as expected both
in terms of being able to load the data and provide the expected number
of chunks, namely `1`.

* Drop test directory removal

Windows seems to complain if we try to remove the test directory at the
end of a test run. So simply drop the directory removal.

* Add a note about our contributions to _compat

These are all borrowed from Dask for compatibility purposes until we can
move to a newer version of Dask. This will be possible once we drop
Python 3.4 as a deployment target.

* Add a note about our contributions to test__compat

These are all borrowed from Dask for compatibility purposes until we can
move to a newer version of Dask. This will be possible once we drop
Python 3.4 as a deployment target.

* Borrow our changes to fftfreq from Dask

Handles an issue where the chunking of the resultant array may not match
the chunking the user specified. Fixes this by simply rechunking if the
chunks don't match at the end of the function.

ref: dask/dask@6dc9e07
ref: dask/dask@1ddb383

* Backport fftfreq test change for handling chunks

This is a backported change that we submitted to Dask to test different
kinds of chunks that could be specified to `fftfreq`. The change to
`fftfreq` has already been incorporated. So this just exercise it a bit
more.

ref: dask/dask@6a7a507

* Backports test changes for fftfreq with chunks

Backports some changes we submitted to Dask to test that `fftfreq` would
return the expected chunking in the resultant array.

ref: dask/dask@fb72d81

* Test all Fourier filters preserve chunking

Updates all of the Fourier filter test to also check that the resultant
array has the same chunking as the input array. This is important to
make sure that it is easy to do an inverse FFT without needing to
rechunk first.

* Test Fourier filters with a chunked `s`

Just to make sure that chunking doesn't get messed up, add a test with a
chunked `s`. Really this should effect anything as the Fourier filter is
normally reduced along `s`. So any chunking of `s` should already be a
non-issue by the time the Fourier filter is applied on the data. Still
this is a good sanity check.

* Add a test for different shapes and chunks

Includes a test that tries different combinations of shapes and chunks
with the Fourier filters. The purpose of this test is both to ensure the
chunking remains unchanged and to ensure that the results remain
accurate. Though as these operations are element-wise, chunking is
unlikely to be a cause of inaccuracies unlike with other filters, which
require an overlap.

* Pull in our fftfreq simplifications from upstream

Backports some changes we made to Dask's `fftfreq` into our `_compat`
version of `fftfreq` to generally clean up the `fftfreq` implementation.
Not only does it simplify the implementation, it seems to speed it up in
some cases.

Switched to using `map_blocks` and a custom Python function, which is
applied over `arange` to implement `_fftfreq` internally. This makes the
overall implementation cleaner and simpler. Also avoids needing various
chunking related tricks. Plus it appears this goes a bit faster than the
old implementation. How much faster seems to depend on the size of the
frequency range and the size of the chunks.

ref: dask/dask@2b2d9f9

* Drop unused `itertools` import in `_compat`

* Drop unused `collections` import in `_utils`

* Drop unused `numpy` import in `test__utils`

* Drop unused dask.array.utils import in test__utils

* Remove core module

* Drop import of core module

* Fix import test

Now that core is removed we need to import the top-level module instead.

* Add compatibility module stub

Provides a compatibility module stub for containing various functions to
ensure compatibility with the much older Dask 0.13.0. This is needed as
this is the newest version of Dask for Python 3.4 on conda-forge. Once
we drop Python 3.4, we can use a newer version of Dask and drop most if
not all of this module's contents.

* Vendor indices based on our Dask contribution

This is just our code that we contributed to Dask. So there is no issue
with us vendoring it here. That said, we use a BSD 3-Clause license just
like Dask. So if there were any issue, we are basically using the same
license as well. This is just borrowed from the same vendoring in
`dask-ndfourier`.

ref: dask-image/dask-ndfourier@039862e

* Vendor _isnonzero_vec from our Dask contribution

Not really of much use itself, `_isnonzero_vec` is necessary to help
determine non-zero values in a Dask Array where type conversion to
`bool` does not work (e.g. `str`s). This is our contribution to Dask so
we are free to include it.

* Vendor _isnonzero from our Dask contribution

Not really of much use itself, `_isnonzero` is necessary to help
determine non-zero values in Dask Arrays. This is our contribution to
Dask so we are free to include it.

* Workaround missing asarray in Dask 0.13.0

There is no asarray in Dask 0.13.0. So workaround this by coercing all
non-Dask arrays (or non-arrays) into NumPy arrays and then Dask arrays.

* Vendor argwhere from our Dask contribution

Basically verbatim the same as what we submitted upstream to Dask for
the `argwhere` function. Though in Dask 0.13.0 this is not a lazy
implementation due to `compress` being non-lazy. Support for unknown
dimensions in Dask 0.13.0 is in its nascent stages. So it may be
difficult to get a lazy implementation that works ok there, but will
give that a try. Fortunately newer versions of Dask (not yet released)
should have this builtin.

* Parameterize arg_where test

* Restrict some tests to certain Dask versions

Not all of the tests for the vendored compatibility functions can be run
with all versions of Dask. This is due to some missing features and/or
bugs in previous versions of Dask. As these tests were never designed to
run on such an old version, we simply skip some old versions of Dask for
some tests if they don't work. There is no point in trying to "fix" them
as the code was designed with a newer version of Dask in mind. Also, at
some point, we intend to drop the compatibility functions once a newer
version of Dask is required.

* Workaround reshaping constraint w.r.t. chunks

Seems there are situations where reshaping use to require only 1 chunk.
So this fixes the non-trivial test to have one chunk.

* Replace compress with a lazy equivalent

Makes use of `atop` to wrap up NumPy's `compress` and apply it to
different blocks. Adjusts the chunks afterwards with a hack so as to
note one of the dimensions is actually unknown.

* Move array check to argwhere

It is needed here to ensure that chunks is properly defined. Also by
doing this conversion first, we are sure it is satisfied for the
non-zero step as well.

* Add a test for argwhere and NumPy arrays

Provides a simple test to ensure argwhere converts NumPy arrays to Dask
arrays internally.

* Refactor out _asarray

Similar in nature to Dask's own `asarray` function. Since it is not
available in Dask 0.13.0, we add our own vendored copy based on code
refactored out of `argwhere`.

* Include some tests for _asarray

Make sure that it correctly converts everything to a Dask Array. Try
using a Python list, NumPy array, and Dask Array. Also make sure the
final array has the same contents as the original.

* Add another newline before author/date info

* Make compatibility tests run with Python

* Add a stub for some utility functions

Creates stub module, `_utils`, for a few internal utility functions that
we don't want to expose as part of the API.

* Add _assert_eq_nan to compare arrays that have NaN

As comparisons with NaN are false even if both values are NaN, using the
`assert_eq` does not work correctly in this case. To fix it, we add this
shim function around `assert_eq`. First we verify that they have the
same NaN values using a duck type friendly strategy (comparing the
arrays to themselves) and then comparing the masks of each array to the
other. Second we zero out all the NaN values and compare the resulting
arrays, which should now work as normal. This works just as well whether
the arrays are from NumPy, Dask, or some other library given they
support these basic functions.

* Include tests for _assert_eq_nan

Just provides some basic tests to make sure it can compare to arrays
that are the same and find that they are equal. Also checks that when
arrays shouldn't be equal that the function raises an `AssertionError`.
Include arrays with and without NaN. Also includes different types of
arrays to make sure it behaves ok.

* Implement center_of_mass using Dask Arrays

Provides a basic implementation of `center_of_mass` akin to SciPy
ndimage's function except that it uses Dask Arrays instead of NumPy
arrays. Also as SciPy's implementation seems to support multiple
indices, we try to do the same here. SciPy uses a variety of lists and
arrays in their result. We do not bother to do this here for a few
reasons. First it makes it harder to manipulate the data. Second it
makes evaluation of the result more cumbersome. Third the structure is
not of particular value.

* Include tests for  center_of_mass

Compares the result between SciPy's implementation and Dask's
implementation. Some of the tests include just providing the array,
including a label image, and including a label image and label indices.
For label indices a variety of formats are tried including scalar, 1-D
array, and various N-D arrays. These are important to test as SciPy
allows all of them and formats the output differently depending on how
the user provides them. Also we test the case where the label index
provided is not present (results in `nan`s for the center of mass).

* Test error cases from center_of_mass

Only one error case raised currently. So we go ahead and test that error
in this unit test. Though we could test other cases in this function in
the future as well.

* Rename some variable in center_of_mass

Should make things a little clearer about what the different variables
mean.

* Shorten a long line in center_of_mass

* Delay conversion of label matches to int

* Rename input_ind* to input_i*

Shortens the variable name and reduces the ambiguity with `index`, which
shared an unfortunately similar name.

* Use Dask Array's where instead of multiplying

Ideally using `where` should be faster than casting an array of `bool`s
to `int`s and multiplying them. Plus it should be more robust to things
like `nan`, which multiplying would just propagate.

* Find matching indices based on labels specified

In an attempt to make it easier to refactor more code, mask out the
indices of interest based on the masks for each selected label. Then use
this in the computation of the weighted average to find the center of
mass.

* Rename input_mtch_ind_wt to input_i_mtch_wt

Fits closer with the naming scheme used earlier in `center_of_mass`.
Also makes the variable's purpose a little clearer.

* Reorder two lines

Appears better stylistically with code later in `center_of_mass`.

* Add stub modules to hold utility functions

Creating these modules to hold some useful internal functions to be
refactored out. Should make it easier to maintain the API exposed
functions.

* Add arg normalizing function for labels and index

Refactored from code in `center_of_mass`, the `_norm_input_labels_index`
function handles the normalization of `input`, `labels`, and `index`
arguments. As these particular arguments will show up repeatedly in the
API, it will be very helpful to have normalize them in the general case
so the API functions can be focused on their particular computations.

* Use arg normalizing function in center_of_mass

As this normalization code has been refactored out into an independent
internal utility function, simply make use of it to implement
`center_of_mass`.

* Add tests for argument normalization function

Exercise a failing and passing case in two tests.

* Adds a function to get label matches

Provides a simple refactored internal utility function that uses the
input array, label array, and selected indices to find masks of selected
labels in order and use the masks to selected relevant regions in the
input data. Both the masks and selected input data are returned to the
caller.

* Tests the label matching function

Provides a parameterized test to exercise the comparison of selected
indices to the label image and the application of these masks to the
input data. Tries different formulation of indices. Includes a NumPy
variant that it compares the results of to our Dask Array
implementation.

* Make use of label matching function

Clean up `center_of_mass` to make use of `_get_label_matches`. This
should help reduce code duplication and keep `center_of_mass` focused on
the relevant computation.

* Drop unused imports

* Add stub modules for Python 2/3 compatibility

Should provide the bare minimum code to smooth over the differences
between Python 2 and Python 3. Also should help keep their performance
comparable.

* Store iterable range as irange

Basically pick up `xrange` on Python 2 and `range` on Python 3. So as
not to override a builtin name, store the result to `irange`. This also
makes it clearer how this `range` will behave. Plus it avoids using the
remove `xrange` syntax in Python 3.

* Add a basic test for irange

Make sure `irange` is there, it doesn't return a list, and it acts like
`range` on some test arguments.

* Make use of irange in place of range

Using `irange` to get an iterable should have some performance
improvements on Python 2 as much of the code assumed `range` was an
iterable. So replacing it with an alias to `xrange` should help Python 2
out. Though there is no expected improvement for Python 3 where `range`
was already an iterable.

* Vendor/refactor lazy implementation of compress

This is needed for some other use cases that `dask-ndmeasure` is trying
to cover. Filtering out labels that are not being used is essential in
these cases and replacing them with 0 simply won't work. Though it
should be able to speed up cases where we did replace values with 0.

While `compress` will be included in newer (not yet released) versions
of Dask, it is not present in any released version and it is especially
not in Dask 0.13.0, which is needed for Python 3.4 support. So for now
we include a simple implementation of `compress` that is based on code
in the vendored `argwhere` with a few tweaks and validation checks.

* Include some tests for compress

Covers various error cases that may occur if preconditions are not met.
Also tests a few different correct cases to validate correctness against
the NumPy implementation.

* Use compress in argwhere implementation

* Simplify the label matches function

Focus only creating a mask of selected labels and none of the other
products that this function was creating before.

* Update the label matching tests

* Update center of mass w.r.t. label matching

Include some more specialized code back into the center of mass
computation and update the computation to use the changed label matching
function.

* Move index transpose into center_of_mass

The index transposing behavior (i.e. the selected labels) appears to be
unique to the `center_of_mass` computation or at least it is not present
in `sum`. So this moves that tweak into the `center_of_mass`
computation. That way it doesn't contaminate other computations that do
not need these. Further it allows the argument normalization function to
be reused repeatedly without ill effect.

* Add sum function for label images

Provides a basic implementation for summing over values matching labels
of interest. Similar to the SciPy ndimage function of the same name.

* Test the sum implementation

Make sure this matches up with SciPy's implementation for a variety of
different input parameters. Borrows from the `center_of_mass` tests to
create these tests.

* Rewrite center_of_mass using sum

As the behavior in `center_of_mass` can basically be seen as computing a
few arrays and summing them, just simplify the summing portions by
reusing our `sum` function for them. This significantly reduces the
complexity of the `center_of_mass` function.

* Add mean

Provides an implementation of `mean` akin to SciPy ndimage's `mean` for
use with label images.

* Include tests for mean

Provides some tests for our `mean` implementation that compare it to
SciPy's implementation. Tries with a variety of parameters to make sure
it behaves ok. Borrows from the testing used for `sum`.

* Add an implementation of variance

Provides a Dask Array implementation of variance, which is designed to
be similar to SciPy's ndimage implementation. Simply computes the
variance over each label. Reuses our mean implementation to construct
this easily.

* Include tests for variance

Simply exercises our implementation of variance and compares it against
the implementation included in SciPy. Largely borrows these tests from
those used for sum.

* Add standard_deviation

Provides a Dask Array implementation of SciPy ndimage's
`standard_deviation` function. Computes the standard deviation over the
selected labels in the array. Leverages our `variance` implementation to
make this work easily.

* Includes test of standard_deviation

Simply exercises our `standard_deviation` implementation for Dask Arrays
and compares the results with those from SciPy ndimage. The tests are
borrowed from other similar tests that we have for API functions.

* Parameterize `center_of_mass` test

Configure this test with the ability to parameterize the function being
tested. This will help consolidate other similar tests into this one to
avoid code duplication and bit rot.

* Consolidate all sum-based property tests

* Test more functions for error cases

Simply iterates over all the functions currently in the API to see if
they all correctly raise the same error.

* Pass Dask Array of labels to Dask function

This was turned into a Dask Array before anyways, but it makes more
sense to use the Dask Array that we created for the label image.

* Rename variable in label match computation test

These names are holdovers from when the test was just for the center of
mass computation. However now that doesn't make sense now that covers
many different computations. So this renames the variables to be a bit
more general in this context.

* Handle scalar values in _assert_eq_nan

Add some special handling in `_assert_eq_nan` to handle having scalar
values passed in. Basically ensure that everything provided is an array.
This is a no-op for arrays, but converts scalars into 0-D arrays. By
doing this, we are able to use the same `nan` handling code. Also
convert the 0-D arrays back to scalars. This means a 0-D array will be
treated as a scalar in the end. However Dask doesn't really have a way
to differentiate the two. So this is fine.

* Test _assert_eq_nan with scalars and 0-D arrays

Make sure that the fix included works for scalar and 0-D arrays both
from NumPy and Dask.

* Add _ravel_shape_indices

Provides a utility function that computes `indices` for the raveled
case, but ensure that it is shaped and chunked so as to match the
`input` array provided.

* Test _ravel_shape_indices

Provide some simple tests that verify this matches what one would expect
from the intended operation with NumPy.

* Fix test label array construction

Was using Dask Array's in the construction of the label array, which
works on old versions of Dask, but not newer versions. As we really want
to use NumPy arrays in the construction of all test data and that will
fix the problem, go ahead and switch the label array construction to use
NumPy arrays exclusively.

* Fix testing of empty arrays with compress

Before the test data generated was being sliced if `axis` was specified.
However this makes no sense if the test data has an empty dimension as
it may mean slicing along the 0 length axis. Even if slicing doesn't
occur along the 0 length axis, we already know the result will be a
length 0 1-D array. We can get to this result much faster by simply
flattening the array instead of slicing. Hence we use this strategy to
fix the tests with newer versions of Dask.

* Add labeled_comprehension

Provides an implementation of `labeled_comprehension` for use with Dask
Arrays. Mimics the implementation from SciPy ndimage. Should be possible
to rewrite everything in terms of `labeled_comprehension` and make use
of it in implementations of other computations that have not yet been
added. Also this should make it easy to support all other manner of
computations over label images that are not already (or will not be)
covered by the API.

* Include tests for labeled_comprehension

Simply compare the results from our implementation of
`labeled_comprehension` with Dask Arrays to that of SciPy ndimage's.
Tries a variety of different parameters to ensure the original behavior
is preserved.

* Handle `labeled_comprehension` comparison directly

As Dask Arrays using delayed appear to be invalid according to its own
array comparison test (particularly in Dask 0.14.0+), simply handle the
comparison of the two results ourselves. This way we can be sure that
the two results are treated correctly regardless of which version of
Dask is in use.

* Rework core tests without `assert_eq` comparisons

As newer versions of Dask appear to have issues using `assert_eq` on
`delayed`-based Dask Arrays, `assert_eq` cannot be used anywhere
`delayed` might be used. Since `labeled_comprehension` will be needed to
implement some new functionality and it makes use of `delayed`,
`assert_eq` cannot be used in `test_core`. So we replace all usages of
`assert_eq` and related functions with our own direct comparisons.

* Add minimum function for label images

Computes the smallest value within each label.

* Run parameterized tests on minimum as well

Provides simple tests for errors and comparisons between our Dask Array
implementations and SciPy.

* Add maximum function for label images

Computes the largest value within each label.

* Run parameterized tests on maximum as well

Provides simple tests for errors and comparisons between our Dask Array
implementations and SciPy.

* Add maximum_position

Provides a Dask Array implementation of `maximum_position`, which
behaves similarly to SciPy ndimage's `maximum_position`.

* Test maximum_position

Simply add `maximum_position` to our parameterized test to compare
results with SciPy ndimage's implementation.

* Add minimum_position

Provides a Dask Array implementation of `minimum_position`, which
behaves similarly to SciPy ndimage's `minimum_position`.

* Test minimum_position

Simply add `minimum_position` to our parameterized test to compare
results with SciPy ndimage's implementation.

* Add extrema

Provides a Dask Array implementation of `extrema`, which behaves
similarly to SciPy ndimage's `extrema`.

* Test extrema

Simply add `extrema` to our parameterized test to compare results with
SciPy ndimage's implementation.

* Rewrite of minimum that uses nanmin

This basically is a clever trick to avoid using `labeled_comprehension`
in the computation of `minimum`. It works by converting the data to
`float64` and placing `nan` over all unneeded values. This way `nanmin`
can be run and all of the `nan` values will be ignored (unless there are
only `nan` values). Using info about which labels could be computed,
`where` is run over the final result to replace those values that could
not be computed with `0`. The result is then cast back to its original
type. This change up should allow the computation to proceed faster than
it would by going through `delayed` in `labeled_comprehension`.

* Rewrite of maximum that uses nanmax

This basically is a clever trick to avoid using `labeled_comprehension`
in the computation of `maximum`. It works by converting the data to
`float64` and placing `nan` over all unneeded values. This way `nanmax`
can be run and all of the `nan` values will be ignored (unless there are
only `nan` values). Using info about which labels could be computed,
`where` is run over the final result to replace those values that could
not be computed with `0`. The result is then cast back to its original
type. This change up should allow the computation to proceed faster than
it would by going through `delayed` in `labeled_comprehension`.

* Add a 3-D test case to the mix

It is pretty easy to mess things up, but still get the right result in
the 2-D test case (e.g. indexing by 1 or -1 is the same in 2-D). Having
a 3-D test case breaks symmetry for some cases, which makes it a bit
easier to check if the code is doing the right thing.

* Tweak ordering in labeled_comprehension

Compute the label index matches before packing the arguments for use
with the user function.

* labeled_comprehension: `indices` -> `positions`

Rename `indices` to `positions` in `labeled_comprehension` to match more
closely with SciPy's terminology. Plus it clashes with `index` less. So
should avoid some confusion on that front.

* Unwrap line in labeled_comprehension

Fits on one line now (with PEP8 constraints). Makes the code more
compact so more fits on a page.

* Name all arguments in _labeled_comprehension_func

As the user specified function, `func`, needs to take a particular
argument structure that is known, there is no need to leave this so
vague in `_labeled_comprehension_func`. Doing this does mean there is an
additional check each time the function is run. However having the
arguments specified will help us remove things like `compute`, which
should speed up the computation on the Dask end.

* Drop compute from _labeled_comprehension_func

There is no need to separately check whether the computation should
proceed or not. This information should already be known based simply on
whether the input array selection is non-empty. Thus we can avoid a
computation on the Dask end and the comprehension can proceed just the
same.

* labeled_comprehension: Get position before `for`

Handle the check as to whether to compute or include the position before
entering the label comprehension `for`-loop. This not only avoids
repeating this check inside the `for`-loop, but it avoids constructing
the raveled indices if the won't be used. Should be more efficient when
the position is not needed in the user function.

* Fused delayed steps in labeled_comprehension

Instead of calling `dask.delayed` on the function and storing it before
converting it to a Dask Array (and storing it again), simply pass the
result of `dask.delayed` directly into `dask.array.from_delayed` before
storing it in the NumPy object array. This cuts down on the
retrieval/stores to NumPy object array and should help streamline things
a bit. May want to consider refactoring this into `_utils` a bit to
cutdown on the visual clutter.

* Refactor out labeled_comprehension delay handling

Moves the utility functions involved in wrapping up the user provided
reduction function for use with Dask into `_utils`. This cleans up the
code in `labeled_comprehension` a bit. Also it avoids some small
overhead incurred in delaying the utility function.

* Test more functions with 3-D cases

In particular, tests `extrema` and `labeled_comprehension` with 3-D test
data. Should help validate that these function are really behaving well
in N-D cases.

* Avoid coercing out_dtype in comprehension wrappers

The `out_dtype` argument should already have a proper dtype due to the
similar coercing in `labeled_comprehension`. So simiply use `out_dtype`
as if it is correct.

* Skip converting default in comprehension wrapper

This conversion of `default` to the expected type, `out_dtype`, already
happens in `labeled_comprehension`. So there is no need to perform this
conversion a second time in the wrapper. Simply use the value and return
it.

* Skip storing result in comprehension wrapper

No need to store the Dask Array result as it is merely going to be
returned immediately afterwards. So simply return the result as soon as
`from_delayed` finishes.

* Coerce default using out_dtype in comprehensions

Simply use `out_dtype`'s `type` to coerce `default` to the expected
return type. This is much simpler than going through a NumPy array to
create the scalar.

* Construct index ranges once in comprehensions

The `range`s were being constructed each time the `for`-loop went
through an iteration previously. By moving the construction of the
`range`s out of the `for`-loop each `range` is constructed once. Then
only the product of `range`s need be constructed on entering the
`for`-loop (as this dependent on how many `range`s are used). Also reuse
these constructed `range`s in the first `for`-loop as well.

* Store each indexed label match in comprehensions

To avoid having to potentially index the label match twice during graph
generation (like when the user function needs values and positions),
store each index selection temporarily in the `for`-loop so as to reuse
it.

* Rewrite minimum to use labeled_comprehension

Based on recent optimizations to `labeled_comprehension`, it appears
that finding the minimum with `labeled_comprehension` is faster than
trying to the original implementation of `minimum`. So simply rewrite
`minimum` to use `labeled_comprehension` instead. This will also improve
the test coverage of the `labeled_comprehension` code path.

* Rewrite maximum to use labeled_comprehension

Based on recent optimizations to `labeled_comprehension`, it appears
that finding the maximum with `labeled_comprehension` is faster than
trying to the original implementation of `maximum`. So simply rewrite
`maximum` to use `labeled_comprehension` instead. This will also improve
the test coverage of the `labeled_comprehension` code path.

* Add median

Provides a Dask Array implementation of `median`, which behaves
similarly to SciPy ndimage's `median`.

* Test median

Simply add `median` to our parameterized test to compare results with
SciPy ndimage's implementation.

* Use float64 in median always

Seems SciPy's `median` always uses `float64`. So we switch over as well.

* Return nan for unknown values with median

This is more consistent with `median` normally behaves in NumPy. Doing
this even though SciPy is a bit inconsistent when it comes to return a
value for a missing label.

* Add histogram

Provides a Dask Array implementation of `histogram`, which behaves
similarly to SciPy ndimage's `histogram`.

* Test histogram

Simply add `histogram` to our parameterized test to compare results with
SciPy ndimage's implementation.

* Add label

Provides a Dask Array implementation of `label`, which behaves similarly
to SciPy ndimage's `label`. This implementation is restricted to 1 chunk
ATM. Even though this would effectively be enforced by using `delayed`,
it is nice to require this of the user outright so that they know what
will happen.

* Test label

Simply add `label` to our parameterized test to compare results with
SciPy ndimage's implementation.

* Simplify maximum and minimum position functions

After performing `where` calls to get the maximum and minimum positions
over labels respectively, convert the results back to the same type used
for `indices` instead of specifying the integral type independently.

* Use int in argwhere, minimum, and maximum position

This ends up matching behavior on Windows better. Plus the behavior is
no different on Unixes.

* Drop unneeded vendored functions

The vendored functions `_argwhere` and `_compress` were expected to be
useful in writing this library early on. However it seems they are not
needed after all. So this strips them from the compatibility module and
associated tests.

* Use Python int in center_of_mass

* Use Python int in maximum_position

* Use Python int in minimum_position

* Use Python int in labeled_comprehension

* Use Python int in histogram

* Use Python int for default labels and index

* Fix expected array type for default index

* Use intp in center_of_mass

* Use intp in maximum_position

* Use intp in minimum_position

* Fix expected array type for labels w/default index

* Revert use of intp

On Windows, this seems to only work correctly for Python 2.7 64-bit, but
fails everywhere else. As such, it would be better to have mostly
correct behavior on a wide range of architectures and Pythons (including
Python 3) than to fix only Python 2.7 64-bit. Hence we revert the use of
`intp`.

* Soften type constraints tests

Unfortunately it is too difficult to match type behavior on Windows
without making a mess of the code. So simply relax type checking in the
tests to workaround these issues.

* Drop unused functools import from _compat

* Drop unused numbers import from _compat

* Drop unneeded _test_utils import from test_core

* Drop _test_utils module

As we are no longer making use of `_assert_eq_nan` and nothing else is
present in the `_test_utils` module, simply drop the module and
associated tests.

* Raise FutureWarning for index with rank above 1

If the `index` has a rank greater than 1 (e.g. a matrix), SciPy ndimage
measurements' functions often handle these in some fashion. Though they
don't always handle this case nor do they handle it consistently. We
have tried to mimic their behavior here, but it is sometimes a little
strange. So as to discourage users from making use of this behavior, we
raise a `FutureWarning` to point out that it is subject to change.

* Test warning for index of rank higher than 1

Provides a simple test to verify that the `FutureWarning` is raised with
an `index` of rank higher than 1 (e.g. a matrix) is provided. Also
ensure that no warning is raised when an `index` of rank 1 or lower is
passed.

* Simplify handling of default labels and index

Avoid performing extra computations on the default cases of `labels` and
`index` by simply filling them with their default values. Also most
SciPy documentation mentions that `labels` defaults to the whole image.
We have been restricting it to the non-zero portion, but that is
technically incorrect. Changing it to be the whole image hasn't had any
negative effects before, but we go ahead and do it now to match SciPy's
docs more accurately.

* Use Dask Array form of index in utils tests

Make sure to use the Dask Array form of `index` in the `utils` tests.

* Drop some unneeded code from utils tests

This code was used to compare results of the `_get_label_matches`
function. However that function no longer returns these other values. So
there is no need to generate something similar in the tests as the
comparison is not being made.

* Wrap long line in the docs

* Freshen up API documentation

Try to fix up the API documentation to be more consistent with how the
Dask-based functions are written. Mainly this means pointing out things
are arrays. Though it also means streamlining documentation in terms of
how common default parameters are handled and how the results are
explained. Documentation is aimed to be clear, but also brief to avoid
redundancy or the potential of confusion.

* Update some doc API headings

Missed fixing these to match the new API doc format when refreshing the
docs. So this tidies them up after the fact.

* Fix min and max position unraveling

Appears the remainder was only getting the singleton dimension added to
it. However the original positions should have gotten a singleton
dimension as well.

* Rewrite maximum_position to use argmax

Makes use of `labeled_comprehension` to rewrite `maximum_position`
around `argmax`. This way there is no need to compute `maximum` first to
find `maximum_position`. Also this removes a bunch of other steps that
`maximum_position` had added in previously to reuse `maximum` to get the
final result. The net result should be a cleaner implementation that is
easier to understand and maintain. Plus this ends up being ~33% faster
than the original implementation.

* Rewrite minimum_position to use argmin

Makes use of `labeled_comprehension` to rewrite `minimum_position`
around `argmin`. This way there is no need to compute `minimum` first to
find `minimum_position`. Also this removes a bunch of other steps that
`minimum_position` had added in previously to reuse `minimum` to get the
final result. The net result should be a cleaner implementation that is
easier to understand and maintain. Plus this ends up being ~33% faster
than the original implementation.

* Drop now unused operator import

* Use an iterable range in core tests

Was using `range` on both Python 2 and Python 3. The problem being that
on Python 2 this allocates a `list` with the contents of everything
included by `range`'s bounds and step size. To fix this, simply use a
Python 2/3 compatibility trick to get an iterable range on Python 2.
Behavior on Python 3 is unchanged (excepting an alias).

* Use labeled_comprehension in sum

Rewrites our `sum` implementation to make use of `labeled_comprehension`
for performing the computation. This should provide some more coverage
of `labeled_comprehension` and give us more confidence in using it in
other cases. Also as masks are used to select out the data of interest,
we are able to avoid performing computations on values that are not of
interest (i.e. not in the mask). As such we are able to achieve a not
insignificant speedup (~40%).

* Update __init__.py

label works correctly with multiple chunks, but just consolidates the image before running the computation

* Update __init__.py

fixing message

* Update test_core.py

fixing label test since chunk limit has been removed

* Bump PIMS requirement to 0.4.1

As PIMS 0.4.1 has better handling in terms of finding tifffile and fixes
a six related bug (amongst other things), go ahead and bump the PIMS
version and do some cleaning.

* Bump SciPy CI requirement to 0.19.1

Appears there are some issues with SciPy 0.18.1 from `defaults`. So this
bumps the version in the hopes that these issues were fixed in a
slightly later version of SciPy.
  • Loading branch information
jakirkham committed Aug 30, 2018
1 parent efbccac commit d81498a
Show file tree
Hide file tree
Showing 54 changed files with 5,747 additions and 15 deletions.
9 changes: 9 additions & 0 deletions .appveyor_support/environments/tst_py27.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .appveyor_support/environments/tst_py35.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .appveyor_support/environments/tst_py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .circleci/environments/tst_py27.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .circleci/environments/tst_py35.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .circleci/environments/tst_py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .travis_support/environments/tst_py27.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .travis_support/environments/tst_py35.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
9 changes: 9 additions & 0 deletions .travis_support/environments/tst_py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ dependencies:
- pip==18.0
- wheel==0.31.1
- coverage==4.5.1
- pytest==3.0.5
- dask==0.13.0
- numpy==1.11.3
- scipy==0.19.1
- scikit-image=0.12.3
- pims==0.4.1
- slicerator==0.9.8
- pip:
- slicerator==0.9.8
93 changes: 93 additions & 0 deletions dask_image/imread.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# -*- coding: utf-8 -*-


__author__ = """John Kirkham"""
__email__ = "kirkhamj@janelia.hhmi.org"


import itertools
import numbers
import warnings

import dask
import dask.array
import dask.delayed
import numpy
import pims


def imread(fname, nframes=1):
"""
Read image data into a Dask Array.
Provides a simple, fast mechanism to ingest image data into a
Dask Array.
Parameters
----------
fname : str
A glob like string that may match one or multiple filenames.
nframes : int, optional
Number of the frames to include in each chunk (default: 1).
Returns
-------
array : dask.array.Array
A Dask Array representing the contents of all image files.
"""

try:
irange = xrange
except NameError:
irange = range

try:
izip = itertools.izip
except AttributeError:
izip = zip

if not isinstance(nframes, numbers.Integral):
raise ValueError("`nframes` must be an integer.")
if (nframes != -1) and not (nframes > 0):
raise ValueError("`nframes` must be greater than zero.")

with pims.open(fname) as imgs:
shape = (len(imgs),) + imgs.frame_shape
dtype = numpy.dtype(imgs.pixel_type)

if nframes == -1:
nframes = shape[0]

if nframes > shape[0]:
warnings.warn(
"`nframes` larger than number of frames in file."
" Will truncate to number of frames in file.",
RuntimeWarning
)
elif shape[0] % nframes != 0:
warnings.warn(
"`nframes` does not nicely divide number of frames in file."
" Last chunk will contain the remainder.",
RuntimeWarning
)

def _read_frame(fn, i):
with pims.open(fn) as imgs:
return numpy.asanyarray(imgs[i])

lower_iter, upper_iter = itertools.tee(itertools.chain(
irange(0, shape[0], nframes),
[shape[0]]
))
next(upper_iter)

a = []
for i, j in izip(lower_iter, upper_iter):
a.append(dask.array.from_delayed(
dask.delayed(_read_frame)(fname, slice(i, j)),
(j - i,) + shape[1:],
dtype
))
a = dask.array.concatenate(a)

return a
61 changes: 61 additions & 0 deletions dask_image/ndfilters/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# -*- coding: utf-8 -*-

__author__ = """John Kirkham"""
__email__ = "kirkhamj@janelia.hhmi.org"


from ._conv import (
convolve,
correlate,
)
convolve.__module__ = __name__
correlate.__module__ = __name__

from ._diff import (
laplace,
)
laplace.__module__ = __name__


from ._edge import (
prewitt,
sobel,
)
prewitt.__module__ = __name__
sobel.__module__ = __name__


from ._gaussian import (
gaussian_filter,
gaussian_gradient_magnitude,
gaussian_laplace,
)
gaussian_filter.__module__ = __name__
gaussian_gradient_magnitude.__module__ = __name__
gaussian_laplace.__module__ = __name__


from ._generic import (
generic_filter,
)
generic_filter.__module__ = __name__


from ._order import (
minimum_filter,
median_filter,
maximum_filter,
rank_filter,
percentile_filter,
)
minimum_filter.__module__ = __name__
median_filter.__module__ = __name__
maximum_filter.__module__ = __name__
rank_filter.__module__ = __name__
percentile_filter.__module__ = __name__


from ._smooth import (
uniform_filter,
)
uniform_filter.__module__ = __name__
54 changes: 54 additions & 0 deletions dask_image/ndfilters/_conv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# -*- coding: utf-8 -*-


import scipy.ndimage.filters

from . import _utils


@_utils._update_wrapper(scipy.ndimage.filters.convolve)
def convolve(input,
weights,
mode='reflect',
cval=0.0,
origin=0):
origin = _utils._get_origin(weights.shape, origin)
depth = _utils._get_depth(weights.shape, origin)
depth, boundary = _utils._get_depth_boundary(input.ndim, depth, "none")

result = input.map_overlap(
scipy.ndimage.filters.convolve,
depth=depth,
boundary=boundary,
dtype=input.dtype,
weights=weights,
mode=mode,
cval=cval,
origin=origin
)

return result


@_utils._update_wrapper(scipy.ndimage.filters.correlate)
def correlate(input,
weights,
mode='reflect',
cval=0.0,
origin=0):
origin = _utils._get_origin(weights.shape, origin)
depth = _utils._get_depth(weights.shape, origin)
depth, boundary = _utils._get_depth_boundary(input.ndim, depth, "none")

result = input.map_overlap(
scipy.ndimage.filters.correlate,
depth=depth,
boundary=boundary,
dtype=input.dtype,
weights=weights,
mode=mode,
cval=cval,
origin=origin
)

return result
22 changes: 22 additions & 0 deletions dask_image/ndfilters/_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# -*- coding: utf-8 -*-


import numbers

import scipy.ndimage.filters

from . import _utils


@_utils._update_wrapper(scipy.ndimage.filters.laplace)
def laplace(input, mode='reflect', cval=0.0):
result = input.map_overlap(
scipy.ndimage.filters.laplace,
depth=(input.ndim * (1,)),
boundary="none",
dtype=input.dtype,
mode=mode,
cval=cval
)

return result

0 comments on commit d81498a

Please sign in to comment.