Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Migrate dask-image org's subpackages (#10)
* Try to import tifffile if possible First try to import `tifffile` directly. As this is a dependency of `pims` in most cases this should work. However some older versions of `pims` (particularly for Python 3.4) pull in `scikit-image` instead for some reason. So if `tifffile` can't be imported directly, try importing it from `scikit-image`'s `external` package. * Monkey patch pims to use the available tifffile This seems to occur on Python 3.4 as the pims 0.3.3 package on conda-forge pulls in scikit-image instead of tifffile, but seems to not be able to use it. To fix that we import any available tifffile and monkey patch pims only if it didn't find any tifffile. However there is a chance we failed to find tifffile as well. In that case this amounts to a no-op. * Ignore test coverage when scikit-image is missing Even though it is extremely unlikely that both `tifffile` and `scikit-image` will not be present, we would still like to gracefully handle this case. Instead of dropping the exception handling here because it is technically not covered in our testing, simply note to coverage that it should not bother with this case. * Rename `imread`'s `fn` to `fname` Changes the filename parameter of `imread` from `fn` to `fname`. This is done to be more consistent with SciPy and scikit-image, which both use `fname`. Though this differs from Dask, which uses `filename`. * Add a docstring for imread Provides some brief explanation of about how `imread` works and what it returns. * Test another negative number of frames Also test -2 as an argument to `nframes` to make sure it also fails. * Drop ValueError if `nframes` is -1 As having `nframes` be `-1` will mean having one chunk, we no longer should raise a `ValueError` when that value occurs. Also drop the test for a `ValueError` if `nframes` is `-1`. * Treat `nframes` equal to `-1` as one big chunk If the user provides `nframes` equal to `-1`, simply replace `nframes` with the number of frames in the image. In other words, the entire image will be loaded into memory for computation. This may be handy in image data that is spread out amongst many small image files. For cases like this, it may be completely reasonable to read the whole image into memory as IO may be more expensive than the memory consumed. Include tests to make sure that this value of `nframes` behaves as expected both in terms of being able to load the data and provide the expected number of chunks, namely `1`. * Drop test directory removal Windows seems to complain if we try to remove the test directory at the end of a test run. So simply drop the directory removal. * Add a note about our contributions to _compat These are all borrowed from Dask for compatibility purposes until we can move to a newer version of Dask. This will be possible once we drop Python 3.4 as a deployment target. * Add a note about our contributions to test__compat These are all borrowed from Dask for compatibility purposes until we can move to a newer version of Dask. This will be possible once we drop Python 3.4 as a deployment target. * Borrow our changes to fftfreq from Dask Handles an issue where the chunking of the resultant array may not match the chunking the user specified. Fixes this by simply rechunking if the chunks don't match at the end of the function. ref: dask/dask@6dc9e07 ref: dask/dask@1ddb383 * Backport fftfreq test change for handling chunks This is a backported change that we submitted to Dask to test different kinds of chunks that could be specified to `fftfreq`. The change to `fftfreq` has already been incorporated. So this just exercise it a bit more. ref: dask/dask@6a7a507 * Backports test changes for fftfreq with chunks Backports some changes we submitted to Dask to test that `fftfreq` would return the expected chunking in the resultant array. ref: dask/dask@fb72d81 * Test all Fourier filters preserve chunking Updates all of the Fourier filter test to also check that the resultant array has the same chunking as the input array. This is important to make sure that it is easy to do an inverse FFT without needing to rechunk first. * Test Fourier filters with a chunked `s` Just to make sure that chunking doesn't get messed up, add a test with a chunked `s`. Really this should effect anything as the Fourier filter is normally reduced along `s`. So any chunking of `s` should already be a non-issue by the time the Fourier filter is applied on the data. Still this is a good sanity check. * Add a test for different shapes and chunks Includes a test that tries different combinations of shapes and chunks with the Fourier filters. The purpose of this test is both to ensure the chunking remains unchanged and to ensure that the results remain accurate. Though as these operations are element-wise, chunking is unlikely to be a cause of inaccuracies unlike with other filters, which require an overlap. * Pull in our fftfreq simplifications from upstream Backports some changes we made to Dask's `fftfreq` into our `_compat` version of `fftfreq` to generally clean up the `fftfreq` implementation. Not only does it simplify the implementation, it seems to speed it up in some cases. Switched to using `map_blocks` and a custom Python function, which is applied over `arange` to implement `_fftfreq` internally. This makes the overall implementation cleaner and simpler. Also avoids needing various chunking related tricks. Plus it appears this goes a bit faster than the old implementation. How much faster seems to depend on the size of the frequency range and the size of the chunks. ref: dask/dask@2b2d9f9 * Drop unused `itertools` import in `_compat` * Drop unused `collections` import in `_utils` * Drop unused `numpy` import in `test__utils` * Drop unused dask.array.utils import in test__utils * Remove core module * Drop import of core module * Fix import test Now that core is removed we need to import the top-level module instead. * Add compatibility module stub Provides a compatibility module stub for containing various functions to ensure compatibility with the much older Dask 0.13.0. This is needed as this is the newest version of Dask for Python 3.4 on conda-forge. Once we drop Python 3.4, we can use a newer version of Dask and drop most if not all of this module's contents. * Vendor indices based on our Dask contribution This is just our code that we contributed to Dask. So there is no issue with us vendoring it here. That said, we use a BSD 3-Clause license just like Dask. So if there were any issue, we are basically using the same license as well. This is just borrowed from the same vendoring in `dask-ndfourier`. ref: dask-image/dask-ndfourier@039862e * Vendor _isnonzero_vec from our Dask contribution Not really of much use itself, `_isnonzero_vec` is necessary to help determine non-zero values in a Dask Array where type conversion to `bool` does not work (e.g. `str`s). This is our contribution to Dask so we are free to include it. * Vendor _isnonzero from our Dask contribution Not really of much use itself, `_isnonzero` is necessary to help determine non-zero values in Dask Arrays. This is our contribution to Dask so we are free to include it. * Workaround missing asarray in Dask 0.13.0 There is no asarray in Dask 0.13.0. So workaround this by coercing all non-Dask arrays (or non-arrays) into NumPy arrays and then Dask arrays. * Vendor argwhere from our Dask contribution Basically verbatim the same as what we submitted upstream to Dask for the `argwhere` function. Though in Dask 0.13.0 this is not a lazy implementation due to `compress` being non-lazy. Support for unknown dimensions in Dask 0.13.0 is in its nascent stages. So it may be difficult to get a lazy implementation that works ok there, but will give that a try. Fortunately newer versions of Dask (not yet released) should have this builtin. * Parameterize arg_where test * Restrict some tests to certain Dask versions Not all of the tests for the vendored compatibility functions can be run with all versions of Dask. This is due to some missing features and/or bugs in previous versions of Dask. As these tests were never designed to run on such an old version, we simply skip some old versions of Dask for some tests if they don't work. There is no point in trying to "fix" them as the code was designed with a newer version of Dask in mind. Also, at some point, we intend to drop the compatibility functions once a newer version of Dask is required. * Workaround reshaping constraint w.r.t. chunks Seems there are situations where reshaping use to require only 1 chunk. So this fixes the non-trivial test to have one chunk. * Replace compress with a lazy equivalent Makes use of `atop` to wrap up NumPy's `compress` and apply it to different blocks. Adjusts the chunks afterwards with a hack so as to note one of the dimensions is actually unknown. * Move array check to argwhere It is needed here to ensure that chunks is properly defined. Also by doing this conversion first, we are sure it is satisfied for the non-zero step as well. * Add a test for argwhere and NumPy arrays Provides a simple test to ensure argwhere converts NumPy arrays to Dask arrays internally. * Refactor out _asarray Similar in nature to Dask's own `asarray` function. Since it is not available in Dask 0.13.0, we add our own vendored copy based on code refactored out of `argwhere`. * Include some tests for _asarray Make sure that it correctly converts everything to a Dask Array. Try using a Python list, NumPy array, and Dask Array. Also make sure the final array has the same contents as the original. * Add another newline before author/date info * Make compatibility tests run with Python * Add a stub for some utility functions Creates stub module, `_utils`, for a few internal utility functions that we don't want to expose as part of the API. * Add _assert_eq_nan to compare arrays that have NaN As comparisons with NaN are false even if both values are NaN, using the `assert_eq` does not work correctly in this case. To fix it, we add this shim function around `assert_eq`. First we verify that they have the same NaN values using a duck type friendly strategy (comparing the arrays to themselves) and then comparing the masks of each array to the other. Second we zero out all the NaN values and compare the resulting arrays, which should now work as normal. This works just as well whether the arrays are from NumPy, Dask, or some other library given they support these basic functions. * Include tests for _assert_eq_nan Just provides some basic tests to make sure it can compare to arrays that are the same and find that they are equal. Also checks that when arrays shouldn't be equal that the function raises an `AssertionError`. Include arrays with and without NaN. Also includes different types of arrays to make sure it behaves ok. * Implement center_of_mass using Dask Arrays Provides a basic implementation of `center_of_mass` akin to SciPy ndimage's function except that it uses Dask Arrays instead of NumPy arrays. Also as SciPy's implementation seems to support multiple indices, we try to do the same here. SciPy uses a variety of lists and arrays in their result. We do not bother to do this here for a few reasons. First it makes it harder to manipulate the data. Second it makes evaluation of the result more cumbersome. Third the structure is not of particular value. * Include tests for center_of_mass Compares the result between SciPy's implementation and Dask's implementation. Some of the tests include just providing the array, including a label image, and including a label image and label indices. For label indices a variety of formats are tried including scalar, 1-D array, and various N-D arrays. These are important to test as SciPy allows all of them and formats the output differently depending on how the user provides them. Also we test the case where the label index provided is not present (results in `nan`s for the center of mass). * Test error cases from center_of_mass Only one error case raised currently. So we go ahead and test that error in this unit test. Though we could test other cases in this function in the future as well. * Rename some variable in center_of_mass Should make things a little clearer about what the different variables mean. * Shorten a long line in center_of_mass * Delay conversion of label matches to int * Rename input_ind* to input_i* Shortens the variable name and reduces the ambiguity with `index`, which shared an unfortunately similar name. * Use Dask Array's where instead of multiplying Ideally using `where` should be faster than casting an array of `bool`s to `int`s and multiplying them. Plus it should be more robust to things like `nan`, which multiplying would just propagate. * Find matching indices based on labels specified In an attempt to make it easier to refactor more code, mask out the indices of interest based on the masks for each selected label. Then use this in the computation of the weighted average to find the center of mass. * Rename input_mtch_ind_wt to input_i_mtch_wt Fits closer with the naming scheme used earlier in `center_of_mass`. Also makes the variable's purpose a little clearer. * Reorder two lines Appears better stylistically with code later in `center_of_mass`. * Add stub modules to hold utility functions Creating these modules to hold some useful internal functions to be refactored out. Should make it easier to maintain the API exposed functions. * Add arg normalizing function for labels and index Refactored from code in `center_of_mass`, the `_norm_input_labels_index` function handles the normalization of `input`, `labels`, and `index` arguments. As these particular arguments will show up repeatedly in the API, it will be very helpful to have normalize them in the general case so the API functions can be focused on their particular computations. * Use arg normalizing function in center_of_mass As this normalization code has been refactored out into an independent internal utility function, simply make use of it to implement `center_of_mass`. * Add tests for argument normalization function Exercise a failing and passing case in two tests. * Adds a function to get label matches Provides a simple refactored internal utility function that uses the input array, label array, and selected indices to find masks of selected labels in order and use the masks to selected relevant regions in the input data. Both the masks and selected input data are returned to the caller. * Tests the label matching function Provides a parameterized test to exercise the comparison of selected indices to the label image and the application of these masks to the input data. Tries different formulation of indices. Includes a NumPy variant that it compares the results of to our Dask Array implementation. * Make use of label matching function Clean up `center_of_mass` to make use of `_get_label_matches`. This should help reduce code duplication and keep `center_of_mass` focused on the relevant computation. * Drop unused imports * Add stub modules for Python 2/3 compatibility Should provide the bare minimum code to smooth over the differences between Python 2 and Python 3. Also should help keep their performance comparable. * Store iterable range as irange Basically pick up `xrange` on Python 2 and `range` on Python 3. So as not to override a builtin name, store the result to `irange`. This also makes it clearer how this `range` will behave. Plus it avoids using the remove `xrange` syntax in Python 3. * Add a basic test for irange Make sure `irange` is there, it doesn't return a list, and it acts like `range` on some test arguments. * Make use of irange in place of range Using `irange` to get an iterable should have some performance improvements on Python 2 as much of the code assumed `range` was an iterable. So replacing it with an alias to `xrange` should help Python 2 out. Though there is no expected improvement for Python 3 where `range` was already an iterable. * Vendor/refactor lazy implementation of compress This is needed for some other use cases that `dask-ndmeasure` is trying to cover. Filtering out labels that are not being used is essential in these cases and replacing them with 0 simply won't work. Though it should be able to speed up cases where we did replace values with 0. While `compress` will be included in newer (not yet released) versions of Dask, it is not present in any released version and it is especially not in Dask 0.13.0, which is needed for Python 3.4 support. So for now we include a simple implementation of `compress` that is based on code in the vendored `argwhere` with a few tweaks and validation checks. * Include some tests for compress Covers various error cases that may occur if preconditions are not met. Also tests a few different correct cases to validate correctness against the NumPy implementation. * Use compress in argwhere implementation * Simplify the label matches function Focus only creating a mask of selected labels and none of the other products that this function was creating before. * Update the label matching tests * Update center of mass w.r.t. label matching Include some more specialized code back into the center of mass computation and update the computation to use the changed label matching function. * Move index transpose into center_of_mass The index transposing behavior (i.e. the selected labels) appears to be unique to the `center_of_mass` computation or at least it is not present in `sum`. So this moves that tweak into the `center_of_mass` computation. That way it doesn't contaminate other computations that do not need these. Further it allows the argument normalization function to be reused repeatedly without ill effect. * Add sum function for label images Provides a basic implementation for summing over values matching labels of interest. Similar to the SciPy ndimage function of the same name. * Test the sum implementation Make sure this matches up with SciPy's implementation for a variety of different input parameters. Borrows from the `center_of_mass` tests to create these tests. * Rewrite center_of_mass using sum As the behavior in `center_of_mass` can basically be seen as computing a few arrays and summing them, just simplify the summing portions by reusing our `sum` function for them. This significantly reduces the complexity of the `center_of_mass` function. * Add mean Provides an implementation of `mean` akin to SciPy ndimage's `mean` for use with label images. * Include tests for mean Provides some tests for our `mean` implementation that compare it to SciPy's implementation. Tries with a variety of parameters to make sure it behaves ok. Borrows from the testing used for `sum`. * Add an implementation of variance Provides a Dask Array implementation of variance, which is designed to be similar to SciPy's ndimage implementation. Simply computes the variance over each label. Reuses our mean implementation to construct this easily. * Include tests for variance Simply exercises our implementation of variance and compares it against the implementation included in SciPy. Largely borrows these tests from those used for sum. * Add standard_deviation Provides a Dask Array implementation of SciPy ndimage's `standard_deviation` function. Computes the standard deviation over the selected labels in the array. Leverages our `variance` implementation to make this work easily. * Includes test of standard_deviation Simply exercises our `standard_deviation` implementation for Dask Arrays and compares the results with those from SciPy ndimage. The tests are borrowed from other similar tests that we have for API functions. * Parameterize `center_of_mass` test Configure this test with the ability to parameterize the function being tested. This will help consolidate other similar tests into this one to avoid code duplication and bit rot. * Consolidate all sum-based property tests * Test more functions for error cases Simply iterates over all the functions currently in the API to see if they all correctly raise the same error. * Pass Dask Array of labels to Dask function This was turned into a Dask Array before anyways, but it makes more sense to use the Dask Array that we created for the label image. * Rename variable in label match computation test These names are holdovers from when the test was just for the center of mass computation. However now that doesn't make sense now that covers many different computations. So this renames the variables to be a bit more general in this context. * Handle scalar values in _assert_eq_nan Add some special handling in `_assert_eq_nan` to handle having scalar values passed in. Basically ensure that everything provided is an array. This is a no-op for arrays, but converts scalars into 0-D arrays. By doing this, we are able to use the same `nan` handling code. Also convert the 0-D arrays back to scalars. This means a 0-D array will be treated as a scalar in the end. However Dask doesn't really have a way to differentiate the two. So this is fine. * Test _assert_eq_nan with scalars and 0-D arrays Make sure that the fix included works for scalar and 0-D arrays both from NumPy and Dask. * Add _ravel_shape_indices Provides a utility function that computes `indices` for the raveled case, but ensure that it is shaped and chunked so as to match the `input` array provided. * Test _ravel_shape_indices Provide some simple tests that verify this matches what one would expect from the intended operation with NumPy. * Fix test label array construction Was using Dask Array's in the construction of the label array, which works on old versions of Dask, but not newer versions. As we really want to use NumPy arrays in the construction of all test data and that will fix the problem, go ahead and switch the label array construction to use NumPy arrays exclusively. * Fix testing of empty arrays with compress Before the test data generated was being sliced if `axis` was specified. However this makes no sense if the test data has an empty dimension as it may mean slicing along the 0 length axis. Even if slicing doesn't occur along the 0 length axis, we already know the result will be a length 0 1-D array. We can get to this result much faster by simply flattening the array instead of slicing. Hence we use this strategy to fix the tests with newer versions of Dask. * Add labeled_comprehension Provides an implementation of `labeled_comprehension` for use with Dask Arrays. Mimics the implementation from SciPy ndimage. Should be possible to rewrite everything in terms of `labeled_comprehension` and make use of it in implementations of other computations that have not yet been added. Also this should make it easy to support all other manner of computations over label images that are not already (or will not be) covered by the API. * Include tests for labeled_comprehension Simply compare the results from our implementation of `labeled_comprehension` with Dask Arrays to that of SciPy ndimage's. Tries a variety of different parameters to ensure the original behavior is preserved. * Handle `labeled_comprehension` comparison directly As Dask Arrays using delayed appear to be invalid according to its own array comparison test (particularly in Dask 0.14.0+), simply handle the comparison of the two results ourselves. This way we can be sure that the two results are treated correctly regardless of which version of Dask is in use. * Rework core tests without `assert_eq` comparisons As newer versions of Dask appear to have issues using `assert_eq` on `delayed`-based Dask Arrays, `assert_eq` cannot be used anywhere `delayed` might be used. Since `labeled_comprehension` will be needed to implement some new functionality and it makes use of `delayed`, `assert_eq` cannot be used in `test_core`. So we replace all usages of `assert_eq` and related functions with our own direct comparisons. * Add minimum function for label images Computes the smallest value within each label. * Run parameterized tests on minimum as well Provides simple tests for errors and comparisons between our Dask Array implementations and SciPy. * Add maximum function for label images Computes the largest value within each label. * Run parameterized tests on maximum as well Provides simple tests for errors and comparisons between our Dask Array implementations and SciPy. * Add maximum_position Provides a Dask Array implementation of `maximum_position`, which behaves similarly to SciPy ndimage's `maximum_position`. * Test maximum_position Simply add `maximum_position` to our parameterized test to compare results with SciPy ndimage's implementation. * Add minimum_position Provides a Dask Array implementation of `minimum_position`, which behaves similarly to SciPy ndimage's `minimum_position`. * Test minimum_position Simply add `minimum_position` to our parameterized test to compare results with SciPy ndimage's implementation. * Add extrema Provides a Dask Array implementation of `extrema`, which behaves similarly to SciPy ndimage's `extrema`. * Test extrema Simply add `extrema` to our parameterized test to compare results with SciPy ndimage's implementation. * Rewrite of minimum that uses nanmin This basically is a clever trick to avoid using `labeled_comprehension` in the computation of `minimum`. It works by converting the data to `float64` and placing `nan` over all unneeded values. This way `nanmin` can be run and all of the `nan` values will be ignored (unless there are only `nan` values). Using info about which labels could be computed, `where` is run over the final result to replace those values that could not be computed with `0`. The result is then cast back to its original type. This change up should allow the computation to proceed faster than it would by going through `delayed` in `labeled_comprehension`. * Rewrite of maximum that uses nanmax This basically is a clever trick to avoid using `labeled_comprehension` in the computation of `maximum`. It works by converting the data to `float64` and placing `nan` over all unneeded values. This way `nanmax` can be run and all of the `nan` values will be ignored (unless there are only `nan` values). Using info about which labels could be computed, `where` is run over the final result to replace those values that could not be computed with `0`. The result is then cast back to its original type. This change up should allow the computation to proceed faster than it would by going through `delayed` in `labeled_comprehension`. * Add a 3-D test case to the mix It is pretty easy to mess things up, but still get the right result in the 2-D test case (e.g. indexing by 1 or -1 is the same in 2-D). Having a 3-D test case breaks symmetry for some cases, which makes it a bit easier to check if the code is doing the right thing. * Tweak ordering in labeled_comprehension Compute the label index matches before packing the arguments for use with the user function. * labeled_comprehension: `indices` -> `positions` Rename `indices` to `positions` in `labeled_comprehension` to match more closely with SciPy's terminology. Plus it clashes with `index` less. So should avoid some confusion on that front. * Unwrap line in labeled_comprehension Fits on one line now (with PEP8 constraints). Makes the code more compact so more fits on a page. * Name all arguments in _labeled_comprehension_func As the user specified function, `func`, needs to take a particular argument structure that is known, there is no need to leave this so vague in `_labeled_comprehension_func`. Doing this does mean there is an additional check each time the function is run. However having the arguments specified will help us remove things like `compute`, which should speed up the computation on the Dask end. * Drop compute from _labeled_comprehension_func There is no need to separately check whether the computation should proceed or not. This information should already be known based simply on whether the input array selection is non-empty. Thus we can avoid a computation on the Dask end and the comprehension can proceed just the same. * labeled_comprehension: Get position before `for` Handle the check as to whether to compute or include the position before entering the label comprehension `for`-loop. This not only avoids repeating this check inside the `for`-loop, but it avoids constructing the raveled indices if the won't be used. Should be more efficient when the position is not needed in the user function. * Fused delayed steps in labeled_comprehension Instead of calling `dask.delayed` on the function and storing it before converting it to a Dask Array (and storing it again), simply pass the result of `dask.delayed` directly into `dask.array.from_delayed` before storing it in the NumPy object array. This cuts down on the retrieval/stores to NumPy object array and should help streamline things a bit. May want to consider refactoring this into `_utils` a bit to cutdown on the visual clutter. * Refactor out labeled_comprehension delay handling Moves the utility functions involved in wrapping up the user provided reduction function for use with Dask into `_utils`. This cleans up the code in `labeled_comprehension` a bit. Also it avoids some small overhead incurred in delaying the utility function. * Test more functions with 3-D cases In particular, tests `extrema` and `labeled_comprehension` with 3-D test data. Should help validate that these function are really behaving well in N-D cases. * Avoid coercing out_dtype in comprehension wrappers The `out_dtype` argument should already have a proper dtype due to the similar coercing in `labeled_comprehension`. So simiply use `out_dtype` as if it is correct. * Skip converting default in comprehension wrapper This conversion of `default` to the expected type, `out_dtype`, already happens in `labeled_comprehension`. So there is no need to perform this conversion a second time in the wrapper. Simply use the value and return it. * Skip storing result in comprehension wrapper No need to store the Dask Array result as it is merely going to be returned immediately afterwards. So simply return the result as soon as `from_delayed` finishes. * Coerce default using out_dtype in comprehensions Simply use `out_dtype`'s `type` to coerce `default` to the expected return type. This is much simpler than going through a NumPy array to create the scalar. * Construct index ranges once in comprehensions The `range`s were being constructed each time the `for`-loop went through an iteration previously. By moving the construction of the `range`s out of the `for`-loop each `range` is constructed once. Then only the product of `range`s need be constructed on entering the `for`-loop (as this dependent on how many `range`s are used). Also reuse these constructed `range`s in the first `for`-loop as well. * Store each indexed label match in comprehensions To avoid having to potentially index the label match twice during graph generation (like when the user function needs values and positions), store each index selection temporarily in the `for`-loop so as to reuse it. * Rewrite minimum to use labeled_comprehension Based on recent optimizations to `labeled_comprehension`, it appears that finding the minimum with `labeled_comprehension` is faster than trying to the original implementation of `minimum`. So simply rewrite `minimum` to use `labeled_comprehension` instead. This will also improve the test coverage of the `labeled_comprehension` code path. * Rewrite maximum to use labeled_comprehension Based on recent optimizations to `labeled_comprehension`, it appears that finding the maximum with `labeled_comprehension` is faster than trying to the original implementation of `maximum`. So simply rewrite `maximum` to use `labeled_comprehension` instead. This will also improve the test coverage of the `labeled_comprehension` code path. * Add median Provides a Dask Array implementation of `median`, which behaves similarly to SciPy ndimage's `median`. * Test median Simply add `median` to our parameterized test to compare results with SciPy ndimage's implementation. * Use float64 in median always Seems SciPy's `median` always uses `float64`. So we switch over as well. * Return nan for unknown values with median This is more consistent with `median` normally behaves in NumPy. Doing this even though SciPy is a bit inconsistent when it comes to return a value for a missing label. * Add histogram Provides a Dask Array implementation of `histogram`, which behaves similarly to SciPy ndimage's `histogram`. * Test histogram Simply add `histogram` to our parameterized test to compare results with SciPy ndimage's implementation. * Add label Provides a Dask Array implementation of `label`, which behaves similarly to SciPy ndimage's `label`. This implementation is restricted to 1 chunk ATM. Even though this would effectively be enforced by using `delayed`, it is nice to require this of the user outright so that they know what will happen. * Test label Simply add `label` to our parameterized test to compare results with SciPy ndimage's implementation. * Simplify maximum and minimum position functions After performing `where` calls to get the maximum and minimum positions over labels respectively, convert the results back to the same type used for `indices` instead of specifying the integral type independently. * Use int in argwhere, minimum, and maximum position This ends up matching behavior on Windows better. Plus the behavior is no different on Unixes. * Drop unneeded vendored functions The vendored functions `_argwhere` and `_compress` were expected to be useful in writing this library early on. However it seems they are not needed after all. So this strips them from the compatibility module and associated tests. * Use Python int in center_of_mass * Use Python int in maximum_position * Use Python int in minimum_position * Use Python int in labeled_comprehension * Use Python int in histogram * Use Python int for default labels and index * Fix expected array type for default index * Use intp in center_of_mass * Use intp in maximum_position * Use intp in minimum_position * Fix expected array type for labels w/default index * Revert use of intp On Windows, this seems to only work correctly for Python 2.7 64-bit, but fails everywhere else. As such, it would be better to have mostly correct behavior on a wide range of architectures and Pythons (including Python 3) than to fix only Python 2.7 64-bit. Hence we revert the use of `intp`. * Soften type constraints tests Unfortunately it is too difficult to match type behavior on Windows without making a mess of the code. So simply relax type checking in the tests to workaround these issues. * Drop unused functools import from _compat * Drop unused numbers import from _compat * Drop unneeded _test_utils import from test_core * Drop _test_utils module As we are no longer making use of `_assert_eq_nan` and nothing else is present in the `_test_utils` module, simply drop the module and associated tests. * Raise FutureWarning for index with rank above 1 If the `index` has a rank greater than 1 (e.g. a matrix), SciPy ndimage measurements' functions often handle these in some fashion. Though they don't always handle this case nor do they handle it consistently. We have tried to mimic their behavior here, but it is sometimes a little strange. So as to discourage users from making use of this behavior, we raise a `FutureWarning` to point out that it is subject to change. * Test warning for index of rank higher than 1 Provides a simple test to verify that the `FutureWarning` is raised with an `index` of rank higher than 1 (e.g. a matrix) is provided. Also ensure that no warning is raised when an `index` of rank 1 or lower is passed. * Simplify handling of default labels and index Avoid performing extra computations on the default cases of `labels` and `index` by simply filling them with their default values. Also most SciPy documentation mentions that `labels` defaults to the whole image. We have been restricting it to the non-zero portion, but that is technically incorrect. Changing it to be the whole image hasn't had any negative effects before, but we go ahead and do it now to match SciPy's docs more accurately. * Use Dask Array form of index in utils tests Make sure to use the Dask Array form of `index` in the `utils` tests. * Drop some unneeded code from utils tests This code was used to compare results of the `_get_label_matches` function. However that function no longer returns these other values. So there is no need to generate something similar in the tests as the comparison is not being made. * Wrap long line in the docs * Freshen up API documentation Try to fix up the API documentation to be more consistent with how the Dask-based functions are written. Mainly this means pointing out things are arrays. Though it also means streamlining documentation in terms of how common default parameters are handled and how the results are explained. Documentation is aimed to be clear, but also brief to avoid redundancy or the potential of confusion. * Update some doc API headings Missed fixing these to match the new API doc format when refreshing the docs. So this tidies them up after the fact. * Fix min and max position unraveling Appears the remainder was only getting the singleton dimension added to it. However the original positions should have gotten a singleton dimension as well. * Rewrite maximum_position to use argmax Makes use of `labeled_comprehension` to rewrite `maximum_position` around `argmax`. This way there is no need to compute `maximum` first to find `maximum_position`. Also this removes a bunch of other steps that `maximum_position` had added in previously to reuse `maximum` to get the final result. The net result should be a cleaner implementation that is easier to understand and maintain. Plus this ends up being ~33% faster than the original implementation. * Rewrite minimum_position to use argmin Makes use of `labeled_comprehension` to rewrite `minimum_position` around `argmin`. This way there is no need to compute `minimum` first to find `minimum_position`. Also this removes a bunch of other steps that `minimum_position` had added in previously to reuse `minimum` to get the final result. The net result should be a cleaner implementation that is easier to understand and maintain. Plus this ends up being ~33% faster than the original implementation. * Drop now unused operator import * Use an iterable range in core tests Was using `range` on both Python 2 and Python 3. The problem being that on Python 2 this allocates a `list` with the contents of everything included by `range`'s bounds and step size. To fix this, simply use a Python 2/3 compatibility trick to get an iterable range on Python 2. Behavior on Python 3 is unchanged (excepting an alias). * Use labeled_comprehension in sum Rewrites our `sum` implementation to make use of `labeled_comprehension` for performing the computation. This should provide some more coverage of `labeled_comprehension` and give us more confidence in using it in other cases. Also as masks are used to select out the data of interest, we are able to avoid performing computations on values that are not of interest (i.e. not in the mask). As such we are able to achieve a not insignificant speedup (~40%). * Update __init__.py label works correctly with multiple chunks, but just consolidates the image before running the computation * Update __init__.py fixing message * Update test_core.py fixing label test since chunk limit has been removed * Bump PIMS requirement to 0.4.1 As PIMS 0.4.1 has better handling in terms of finding tifffile and fixes a six related bug (amongst other things), go ahead and bump the PIMS version and do some cleaning. * Bump SciPy CI requirement to 0.19.1 Appears there are some issues with SciPy 0.18.1 from `defaults`. So this bumps the version in the hopes that these issues were fixed in a slightly later version of SciPy.
- Loading branch information