Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified code in lib.distances a bit #2048

Merged

Conversation

zemanj
Copy link
Member

@zemanj zemanj commented Aug 14, 2018

Fixes (partially) #2046, this is just the beginning of a series of related PRs.

Changes made in this Pull Request:

  • Renamed _box_check() -> _check_box() to be consistent with naming of other helper functions.
  • _check_box() now also returns the box in the format expected by low-level C functions in lib.c_distances, which reduces code duplications.
  • Removed now obsolete box type 'tri_vecs_bad' from _check_box().
  • Renamed _check_array() -> _check_coord_array() to better reflect its purpose.
  • Renamed _check_results_array() -> _check_result_array() since the checked parameter is always named result.
  • Moved coord array shape checks to separate functions.
  • _check_result_array() now returns a numpy array of correct shape and dtype if result is None, which simplifies the code of several functions.
  • Improved docstrings of transform_RtoS() and transform_StoR() and added input coordinate shape check. Adjusted test_augment.py accordingly.
  • Added shape check for input coordinates of calc_distance(), calc_angle(), and calc_dihedral().

Question:

  • Before I can continue to fix the rest of the docs, I'd like to hear further opinions on how to proceed with the requirements for the box arrays. Should the documented requirements for the box format be less strict or should we rather make the checks in _check_box() stricter?

PR Checklist

  • Tests? (Do we need additional tests? Additional tests needed)
  • Docs? (More to do)
  • CHANGELOG updated?
  • Issue raised/referenced?

* Renamed `_box_check()` -> `_check_box()` to be consistent with naming of other helper functions.
* `_check_box()` now also returns the box in the format expected by low-level C functions in `lib.c_distances`.
* Removed obsolete box type `'tri_vecs_bad'`.
* Renamed `_check_array()` -> `_check_coord_array()` to better reflect its purpose.
* Renamed `_check_results_array()` -> `_check_result_array()` since the checked parameter is always named `result`.
* `_check_result_array()` now returns a numpy array of correct shape and dtype if `result` is `None`.
* Improved docstrings of `transform_RtoS()` and `transform_StoR()` and added input coordinate shape check. Adjusted `test_augment.py` accordingly.
* Added shape check for input coordinates of `calc_distance()`, `clac_angle()`, and `calc_dihedral()`.
@codecov
Copy link

codecov bot commented Aug 14, 2018

Codecov Report

Merging #2048 into develop will increase coverage by 0.31%.
The diff coverage is 98.07%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2048      +/-   ##
===========================================
+ Coverage    88.93%   89.24%   +0.31%     
===========================================
  Files          144      159      +15     
  Lines        17490    18703    +1213     
  Branches      2693     2682      -11     
===========================================
+ Hits         15554    16691    +1137     
- Misses        1323     1407      +84     
+ Partials       613      605       -8
Impacted Files Coverage Δ
package/MDAnalysis/lib/distances.py 95.34% <97.56%> (+7.23%) ⬆️
package/MDAnalysis/lib/util.py 88.17% <98.64%> (+1.3%) ⬆️
package/MDAnalysis/analysis/waterdynamics.py 80.94% <0%> (-3.51%) ⬇️
package/MDAnalysis/lib/pkdtree.py 90.1% <0%> (-1.1%) ⬇️
package/MDAnalysis/core/selection.py 99.31% <0%> (-0.05%) ⬇️
package/MDAnalysis/analysis/rdf.py 100% <0%> (ø) ⬆️
package/MDAnalysis/analysis/data/filenames.py 100% <0%> (ø) ⬆️
formats/__init__.py 100% <0%> (ø)
__init__.py 91.89% <0%> (ø)
topology/base.py 97.67% <0%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2e22ff...f6d946e. Read the comment docs.

@richardjgowers
Copy link
Member

Looks like the Mac tests are failing on the last decimal place of precision. I think this is expected with a float32 triclinic box.

_check_coord_array_shape_2d(coords, desc)
if coords.dtype != np.float32:
raise TypeError("{0} must be of type numpy.float32".format(desc))
# The following two lines would break a lot of tests. WHY?!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What tests fail here? Is it that some coords arrays are often non C contiguous?

Copy link
Member Author

@zemanj zemanj Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore it, that's just a personal note I forgot to delete. Requiring coordinate arrays to be C-contiguous doesn't really make sense here since all the functions have calls to np.astype(...) which will return a contiguous copy anyway.

@richardjgowers
Copy link
Member

@zemanj these changes look good, thanks!

@richardjgowers
Copy link
Member

@zemanj ok if you get the precision on the macos tests working we can merge this. I'd rather chip away at 2046 in small parts to keep the PRs smaller. I added a WIP tag, remove it once you're happy for us to pull the trigger on this PR.

@zemanj
Copy link
Member Author

zemanj commented Aug 14, 2018

@richardjgowers Thanks for taking a look! I'll add some more tests to increase coverage tomorrow and decrease the precision in the tests for float32 to 6 decimals.

@zemanj
Copy link
Member Author

zemanj commented Aug 15, 2018

@orbeckst @richardjgowers I'm still unsure how we should handle the requirements for the box input.
AFAIK (correct me if I'm wrong) the docs of pretty much all functions in MDAnalysis state that the format of the box input must correspond to the format returned by MDAnalysis.coordinates.base.Timestep.dimensions.
However, under the hood, the requirements are often weaker, meaning that one can also supply arrays with shape (3,) for orthogonal boxes or with shape (3, 3) for triclinic ones.

There are two proper solutions to this discrepancy:

  1. Actually enforce correspondence of boxes with MDAnalysis.coordinates.base.Timestep.dimensions
  2. Change the docstrings to properly describe which box formats can be handled. In that case, one should also get rid of all requirements that can be handled automatically, meaning that any array_like box with compatible shape should be accepted as input and converted internally.

And two rather ugly "solutions":

  1. Leave it as is.
  2. Lower the requirements as suggested in solution 2 and leave the docs as they are.

My preference is definitely solution 1, since including a proper description of all possible box input formats in the docstrings would quickly get out of hand. Additionally, the overhead caused by _check_box() will might be reduced (not sure if that's an issue).

If other @MDAnalysis/coredevs have an opinion on that issue, please let me know!

@richardjgowers
Copy link
Member

@zemanj yeah let's tidy box inputs to the first option. If the docs are requesting the .dimensions format even better, as it's technically not an API break....

@orbeckst
Copy link
Member

orbeckst commented Aug 15, 2018 via email

@zemanj
Copy link
Member Author

zemanj commented Aug 16, 2018

Ok so I changed the requirements for boxes to correspond to the docs. This means:

  • The format has to be [lx, ly, lz, alpha, beta, gamma]
  • The container class and dtype is arbitrary and is converted to np.ndarray with dtype=np.float32 within _check_box().

I fixed the tests accordingly.

I also fiddled with some decorator tricks to outsource the coordinate array checking. That's what the new decorator MDAnalysis.lib.util.check_coords(*coord_names, **options) is good for.
This decorator also enables functions such as calc_dihedrals(), calc_bonds(), or apply_PBC() to accept single coordinate arrays and return single angles, distances, or coordinates, respectively.

The function calc_distance(), which has been added in PR #1939, therefore becomes somewhat obsolete since calc_bonds() now provides the same functionality. (The fact that calc_distance overwrites a previously imported low-level Cython function c_distances.calc_distance is not so pretty anyway...)

@richardjgowers richardjgowers merged commit 5eebb18 into MDAnalysis:develop Aug 18, 2018
@zemanj
Copy link
Member Author

zemanj commented Aug 18, 2018

@richardjgowers Thanks for the merge, CHANGELOG and some additional tests are still missing, though.

@zemanj zemanj deleted the issue-2046-lib-distances-rework branch August 20, 2018 06:10
zemanj added a commit to zemanj/mdanalysis that referenced this pull request Aug 20, 2018
richardjgowers pushed a commit that referenced this pull request Sep 7, 2018
* added missing CHANGELOG entry for PR #2048

* ensured `lib.distances.distance_array()` always returns an array
zemanj added a commit to zemanj/mdanalysis that referenced this pull request Sep 25, 2018
@zemanj zemanj mentioned this pull request Sep 26, 2018
4 tasks
richardjgowers pushed a commit that referenced this pull request Sep 29, 2018
* fix bug introduced in PR #2048

* allow for empty input coord arrays in lib.distances

* tests for empty input coord arrays in lib.distances

* fixed _bruteforce_capped() when no pairs are found; fixed _nsgrid_capped() for zero pseudobox size

* return type testing for all functions in lib.distances

* docs: more precise return type specifications

* ensured _nsgrid_capped*() always returns pairs of dtype=np.int64

* added comments and removed unnecessary return type conversion from *capped_distance()

* added out-of-box coordinate to TestOutputType tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants