Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiated dimension-types #113

Merged
merged 41 commits into from
Nov 1, 2018
Merged

Conversation

scanny
Copy link
Contributor

@scanny scanny commented Oct 29, 2018

This PR builds on the work done in the refactor-dimension branch.

I recommend reviewing it commit-by-commit as it is pretty wide, but the commits are well-groomed (no later rework) and I expect much easier to follow one-by-one.

At a high level, it extracts dimension type from a str object like 'multiple_response' to an enumeration object like DIMENSION_TYPE.MULTIPLE_RESPONSE (commonly DT.MR in the code). The enumeration has distinct types for CA_CAT, MR_CAT, (regular) CAT, and LOGICAL, all of which used to report 'categorical' as their dimension type. This full differentiation allows several helper properties that were needed to distinguish those to go away, simplifying the interface and reducing the support burden as well as making the code more explicit and therefore easier to reason about.

In addition, all the dimension-type discovery logic is now localized in a single place. As a result, several bugs were revealed and fixed. Many other improvements were made along the way to docstrings, naming, and tests, and this refactoring paves the way for the next round. The API for CrunchCube was also narrowed, which improves the API documentation and makes the library easier to learn for newcomers.

All tests pass on every commit. New code (AllDimensions and downward) were developed strictly test-driven. The NewDimension object was a stub used for the TDD and then its (changed) signature and .dimension_type property incorporated into Dimension once AllDimensions was completed.

@coveralls
Copy link

coveralls commented Oct 29, 2018

Coverage Status

Coverage decreased (-0.02%) to 97.143% when pulling c7073c9 on differentiated-dimtypes into 926e156 on master.

* Rename CrunchCube.all_dimensions to ._all_dimensions and move it to
  down to the implementation method section of the class.

* Update all internal references.

* Update references in tests.
CrunchCube.ca_dim_ind is not called by any other part of cr.cube or by
current table-writer. Remove it as it will be made obsolete by other
factoring.
CrunchCube.col_direction_axis is not called outside cr.cube.crunch_cube.
CrunchCube.data is used only internally to the CrunchCube class.
CrunchCube.flat_values is only used internally by CrunchCube and is not
a required interface method.
CrunchCube.prune_indices is not used outside CrunchCube.
CrunchCube.valid_indices_with_selections is not required outside
CrunchCube. Make it private.
* Define DIMENSION_TYPE enumeration, defining all distinct types, even
  though current Dimension.dimension_type cannot differentiate all of
  them yet.

* Change each use of `str` dimension-type key in code and tests to
  actual DIMENSION_TYPE member.

* Remove some cruft from Dimension.dimension_type to bring it consistent
  with current cube response form.

* Update a legacy fixture to all now-required subtype-class.
Use pytest parameterized fixtures to gather test cases into a single
test plus a parameterized fixture. This clarifies the test and greatly
shortens the test file making it easier to understand and maintain.
Add integration tests to drive test-driven development of dimension
collections objects, including the core aspect of dimension type
discovery.

Along the way:

* Normalize CAT_X_CAT fixture to allow uniform access to result dict.

* Fix unit test that depended on CAT_X_CAT being a shojified cube response.
* Move NewDimension methods into Dimension, replacing old
  implementations.

* Update CrunchCube dimensions methods to use new dimension collections.

* Update CrunchCube and Dimension tests as required.

* Mark newly failing tests xfail for individual resolution in following
  commits.
With the new fully-differentiated dimension types, a CA dimension pair
has a CA_CAT dimension where it used to have an (undifferentiated) CAT
dimension.
Main axis of univariate CA is now CA_CAT type where before it was
undifferentiated CAT type.
Move CrunchCube.dimensions integration test to test_crunch_cube.py and
update it to reflect differentiated dimension type for CA_CAT.
* Canonicalize unpunctuated logical-univariate JSON fixture.

* Move test to CrunchCube integration test module since it exercises
  CrunchCube methods.

* Test requires DIMENSION_TYPE.LOGICAL instead of CATEGORICAL now that
  categorical dimension types are differentiated.
JSON fixture named LOGICAL_X_CAT turns out to have CAT_X_LOGICAL
dimensions. Canonicalize and rename that JSON fixture and move test to
cube integration test module since that's its entry point. Use LOGICAL
dimension-type to match now differentiated category type.
CrunchCube.is_mr_selections() can now be replaced by
`dimension.dimension_type == DT.MR_CAT`. Remove .is_mr_selections() and
replace its usage with that snippet.
Differentiated dimension-types make this API property unnecessary and
making selection types explicit in client code makes it easier to reason
about.
CrunchCube.mr_selections_indices is unused and should not be required as
an API property. Remove it.
Dimension.alias is no longer used. Remove it on YAGNI rationale.
Returning a mutable type on an @lazyproperty is dangerous because it
permits the caller to change the contents which would be returned to
the next caller. Make CrunchCube.dim_types return a tuple rather than
a list.
@scanny scanny changed the base branch from refactor-dimension to master October 30, 2018 19:02
collection.
"""
return tuple(
d for d in self._all_dimensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d seems a bit too short for me personally. I know that linter used to complain, but I didn't check it now. It's super minor, you don't have to fix it, but just wanted to point out that it might need to be fixed if we firm our position on code formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my rationale here is that the scope is so very, very small (one line) and the contents of ._all_dimensions is clear.

We could go with dimension for dimension in self._all_dimensions, but for whatever reason I find this expression clearer, like it would be in math "Let d represent a dimension". There's some tendency in functional programs to use very short names like these in a short scope (like 3 or 4 lines, no more), and I've been experimenting a little to see when I find they read better. I understand of course that readability is somewhat subjective at this level of detail.

I would suggest we let this one stand and see how we like it when returning back to it after a few weeks or whatever. It's difficult to distinguish 'different' from 'worse' at first, so if we age it a while and see what we think then we can make a change if we want.

@slobodan-ilic slobodan-ilic merged commit 4090b27 into master Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants