-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differentiated dimension-types #113
Conversation
* Rename CrunchCube.all_dimensions to ._all_dimensions and move it to down to the implementation method section of the class. * Update all internal references. * Update references in tests.
CrunchCube.ca_dim_ind is not called by any other part of cr.cube or by current table-writer. Remove it as it will be made obsolete by other factoring.
CrunchCube.col_direction_axis is not called outside cr.cube.crunch_cube.
CrunchCube.data is used only internally to the CrunchCube class.
CrunchCube.flat_values is only used internally by CrunchCube and is not a required interface method.
CrunchCube.prune_indices is not used outside CrunchCube.
CrunchCube.valid_indices_with_selections is not required outside CrunchCube. Make it private.
* Define DIMENSION_TYPE enumeration, defining all distinct types, even though current Dimension.dimension_type cannot differentiate all of them yet. * Change each use of `str` dimension-type key in code and tests to actual DIMENSION_TYPE member. * Remove some cruft from Dimension.dimension_type to bring it consistent with current cube response form. * Update a legacy fixture to all now-required subtype-class.
Use pytest parameterized fixtures to gather test cases into a single test plus a parameterized fixture. This clarifies the test and greatly shortens the test file making it easier to understand and maintain.
Add integration tests to drive test-driven development of dimension collections objects, including the core aspect of dimension type discovery. Along the way: * Normalize CAT_X_CAT fixture to allow uniform access to result dict. * Fix unit test that depended on CAT_X_CAT being a shojified cube response.
* Move NewDimension methods into Dimension, replacing old implementations. * Update CrunchCube dimensions methods to use new dimension collections. * Update CrunchCube and Dimension tests as required. * Mark newly failing tests xfail for individual resolution in following commits.
With the new fully-differentiated dimension types, a CA dimension pair has a CA_CAT dimension where it used to have an (undifferentiated) CAT dimension.
Main axis of univariate CA is now CA_CAT type where before it was undifferentiated CAT type.
Move CrunchCube.dimensions integration test to test_crunch_cube.py and update it to reflect differentiated dimension type for CA_CAT.
* Canonicalize unpunctuated logical-univariate JSON fixture. * Move test to CrunchCube integration test module since it exercises CrunchCube methods. * Test requires DIMENSION_TYPE.LOGICAL instead of CATEGORICAL now that categorical dimension types are differentiated.
JSON fixture named LOGICAL_X_CAT turns out to have CAT_X_LOGICAL dimensions. Canonicalize and rename that JSON fixture and move test to cube integration test module since that's its entry point. Use LOGICAL dimension-type to match now differentiated category type.
CrunchCube.is_mr_selections() can now be replaced by `dimension.dimension_type == DT.MR_CAT`. Remove .is_mr_selections() and replace its usage with that snippet.
Differentiated dimension-types make this API property unnecessary and making selection types explicit in client code makes it easier to reason about.
CrunchCube.mr_selections_indices is unused and should not be required as an API property. Remove it.
Dimension.alias is no longer used. Remove it on YAGNI rationale.
Returning a mutable type on an @lazyproperty is dangerous because it permits the caller to change the contents which would be returned to the next caller. Make CrunchCube.dim_types return a tuple rather than a list.
c22bea3
to
c7073c9
Compare
collection. | ||
""" | ||
return tuple( | ||
d for d in self._all_dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d
seems a bit too short for me personally. I know that linter used to complain, but I didn't check it now. It's super minor, you don't have to fix it, but just wanted to point out that it might need to be fixed if we firm our position on code formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my rationale here is that the scope is so very, very small (one line) and the contents of ._all_dimensions
is clear.
We could go with dimension for dimension in self._all_dimensions
, but for whatever reason I find this expression clearer, like it would be in math "Let d represent a dimension". There's some tendency in functional programs to use very short names like these in a short scope (like 3 or 4 lines, no more), and I've been experimenting a little to see when I find they read better. I understand of course that readability is somewhat subjective at this level of detail.
I would suggest we let this one stand and see how we like it when returning back to it after a few weeks or whatever. It's difficult to distinguish 'different' from 'worse' at first, so if we age it a while and see what we think then we can make a change if we want.
This PR builds on the work done in the
refactor-dimension
branch.I recommend reviewing it commit-by-commit as it is pretty wide, but the commits are well-groomed (no later rework) and I expect much easier to follow one-by-one.
At a high level, it extracts dimension type from a str object like 'multiple_response' to an enumeration object like
DIMENSION_TYPE.MULTIPLE_RESPONSE
(commonlyDT.MR
in the code). The enumeration has distinct types forCA_CAT
,MR_CAT
, (regular)CAT
, andLOGICAL
, all of which used to report'categorical'
as their dimension type. This full differentiation allows several helper properties that were needed to distinguish those to go away, simplifying the interface and reducing the support burden as well as making the code more explicit and therefore easier to reason about.In addition, all the dimension-type discovery logic is now localized in a single place. As a result, several bugs were revealed and fixed. Many other improvements were made along the way to docstrings, naming, and tests, and this refactoring paves the way for the next round. The API for
CrunchCube
was also narrowed, which improves the API documentation and makes the library easier to learn for newcomers.All tests pass on every commit. New code (
AllDimensions
and downward) were developed strictly test-driven. TheNewDimension
object was a stub used for the TDD and then its (changed) signature and.dimension_type
property incorporated intoDimension
onceAllDimensions
was completed.