Differentiated dimension-types #113

scanny · 2018-10-29T19:18:25Z

This PR builds on the work done in the refactor-dimension branch.

I recommend reviewing it commit-by-commit as it is pretty wide, but the commits are well-groomed (no later rework) and I expect much easier to follow one-by-one.

At a high level, it extracts dimension type from a str object like 'multiple_response' to an enumeration object like DIMENSION_TYPE.MULTIPLE_RESPONSE (commonly DT.MR in the code). The enumeration has distinct types for CA_CAT, MR_CAT, (regular) CAT, and LOGICAL, all of which used to report 'categorical' as their dimension type. This full differentiation allows several helper properties that were needed to distinguish those to go away, simplifying the interface and reducing the support burden as well as making the code more explicit and therefore easier to reason about.

In addition, all the dimension-type discovery logic is now localized in a single place. As a result, several bugs were revealed and fixed. Many other improvements were made along the way to docstrings, naming, and tests, and this refactoring paves the way for the next round. The API for CrunchCube was also narrowed, which improves the API documentation and makes the library easier to learn for newcomers.

All tests pass on every commit. New code (AllDimensions and downward) were developed strictly test-driven. The NewDimension object was a stub used for the TDD and then its (changed) signature and .dimension_type property incorporated into Dimension once AllDimensions was completed.

coveralls · 2018-10-29T19:21:41Z

Coverage decreased (-0.02%) to 97.143% when pulling c7073c9 on differentiated-dimtypes into 926e156 on master.

* Rename CrunchCube.all_dimensions to ._all_dimensions and move it to down to the implementation method section of the class. * Update all internal references. * Update references in tests.

CrunchCube.ca_dim_ind is not called by any other part of cr.cube or by current table-writer. Remove it as it will be made obsolete by other factoring.

CrunchCube.col_direction_axis is not called outside cr.cube.crunch_cube.

CrunchCube.data is used only internally to the CrunchCube class.

CrunchCube.flat_values is only used internally by CrunchCube and is not a required interface method.

CrunchCube.prune_indices is not used outside CrunchCube.

CrunchCube.valid_indices_with_selections is not required outside CrunchCube. Make it private.

* Define DIMENSION_TYPE enumeration, defining all distinct types, even though current Dimension.dimension_type cannot differentiate all of them yet. * Change each use of `str` dimension-type key in code and tests to actual DIMENSION_TYPE member. * Remove some cruft from Dimension.dimension_type to bring it consistent with current cube response form. * Update a legacy fixture to all now-required subtype-class.

Use pytest parameterized fixtures to gather test cases into a single test plus a parameterized fixture. This clarifies the test and greatly shortens the test file making it easier to understand and maintain.

Add integration tests to drive test-driven development of dimension collections objects, including the core aspect of dimension type discovery. Along the way: * Normalize CAT_X_CAT fixture to allow uniform access to result dict. * Fix unit test that depended on CAT_X_CAT being a shojified cube response.

* Move NewDimension methods into Dimension, replacing old implementations. * Update CrunchCube dimensions methods to use new dimension collections. * Update CrunchCube and Dimension tests as required. * Mark newly failing tests xfail for individual resolution in following commits.

With the new fully-differentiated dimension types, a CA dimension pair has a CA_CAT dimension where it used to have an (undifferentiated) CAT dimension.

Main axis of univariate CA is now CA_CAT type where before it was undifferentiated CAT type.

Move CrunchCube.dimensions integration test to test_crunch_cube.py and update it to reflect differentiated dimension type for CA_CAT.

* Canonicalize unpunctuated logical-univariate JSON fixture. * Move test to CrunchCube integration test module since it exercises CrunchCube methods. * Test requires DIMENSION_TYPE.LOGICAL instead of CATEGORICAL now that categorical dimension types are differentiated.

JSON fixture named LOGICAL_X_CAT turns out to have CAT_X_LOGICAL dimensions. Canonicalize and rename that JSON fixture and move test to cube integration test module since that's its entry point. Use LOGICAL dimension-type to match now differentiated category type.

CrunchCube.is_mr_selections() can now be replaced by `dimension.dimension_type == DT.MR_CAT`. Remove .is_mr_selections() and replace its usage with that snippet.

Differentiated dimension-types make this API property unnecessary and making selection types explicit in client code makes it easier to reason about.

CrunchCube.mr_selections_indices is unused and should not be required as an API property. Remove it.

Dimension.alias is no longer used. Remove it on YAGNI rationale.

@lazyproperty

Returning a mutable type on an @lazyproperty is dangerous because it permits the caller to change the contents which would be returned to the next caller. Make CrunchCube.dim_types return a tuple rather than a list.

slobodan-ilic · 2018-11-01T14:23:26Z

src/cr/cube/dimension.py

+        collection.
+        """
+        return tuple(
+            d for d in self._all_dimensions


d seems a bit too short for me personally. I know that linter used to complain, but I didn't check it now. It's super minor, you don't have to fix it, but just wanted to point out that it might need to be fixed if we firm our position on code formatting.

Well, my rationale here is that the scope is so very, very small (one line) and the contents of ._all_dimensions is clear.

We could go with dimension for dimension in self._all_dimensions, but for whatever reason I find this expression clearer, like it would be in math "Let d represent a dimension". There's some tendency in functional programs to use very short names like these in a short scope (like 3 or 4 lines, no more), and I've been experimenting a little to see when I find they read better. I understand of course that readability is somewhat subjective at this level of detail.

I would suggest we let this one stand and see how we like it when returning back to it after a few weeks or whatever. It's difficult to distinguish 'different' from 'worse' at first, so if we age it a while and see what we think then we can make a change if we want.

scanny requested review from slobodan-ilic and percious October 29, 2018 19:18

scanny added 27 commits October 30, 2018 12:00

rfctr: improve deprecation warning

c38a066

rfctr: normalize ordering of new CubeSlice methods

7e2fad6

cube: make CrunchCube._all_dimensions private

828549e

* Rename CrunchCube.all_dimensions to ._all_dimensions and move it to down to the implementation method section of the class. * Update all internal references. * Update references in tests.

cube: remove dead CrunchCube.ca_dim_ind

fbe683a

CrunchCube.ca_dim_ind is not called by any other part of cr.cube or by current table-writer. Remove it as it will be made obsolete by other factoring.

cube: make CrunchCube.col_direction_axis private

e060402

CrunchCube.col_direction_axis is not called outside cr.cube.crunch_cube.

cube: make CrunchCube.data private

8a49d08

CrunchCube.data is used only internally to the CrunchCube class.

cube: make CrunchCube.flat_values private

1978f4c

CrunchCube.flat_values is only used internally by CrunchCube and is not a required interface method.

cube: make CrunchCube.prune_indices private

afe5478

CrunchCube.prune_indices is not used outside CrunchCube.

cube: make .valid_indices_with_selections private

1b2afe4

CrunchCube.valid_indices_with_selections is not required outside CrunchCube. Make it private.

cube: parameterize CrunchCube unit tests

10a93ad

Use pytest parameterized fixtures to gather test cases into a single test plus a parameterized fixture. This clarifies the test and greatly shortens the test file making it easier to understand and maintain.

dim: add sequence behaviors to _BaseDimensions

94a0ebc

dim: add AllDimensions._dimensions

4b9f479

dim: add _DimensionFactory.iter_dimensions()

6425eb9

dim: add _DimensionFactory._iter_dimensions()

7538608

dim: add _DimensionFactory._raw_dimensions

54b476e

dim: add _RawDimension.dimension_dict

f0d08d5

dim: add _RawDimension.dimension_type

fc065fb

dim: add _RawDimension._base_type

3c76720

dim: add _RawDimension._resolve_categorical()

252466f

dim: add _RawDimension._is_array_cat

e036050

dim: add _RawDimension._has_selected_category

b8bc449

dim: reimplement Dimension.dimension_type

7cc9814

dim: add _RawDimension._resolve_array_type()

eea9c34

dim: add _RawDimension._next_raw_dimension

6ad3053

dim: add _RawDimension._alias

903dcce

scanny added 14 commits October 30, 2018 12:00

dim: add AllDimensions.apparent_dimensions

fc5f12d

dim: add _ApparentDimensions._dimensions

355cb48

cube: remimplement CrunchCube.is_univariate_ca

821a37d

With the new fully-differentiated dimension types, a CA dimension pair has a CA_CAT dimension where it used to have an (undifferentiated) CAT dimension.

cube: reimplement CrunchCube.univariate_ca_main_axis

b671c3d

Main axis of univariate CA is now CA_CAT type where before it was undifferentiated CAT type.

test: rework CrunchCube.dimensions integration test

d8db2b5

Move CrunchCube.dimensions integration test to test_crunch_cube.py and update it to reflect differentiated dimension type for CA_CAT.

rfctr: remove CrunchCube.is_mr_selections()

6ed25b1

CrunchCube.is_mr_selections() can now be replaced by `dimension.dimension_type == DT.MR_CAT`. Remove .is_mr_selections() and replace its usage with that snippet.

rfctr: remove Dimension.is_selections

34aa402

Differentiated dimension-types make this API property unnecessary and making selection types explicit in client code makes it easier to reason about.

rfctr: remove CrunchCube.mr_selections_indices

b3d883d

CrunchCube.mr_selections_indices is unused and should not be required as an API property. Remove it.

rfctr: remove Dimension.alias

4f3461b

Dimension.alias is no longer used. Remove it on YAGNI rationale.

cube: make CrunchCube.dim_types immutable

b374ec1

Returning a mutable type on an @lazyproperty is dangerous because it permits the caller to change the contents which would be returned to the next caller. Make CrunchCube.dim_types return a tuple rather than a list.

rfctr: improve docstrings and naming

c7073c9

scanny force-pushed the differentiated-dimtypes branch from c22bea3 to c7073c9 Compare October 30, 2018 19:01

scanny changed the base branch from refactor-dimension to master October 30, 2018 19:02

slobodan-ilic reviewed Nov 1, 2018

View reviewed changes

slobodan-ilic approved these changes Nov 1, 2018

View reviewed changes

slobodan-ilic merged commit 4090b27 into master Nov 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differentiated dimension-types #113

Differentiated dimension-types #113

scanny commented Oct 29, 2018 •

edited

Loading

coveralls commented Oct 29, 2018 •

edited

Loading

slobodan-ilic Nov 1, 2018

scanny Nov 1, 2018

Differentiated dimension-types #113

Differentiated dimension-types #113

Conversation

scanny commented Oct 29, 2018 • edited Loading

coveralls commented Oct 29, 2018 • edited Loading

slobodan-ilic Nov 1, 2018

Choose a reason for hiding this comment

scanny Nov 1, 2018

Choose a reason for hiding this comment

scanny commented Oct 29, 2018 •

edited

Loading

coveralls commented Oct 29, 2018 •

edited

Loading