Refactor dimension #112

scanny · 2018-10-21T07:18:55Z

This is a fairly thorough refactor of cr.cube.dimension.

Let's discuss when we speak next.

This refactoring introduced a policy clarification in that any elements representing missing data are excluded from a subtotal. This changed the expectations on a couple of integration tests as they included item 8 "Skipped" from the fixture. * Add _Subtotals class and move logic previously in Dimension.subtotals into the collection. * Remove `.is_valid` flag and perform validation in _Subtotals such that only valid _Subtotal objects are instantiated. * Rework _Subtotal and its unit tests to suit.

* Rename Dimension.type to .dimension_type. In general, an object should never have an attribute or property (and probably method either) named `type` or `id`. It seems like a good concise choice considering the broader identifier is likely to be `thing.type` or `widget.id` and that reads nicely. But the negatives soon outweigh, in particular that it collides with a built-in method or type, which causes spurious syntax highlighting, also its ambiguous because things can have more than one kind of type and more that one kind of id. This required a handful of changes to exporter code. * Localize all type-discovery code in the `.dimension_type` property. This logic will need to move to the `Dimensions` object when its implemented so having it all in one place will make that move easier and avoids cluttering the namespace with helpers. classmethods are unnecessary here so removed those and improved name of `._next_dimension_dict` to clarify its role. * Unify unit tests for dimension type-discovery.

Reimplement `.labels()` making best use of newly extracted element and subtotal objects, using iterators for interleaving.

Makes good use of the new `._iter_interleaved_items()` method.

coveralls · 2018-10-21T07:22:10Z

Coverage increased (+0.2%) to 97.165% when pulling dd32932 on refactor-dimension into e10e155 on master.

scanny

I left a few comments for you here to get you started following along with what I did. We can discuss in more detail this coming week.

scanny · 2018-10-21T07:21:29Z

src/cr/cube/crunch_cube.py

@@ -149,7 +149,7 @@ def as_array(self, include_missing=False, weighted=True, adjusted=False,
    @lazyproperty
    def ca_dim_ind(self):
        for (i, dim) in enumerate(self.dimensions):
-            if dim.type == 'categorical_array':
+            if dim.dimension_type == 'categorical_array':


I renamed Dimension.type to .dimension_type. There are some comments in the commit going into the rationale. Short version is having a property or method named .type or .id is to be avoided. Among other benefits this lets the reader know what kind of type it is when context can't make that clear.

scanny · 2018-10-21T07:23:50Z

src/cr/cube/crunch_cube.py

        return dim_types.index('categorical')

    def valid_indices_with_selections(self, include_missing=False):
        """Get all valid indices (including MR selections)."""
        return [
-            dim.valid_indices(include_missing)
+            dim.element_indices(include_missing)


I renamed Dimension.valid_indices() to .element_indices() since it can be used to get both all and only valid indices. This eliminates a jarring contradiction in the naming.

scanny · 2018-10-21T07:26:17Z

src/cr/cube/crunch_cube.py


-        return insertions
+        return [insertion for insertion in iter_insertions()]


I extracted the body of the code into the local method iter_insertions() to make this simpler to understand.

scanny · 2018-10-21T07:28:33Z

src/cr/cube/dimension.py

-        self._type = self._get_type(dim, selections)
+    def __init__(self, dimension_dict, next_dimension_dict=None):
+        self._dimension_dict = dimension_dict
+        self._next_dimension_dict = next_dimension_dict


I renamed dim to the more descriptive _dimension_dict, especially since dim is so often used as a shorthand for a Dimension object. I renamed selections to the more precise next_dimension_dict for clarity.

scanny · 2018-10-21T07:36:14Z

src/cr/cube/dimension.py

+        return description if description else ''
+
+    @lazyproperty
+    def dimension_type(self):


This is the refactored form of .type, ._type, _get_type(), and ._is_multiple_response(). Calling ._get_type() in the constructor represented "real work", which is why you had to mock it so often in unit tests even when it didn't have anything to do with that unit. The classmethods were unnecessary once next_dimension_dict was stored as an instance var and .dimension_type was made lazy. All this logic really belongs in a Dimensions collection object anyway, which we'll probably get to before too long.

scanny · 2018-10-21T07:38:20Z

src/cr/cube/dimension.py

+        """_Elements object providing access to elements of this dimension."""
+        return (
+            self._all_elements if include_missing else self._valid_elements
+        )


One of the big items in this PR is extracting _Elements and _Element (and their object hierarchy siblings) as full-fledged objects. This removes all the widely distributed dict manipulation that is hard to follow and very error prone. Using dicts instead of real objects for jobs like these is an example of the "Primitive Obsession" anti-pattern, and this refactoring provides a good example of how to eliminate that and the benefits in clarity, conciseness, and testability that can bring.

scanny · 2018-10-21T07:45:50Z

src/cr/cube/dimension.py

+        appear after subtotals have been inserted at their anchor locations.
+        Where more than one subtotal is anchored to the same location, they
+        appear in their document order in the cube response. pairs for this
+        dimension.


This last sentence fragment is an editing error. I've fixed it in my copy and will push with any other changes before we proceed.

slobodan-ilic · 2018-10-24T15:54:47Z

src/cr/cube/dimension.py

+        from the collection; this sequence represents a subset of that
+        provided by `._all_elements`.
+        """
+        return self._all_elements.valid_elements


From this expression it seems that having "elements" repeated two times is redundant. Maybe something like

self._elements.all

vs

self._elements.valid

would be more concise and/or readable.
But this is just a comment, not sure it's good/easy to change it like that.

I think this is the right naming. This location is the only place we expect this expression to arise and I'm strongly inclined to keep things explicit. If we neglect to distinguish all from valid here it makes readability poor because you have to remember (or more likely go look) to determine what that simplified name means.

End users will have no direct access to the _AllElements object so they will not need to use the _AllElements.valid_elements property.

slobodan-ilic · 2018-10-24T16:37:51Z

src/cr/cube/dimension.py

+
+            categories = self._next_dimension_dict['type'].get('categories')
+            if not categories:
+                return False


This line is untested. Is there an easy way of testing it?

Ok, this is fixed. Interestingly it takes a pretty special case of CA_SUBVAR_X_MR to trigger this particular branch.

slobodan-ilic · 2018-10-24T16:39:00Z

src/cr/cube/dimension.py

-            for label in labels_with_cat_ids
-            if self._include_in_labels(label, valid_indices)
-        ]
+            return list(zip(labels, element_ids))


This (under if) is untested

Fixed, dimension.py is 100% coverage now.

slobodan-ilic · 2018-10-24T16:40:14Z

src/cr/cube/dimension.py

-            return name
+        Must be implemented by each subclass.
+        """
+        raise NotImplementedError('must be implemented by each subclass')


Also this, although it's understandable why :)

This one is fixed too.

* Make a few small documentation fixes. * Bring coverage for cr.cube.dimensions up to 100%. * Add `make coverage` to Makefile. * Add integration test for `Dimension.dimension_type` * Replace tautalogical unit test for `Dimension.has_mr` with one of full edge-case variety. * Add unit test for `_BaseElements._elements`.

scanny · 2018-10-26T09:57:36Z

Ok, I think this is ready to go. I merged master to resolves a parallel development conflict.

We'll need make some changes in our exporter code to make this work over there. Once this is released I'll prepare a commit containing those changes along with the new version bump.

scanny added 5 commits October 18, 2018 17:25

rfctr: extract _Elements and _Element objects

df5c569

rfctr: reimplement Dimension.labels()

e8557c2

Reimplement `.labels()` making best use of newly extracted element and subtotal objects, using iterators for interleaving.

rfctr: reimplement Dimension.inserted_hs_indices

4dab599

Makes good use of the new `._iter_interleaved_items()` method.

scanny commented Oct 21, 2018

View reviewed changes

scanny requested a review from slobodan-ilic October 21, 2018 07:56

slobodan-ilic reviewed Oct 24, 2018

View reviewed changes

slobodan-ilic approved these changes Oct 24, 2018

View reviewed changes

slobodan-ilic reviewed Oct 24, 2018

View reviewed changes

scanny added 2 commits October 26, 2018 05:30

Merge branch 'master' into refactor-dimension

912ae55

slobodan-ilic approved these changes Oct 26, 2018

View reviewed changes

Bump version and update readme

dd32932

slobodan-ilic merged commit 926e156 into master Oct 29, 2018

scanny deleted the refactor-dimension branch May 23, 2019 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dimension #112

Refactor dimension #112

scanny commented Oct 21, 2018

coveralls commented Oct 21, 2018 •

edited

Loading

scanny left a comment

scanny Oct 21, 2018

scanny Oct 21, 2018

scanny Oct 21, 2018

scanny Oct 21, 2018

scanny Oct 21, 2018

scanny Oct 21, 2018

slobodan-ilic Oct 24, 2018

scanny Oct 21, 2018

slobodan-ilic Oct 24, 2018

scanny Oct 26, 2018

slobodan-ilic Oct 24, 2018

scanny Oct 26, 2018 •

edited

Loading

slobodan-ilic Oct 24, 2018

scanny Oct 26, 2018

slobodan-ilic Oct 24, 2018

scanny Oct 26, 2018

scanny commented Oct 26, 2018


		return insertions
		return [insertion for insertion in iter_insertions()]

Refactor dimension #112

Refactor dimension #112

Conversation

scanny commented Oct 21, 2018

coveralls commented Oct 21, 2018 • edited Loading

scanny left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scanny Oct 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scanny commented Oct 26, 2018

coveralls commented Oct 21, 2018 •

edited

Loading

scanny Oct 26, 2018 •

edited

Loading