-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor dimension #112
Refactor dimension #112
Conversation
This refactoring introduced a policy clarification in that any elements representing missing data are excluded from a subtotal. This changed the expectations on a couple of integration tests as they included item 8 "Skipped" from the fixture. * Add _Subtotals class and move logic previously in Dimension.subtotals into the collection. * Remove `.is_valid` flag and perform validation in _Subtotals such that only valid _Subtotal objects are instantiated. * Rework _Subtotal and its unit tests to suit.
* Rename Dimension.type to .dimension_type. In general, an object should never have an attribute or property (and probably method either) named `type` or `id`. It seems like a good concise choice considering the broader identifier is likely to be `thing.type` or `widget.id` and that reads nicely. But the negatives soon outweigh, in particular that it collides with a built-in method or type, which causes spurious syntax highlighting, also its ambiguous because things can have more than one kind of type and more that one kind of id. This required a handful of changes to exporter code. * Localize all type-discovery code in the `.dimension_type` property. This logic will need to move to the `Dimensions` object when its implemented so having it all in one place will make that move easier and avoids cluttering the namespace with helpers. classmethods are unnecessary here so removed those and improved name of `._next_dimension_dict` to clarify its role. * Unify unit tests for dimension type-discovery.
Reimplement `.labels()` making best use of newly extracted element and subtotal objects, using iterators for interleaving.
Makes good use of the new `._iter_interleaved_items()` method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments for you here to get you started following along with what I did. We can discuss in more detail this coming week.
@@ -149,7 +149,7 @@ def as_array(self, include_missing=False, weighted=True, adjusted=False, | |||
@lazyproperty | |||
def ca_dim_ind(self): | |||
for (i, dim) in enumerate(self.dimensions): | |||
if dim.type == 'categorical_array': | |||
if dim.dimension_type == 'categorical_array': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed Dimension.type
to .dimension_type
. There are some comments in the commit going into the rationale. Short version is having a property or method named .type
or .id
is to be avoided. Among other benefits this lets the reader know what kind of type it is when context can't make that clear.
return dim_types.index('categorical') | ||
|
||
def valid_indices_with_selections(self, include_missing=False): | ||
"""Get all valid indices (including MR selections).""" | ||
return [ | ||
dim.valid_indices(include_missing) | ||
dim.element_indices(include_missing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed Dimension.valid_indices()
to .element_indices()
since it can be used to get both all and only valid indices. This eliminates a jarring contradiction in the naming.
|
||
return insertions | ||
return [insertion for insertion in iter_insertions()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted the body of the code into the local method iter_insertions()
to make this simpler to understand.
self._type = self._get_type(dim, selections) | ||
def __init__(self, dimension_dict, next_dimension_dict=None): | ||
self._dimension_dict = dimension_dict | ||
self._next_dimension_dict = next_dimension_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed dim
to the more descriptive _dimension_dict
, especially since dim
is so often used as a shorthand for a Dimension
object. I renamed selections
to the more precise next_dimension_dict
for clarity.
return description if description else '' | ||
|
||
@lazyproperty | ||
def dimension_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the refactored form of .type
, ._type
, _get_type()
, and ._is_multiple_response()
. Calling ._get_type()
in the constructor represented "real work", which is why you had to mock it so often in unit tests even when it didn't have anything to do with that unit. The classmethods were unnecessary once next_dimension_dict
was stored as an instance var and .dimension_type
was made lazy. All this logic really belongs in a Dimensions
collection object anyway, which we'll probably get to before too long.
"""_Elements object providing access to elements of this dimension.""" | ||
return ( | ||
self._all_elements if include_missing else self._valid_elements | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the big items in this PR is extracting _Elements
and _Element
(and their object hierarchy siblings) as full-fledged objects. This removes all the widely distributed dict manipulation that is hard to follow and very error prone. Using dicts instead of real objects for jobs like these is an example of the "Primitive Obsession" anti-pattern, and this refactoring provides a good example of how to eliminate that and the benefits in clarity, conciseness, and testability that can bring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
src/cr/cube/dimension.py
Outdated
appear after subtotals have been inserted at their anchor locations. | ||
Where more than one subtotal is anchored to the same location, they | ||
appear in their document order in the cube response. pairs for this | ||
dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last sentence fragment is an editing error. I've fixed it in my copy and will push with any other changes before we proceed.
from the collection; this sequence represents a subset of that | ||
provided by `._all_elements`. | ||
""" | ||
return self._all_elements.valid_elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this expression it seems that having "elements" repeated two times is redundant. Maybe something like
self._elements.all
vs
self._elements.valid
would be more concise and/or readable.
But this is just a comment, not sure it's good/easy to change it like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right naming. This location is the only place we expect this expression to arise and I'm strongly inclined to keep things explicit. If we neglect to distinguish all from valid here it makes readability poor because you have to remember (or more likely go look) to determine what that simplified name means.
End users will have no direct access to the _AllElements object so they will not need to use the _AllElements.valid_elements
property.
|
||
categories = self._next_dimension_dict['type'].get('categories') | ||
if not categories: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is untested. Is there an easy way of testing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is fixed. Interestingly it takes a pretty special case of CA_SUBVAR_X_MR to trigger this particular branch.
for label in labels_with_cat_ids | ||
if self._include_in_labels(label, valid_indices) | ||
] | ||
return list(zip(labels, element_ids)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (under if) is untested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, dimension.py is 100% coverage now.
return name | ||
Must be implemented by each subclass. | ||
""" | ||
raise NotImplementedError('must be implemented by each subclass') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this, although it's understandable why :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is fixed too.
* Make a few small documentation fixes. * Bring coverage for cr.cube.dimensions up to 100%. * Add `make coverage` to Makefile. * Add integration test for `Dimension.dimension_type` * Replace tautalogical unit test for `Dimension.has_mr` with one of full edge-case variety. * Add unit test for `_BaseElements._elements`.
Ok, I think this is ready to go. I merged master to resolves a parallel development conflict. We'll need make some changes in our exporter code to make this work over there. Once this is released I'll prepare a commit containing those changes along with the new version bump. |
This is a fairly thorough refactor of cr.cube.dimension.
Let's discuss when we speak next.