Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dimension #112

Merged
merged 8 commits into from
Oct 29, 2018
Merged

Refactor dimension #112

merged 8 commits into from
Oct 29, 2018

Conversation

scanny
Copy link
Contributor

@scanny scanny commented Oct 21, 2018

This is a fairly thorough refactor of cr.cube.dimension.

Let's discuss when we speak next.

This refactoring introduced a policy clarification in that any elements
representing missing data are excluded from a subtotal. This changed the
expectations on a couple of integration tests as they included item
8 "Skipped" from the fixture.

* Add _Subtotals class and move logic previously in Dimension.subtotals
  into the collection.

* Remove `.is_valid` flag and perform validation in _Subtotals such that
  only valid _Subtotal objects are instantiated.

* Rework _Subtotal and its unit tests to suit.
* Rename Dimension.type to .dimension_type. In general, an object should
  never have an attribute or property (and probably method either) named
  `type` or `id`. It seems like a good concise choice considering the
  broader identifier is likely to be `thing.type` or `widget.id` and
  that reads nicely. But the negatives soon outweigh, in particular that
  it collides with a built-in method or type, which causes spurious
  syntax highlighting, also its ambiguous because things can have more
  than one kind of type and more that one kind of id.

  This required a handful of changes to exporter code.

* Localize all type-discovery code in the `.dimension_type` property.
  This logic will need to move to the `Dimensions` object when its
  implemented so having it all in one place will make that move easier
  and avoids cluttering the namespace with helpers. classmethods are
  unnecessary here so removed those and improved name of
  `._next_dimension_dict` to clarify its role.

* Unify unit tests for dimension type-discovery.
Reimplement `.labels()` making best use of newly extracted element and
subtotal objects, using iterators for interleaving.
Makes good use of the new `._iter_interleaved_items()` method.
@coveralls
Copy link

coveralls commented Oct 21, 2018

Coverage Status

Coverage increased (+0.2%) to 97.165% when pulling dd32932 on refactor-dimension into e10e155 on master.

Copy link
Contributor Author

@scanny scanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments for you here to get you started following along with what I did. We can discuss in more detail this coming week.

@@ -149,7 +149,7 @@ def as_array(self, include_missing=False, weighted=True, adjusted=False,
@lazyproperty
def ca_dim_ind(self):
for (i, dim) in enumerate(self.dimensions):
if dim.type == 'categorical_array':
if dim.dimension_type == 'categorical_array':
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed Dimension.type to .dimension_type. There are some comments in the commit going into the rationale. Short version is having a property or method named .type or .id is to be avoided. Among other benefits this lets the reader know what kind of type it is when context can't make that clear.

return dim_types.index('categorical')

def valid_indices_with_selections(self, include_missing=False):
"""Get all valid indices (including MR selections)."""
return [
dim.valid_indices(include_missing)
dim.element_indices(include_missing)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed Dimension.valid_indices() to .element_indices() since it can be used to get both all and only valid indices. This eliminates a jarring contradiction in the naming.


return insertions
return [insertion for insertion in iter_insertions()]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted the body of the code into the local method iter_insertions() to make this simpler to understand.

self._type = self._get_type(dim, selections)
def __init__(self, dimension_dict, next_dimension_dict=None):
self._dimension_dict = dimension_dict
self._next_dimension_dict = next_dimension_dict
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed dim to the more descriptive _dimension_dict, especially since dim is so often used as a shorthand for a Dimension object. I renamed selections to the more precise next_dimension_dict for clarity.

return description if description else ''

@lazyproperty
def dimension_type(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the refactored form of .type, ._type, _get_type(), and ._is_multiple_response(). Calling ._get_type() in the constructor represented "real work", which is why you had to mock it so often in unit tests even when it didn't have anything to do with that unit. The classmethods were unnecessary once next_dimension_dict was stored as an instance var and .dimension_type was made lazy. All this logic really belongs in a Dimensions collection object anyway, which we'll probably get to before too long.

"""_Elements object providing access to elements of this dimension."""
return (
self._all_elements if include_missing else self._valid_elements
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the big items in this PR is extracting _Elements and _Element (and their object hierarchy siblings) as full-fledged objects. This removes all the widely distributed dict manipulation that is hard to follow and very error prone. Using dicts instead of real objects for jobs like these is an example of the "Primitive Obsession" anti-pattern, and this refactoring provides a good example of how to eliminate that and the benefits in clarity, conciseness, and testability that can bring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

appear after subtotals have been inserted at their anchor locations.
Where more than one subtotal is anchored to the same location, they
appear in their document order in the cube response. pairs for this
dimension.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last sentence fragment is an editing error. I've fixed it in my copy and will push with any other changes before we proceed.

from the collection; this sequence represents a subset of that
provided by `._all_elements`.
"""
return self._all_elements.valid_elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this expression it seems that having "elements" repeated two times is redundant. Maybe something like

self._elements.all

vs

self._elements.valid

would be more concise and/or readable.
But this is just a comment, not sure it's good/easy to change it like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right naming. This location is the only place we expect this expression to arise and I'm strongly inclined to keep things explicit. If we neglect to distinguish all from valid here it makes readability poor because you have to remember (or more likely go look) to determine what that simplified name means.

End users will have no direct access to the _AllElements object so they will not need to use the _AllElements.valid_elements property.


categories = self._next_dimension_dict['type'].get('categories')
if not categories:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is untested. Is there an easy way of testing it?

Copy link
Contributor Author

@scanny scanny Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is fixed. Interestingly it takes a pretty special case of CA_SUBVAR_X_MR to trigger this particular branch.

for label in labels_with_cat_ids
if self._include_in_labels(label, valid_indices)
]
return list(zip(labels, element_ids))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (under if) is untested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, dimension.py is 100% coverage now.

return name
Must be implemented by each subclass.
"""
raise NotImplementedError('must be implemented by each subclass')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this, although it's understandable why :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is fixed too.

* Make a few small documentation fixes.
* Bring coverage for cr.cube.dimensions up to 100%.
* Add `make coverage` to Makefile.
* Add integration test for `Dimension.dimension_type`
* Replace tautalogical unit test for `Dimension.has_mr` with one of full
  edge-case variety.
* Add unit test for `_BaseElements._elements`.
@scanny
Copy link
Contributor Author

scanny commented Oct 26, 2018

Ok, I think this is ready to go. I merged master to resolves a parallel development conflict.

We'll need make some changes in our exporter code to make this work over there. Once this is released I'll prepare a commit containing those changes along with the new version bump.

@slobodan-ilic slobodan-ilic merged commit 926e156 into master Oct 29, 2018
@scanny scanny deleted the refactor-dimension branch May 23, 2019 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants