-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tstats correct for CATxMRxITSELF Cubes #186
Conversation
ernestoarbitrio
commented
Nov 28, 2019
- New tstats measure in MRxMR Matrix
- new pairwise sig test for CATxMRxITSELF 5d cubes
- new factory condition for base matrix creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of changes requested.
src/cr/cube/cube.py
Outdated
@lazyproperty | ||
def is_mr_by_itself(self): | ||
""" | ||
It identify if the cube contains MRxItself as last 2 dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a PEP convention for docstring layout: https://www.python.org/dev/peps/pep-0257/
The first line goes on the same line as the opening """:
def is_mr_by_itself(self):
"""True if cube contains MRxItself as last 2 dimensions.
...
"""
src/cr/cube/cube.py
Outdated
return ( | ||
True | ||
if len(set([dimension.alias for dimension in self.dimensions[-2:]])) == 1 | ||
and all(dim_type == DT.MR for dim_type in self.dimension_types[-2:]) | ||
and self.ndim >= 3 | ||
else False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The True
, False
, if
, and else
are redundant here:
return (
# ---there are at least three dimensions---
self.ndim >= 3
# ---the last two are both MR---
and all(dim_type == DT.MR for dim_type in self.dimension_types[-2:])
# ---and they both have the same alias---
and len(set([dimension.alias for dimension in self.dimensions[-2:]])) == 1
)
Also note the use of comments to clarify your intent with otherwise somewhat complex expressions.
Also, the most general qualification (self.ndim >= 3
) is moved to the top. The other two predicates would be True on a 1D MR array because [x][-2:] == [x]
.
src/cr/cube/cubepart.py
Outdated
return self._matrix.overlaps_tstats | ||
# return self._matrix.overlaps_tstats[self._slice_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't leave commented code in a commit.
src/cr/cube/cubepart.py
Outdated
@lazyproperty | ||
def cube_is_mr_by_itself(self): | ||
return self._cube.is_mr_by_itself | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respect alphabetical ordering when adding new methods; do not locate them randomly. This is especially important with a value object like this having many external methods. Locating a method in a long list by scanning is tedious and error-prone.
src/cr/cube/cubepart.py
Outdated
@lazyproperty | ||
def cube_is_mr_by_itself(self): | ||
return False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this property required in a Strand
or Nub
? If it is, place this default case in the base class (in alphabetical order among public instance methods).
tests/integration/test_cubepart.py
Outdated
def it_knows_cube_is_mr_by_itself(self): | ||
cube = Cube(CR.EDU_FAV5_FAV5) | ||
slice_ = cube.partitions[0] | ||
assert slice_.table_name == cube.name | ||
assert slice_.cube_is_mr_by_itself is True | ||
|
||
def it_knows_cube_is_not_mr_by_itself(self): | ||
cube = Cube(CR.AGE_FAVMR) | ||
slice_ = cube.partitions[0] | ||
assert slice_.table_name is None | ||
assert slice_.cube_is_mr_by_itself is False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two are the same test with a different parameter. Parameterize a single test with @pytest.mark.parametrize
, grep for examples in exporter if you need them.
# def test_dummy(): | ||
# slice_ = Cube(CR.GGG).partitions[0] | ||
# import ipdb | ||
# | ||
# ipdb.set_trace() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented lines before committing.
tests/unit/test_cube.py
Outdated
@@ -47,6 +49,34 @@ def but_it_raises_on_other_cube_response_types( | |||
|
|||
assert str(exception) == expected_value | |||
|
|||
def it_knows_if_it_is_mr_by_itself_to_help( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _to_help
suffix is used to indicate a private (helper) method/property. Since this is a public method it is not needed here.
tests/unit/test_cube.py
Outdated
@pytest.fixture( | ||
params=[ | ||
((DT.CAT, DT.MR, DT.MR), ("alias1", "alias2", "alias2"), True), | ||
((DT.CAT, DT.MR), ("alias1", "alias2"), False), | ||
((DT.CAT, DT.MR, DT.MR), ("alias1", "alias2", "alias3"), False), | ||
((DT.CAT, DT.TEXT, DT.TEXT), ("alias1", "alias2", "alias2"), False), | ||
] | ||
) | ||
def cube_dimensions_fixture(self, request): | ||
dimension_types, aliases, expected_value = request.param | ||
return dimension_types, aliases, expected_value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place fixtures in alphabetical order. This aids locating them when search is unavailable.
@pytest.fixture | ||
def cube_dimensions_prop_(self, request): | ||
return property_mock(request, Cube, "dimensions") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same. respect alphabetical order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, this pr is huge but it all appears to be a fixture that we actually simplified — the age-favmr-favmr is fully superseded by the smaller fav5 version. If any tests depend on the huge favmr one, please refactor to use only the fav5 verison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings added.
@@ -182,6 +190,35 @@ def inserted_column_idxs(self): | |||
def inserted_row_idxs(self): | |||
return tuple(i for i, row in enumerate(self._matrix.rows) if row.is_insertion) | |||
|
|||
@lazyproperty | |||
def insertions(self): | |||
"""Returns masked array with residuals for insertions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this makes more sense now — even compress_rowcols would leave this whole thing intact. this mask is fine, but we may want to provide a way to get just rows or just columns. 🤔
What do we serialize in second-order for inf, since json is deficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use "Infinity" string for np.inf
... in the next release we can provide a way to provide only rows, only cols or both; also we can change the mask using the compress thing of numpy
src/cr/cube/matrix.py
Outdated
|
||
# These are apparent dimensions (user dimensions). Do we need to get all the dims? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# These are apparent dimensions (user dimensions). Do we need to get all the dims? | |
# These are apparent dimensions which hide 'selections' dims behind 'MR' |
src/cr/cube/matrix.py
Outdated
|
||
# These are apparent dimensions (user dimensions). Do we need to get all the dims? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(consider it resolved)
src/cr/cube/matrix.py
Outdated
@@ -667,6 +712,10 @@ def columns(self): | |||
for counts, base_counts, element, table_margin in self._column_generator | |||
) | |||
|
|||
@lazyproperty | |||
def _mr_shadow_proportions(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _mr_shadow_proportions(self): | |
def _mr_shadow_proportions(self): | |
""" | |
Cube containing item-wise selections, overlap, and nonoverlap | |
with all other items in a multiple response dimension, for each | |
element of any prepended dimensions: | |
A 1d interface to a 4d hypercube of underlying counts. | |
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong place
@@ -725,6 +821,10 @@ def _column_generator(self): | |||
self.table_margin.T, | |||
) | |||
|
|||
@lazyproperty | |||
def _pairwise_overlap_total(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _pairwise_overlap_total(self): | |
def _pairwise_overlap_total(self): | |
""" | |
Given a 4d hypercube of multiple response items, return the | |
symmetric square matrix of valid observations between all pairs. | |
n1 = 2; n2 = 2; n12 = 1; overlap total = 3 | |
""" | |
return np.sum(np.sum(self._counts, axis=1), axis=2) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings should be of the following PEP 257 form:
def method(self):
"""Return 2D ndarray symmetric-square matrix of valid observations.
Given a 4D hypercube of multiple-response (MR) items, this method calculates the
*symmetric square matrix* which is a fancy word for {this simple idea}.
"""
The key features are a single line to start, short enough not to wrap, that gives some gist of what to expect but should always start by mentioning the return-type, since that is the effective "type" of the function. Then, when more is called for, a separate passage, separated by a blank line that fills in any important remaining details. The key thing is to describe the contract of the method, i.e. what it gives back and what you need to give it, in that order.