Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tstats correct for CATxMRxITSELF Cubes #186

Merged
merged 6 commits into from
Dec 5, 2019

Conversation

ernestoarbitrio
Copy link
Contributor

  • New tstats measure in MRxMR Matrix
  • new pairwise sig test for CATxMRxITSELF 5d cubes
  • new factory condition for base matrix creation

@coveralls
Copy link

coveralls commented Nov 28, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling 8c168c6 on correct-t-test-for-multiple-response into acbcf0a on master.

Copy link
Contributor

@scanny scanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of changes requested.

@lazyproperty
def is_mr_by_itself(self):
"""
It identify if the cube contains MRxItself as last 2 dimensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PEP convention for docstring layout: https://www.python.org/dev/peps/pep-0257/

The first line goes on the same line as the opening """:

def is_mr_by_itself(self):
    """True if cube contains MRxItself as last 2 dimensions.

    ...
    """

Comment on lines 279 to 285
return (
True
if len(set([dimension.alias for dimension in self.dimensions[-2:]])) == 1
and all(dim_type == DT.MR for dim_type in self.dimension_types[-2:])
and self.ndim >= 3
else False
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The True, False, if, and else are redundant here:

        return (
            # ---there are at least three dimensions---
            self.ndim >= 3
            # ---the last two are both MR---
            and all(dim_type == DT.MR for dim_type in self.dimension_types[-2:])
            # ---and they both have the same alias---
            and len(set([dimension.alias for dimension in self.dimensions[-2:]])) == 1
        )

Also note the use of comments to clarify your intent with otherwise somewhat complex expressions.

Also, the most general qualification (self.ndim >= 3) is moved to the top. The other two predicates would be True on a 1D MR array because [x][-2:] == [x].

Comment on lines 211 to 212
return self._matrix.overlaps_tstats
# return self._matrix.overlaps_tstats[self._slice_idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave commented code in a commit.

Comment on lines 189 to 192
@lazyproperty
def cube_is_mr_by_itself(self):
return self._cube.is_mr_by_itself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respect alphabetical ordering when adding new methods; do not locate them randomly. This is especially important with a value object like this having many external methods. Locating a method in a long list by scanning is tedious and error-prone.

Comment on lines 578 to 581
@lazyproperty
def cube_is_mr_by_itself(self):
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this property required in a Strand or Nub? If it is, place this default case in the base class (in alphabetical order among public instance methods).

Comment on lines 15 to 26
def it_knows_cube_is_mr_by_itself(self):
cube = Cube(CR.EDU_FAV5_FAV5)
slice_ = cube.partitions[0]
assert slice_.table_name == cube.name
assert slice_.cube_is_mr_by_itself is True

def it_knows_cube_is_not_mr_by_itself(self):
cube = Cube(CR.AGE_FAVMR)
slice_ = cube.partitions[0]
assert slice_.table_name is None
assert slice_.cube_is_mr_by_itself is False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are the same test with a different parameter. Parameterize a single test with @pytest.mark.parametrize, grep for examples in exporter if you need them.

Comment on lines 8 to 14
# def test_dummy():
# slice_ = Cube(CR.GGG).partitions[0]
# import ipdb
#
# ipdb.set_trace()


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented lines before committing.

@@ -47,6 +49,34 @@ def but_it_raises_on_other_cube_response_types(

assert str(exception) == expected_value

def it_knows_if_it_is_mr_by_itself_to_help(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _to_help suffix is used to indicate a private (helper) method/property. Since this is a public method it is not needed here.

Comment on lines 108 to 119
@pytest.fixture(
params=[
((DT.CAT, DT.MR, DT.MR), ("alias1", "alias2", "alias2"), True),
((DT.CAT, DT.MR), ("alias1", "alias2"), False),
((DT.CAT, DT.MR, DT.MR), ("alias1", "alias2", "alias3"), False),
((DT.CAT, DT.TEXT, DT.TEXT), ("alias1", "alias2", "alias2"), False),
]
)
def cube_dimensions_fixture(self, request):
dimension_types, aliases, expected_value = request.param
return dimension_types, aliases, expected_value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place fixtures in alphabetical order. This aids locating them when search is unavailable.

Comment on lines +130 to +133
@pytest.fixture
def cube_dimensions_prop_(self, request):
return property_mock(request, Cube, "dimensions")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. respect alphabetical order.

Copy link
Contributor

@malecki malecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this pr is huge but it all appears to be a fixture that we actually simplified — the age-favmr-favmr is fully superseded by the smaller fav5 version. If any tests depend on the huge favmr one, please refactor to use only the fav5 verison.

Copy link
Contributor

@malecki malecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstrings added.

@@ -182,6 +190,35 @@ def inserted_column_idxs(self):
def inserted_row_idxs(self):
return tuple(i for i, row in enumerate(self._matrix.rows) if row.is_insertion)

@lazyproperty
def insertions(self):
"""Returns masked array with residuals for insertions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this makes more sense now — even compress_rowcols would leave this whole thing intact. this mask is fine, but we may want to provide a way to get just rows or just columns. 🤔

What do we serialize in second-order for inf, since json is deficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use "Infinity" string for np.inf ... in the next release we can provide a way to provide only rows, only cols or both; also we can change the mask using the compress thing of numpy

Comment on lines 234 to 235

# These are apparent dimensions (user dimensions). Do we need to get all the dims?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# These are apparent dimensions (user dimensions). Do we need to get all the dims?
# These are apparent dimensions which hide 'selections' dims behind 'MR'

Comment on lines 234 to 235

# These are apparent dimensions (user dimensions). Do we need to get all the dims?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(consider it resolved)

@@ -667,6 +712,10 @@ def columns(self):
for counts, base_counts, element, table_margin in self._column_generator
)

@lazyproperty
def _mr_shadow_proportions(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _mr_shadow_proportions(self):
def _mr_shadow_proportions(self):
"""
Cube containing item-wise selections, overlap, and nonoverlap
with all other items in a multiple response dimension, for each
element of any prepended dimensions:
A 1d interface to a 4d hypercube of underlying counts.
"""

Copy link
Contributor

@malecki malecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong place

@@ -725,6 +821,10 @@ def _column_generator(self):
self.table_margin.T,
)

@lazyproperty
def _pairwise_overlap_total(self):
Copy link
Contributor

@malecki malecki Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _pairwise_overlap_total(self):
def _pairwise_overlap_total(self):
"""
Given a 4d hypercube of multiple response items, return the
symmetric square matrix of valid observations between all pairs.
n1 = 2; n2 = 2; n12 = 1; overlap total = 3
"""
return np.sum(np.sum(self._counts, axis=1), axis=2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings should be of the following PEP 257 form:

def method(self):
    """Return 2D ndarray symmetric-square matrix of valid observations.

    Given a 4D hypercube of multiple-response (MR) items, this method calculates the
    *symmetric square matrix* which is a fancy word for {this simple idea}.
    """

The key features are a single line to start, short enough not to wrap, that gives some gist of what to expect but should always start by mentioning the return-type, since that is the effective "type" of the function. Then, when more is called for, a separate passage, separated by a blank line that fills in any important remaining details. The key thing is to describe the contract of the method, i.e. what it gives back and what you need to give it, in that order.

@ernestoarbitrio ernestoarbitrio merged commit 5f17fd2 into master Dec 5, 2019
@scanny scanny deleted the correct-t-test-for-multiple-response branch May 11, 2020 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants