-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zscore for headers and subtotals #182
Conversation
src/cr/cube/matrix.py
Outdated
@@ -426,7 +426,7 @@ def columns(self): | |||
element, | |||
self.table_margin, | |||
zscore, | |||
np.sum(self._counts, axis=1), | |||
**{"opposite_margins": np.sum(self._counts, axis=1)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be a redundant expression. Isn't opposite_margins=np.sum(self._counts, axis=1),
equivalent? All the **
says is "take these out of the dict I've put them in". This dict packing can be useful when the contents are unknown (like received from elsewhere as a dict), but in this case they are well defined and defined entirely by this method.
src/cr/cube/matrix.py
Outdated
@@ -441,7 +441,7 @@ def rows(self): | |||
self.table_margin, | |||
zscore, | |||
column_index, | |||
np.sum(self._counts, axis=0), | |||
**{"opposite_margins": np.sum(self._counts, axis=0)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
src/cr/cube/matrix.py
Outdated
@@ -1469,14 +1451,14 @@ def __init__( | |||
table_margin, | |||
zscore=None, | |||
column_index=None, | |||
opposite_margins=None, | |||
**kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using **kwargs
seems unnecessary here. What's wrong with opposite_margins=None
? It accomplishes the same thing and is more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used kwargs
because I would propagate opposit margins to be used in _addend_vectors
like here
crunch-cube/src/cr/cube/matrix.py
Line 994 in a135ca6
@lazyproperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is different about this behavior (using **kwargs
) than just having opposite_margins=None
in there like you did before? The x=y
argument syntax makes the argument optional, so adding the kwargs
layer just obscures what is a simple and common pattern of an optional parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmm U totally right, I don't remenber why that param was None
when I define that new property. I'll change it in the next commit
src/cr/cube/matrix.py
Outdated
@@ -260,14 +260,17 @@ def _columns_inserted_at_right(self): | |||
def _iter_columns(self): | |||
"""Generate all column vectors with insertions interleaved at right spot.""" | |||
opposing_insertions = self._all_inserted_rows | |||
column_vector_index = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the first thing I saw, and I had to think: "what is the column vector index?". Then I looked above to see what the name of the method was, and then I remembered that I originally wrote i
. Maybe change it to just column_index
, since the name of the method implies that (and not _iter_column_vectors
).
src/cr/cube/matrix.py
Outdated
@@ -282,14 +285,15 @@ def _iter_inserted_rows_anchored_at(self, anchor): | |||
def _iter_rows(self): | |||
"""Generate all row vectors with insertions interleaved at right spot.""" | |||
opposing_insertions = self._all_inserted_columns | |||
|
|||
row_vector_index = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as ☝️
src/cr/cube/matrix.py
Outdated
# ---subtotals inserted at top--- | ||
for row in self._rows_inserted_at_top: | ||
yield _AssembledVector(row, opposing_insertions) | ||
|
||
yield _AssembledVector(row, opposing_insertions, row_vector_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same. we're passing it row
, not the row_vector
. the index should conform
src/cr/cube/matrix.py
Outdated
element, | ||
self.table_margin, | ||
zscore, | ||
opposite_margins=np.sum(self._counts, axis=1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy to miss the =1
here. I'd create a private property of _rows_margin
, and just pass it here. Later, when we get to more complex cases with MR, this will come in handy.
src/cr/cube/matrix.py
Outdated
self.table_margin, | ||
zscore, | ||
column_index, | ||
opposite_margins=np.sum(self._counts, axis=0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as ☝️
src/cr/cube/matrix.py
Outdated
@@ -975,11 +1029,11 @@ def pruned(self): | |||
|
|||
@lazyproperty | |||
def pvals(self): | |||
return np.array([np.nan] * len(self._matrix.columns)) | |||
return self._pvals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are absolutely the same between _InsertionRow
and _InsertionColumn
. They should just be moved one level up and public.
@@ -957,6 +999,18 @@ def _addend_vectors(self): | |||
if i in self._subtotal.addend_idxs | |||
) | |||
|
|||
@lazyproperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These also seem as if they belong to the _InsertionVector
and not row or column
@@ -989,6 +1043,18 @@ def _addend_vectors(self): | |||
if i in self._subtotal.addend_idxs | |||
) | |||
|
|||
@lazyproperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These too, absolutely the same... should go up one level...
src/cr/cube/matrix.py
Outdated
@@ -1027,6 +1093,10 @@ def means(self): | |||
def numeric(self): | |||
return self._base_vector.numeric | |||
|
|||
@lazyproperty | |||
def opposite_margins(self): | |||
return self._base_vector._opposite_margins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be made public on the base vector class... we shouldn't access private properties.
src/cr/cube/matrix.py
Outdated
@@ -1424,7 +1529,25 @@ def values(self): | |||
|
|||
@lazyproperty | |||
def zscore(self): | |||
return self._zscore | |||
variance = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These also look exactly the same as the above calculations... But I'm not sure if they can be moved up the hierarchical structure. If they can, please move them. If not, this is fine too.
np.testing.assert_almost_equal( | ||
slice_.pvals, | ||
[ | ||
[0.00000000e00, np.nan, 0.00000000e00, np.nan, 0.00000000e00], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary :) Can you try and make it with just normal zeros? Maybe even use the reshape, so you only focus on important entries, like so:
np.array([0, np.nan, 0, np.nan, 0] * 10).reshape((10, 5))
or something similar...
tests/unit/legacy/test_cube_slice.py
Outdated
@@ -8,9 +8,10 @@ | |||
|
|||
from cr.cube.crunch_cube import CrunchCube | |||
from cr.cube.cube_slice import CubeSlice | |||
from cr.cube.enum import DIMENSION_TYPE as DT | |||
from cr.cube.dimension import Dimension | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why empty line here?
first implementation of zscore for categorical vector
final goal is to add zscore and pvals for insertions