Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zscore for headers and subtotals #182

Merged
merged 11 commits into from
Oct 22, 2019
Merged

Conversation

ernestoarbitrio
Copy link
Contributor

first implementation of zscore for categorical vector
final goal is to add zscore and pvals for insertions

@coveralls
Copy link

coveralls commented Oct 16, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling ae7dee9 on residual-of-subtotals-169063630 into e1adf9d on master.

@@ -426,7 +426,7 @@ def columns(self):
element,
self.table_margin,
zscore,
np.sum(self._counts, axis=1),
**{"opposite_margins": np.sum(self._counts, axis=1)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a redundant expression. Isn't opposite_margins=np.sum(self._counts, axis=1), equivalent? All the ** says is "take these out of the dict I've put them in". This dict packing can be useful when the contents are unknown (like received from elsewhere as a dict), but in this case they are well defined and defined entirely by this method.

@@ -441,7 +441,7 @@ def rows(self):
self.table_margin,
zscore,
column_index,
np.sum(self._counts, axis=0),
**{"opposite_margins": np.sum(self._counts, axis=0)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -1469,14 +1451,14 @@ def __init__(
table_margin,
zscore=None,
column_index=None,
opposite_margins=None,
**kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using **kwargs seems unnecessary here. What's wrong with opposite_margins=None? It accomplishes the same thing and is more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used kwargs because I would propagate opposit margins to be used in _addend_vectors like here

@lazyproperty

Copy link
Contributor

@scanny scanny Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is different about this behavior (using **kwargs) than just having opposite_margins=None in there like you did before? The x=y argument syntax makes the argument optional, so adding the kwargs layer just obscures what is a simple and common pattern of an optional parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmm U totally right, I don't remenber why that param was None when I define that new property. I'll change it in the next commit

@ernestoarbitrio ernestoarbitrio changed the title zscore for categorical vector zscore for headers and subtotals Oct 19, 2019
@@ -260,14 +260,17 @@ def _columns_inserted_at_right(self):
def _iter_columns(self):
"""Generate all column vectors with insertions interleaved at right spot."""
opposing_insertions = self._all_inserted_rows
column_vector_index = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the first thing I saw, and I had to think: "what is the column vector index?". Then I looked above to see what the name of the method was, and then I remembered that I originally wrote i. Maybe change it to just column_index, since the name of the method implies that (and not _iter_column_vectors).

@@ -282,14 +285,15 @@ def _iter_inserted_rows_anchored_at(self, anchor):
def _iter_rows(self):
"""Generate all row vectors with insertions interleaved at right spot."""
opposing_insertions = self._all_inserted_columns

row_vector_index = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as ☝️

# ---subtotals inserted at top---
for row in self._rows_inserted_at_top:
yield _AssembledVector(row, opposing_insertions)

yield _AssembledVector(row, opposing_insertions, row_vector_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. we're passing it row, not the row_vector. the index should conform

element,
self.table_margin,
zscore,
opposite_margins=np.sum(self._counts, axis=1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to miss the =1 here. I'd create a private property of _rows_margin, and just pass it here. Later, when we get to more complex cases with MR, this will come in handy.

self.table_margin,
zscore,
column_index,
opposite_margins=np.sum(self._counts, axis=0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as ☝️

@@ -975,11 +1029,11 @@ def pruned(self):

@lazyproperty
def pvals(self):
return np.array([np.nan] * len(self._matrix.columns))
return self._pvals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are absolutely the same between _InsertionRow and _InsertionColumn. They should just be moved one level up and public.

@@ -957,6 +999,18 @@ def _addend_vectors(self):
if i in self._subtotal.addend_idxs
)

@lazyproperty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These also seem as if they belong to the _InsertionVector and not row or column

@@ -989,6 +1043,18 @@ def _addend_vectors(self):
if i in self._subtotal.addend_idxs
)

@lazyproperty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These too, absolutely the same... should go up one level...

@@ -1027,6 +1093,10 @@ def means(self):
def numeric(self):
return self._base_vector.numeric

@lazyproperty
def opposite_margins(self):
return self._base_vector._opposite_margins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be made public on the base vector class... we shouldn't access private properties.

@@ -1424,7 +1529,25 @@ def values(self):

@lazyproperty
def zscore(self):
return self._zscore
variance = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These also look exactly the same as the above calculations... But I'm not sure if they can be moved up the hierarchical structure. If they can, please move them. If not, this is fine too.

np.testing.assert_almost_equal(
slice_.pvals,
[
[0.00000000e00, np.nan, 0.00000000e00, np.nan, 0.00000000e00],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary :) Can you try and make it with just normal zeros? Maybe even use the reshape, so you only focus on important entries, like so:

np.array([0, np.nan, 0, np.nan, 0] * 10).reshape((10, 5))

or something similar...

@@ -8,9 +8,10 @@

from cr.cube.crunch_cube import CrunchCube
from cr.cube.cube_slice import CubeSlice
from cr.cube.enum import DIMENSION_TYPE as DT
from cr.cube.dimension import Dimension

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why empty line here?

@slobodan-ilic slobodan-ilic merged commit a8631a8 into master Oct 22, 2019
@ernestoarbitrio ernestoarbitrio deleted the residual-of-subtotals-169063630 branch June 3, 2020 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants