zscore for headers and subtotals #182

ernestoarbitrio · 2019-10-16T18:26:17Z

first implementation of zscore for categorical vector
final goal is to add zscore and pvals for insertions

coveralls · 2019-10-16T18:30:12Z

Coverage remained the same at 100.0% when pulling ae7dee9 on residual-of-subtotals-169063630 into e1adf9d on master.

scanny · 2019-10-18T16:54:55Z

src/cr/cube/matrix.py

@@ -426,7 +426,7 @@ def columns(self):
                element,
                self.table_margin,
                zscore,
-                np.sum(self._counts, axis=1),
+                **{"opposite_margins": np.sum(self._counts, axis=1)}


This appears to be a redundant expression. Isn't opposite_margins=np.sum(self._counts, axis=1), equivalent? All the ** says is "take these out of the dict I've put them in". This dict packing can be useful when the contents are unknown (like received from elsewhere as a dict), but in this case they are well defined and defined entirely by this method.

scanny · 2019-10-18T16:55:08Z

src/cr/cube/matrix.py

@@ -441,7 +441,7 @@ def rows(self):
                self.table_margin,
                zscore,
                column_index,
-                np.sum(self._counts, axis=0),
+                **{"opposite_margins": np.sum(self._counts, axis=0)}


scanny · 2019-10-18T17:01:40Z

src/cr/cube/matrix.py

@@ -1469,14 +1451,14 @@ def __init__(
        table_margin,
        zscore=None,
        column_index=None,
-        opposite_margins=None,
+        **kwargs


Using **kwargs seems unnecessary here. What's wrong with opposite_margins=None? It accomplishes the same thing and is more explicit.

I used kwargs because I would propagate opposit margins to be used in _addend_vectors like here

crunch-cube/src/cr/cube/matrix.py

Line 994 in a135ca6

@lazyproperty

What is different about this behavior (using **kwargs) than just having opposite_margins=None in there like you did before? The x=y argument syntax makes the argument optional, so adding the kwargs layer just obscures what is a simple and common pattern of an optional parameter.

mmmm U totally right, I don't remenber why that param was None when I define that new property. I'll change it in the next commit

slobodan-ilic · 2019-10-21T14:18:54Z

src/cr/cube/matrix.py

@@ -260,14 +260,17 @@ def _columns_inserted_at_right(self):
    def _iter_columns(self):
        """Generate all column vectors with insertions interleaved at right spot."""
        opposing_insertions = self._all_inserted_rows
+        column_vector_index = 0


This was the first thing I saw, and I had to think: "what is the column vector index?". Then I looked above to see what the name of the method was, and then I remembered that I originally wrote i. Maybe change it to just column_index, since the name of the method implies that (and not _iter_column_vectors).

slobodan-ilic · 2019-10-21T14:19:21Z

src/cr/cube/matrix.py

@@ -282,14 +285,15 @@ def _iter_inserted_rows_anchored_at(self, anchor):
    def _iter_rows(self):
        """Generate all row vectors with insertions interleaved at right spot."""
        opposing_insertions = self._all_inserted_columns
-
+        row_vector_index = 0


same as ☝️

slobodan-ilic · 2019-10-21T14:20:05Z

src/cr/cube/matrix.py

        # ---subtotals inserted at top---
        for row in self._rows_inserted_at_top:
-            yield _AssembledVector(row, opposing_insertions)
-
+            yield _AssembledVector(row, opposing_insertions, row_vector_index)


same. we're passing it row, not the row_vector. the index should conform

slobodan-ilic · 2019-10-21T14:21:51Z

src/cr/cube/matrix.py

+                element,
+                self.table_margin,
+                zscore,
+                opposite_margins=np.sum(self._counts, axis=1),


It's easy to miss the =1 here. I'd create a private property of _rows_margin, and just pass it here. Later, when we get to more complex cases with MR, this will come in handy.

slobodan-ilic · 2019-10-21T14:22:07Z

src/cr/cube/matrix.py

+                self.table_margin,
+                zscore,
+                column_index,
+                opposite_margins=np.sum(self._counts, axis=0),


same as ☝️

slobodan-ilic · 2019-10-21T14:24:11Z

src/cr/cube/matrix.py

@@ -975,11 +1029,11 @@ def pruned(self):

    @lazyproperty
    def pvals(self):
-        return np.array([np.nan] * len(self._matrix.columns))
+        return self._pvals


These are absolutely the same between _InsertionRow and _InsertionColumn. They should just be moved one level up and public.

slobodan-ilic · 2019-10-21T14:24:49Z

src/cr/cube/matrix.py

@@ -957,6 +999,18 @@ def _addend_vectors(self):
            if i in self._subtotal.addend_idxs
        )

+    @lazyproperty


These also seem as if they belong to the _InsertionVector and not row or column

slobodan-ilic · 2019-10-21T14:26:12Z

src/cr/cube/matrix.py

@@ -989,6 +1043,18 @@ def _addend_vectors(self):
            if i in self._subtotal.addend_idxs
        )

+    @lazyproperty


These too, absolutely the same... should go up one level...

slobodan-ilic · 2019-10-21T14:37:05Z

src/cr/cube/matrix.py

@@ -1027,6 +1093,10 @@ def means(self):
    def numeric(self):
        return self._base_vector.numeric

+    @lazyproperty
+    def opposite_margins(self):
+        return self._base_vector._opposite_margins


These should be made public on the base vector class... we shouldn't access private properties.

slobodan-ilic · 2019-10-21T14:47:46Z

src/cr/cube/matrix.py

@@ -1424,7 +1529,25 @@ def values(self):

    @lazyproperty
    def zscore(self):
-        return self._zscore
+        variance = (


These also look exactly the same as the above calculations... But I'm not sure if they can be moved up the hierarchical structure. If they can, please move them. If not, this is fine too.

slobodan-ilic · 2019-10-21T14:51:13Z

tests/integration/test_headers_and_subtotals.py

+        np.testing.assert_almost_equal(
+            slice_.pvals,
+            [
+                [0.00000000e00, np.nan, 0.00000000e00, np.nan, 0.00000000e00],


This seems unnecessary :) Can you try and make it with just normal zeros? Maybe even use the reshape, so you only focus on important entries, like so:

np.array([0, np.nan, 0, np.nan, 0] * 10).reshape((10, 5))

or something similar...

slobodan-ilic · 2019-10-21T14:52:21Z

tests/unit/legacy/test_cube_slice.py

@@ -8,9 +8,10 @@

 from cr.cube.crunch_cube import CrunchCube
 from cr.cube.cube_slice import CubeSlice
-from cr.cube.enum import DIMENSION_TYPE as DT
 from cr.cube.dimension import Dimension



why empty line here?

first implementation of zscore for categorical vector

a699ad7

ernestoarbitrio added 3 commits October 17, 2019 17:03

zscor calc attempt for inserted vectors

182877c

ipdb typo

9939174

zscore for row insertions, new tests expectations

a135ca6

scanny reviewed Oct 18, 2019

View reviewed changes

New tests for zscore on insertions col and rows, refactoring code

62c7acb

ernestoarbitrio changed the title ~~zscore for categorical vector~~ zscore for headers and subtotals Oct 19, 2019

ernestoarbitrio added 2 commits October 19, 2019 08:46

New tests for zscore coverage cases

781e038

pvals measures for subtotals

b0c3040

slobodan-ilic reviewed Oct 21, 2019

View reviewed changes

ernestoarbitrio added 3 commits October 21, 2019 19:51

code refactoring accorging to PR comments, zscore hack for MR vecotrs

536a6fb

code refactoring, test for MR cat arrays

cd12999

Hack for MR fixed, test refactoring

c8901f0

slobodan-ilic approved these changes Oct 22, 2019

View reviewed changes

pragma no cover removed

ae7dee9

slobodan-ilic merged commit a8631a8 into master Oct 22, 2019

ernestoarbitrio deleted the residual-of-subtotals-169063630 branch June 3, 2020 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zscore for headers and subtotals #182

zscore for headers and subtotals #182

ernestoarbitrio commented Oct 16, 2019

coveralls commented Oct 16, 2019 •

edited

scanny Oct 18, 2019

scanny Oct 18, 2019

scanny Oct 18, 2019

ernestoarbitrio Oct 18, 2019

scanny Oct 18, 2019 •

edited

ernestoarbitrio Oct 18, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

slobodan-ilic Oct 21, 2019

zscore for headers and subtotals #182

zscore for headers and subtotals #182

Conversation

ernestoarbitrio commented Oct 16, 2019

coveralls commented Oct 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scanny Oct 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Oct 16, 2019 •

edited

scanny Oct 18, 2019 •

edited