Matrix refactoring #183

scanny · 2019-10-22T16:48:54Z

Some refactorings to trim-down and clarify transform-application matrix objects.

This moving OrderedMatrix to the bottom of the three places them in the order they are depended on by the one above. This prepares the way for letting each take responsibility for their own dependencies rather than having that centralized in TransformedMatrix.

Let _MatrixWithHidden create its own dependencies.

For better clarity, but also to prepare for what comes next.

After prior refactorings, TransformedMatrix became a straight pass-through to _MatrixWithHidden and its methods could be replaced directly with those from _MatrixWithHidden and the latter class removed.

Since everything now has access to unordered_matrix, there's no reason to go to the start of the pipeline to get those values.

I'm on the fence about this one as the +/- is so close, but thought I'd leave it in as a worthwhile extraction.

As a precursor to reducing vector interleaving to a sort, provide interleaved vectors with an `.ordering` property. The (position, idx, vector) value returned can be sorted to produce vectors in interleaved order.

.iter_rows and ._iter_columns can now be easily subsumed into .rows and .columns respectively. Rename ._all_inserted_rows/columns to just ._inserted_rows/columns

Give _BaseMatrixInsertionVector what it needs on construction to avoid circular dependence on the matrix that constructed them.

Collapse few remaining distinct properties of _OrderedMatrix directly into _MatrixWithInsertions.

Merge remaining properties of _MatrixWithInsertions directly into TransformedMatrix. Merge subclass properties into TransformedMatrix since there's only the one subclass now. Rename ._rows_ind and ._cols_ind to the more descriptive ._visible_rows_mask and ._visible_cols_mask.

These two methods extract the interleaving boilerplate that will be reused for each _AssembledVector measure.

It turns out the `fbase()` function is the same in all cases. Move that logic into `._apply_interleaved()` and simplify callers.

scanny · 2019-10-27T18:29:06Z

Hi @slobodan-ilic I think this one is ready to go. It gets the same seven test failures on exporter that master does, so I expect those are unresolved integrations for the new pval work Ernesto has done.

Anyway, let me know what you think. I was able to trim out a lot of code and I think clarify things along the way, but I'll let you be the judge of that :)

scanny · 2019-10-27T18:48:01Z

src/cr/cube/matrix.py

+            _AssembledVector(row, opposing_insertions, 0 if idx < 0 else idx)
+            for _, idx, row in sorted(
+                itertools.chain(
+                    (row.ordering for row in self._inserted_rows),
+                    (row.ordering for row in self._base_rows),
+                )
+            )


@slobodan-ilic @ernestoarbitrio It smells to me like a bug that an assembled row that is an insertion row relies on its idx value to be 0. Why would it be that the the "intersection" pval would always be the first value of .pvals on the intersecting column? I suppose if that .pvals was always a sequence of exactly one value that would make sense of it, but it sounds a little odd to me and at the very least is obscure without a comment.

This is probably indicative of needing two extractions (either as classes or as mechanisms within one class). The assembled row that is not an insertion row itself needs this index. The assembled row that is an insertion row, doesn't need it (but currently has it defaulting to 0).

Well, the interesting thing is that at least one test fails (on expectation doesn't match) if you replace 0 if idx < 0 else idx with idx. idx always has a legitimate value, it's just end-based (like -2) and indexes into the insertions-collection if the vector is an insertion.

I took a quick look, and it appears this idx value is directly involved in the pvals calculation for an insertion, which just seems odd to me. I can't think of what calculation would need a col_idx but work correctly with the same value (0) for all insertions.

@ernestoarbitrio can you weigh in on this one? ^^^

scanny added 19 commits October 26, 2019 20:35

rfctr: delegate dependencies to _MatrixWithHidden

2f4d5f7

Let _MatrixWithHidden create its own dependencies.

rfctr: delegate dependencies to _MatrixWithInsertions

24afab0

rfctr: normalize internal matrix naming

f28336c

For better clarity, but also to prepare for what comes next.

rfctr: collapse _MatrixWithHidden

a16d466

After prior refactorings, TransformedMatrix became a straight pass-through to _MatrixWithHidden and its methods could be replaced directly with those from _MatrixWithHidden and the latter class removed.

rfctr: collapse unrequired delegations

dda4a50

Since everything now has access to unordered_matrix, there's no reason to go to the start of the pipeline to get those values.

rfctr: extract base class

74c5d78

I'm on the fence about this one as the +/- is so close, but thought I'd leave it in as a worthwhile extraction.

rfctr: add .ordering to interleaved vectors

4ce6905

As a precursor to reducing vector interleaving to a sort, provide interleaved vectors with an `.ordering` property. The (position, idx, vector) value returned can be sorted to produce vectors in interleaved order.

rfctr: interleave vectors using .ordering

e301873

.iter_rows and ._iter_columns can now be easily subsumed into .rows and .columns respectively. Rename ._all_inserted_rows/columns to just ._inserted_rows/columns

rfctr: break InsertionVector dependence on matrix

61f9c56

Give _BaseMatrixInsertionVector what it needs on construction to avoid circular dependence on the matrix that constructed them.

rfctr: factor out _OrderedMatrix

4ec3e1c

Collapse few remaining distinct properties of _OrderedMatrix directly into _MatrixWithInsertions.

rfctr: add vector "cell" interleave machinery

742abf8

These two methods extract the interleaving boilerplate that will be reused for each _AssembledVector measure.

rfctr: interleave base values

ae8bcc8

rfctr: interleave column_index values

834a10c

rfctr: interleave means values

a61d846

rfctr: interleave pvals values

39cbd8a

rfctr: interleave assembled-vector values

e2c03a9

rfctr: interleave zscore values

d5040ce

scanny force-pushed the matrix-refactoring branch from a5deaa6 to d5040ce Compare October 27, 2019 07:13

rfctr: factor out fbase() function from assembly

54c118c

It turns out the `fbase()` function is the same in all cases. Move that logic into `._apply_interleaved()` and simplify callers.

scanny requested a review from slobodan-ilic October 27, 2019 18:26

scanny commented Oct 27, 2019

View reviewed changes

slobodan-ilic approved these changes Oct 30, 2019

View reviewed changes

scanny merged commit 54c118c into master Oct 31, 2019

scanny added a commit that referenced this pull request Oct 31, 2019

Merge pull request #183 branch 'matrix-refactoring'

56594b3

ernestoarbitrio deleted the matrix-refactoring branch August 31, 2020 07:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix refactoring #183

Matrix refactoring #183

scanny commented Oct 22, 2019

scanny commented Oct 27, 2019

scanny Oct 27, 2019 •

edited

Loading

slobodan-ilic Oct 30, 2019

scanny Oct 30, 2019 •

edited

Loading

scanny Oct 30, 2019

Matrix refactoring #183

Matrix refactoring #183

Conversation

scanny commented Oct 22, 2019

scanny commented Oct 27, 2019

scanny Oct 27, 2019 • edited Loading

Choose a reason for hiding this comment

slobodan-ilic Oct 30, 2019

Choose a reason for hiding this comment

scanny Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

scanny Oct 30, 2019

Choose a reason for hiding this comment

scanny Oct 27, 2019 •

edited

Loading

scanny Oct 30, 2019 •

edited

Loading