Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix refactoring #183

Merged
merged 20 commits into from
Oct 31, 2019
Merged

Matrix refactoring #183

merged 20 commits into from
Oct 31, 2019

Conversation

scanny
Copy link
Contributor

@scanny scanny commented Oct 22, 2019

Some refactorings to trim-down and clarify transform-application matrix objects.

This moving OrderedMatrix to the bottom of the three places them in the
order they are depended on by the one above. This prepares the way for
letting each take responsibility for their own dependencies rather than
having that centralized in TransformedMatrix.
Let _MatrixWithHidden create its own dependencies.
For better clarity, but also to prepare for what comes next.
After prior refactorings, TransformedMatrix became a straight
pass-through to _MatrixWithHidden and its methods could be replaced
directly with those from _MatrixWithHidden and the latter class removed.
Since everything now has access to unordered_matrix, there's no reason
to go to the start of the pipeline to get those values.
I'm on the fence about this one as the +/- is so close, but thought I'd
leave it in as a worthwhile extraction.
As a precursor to reducing vector interleaving to a sort, provide
interleaved vectors with an `.ordering` property.

The (position, idx, vector) value returned can be sorted to produce
vectors in interleaved order.
.iter_rows and ._iter_columns can now be easily subsumed into .rows and
.columns respectively.

Rename ._all_inserted_rows/columns to just ._inserted_rows/columns
Give _BaseMatrixInsertionVector what it needs on construction to avoid
circular dependence on the matrix that constructed them.
Collapse few remaining distinct properties of _OrderedMatrix directly
into _MatrixWithInsertions.
Merge remaining properties of _MatrixWithInsertions directly into
TransformedMatrix.

Merge subclass properties into TransformedMatrix since there's only the
one subclass now.

Rename ._rows_ind and ._cols_ind to the more descriptive
._visible_rows_mask and ._visible_cols_mask.
These two methods extract the interleaving boilerplate that will be
reused for each _AssembledVector measure.
It turns out the `fbase()` function is the same in all cases. Move that
logic into `._apply_interleaved()` and simplify callers.
@scanny
Copy link
Contributor Author

scanny commented Oct 27, 2019

Hi @slobodan-ilic I think this one is ready to go. It gets the same seven test failures on exporter that master does, so I expect those are unresolved integrations for the new pval work Ernesto has done.

Anyway, let me know what you think. I was able to trim out a lot of code and I think clarify things along the way, but I'll let you be the judge of that :)

Comment on lines +95 to +101
_AssembledVector(row, opposing_insertions, 0 if idx < 0 else idx)
for _, idx, row in sorted(
itertools.chain(
(row.ordering for row in self._inserted_rows),
(row.ordering for row in self._base_rows),
)
)
Copy link
Contributor Author

@scanny scanny Oct 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slobodan-ilic @ernestoarbitrio It smells to me like a bug that an assembled row that is an insertion row relies on its idx value to be 0. Why would it be that the the "intersection" pval would always be the first value of .pvals on the intersecting column? I suppose if that .pvals was always a sequence of exactly one value that would make sense of it, but it sounds a little odd to me and at the very least is obscure without a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably indicative of needing two extractions (either as classes or as mechanisms within one class). The assembled row that is not an insertion row itself needs this index. The assembled row that is an insertion row, doesn't need it (but currently has it defaulting to 0).

Copy link
Contributor Author

@scanny scanny Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the interesting thing is that at least one test fails (on expectation doesn't match) if you replace 0 if idx < 0 else idx with idx. idx always has a legitimate value, it's just end-based (like -2) and indexes into the insertions-collection if the vector is an insertion.

I took a quick look, and it appears this idx value is directly involved in the pvals calculation for an insertion, which just seems odd to me. I can't think of what calculation would need a col_idx but work correctly with the same value (0) for all insertions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ernestoarbitrio can you weigh in on this one? ^^^

@scanny scanny merged commit 54c118c into master Oct 31, 2019
scanny added a commit that referenced this pull request Oct 31, 2019
@ernestoarbitrio ernestoarbitrio deleted the matrix-refactoring branch August 31, 2020 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants