Improvements to SplitMatrix #91

MarcAntoineSchmidtQC · 2021-07-22T00:31:52Z

Allow SplitMatrix to be constructed from another SplitMatrix.
Allow inputs of SplitMatrix to be 1-d
Implement __getitem__ for column subset
Also had to implement column subsetting for CategoricalMatrix
__repr__ uses the __repr__ method of components instead of str()

ToDo:

FIX BUG WITH _split_col_subsets (first confirm that it's a bug)
Add testing for new features

Checklist

Added a CHANGELOG.rst entry

- Allow SplitMatrix to be constructed from another SplitMatrix. - Allow inputs of SplitMatrix to be 1-d - Implement __getitem__ for column subset - Also had to implement column subsetting for CategoricalMatrix - __repr__ uses the __repr__ function of components instead of str()

lbittarello

🎉

src/quantcore/matrix/split_matrix.py

lbittarello · 2021-07-22T21:02:56Z

src/quantcore/matrix/split_matrix.py

+                colmap[idx] = [i, j]
+        return colmap
+
+    def _split_col_subsets_unordered(self, cols):


This function seems occasionally to return empty lists when one indexes columns with a list, which causes is_sorted to throw an error later.

MRE:

import pandas as pd import quantcore.matrix as mx df = pd.DataFrame({"u": ["a", "b"], "v": ["a", "b"]}) X = mx.from_pandas(df, cat_threshold=False, object_as_cat=True) X[:, [0, 1]]

I can't replicate this on macOS. I assume you tried this on Windows. Is that correct? Can you give me some info on your setup?

Also, you said "occasionally". Does it always throw an error when you try your MRE or not?

Does it always throw an error when you try your MRE or not?

Yes. But it doesn't throw an error if we reduce the number of rows from two to one, for example.

I assume you tried this on Windows. Is that correct?

Yes. I'm not embarrassed.

Can you give me some info on your setup?

python : 3.8.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD byteorder : little pandas : 1.3.0 numpy : 1.20.3

Is this helpful? Do you need something else?

Should we perhaps add unit tests for indexing? The Windows CI could come in handy.

Yes, unit testing will be very valuable. I'll temporarily add the Windows CI to this PR for all the pushes. And yes, this is helpful.

👍 on indexing unit tests.

tbenthompson

This is huge!! Thanks Marc for making a really big improvement to SplitMatrix. Let me know if there are any pieces that you would like to hand off. It feels like you started what seemed like a small project and it's creeped outwards a couple times now. So, don't feel like you're committed and stuck finishing this if you have other important stuff going on.

tbenthompson · 2021-07-26T04:36:31Z

src/quantcore/matrix/categorical_matrix.py

+                return CategoricalMatrix(self.cat[row])
+            else:
+                # return a SparseMatrix if we subset columns
+                return SparseMatrix(self.tocsr()[row, col], dtype=self.dtype)


This is quite inefficient because we construct the full sparse matrix with self.tocsr() before subsetting it. I'm fine leaving it like this for now, but I think it'd be good to at least leave a TODO comment or add an issue mentioning this performance bug. Fixing this for the single element case is quite easy. For the [:, cols] case, I guess we need to construct a sparse matrix element by element and I'm guessing that it'll be easiest to do that in Cython.

src/quantcore/matrix/split_matrix.py

tbenthompson · 2021-07-26T04:41:06Z

src/quantcore/matrix/split_matrix.py

+                colmap[idx] = [i, j]
+        return colmap
+
+    def _split_col_subsets_unordered(self, cols):


👍 on indexing unit tests.

…d-usability

added docstring Co-authored-by: Ben Thompson <t.ben.thompson@gmail.com>

This is a big commit with many changes: - partial support of integer indexing - removed advanced indexing for densematrix and sparsematrix - ensure that indexing of splitmatrix generates basematrix type - partial fix of standardizedmatrix indexing - added indexing tests (currently fails)

waclawkusnierczykqc · 2021-08-26T12:55:21Z

Quick thought: it looks like several loosely related updates within one PR.
Perhaps worth considering pushing them in a few more focused PRs?

MarcAntoineSchmidtQC · 2021-08-26T20:14:20Z

Quick thought: it looks like several loosely related updates within one PR.
Perhaps worth considering pushing them in a few more focused PRs?

Great idea. There are finished features that I would like to merge soon so I'll create more focused PRs with them.

waclawkusnierczykqc · 2021-08-27T07:54:16Z

Quick thought: it looks like several loosely related updates within one PR.
Perhaps worth considering pushing them in a few more focused PRs?

Great idea. There are finished features that I would like to merge soon so I'll create more focused PRs with them.

Yes, being able to push some of the changes independently of others is one advantage.
Among the others, there is clear focus, easier reviewing, and easier reverting if need be.

MarcAntoineSchmidtQC · 2021-08-28T18:01:03Z

This PR has been separated into chunks. See PR #109, PR #110, and PR #111.

What remains is the big mess that is column indexing with splitmatrix.

MarcAntoineSchmidtQC · 2021-10-04T17:31:17Z

closing. Most changes were implemented in other PRs and we will clearly take another approach to dealing with this.

MarcAntoineSchmidtQC added 2 commits July 21, 2021 20:28

removed test checking not 1d

2c75a4f

MarcAntoineSchmidtQC requested a review from tbenthompson as a code owner July 22, 2021 00:43

MarcAntoineSchmidtQC mentioned this pull request Jul 22, 2021

_split_col_subsets ignores columns when non-monotonic #92

Open

column mapping and unordered split_col_subsets

9f44f58

lbittarello reviewed Jul 22, 2021

View reviewed changes

testing split matrix creation

831ef6d

tbenthompson reviewed Jul 26, 2021

View reviewed changes

MarcAntoineSchmidtQC and others added 7 commits July 26, 2021 12:09

Merge remote-tracking branch 'origin/master' into SplitMatrix-improve…

85f6dc6

…d-usability

don't modify in place + windows CI

dc7b2a9

add Luca's test (temporary)

f45a69c

filter out empty matrices

8ddb627

Update src/quantcore/matrix/split_matrix.py

42b78eb

added docstring Co-authored-by: Ben Thompson <t.ben.thompson@gmail.com>

docstring formatting

cc7afec

MarcAntoineSchmidtQC closed this Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to SplitMatrix #91

Improvements to SplitMatrix #91

MarcAntoineSchmidtQC commented Jul 22, 2021

lbittarello left a comment

lbittarello Jul 22, 2021

MarcAntoineSchmidtQC Jul 22, 2021

lbittarello Jul 23, 2021 •

edited

lbittarello Jul 23, 2021

MarcAntoineSchmidtQC Jul 23, 2021

tbenthompson Jul 26, 2021

tbenthompson left a comment

tbenthompson Jul 26, 2021

tbenthompson Jul 26, 2021

waclawkusnierczykqc commented Aug 26, 2021

MarcAntoineSchmidtQC commented Aug 26, 2021

waclawkusnierczykqc commented Aug 27, 2021

MarcAntoineSchmidtQC commented Aug 28, 2021

MarcAntoineSchmidtQC commented Oct 4, 2021

Improvements to SplitMatrix #91

Improvements to SplitMatrix #91

Conversation

MarcAntoineSchmidtQC commented Jul 22, 2021

lbittarello left a comment

Choose a reason for hiding this comment

lbittarello Jul 22, 2021

Choose a reason for hiding this comment

MarcAntoineSchmidtQC Jul 22, 2021

Choose a reason for hiding this comment

lbittarello Jul 23, 2021 • edited

Choose a reason for hiding this comment

lbittarello Jul 23, 2021

Choose a reason for hiding this comment

MarcAntoineSchmidtQC Jul 23, 2021

Choose a reason for hiding this comment

tbenthompson Jul 26, 2021

Choose a reason for hiding this comment

tbenthompson left a comment

Choose a reason for hiding this comment

tbenthompson Jul 26, 2021

Choose a reason for hiding this comment

tbenthompson Jul 26, 2021

Choose a reason for hiding this comment

waclawkusnierczykqc commented Aug 26, 2021

MarcAntoineSchmidtQC commented Aug 26, 2021

waclawkusnierczykqc commented Aug 27, 2021

MarcAntoineSchmidtQC commented Aug 28, 2021

MarcAntoineSchmidtQC commented Oct 4, 2021

lbittarello Jul 23, 2021 •

edited