-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow passing of sparse matrices #29
Comments
Do you know how this data format interacts with pandas? Right now, TPOT uses pandas to pass around the data sets between pipeline operators. |
I further recommend @msjgriffiths Yes, I think sparse matrices shouldn't be a problem anymore -- in the majority of cases. Even the random forest has sparse matrix support now.
As far as I can tell you are using numpy arrays as input to the scikit-line pipelines, right? E.g.,
So, in this case, we could add an intermediate transformer step for the conversion
(Note that it is not necessary for the RandomForest anymore. And the
I think we'd need to check for which scikit transformers this would be necessary, but I could add the general functionality if you like. |
+1 to this, it would make it even closer to sklearn |
Now that we're looking at adding support for We may have to rework TPOT's internals entirely to work with numpy arrays/matrices in place of pandas DataFrames. I think I'm okay with that -- it would actually eliminate the pandas dependency -- but that means this would entail a significant rework of TPOT. |
Would using some of the There is a lot of talk about a rework of TPOT in the issues. |
We're actually heading toward a major refactor of TPOT after the next release (v0.4). :-) |
Are y'all talking about that in an issue somewhere? |
@teaearlgraycold should be raising that issue soon (we just finished meeting). He's been working on a PR with a base implementation of it. |
@teaearlgraycold, can you please write up a script here to pull all operators in TPOT and pass a sparse matrix to each of them individually? I'd like to see what operators we need to work on to natively support sparse matrices. |
The 0.9 release added sparse matrix support via the |
In some situations, it's easier to pass a
scipy.sparse
matrix object tosklearn
model objects. This would reduce the memory requirements when fitting larger datasets.Most
sklearn
models will accept a sparse matrix. For those that do not, checking for sparsity in the method call and callingmatrix.todense()
would work.The text was updated successfully, but these errors were encountered: