-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow machine construction for large number of features #428
Comments
The slowdown is due to the fact that the constructor checks the number of rows of using Tables
X = Tables.table(randn(200,10000));
col_table = Tables.columntable(X); Is this a Tables issue? No, not really. The problem is that julia named tuples with a large number of keys take forever to construct and I guess are a bad idea anyway (I think large tuples are generally a bad idea in julia?). So, for example, this hangs: names = tuple((Symbol(string("Column", j)) for j in 1:10000)...);
values = rand(10000);
NamedTuple{names}(values); MLJBase uses column tables not just in See also: #309 Note that in the slack user's use-case the data is not sparse. In any case, for dimension reduction models (eg, PCA) we would not want to restrict to sparse tables. cc @OkonSamuel |
@ablaom. Yes this is a big issue. Maybe for now we could choose a Table type e,g |
Sounds like a good workaround for now, thanks! |
FWIW I have been using the following snippet ( based on @OkonSamuel's suggestion) to run things locally as a temporary workaround ( using Tables
import MLJModelInterface
const MMI = MLJModelInterface
MMI.nrows(X::Tables.MatrixTable) = size(MMI.matrix(X), 1)
MMI.selectrows(X::Tables.MatrixTable, ::Colon) = X
MMI.selectrows(X::Tables.MatrixTable, r::Integer) =
MMI.selectrows(X::Tables.MatrixTable, r:r)
function MMI.selectrows(X::Tables.MatrixTable, r)
new_matrix = MMI.matrix(X)[r, :]
_names = getfield(X, :names)
MMI.table(new_matrix; names = _names)
end |
Arghh, I just ran into this. For a not so large DataFrame with less rows than columns (708×4870) julia felt like hanging whenever I ran Is there anything speaking against the workaround mentioned above? |
@jbrea Thanks for reporting! I suspect a different cause here because you are using a |
Thanks for your response. The data is all Why isn't just |
Your slow code is because MLJ's |
Thanks to @OkonSamuel, the following addresses the |
Yes, my issue is resolved. Thanks a lot for the quick fix! |
As reported by slack user, this code is taking way too long:
The text was updated successfully, but these errors were encountered: