New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to process large data set?? #373
Comments
Thank you very much for reporting this, it seems like there is a bug. It appears that the bug is triggered from none of the evaluations of the machine learning algorithm actually succeeding. When you go from 20000 to 100000 samples, you should increase the time you give the ML algorithms to fit the data (increase |
Tried to increase both The warning message is
|
Try setting |
Tried the above code, but the warming is thrown after 1 ~ 2 mins. |
I can neither reproduce this with |
|
In can reproduce the given example (boston dataset) with |
Would like to supplement, my sklearn version is 0.19.1 |
I am surprised that you are able to auto-sklearn with scikit-learn 0.19.1, or are you using the development branch? The issue with the tree builder looks weird. Is this on one machine after fitting is done? Or is fitting done on a different machine than you actually try to load the data? Anyway, it might be best if you open a new issue for that because it seems unrelated to the actual issue of this thread. Regarding the actual issue, I come to believe that this is actually a bug. I will have a closer look with the current development version on Monday and will hopefully be able to fix this for the upcoming release. |
scikit-learn 0.19.1 is released on Oct 2017. As I am eager to try new feature, i will also try the latest version. :) Currently, I am using single VM to execute those code. |
Created another thread for tracking this issue. #390 |
Please excuse the delay with the promised release. I'm actually having problems to start on a big dataset myself at the moment and am hunting down a memory leak. @mkcedward did you ever encounter this? |
Okay, I finally managed to push a new release. Could you please check if the issue is still there? |
Good job. Running the example code given in the issue description with |
Does it give reasonable results? If yes, could you please close the issue? |
I just tried auto-sklearn some months ago and run into this issue reported by @mkcedward. |
I just pushed a new release - please reopen if this issue is still present with the latest version. |
I have a data set which is more than 100k records. When I try to fit into AutoSklearnRegressor, it always thrown an warning. It seems causing that I cannot get a expected output.
However, if number of records is small enough (says less than 20k), it can execute without any warming/ error. May you advise this situation? I am using 0.2 version
Sample code
The exception is
The text was updated successfully, but these errors were encountered: