-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OverflowError: Python int too large to convert to C long #84
Comments
Interesting... this appears to be an issue with sklearn's Is the data set you're using publicly available (or can it be)? I'd be interested to see if this bug is reproducible on a simple Regardless, I think your issue raises a broader point: We should put exception handling around the pipeline evaluation function so all of TPOT doesn't crash when one invalid or faulty pipeline pops up. I've filed this issue as a bug in #85. |
Hopefully in a few months the paper I'm working on with this dataset will be published and I can publish the dataset as well ;) I noticed this bug pops up a lot across Python 2 installs (since iterators are lists, rather than generators). Would switching to Python 3 be a potential fix? |
Well, I always recommend upgrading to Python 3. ;-) Try running this code on your data set and see if it generates an error: from sklearn.ensemble import GradientBoostingClassifier
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(my_data_features, my_data_targets,
train_size=0.75, test_size=0.25)
clf = GradientBoostingClassifier()
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test)) |
I updated the code above to do the CV split. |
That seems to work just fine; it spits out an accuracy with no problems (in both Python 2 and 3). |
Interesting. TPOT must have been performing some feature transformations On Thursday, February 18, 2016, Shannon notifications@github.com wrote:
Randal S. Olson, Ph.D. E-mail: rso@randalolson.com | Twitter: @randal_olson |
Since #85 should address this issue, I'm going to close this one. Please feel free to reopen this one if you need further assistance. |
SGTM. I've been running the same thing but on Python 3 and haven't run into
|
I tried running tpot on a reasonably small dataset (141 data points, each with 78 features) for 10-way classification. In the interest of nearly-brute-forcing it, I set tpot to run for 1000 generations with 1000 populations in each generation.
Unfortunately, it only made it through 27 generations before crashing with the error:
OverflowError: Python int too large to convert to C long
. The full stack trace is reproduced below:I'm not familiar enough with tpot innards to diagnose on my own, though I have a fair idea what the basic idea of the problem is: some Python integer variable overflows before it can be converted to a long format. As for why, I'm unsure and could use some suggestions there.
The text was updated successfully, but these errors were encountered: