Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unresponsive on "Quick Search" stage with simple dataset #187

Open
garrettjoecox opened this issue Nov 13, 2021 · 5 comments
Open

Unresponsive on "Quick Search" stage with simple dataset #187

garrettjoecox opened this issue Nov 13, 2021 · 5 comments
Assignees

Comments

@garrettjoecox
Copy link

Hey there, I have a dataset I have stripped down to be pretty bare trying to get this library working

df.dtypes
TXNS               int64
VOLUME           float64
ANNUAL_VOLUME    float64

The dataframe has 350,000 rows, I figured maybe the size was causing it to be slow but it's been sitting like this for about 15 minutes now, with "kernel busy"
Screen Shot 2021-11-12 at 10 15 54 PM

I'm sort of new to this tech so I'm not even sure how I would go about further debugging, any ideas?

@Hetarth02
Copy link

Hetarth02 commented Nov 13, 2021

Can you try according to this as mentioned in the repo, I think the "features" part should not be the in the same code-line. Sample

@sanketsarang
Copy link
Contributor

@garrettjoecox, considering the size of the data, you might actually need GPU acceleration. To ensure that it is not stuck, can you try with just the first 1000 rows of the DataFrame?

@garrettjoecox
Copy link
Author

Can you try according to this as mentioned in the repo, I think the "features" part should not be the in the same code-line. Sample

According to the docs this is valid, as I only want to train on those two columns

There might be scenarios where you want to explicitely exclude some columns, or only use a subset of columns in the training. Manually specify the features to be used. AutoAI will still perform a feature selection within the list of features provided to improve effective model accuracy.

model = bc.train(file="data.csv", target="Y_value", features=["col1", "col2", "col3"])

However since I have already stripped everything else from my dataset I don't need to specify them as they are the only columns remaining, so I tried removing the features argument and got the same result.

@garrettjoecox
Copy link
Author

garrettjoecox commented Nov 13, 2021

@garrettjoecox, considering the size of the data, you might actually need GPU acceleration. To ensure that it is not stuck, can you try with just the first 1000 rows of the DataFrame?

I tried with less data, (1000, 100, 50, 10 rows) and it seems to have gotten a bit further, but then the kernel dies every time around 30-40% of the way through this step:
Screen Shot 2021-11-13 at 8 01 27 AM
Screen Shot 2021-11-13 at 8 01 41 AM

@sanketsarang
Copy link
Contributor

Thank you for the update @garrettjoecox. My best guess is that it is crashing on one of the models. This problem is most likely data specific. Is it possible for you to email the first 1000 rows of data to support@blobcity.com? Trying it on your data will allow us to diagnose better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants