-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when importing bqfetch #5
Comments
Hello there,
|
Thank you so much for your quick answer!! I need to read a table that has 122602779 rows as a data frame in google colab to be able to use it with machine learning algorithms. Which parallel_backend do you recommend? billiard, joblib or multiprocessing ? And do you recommend Thank you!! |
Before dividing your big dataset into chunks of relatively small size using one of the functions available in bqfetch, did you verify that your data are independent? Namely that it is possible for you to divide the whole dataset into multiple batches that you can train independently. Next, for the backend, I recommend you using the default backend, so no need to set the parallel_backend argument for the moment. If this backend leads to issues, then maybe try one of the other available backends. Then for the fetching, I recommend to fetch by chunk size as you can easily manage your memory and avoid memory overflows in your colab environment. However, you need to specify an index column in your dataset on which the dataset can be partitioned. I give some examples of index columns in the README. Do not hesitate if any other questions! |
I understand that you want to split your dataset into multiple parts, however it's no use to reconstruct the whole dataframe as you will still end up with a memory overflow because your dataframe is too huge to fit on your machine. What you can do to deal with this large table is to run the training loop on the parts of data instead of the whole dataframe, like in mini-batch training. To summary:
|
Your error is about the BigQuery API, can you provide the full code you used to fetch? I think your chunk size is too big and the fetching took more than 10min and raised a timeout error. |
I'll think about what you said and probably come back with questions hahah thank you !!! This is my code:
|
Hi everyone !
I'm trying to read a big table from BigQuery using python in google colab and I found bqfetch, however, when I try to import BigQueryFetcher and BigQueryTable I get an error.
I installed it by doing:
But when running the second command, I get this error:
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
Then, my code is this:
Am I doing something wrong ? Because this is what I get:
Some help would be appreciated! Because I cannot run anything so I can't get the table I need in order to have as a dataframe in python :(
Thank you in advance!
Marina
The text was updated successfully, but these errors were encountered: