Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Fastest way to load multiple DataFrames same time #41

Open
cgi1 opened this issue May 4, 2020 · 0 comments
Open

Q: Fastest way to load multiple DataFrames same time #41

cgi1 opened this issue May 4, 2020 · 0 comments

Comments

@cgi1
Copy link

cgi1 commented May 4, 2020

Awesome project.

Just a short question, I have like 2000 stored dataframes now and I would like to load 500 of it as fast as possible into one python process. Is there a batch-load function in it?

I coded something with ThreadPoolExecutor and it loads 3GB on disk into around a 40GB DataFrame (which is pretty heavy) in under four minutes using 5 threads.

Does somebody see a faster variant? The SSD is relaxed, it looks like the performance limiation lies in df = item.to_pandas(), which is CPU intensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant