Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase reading speed of for read_secuTrial() #204

Open
PatrickRWright opened this issue Jul 2, 2020 · 10 comments
Open

Increase reading speed of for read_secuTrial() #204

PatrickRWright opened this issue Jul 2, 2020 · 10 comments

Comments

@PatrickRWright
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Very big exports (i.e. tens of thousands of entries) need long to read. Maybe its possible to boost the performance.

@PatrickRWright PatrickRWright changed the title Increase reading spead of for read_secuTrial() Increase reading speed of for read_secuTrial() Jul 2, 2020
@aghaynes
Copy link
Member

aghaynes commented Jul 3, 2020

I tried a little profiling... It seems to be converting the dates that's primarily causing the lag (at least in the dataset I've looked at)

image

@aghaynes
Copy link
Member

aghaynes commented Jul 3, 2020

Specifically, it's a merge in .convert_dates
image

@PatrickRWright
Copy link
Collaborator Author

What's the tool you are using there? Looks super useful.

@aghaynes
Copy link
Member

profvis - it integrates with RStudio's IDE
see here for tutorial on RStudio's support forum

@aghaynes
Copy link
Member

devtools::load_all()
profvis::profvis(read_secuTrial(path_to_export))

I find it easier to look at the Data tab
image

@PatrickRWright
Copy link
Collaborator Author

image

@PatrickRWright
Copy link
Collaborator Author

We could benchmark the tidyverse reading functions to see if its worthwhile switching.

@PatrickRWright
Copy link
Collaborator Author

In the spirit of structured procrastination I prepared a small benchmark:
https://gist.github.com/PatrickRWright/4ed5d4e5b5aed03b7a1aa5b593dd9b64

readr is faster but its not exactly light speed either.

@aghaynes
Copy link
Member

aghaynes commented Jul 16, 2020

There is also data.table::fread and vroom::vroom which would be worth looking at. vroom is apparently the fastest, although I've never used it... (screenshot from the vroom readme)

image

@PatrickRWright
Copy link
Collaborator Author

If it were just the numbers I agree. Lets have a look at the dependency consequences it has. The speedup looks pretty impressive though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants