Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a progress bar indicator for record processing #111

Merged
merged 1 commit into from May 20, 2022

Conversation

pablogsal
Copy link
Member

As record processing is an operation that can take an arbitrary amount
of time depending on the size of the file, some users can be frustrated
because they don't know how much time the processing is going to take.

To help with this, add a progress bar indicator that periodically
updates while we are processing records in the FileReader. The class is
updated strategically to avoid impacting the performance of the actual
processing.

Copy link

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a news entry 👍

@pablogsal pablogsal force-pushed the progressbar branch 2 times, most recently from 626c0b9 to bf27253 Compare May 17, 2022 15:46
@pablogsal pablogsal enabled auto-merge (rebase) May 17, 2022 16:04
@pablogsal pablogsal force-pushed the progressbar branch 2 times, most recently from ecb37f5 to 58f68a5 Compare May 19, 2022 17:13
@godlygeek godlygeek force-pushed the progressbar branch 2 times, most recently from 669bde5 to 99ff50a Compare May 19, 2022 22:22
@godlygeek
Copy link
Contributor

I played with this for a while locally. I've pushed some fixup commits that manage to make it much faster, especially in the incomplete file / OOM killer case. Rather than letting Rich decide when to redraw the screen, my version calculates that ourselves, using calls to clock_gettime (which is in the VDSO) rather than making calls into Python code. This winds up being substantially faster, especially when a too-low value was chosen for _update_interval.

Incidentally, I also raised _update_interval from 200 to 100,000. On my WSL box with its relatively slow disk IO, we still process approximately 7,000,000 allocations per second, so at 10 refreshes per second we process ~700k allocations per refresh, which works out to 7 calls to clock_gettime per refresh, which seems like a fair enough price to pay for snappier updates of the progress bar.

src/memray/_memray.pyx Outdated Show resolved Hide resolved
As record processing is an operation that can take an arbitrary amount
of time depending on the size of the file, some users can be frustrated
because they don't know how much time the processing is going to take.

To help with this, add a progress bar indicator that periodically
updates while we are processing records in the `FileReader`. The class is
updated strategically to avoid impacting the performance of the actual
processing.

Co-authored-by: Matt Wozniski <godlygeek@gmail.com>
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pablogsal pablogsal merged commit 199ff48 into bloomberg:main May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants