Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement backpressure #106

Open
FlipB opened this issue Nov 5, 2023 · 9 comments
Open

Implement backpressure #106

FlipB opened this issue Nov 5, 2023 · 9 comments
Labels
A: perf Performance improvements A-output Output displayed O-dynamic Dynamic output S: vertical Vertical scrolling

Comments

@FlipB
Copy link

FlipB commented Nov 5, 2023

Is your feature request related to a problem? Please describe.
Minus becomes unresponsive when attempting to page big files (gigabytes).

Describe the solution you'd like
It seems like the pager's buffer is unbounded and keeps growing to attempt to fit the entire input.
Instead the pager ought to limit the buffer to eg. 1000 lines, and block write_str when full.
Scrolling the output consumes the buffer, allowing new lines to be written by write_str.

Additional context
It spends virtually all time in PagerState::append_str, pegging the CPU at 100%.

@AMythicDev
Copy link
Owner

AMythicDev commented Nov 5, 2023

Thanks for filing the issue. Now talking about it

Scrolling the output consumes the buffer, allowing new lines to be written by write_str.

I don't think this could be done. If we "consume" the buffer as the user scrolls down the page it would be impossible to show the same data again if the user decides to go back up.

I knew this problem would surely arise someday and I have yet to figure out a the correct solution to this. Here are some of those which I had thought:-

  1. Applications should buffer the data themselves and also override the default key bindings the so that whenever the user presses the down arrow or page down or G or any key that requires more data to be fetched.
  2. Get Feature Request: Hooks #95 working. Although this too will require applications to buffer the data themselves but it does not require overriding every key binding that will require more data to be fetched.
  3. Format PagerState::rows lines of data initially and then use another thread to format rest of the data so that the UI always remains responsive.

I also believe that the solution isn't just one of them but instead a mixture of bits and pieces from all of these ideas.

Anyways If you could share me more on how exactly you are using minus that would be quite helpful.

If you want to contribute you are more than welcome to open a PR. Comment down below any suggestions if you have.

EDIT: Give numbers to each point

@FlipB
Copy link
Author

FlipB commented Nov 5, 2023

I don't think this could be done. If we "consume" the buffer as the user scrolls down the page it would be impossible to show the same data again if the user decides to go back up.

You are correct that you cannot scroll back up to the start if we limit the buffer of the pager. However if the input file is big enough, that's the only reasonable thing we can do, no?
I've looked into how less works and it has an option to limit the buffer to 64KiB (so limited scrollback possibilities) but by default it also uses unbounded buffers when reading from a pipe (see -B in less man page).

I tried less (via pager crate) with the same file that minus had problems with and it worked fine - it ended up using 1.8GiB of memory.
So in this case, maybe the problem wasn't the memory usage but rather performance related.

However I still think this feature is a good idea for my use case.
I'm parsing log files and want to provide a pager for the output (similar to how journalctl works), these files can surely get bigger and I'm not too worried about having to provide a complete scrollback for insanely large files (user can just manually restart in that scenario).

Format PagerState::rows lines of data initially and then use another thread to format rest of the data so that the UI always remains responsive.

As I mentioned above, less worked fine even though it allocated memory for the entire input, so the problem in minus could very well primarily have been unresponsive UI.

Applications should buffer the data themselves and also override the default key bindings the so that whenever the user presses the down arrow or page down or G or any key that requires more data to be fetched.

Get #95 working. Although this too will require applications to buffer the data themselves but it does not require overriding every key binding that will require more data to be fetched.

I think this could work, but minus has a rather simple interface today and these API additions would have to be designed carefully to not be too difficult to use.

@AMythicDev
Copy link
Owner

I spent the last night giving thought to each of these ideas that's why I am replying quite late today.

I know the 64KB approach of less but the problem with implementing it is that many applications depending on minus right now might be benefiting from the current unbounded buffer.
For example many applications use a pager to show real-time data on the screen such as htop and it might happen that the application pushes a lot of data to the pager at once (more than 64KB) then minus would break the data at the 64KB and it might cause a broken or non updated display at all. When applications use less for showing real-time data they pipe the data but minus doesn't care about data source as it is the job of applications depending on it. Also introducing this may cause broken displays on certain applications. I know we could add a function to control the buffer size but that will still require applications to do changes to their codebases which which in turn requires a major release.

(3) is another solutions for applications but I fear that it might require more knowledge of the data. I want minus to know about the data only quantitatively.

(1) can make the application code quite repetitive like you have to call whatever your fetch function is in each of key/mouse binding that move the screen down.

I think the best option would be to implement (2) and have a hook like EOF. Applications first send a certain amount of data to the pager and attach their data fetching function to this hook. Whenever this hook is activated due to a user event like going downward, the data fetching function is called and more data is taken.

@AMythicDev AMythicDev added O-dynamic Dynamic output A-output Output displayed S: vertical Vertical scrolling A: perf Performance improvements labels Nov 6, 2023
@TornaxO7
Copy link
Contributor

I've maybe got an idea for this:

What is if minus internally uses a temporary file if the buffer of minus reaches a specifique limit. Then minus will only read some blocks of the temporary file and not the whole temporary file.

@AMythicDev
Copy link
Owner

AMythicDev commented Dec 18, 2023

Could you describe a bit more?
What I am currently understanding from your explanation is that we take the data from the application and write it into a temporary file and also keep some part of the data in memory. As the user goes down and we need more data, we will read more data from the temp file and save it into the memory buffer. If this is exactly what you were saying then I don't think this is a good idea since we have to take all the data from the application and write into the temp file which would already take a large space on memory to pass from the application side to minus's side. I also don't want to use stuff like memory maps if you're talking about that because anonymous memory maps basically don't exist in Windows.

@AMythicDev
Copy link
Owner

AMythicDev commented Feb 6, 2024

I pushed some commits, specifically c0475fd and 4a015a8 which greatly improve text append throughout. I will get some concrete benchmarks posted tomorrow.

EDIT:- Here are benchmarks after the changes:-
test state::bench::bench_append_str ... bench: 2,778,838,569 ns/iter (+/- 329,823,431)

test result: ok. 0 passed; 0 failed; 63 ignored; 1 measured; 0 filtered out; finished in 842.37s

I wasn't able to get benchmarks before these changes as the benchmark process just didn't finish even in 4hrs.

It still takes about 2.7 secs so there's should be a lot improve upon.

I have created #127 to track performance improvements all over the project.

@AMythicDev
Copy link
Owner

@FlipB can you do this test again against the latest main and post the results?

@TornaxO7
Copy link
Contributor

TornaxO7 commented Mar 15, 2024

@arijit79 Sorry for the late reply, I've forgot to respond on this.

Could you describe a bit more?
What I am currently understanding from your explanation is that we take the data from the application and write it into a temporary file and also keep some part of the data in memory. As the user goes down and we need more data, we will read more data from the temp file and save it into the memory buffer. If this is exactly what you were saying then I don't think this is a good idea since we have to take all the data from the application and write into the temp file which would already take a large space on memory to pass from the application side to minus's side. I also don't want to use stuff like memory maps if you're talking about that because memory maps don't exist in Windows.

I think it's clearer if we use an image here:

Image

My idea is the following:
We are splitting up the whole document into smaller chunks. Some chunks are either loaded in the memory, for quick access while the rest resides in file buffers (which is either in memory or in the disc).
The idea here is that we just need to store, for example, 3 chunks ahead and behind the user's chunk (where the user is currently reading).
If the user enters the next chunk, while reading, we can discard the topmost memory chunk, since it's unlikely that the user will get fast to the topmost chunk again and we're loading the next chunk instead into the memory.

@AMythicDev
Copy link
Owner

AMythicDev commented Mar 15, 2024

@arijit79 Sorry for the late reply, I've forgot to respond on this.

Its alright.

@TornaxO7 I really appreciate your efforts. I have created a branch called faster-append with some benchmarking code which create a ~1GB buffer in minus. This will be our starting ground for further development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: perf Performance improvements A-output Output displayed O-dynamic Dynamic output S: vertical Vertical scrolling
Projects
Status: Todo ✔️
Development

No branches or pull requests

3 participants