Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow buffered input streams #23

Open
V0ldek opened this issue Sep 20, 2022 · 2 comments · Fixed by #149
Open

Allow buffered input streams #23

V0ldek opened this issue Sep 20, 2022 · 2 comments · Fixed by #149
Assignees
Labels
acceptance: go ahead Reviewed, implementation can start area: performance Performance improvements help wanted External contributions welcome type: feature New feature or request
Milestone

Comments

@V0ldek
Copy link
Member

V0ldek commented Sep 20, 2022

Is your feature request related to a problem? Please describe.
Current implementation reads the entire input to a string. This is not production-viable – very large files that we are targeting with all the performance improvements might not fit in memory. A first step would be to enable buffered reading – load a single page worth of input at a time. There are challenges here – it is possible for a single logical query step to span arbitrarily many blocks, e.g. JSON labels can be arbitrarily long.

Describe the solution you'd like
First of all, current implementations heavily rely on raw AlignedSlice data. This should be abstracted behind a buffered input that can yield slices on-demand.

Two, the query engines need to be made aware of this. They currently rely on having all the data available to index into the slice and compare labels. The engines also need to communicate to the classifiers at which point it is safe to stop keeping old input blocks in memory – we always need the entire label before the currently looked-at colon to be buffered, but after we examine it, it can be discarded.

@V0ldek V0ldek added type: feature New feature or request help wanted External contributions welcome acceptance: go ahead Reviewed, implementation can start labels Sep 20, 2022
@V0ldek V0ldek self-assigned this Sep 20, 2022
@V0ldek V0ldek removed their assignment Nov 13, 2022
@V0ldek V0ldek added this to the v1.0.0 milestone Nov 16, 2022
@V0ldek V0ldek added area: engine area: performance Performance improvements labels Nov 23, 2022
@V0ldek V0ldek self-assigned this Apr 9, 2023
@V0ldek V0ldek mentioned this issue May 11, 2023
3 tasks
V0ldek added a commit that referenced this issue May 12, 2023
A more abstract API to access the underlying byte stream replacing the reliance of the engines on a direct `&[u8]` slice access, to allow adding buffered input streams (#23) in the future. Two types were added, `OwnedBytes` and `BorrowedBytes`, to support the current easy scenario of having the bytes already in memory.

Ref: #23
@V0ldek V0ldek mentioned this issue Jun 13, 2023
3 tasks
V0ldek added a commit that referenced this issue Jun 14, 2023
- Added `MmapInput` which maps a file into memory on unix and windows.
- The CLI app now automatically decides which input to use, favoring mmap in most cases. This can be overriden with `--force-input`.

Ref: #23
@github-actions github-actions bot removed the acceptance: go ahead Reviewed, implementation can start label Jun 14, 2023
@V0ldek
Copy link
Member Author

V0ldek commented Jun 14, 2023

This is not closed yet. We need a smarter buffered input for when Mmap is unavailable, one that does not hold the entire input in memory.

@V0ldek V0ldek reopened this Jun 14, 2023
@V0ldek V0ldek added the acceptance: go ahead Reviewed, implementation can start label Jun 14, 2023
@github-actions github-actions bot added the acceptance: triage Waiting for owner's input label Jun 14, 2023
@github-actions
Copy link

Tagging @V0ldek for notifications

@V0ldek V0ldek removed the acceptance: triage Waiting for owner's input label Jun 28, 2023
@V0ldek V0ldek modified the milestones: v1.0.0, v1.1.0 Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acceptance: go ahead Reviewed, implementation can start area: performance Performance improvements help wanted External contributions welcome type: feature New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

1 participant