-
-
Notifications
You must be signed in to change notification settings - Fork 401
Fix deadlock in single-file filter drivers with large inputs #2262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@copilot fix lint errors, and show me the test you have to reproduce the |
No lint errors in gix-filter. The clippy warnings shown are only in gix-tempfile (unrelated to my changes). The test reproducing the #[test]
fn large_file_with_cat_filter_does_not_hang() -> crate::Result {
// Typical pipe buffer sizes on Unix systems
const PIPE_BUFFER_SIZE: usize = 64 * 1024; // 64KB
let mut state = gix_filter::driver::State::default();
// Create a driver that uses `cat` command (which echoes input to output immediately)
let driver = Driver {
name: "cat".into(),
clean: Some("cat".into()),
smudge: Some("cat".into()),
process: None,
required: false,
};
// Test with multiple sizes to ensure robustness
for size in [PIPE_BUFFER_SIZE, 2 * PIPE_BUFFER_SIZE, 8 * PIPE_BUFFER_SIZE, 16 * PIPE_BUFFER_SIZE] {
let input = vec![b'a'; size];
// Apply the filter - this should not hang
let mut filtered = state
.apply(&driver, &mut input.as_slice(), driver::Operation::Smudge,
context_from_path("large-file.txt"))?
.expect("filter present");
let mut output = Vec::new();
filtered.read_to_end(&mut output)?;
assert_eq!(input.len(), output.len(),
"cat should pass through all data unchanged for {size} bytes");
assert_eq!(input, output, "cat should not modify the data");
}
Ok(())
}This test verifies that filtering large files (64KB to 1MB) through |
|
@copilot Run |
| // Solution: Read all data into a buffer, then spawn a thread to write it to stdin | ||
| // while we can immediately read from stdout. | ||
| let mut input_data = Vec::new(); | ||
| std::io::copy(src, &mut input_data)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot instead of reading everything ahead of time, pass the child stdin handle to the thread instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current API has src: &mut impl std::io::Read which is a reference that can't be moved into a thread. To avoid buffering, would you like me to:
- Change the signature to take ownership (e.g.,
src: impl std::io::Read + Send + 'static)? - Use a different approach like
Box<dyn std::io::Read + Send>or channels? - Something else?
Currently the code reads from src into a buffer before spawning the thread because the reference can't be moved. Could you clarify the preferred approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to discard it as it probably needs a redesign or some much more intricate implementation.
It's the question if deadlocking is worse than possibly using a lot of memory.
Effectively the only kind of filter that's working now is those which don't output large amounts of data immediately, or multi-file filters.
And… thinking about it, deadlocking is clearly worse as it always fails, whereas using memory will typically work.
Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
9911218 to
1aaa6fa
Compare
Fix deadlock in single-file filter drivers with large inputs (issue #2080)
Solution completed:
catfilter (64KB to 1MB)Changes
The fix prevents deadlock by ensuring both stdin and stdout pipes are serviced concurrently.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.