Skip to content

feat: implement app/utils#269

Merged
emlautarom1 merged 32 commits into
mainfrom
emlautarom1/app-utils
Mar 13, 2026
Merged

feat: implement app/utils#269
emlautarom1 merged 32 commits into
mainfrom
emlautarom1/app-utils

Conversation

@emlautarom1
Copy link
Copy Markdown
Collaborator

Closes #245


The implementation is a lot shorter since the Rust libraries handle several edge cases that are hand-rolled in the reference Go implementation.

Only two dependencies are added to handle tar and gz files:

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the pluto-app utils module to provide helper functions for short hex formatting, bundling an output directory into a .tar.gz, extracting an archive, and recursively comparing directories (with accompanying unit tests), as requested in #245.

Changes:

  • Added crates/app/src/utils.rs with hex_7, bundle_output, extract_archive, and compare_directories plus tests.
  • Exposed the new module via crates/app/src/lib.rs.
  • Added tar, flate2, tempfile, and test-case dependencies (workspace + crate).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
crates/app/src/utils.rs New utilities for tar/gzip bundling/extraction and directory comparison; includes tests.
crates/app/src/lib.rs Exposes the new utils module from the crate root.
crates/app/Cargo.toml Adds dependencies required by utils and its tests.
Cargo.toml Adds workspace dependency versions for tar and flate2.
Cargo.lock Locks new dependency graph for tar/flate2 and transitive updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs
Comment on lines +56 to +62
// Delete all files from the `target_dir`
fs::remove_dir_all(&target_path)?;
fs::create_dir_all(&target_path)?;

// Move the created tarball to the target location
let output_path = path::Path::new(target_path.as_ref()).join(filename.as_ref());
fs::rename(tar_file_path, output_path)?;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundle_output deletes target_path (removing the original output) before it knows the archive can be moved into place. If rename fails (e.g., due to an open handle or permissions), this will lose the original directory contents and leave only the temp archive behind. Safer flow is: fully finalize/close the archive, move it into target_path, then delete everything except the archive (or otherwise ensure the move cannot fail before deleting originals).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

@emlautarom1 emlautarom1 Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The move operation should not fail since we have permissions to manipulate the target_path, and we assume that we can manipulate any temporary file. Nevertheless, this is indeed an unsafe operation.

The original implementation uses the suggested approach, but it can also end up corrupted due to partially deleted data. I think my proposed implementation is better due to its brevity but I'm not against changing it if it's considered appropriate.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the chance that fs::create_dir_all fails and we potentially lose data. I would suggest saving everything into target_path and then deleting old files manually, and not by remove_dir_all

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge here is with the implementation: if we put the .tar.gz file in the target directory then we cannot use tar.append_dir_all which does a lot of heavy lifting. I don't see how operations could fail here though (if we can remove the dir then we can create it again, right?)

Comment thread crates/app/src/utils.rs Outdated
Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 00:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs Outdated
Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs Outdated
Comment thread crates/app/src/utils.rs
dir1: impl AsRef<path::Path>,
dir2: impl AsRef<path::Path>,
) -> Result<()> {
let mut entries1 = fs::read_dir(dir1)?.collect::<std::result::Result<Vec<_>, _>>()?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's worth to use tokio::fs here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets complicated quickly due to recursive async. Given that the extract_archive function is already sync I would leave this one also sync, and if needed we can wrap it in a tokio::spawn_blocking.

Comment thread crates/app/src/utils.rs
Comment on lines +33 to +37
/// Returns the first 7 (or less) hex chars of the provided bytes.
pub fn hex_7(input: &[u8]) -> String {
let as_string = hex::encode(input);
as_string.chars().take(7).collect()
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same function is used in the peerinfo it would be great to port it there.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but I don't immediately see a way to extract the common code (does not make sense for the example to depend on the entire pluto_app crate for a single function).
Any suggestions?

Comment thread crates/app/src/utils.rs Outdated
Comment thread crates/app/src/utils.rs
Comment on lines +56 to +62
// Delete all files from the `target_dir`
fs::remove_dir_all(&target_path)?;
fs::create_dir_all(&target_path)?;

// Move the created tarball to the target location
let output_path = path::Path::new(target_path.as_ref()).join(filename.as_ref());
fs::rename(tar_file_path, output_path)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the chance that fs::create_dir_all fails and we potentially lose data. I would suggest saving everything into target_path and then deleting old files manually, and not by remove_dir_all

Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs
Copilot AI review requested due to automatic review settings March 6, 2026 14:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs
varex83 and others added 3 commits March 10, 2026 09:05
* feat: add quic upgrade

* feat: add comment about peer addresses

* fix: linter

* fix: clean up tests

* fix: linter

* fix: review comments

* fix: review comments
* feat: add quic upgrade

* feat: add comment about peer addresses

* fix: linter

* fix: clean up tests

* fix: linter

* feat: add force direct connections

* fix: linter

* fix: review comments

* fix: review comments

* fix: review comments
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread crates/app/src/utils.rs
Copilot AI review requested due to automatic review settings March 12, 2026 14:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs
Comment thread crates/app/src/utils.rs
Comment on lines +164 to +170
if path1.is_dir() && path2.is_dir() {
compare_directories(&path1, &path2)?;
} else if path1.is_file() && path2.is_file() {
compare_file_contents(&path1, &path2)?;
} else {
return Err(UtilsError::TypeMismatch { path1, path2 });
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare_directories uses PathBuf::is_dir() / is_file(), which follow symlinks. A symlink loop (or a symlink pointing outside the compared trees) can cause infinite recursion or unintended reads outside the directory being compared. Consider using fs::symlink_metadata and explicitly handling symlinks (e.g., treat as a distinct type mismatch, or compare link targets without following).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@varex83 varex83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@emlautarom1
Copy link
Copy Markdown
Collaborator Author

@claude Impress me

@claude
Copy link
Copy Markdown

claude Bot commented Mar 13, 2026

Claude finished @emlautarom1's task in 2m 59s —— View job


Code Review — feat: implement app/utils

Summary: The implementation is clean and well-structured, with good test coverage for the happy paths. I found one correctness bug that can produce false mismatches, one incomplete decompression-bomb guard, a real data-loss scenario on cross-filesystem setups, and a handful of medium/low issues. Two of these warrant fixing before merge.


Findings

[Critical] Streaming file comparison is not read-safe

Impact: compare_file_contents can return a false ContentMismatch for two identical files that are >5MB, nondeterministically.
Evidence: crates/app/src/utils.rs:209-215

let n1 = file1.read(&mut buf1)?;
let n2 = file2.read(&mut buf2)?;

if n1 != n2 || buf1[..n1] != buf2[..n2] {
    return error;
}

std::io::Read::read() is not required to fill the buffer — it may return fewer bytes than available (partial reads from the OS). If file1.read() returns 8192 bytes but file2.read() returns 4096 bytes on the same iteration (both valid), n1 != n2 fires and reports ContentMismatch on two identical files.

Since the files' sizes are already verified equal before entering this branch, the fix is simple:

loop {
    let n = file1.read(&mut buf1)?;
    if n == 0 { break; }
    // read_exact is safe here because sizes are equal and we haven't reached EOF
    file2.read_exact(&mut buf2[..n])?;
    if buf1[..n] != buf2[..n] {
        return error;
    }
}

Fix this →


[High] Decompression-bomb guard checks attacker-controlled data

Impact: The MAX_FILE_SIZE guard on line 119 does not prevent a decompression bomb.
Evidence: crates/app/src/utils.rs:117-121

const MAX_FILE_SIZE: u64 = 100 * 1024 * 1024;
if entry.size() > MAX_FILE_SIZE {
    return Err(UtilsError::FileTooLarge(...));
}
entry.unpack_in(&target_path)?;

entry.size() returns the value from the tar header, which is attacker-controlled. A malicious archive can set the header size to 0 while containing many gigabytes of actual data. The guard passes, unpack_in reads the full content, and the 100MB limit is bypassed completely.

The only reliable approach is to wrap the extraction in a counted writer or limit via the OS (e.g., disk quota / filesystem limits). The current guard provides false security — the comment "to prevent decompression bombs" is inaccurate. It's worth either removing the false assurance or documenting that this is header-only and does not defend against adversarial archives.


[High] Data-loss window in bundle_output on cross-filesystem setups

Impact: If target_path is on a different filesystem than /tmp, fs::rename fails with EXDEV after the original directory has already been deleted.
Evidence: crates/app/src/utils.rs:91-95

fs::remove_dir_all(&target_path)?;   // original data gone
fs::create_dir_all(&target_path)?;
fs::rename(tar_file_path, output_path)?;   // <- EXDEV here on different FS

The archive is intact in /tmp, but the caller's output directory contents are permanently deleted. This is common in containerized environments where /tmp is tmpfs and the workspace is a bind-mounted volume.

The fix is to create the temp file in the same directory as the destination (same filesystem):

let tar_file = tempfile::NamedTempFile::new_in(target_path.as_ref())?;

Note: the target_path is the directory being archived, so the temp file would appear in the listing. A workaround is to use the parent of target_path, or a sibling temp directory. Either way, the archive must live on the same filesystem as the destination before any destructive operations.


[Medium] &PathBuf as function parameter is an anti-pattern

Evidence: crates/app/src/utils.rs:177

fn compare_file_contents(path1: &path::PathBuf, path2: &path::PathBuf) -> Result<()>

&PathBuf should almost always be &Path in Rust. &PathBuf coerces to &Path through Deref, but accepting &Path directly is more flexible (callers can pass &Path, &PathBuf, string references via Path::new). This is enforced by clippy's ptr_arg lint. Worth fixing while the code is new.


[Medium] Symlink cycles can cause infinite recursion in compare_directories

Evidence: crates/app/src/utils.rs:164-170

if path1.is_dir() && path2.is_dir() {
    compare_directories(&path1, &path2)?;

Path::is_dir() follows symlinks. A symlink to a parent directory (e.g., dir/link -> dir/) will recurse infinitely until stack overflow. Consider using symlink_metadata() + FileType::is_symlink() to detect and skip (or error on) symlinks, consistent with how extract_archive already handles them explicitly.


[Low] tempfile is a redundant dev-dependency

Evidence: crates/app/Cargo.toml:33 and :42

tempfile appears in both [dependencies] and [dev-dependencies]. Since bundle_output uses NamedTempFile in production code, tempfile correctly belongs in [dependencies]. The [dev-dependencies] entry is redundant and can be removed.


[Low] hex_7 code duplication with peerinfo

hex_7 is also implemented inline in crates/peerinfo/examples/peerinfo.rs (noted by @varex83). This PR doesn't resolve the duplication, and since peerinfo cannot reasonably depend on pluto-app for a single small function, this suggests hex_7 might belong in a lower-level crate (e.g., pluto-core or a new utility module). Not blocking, but worth tracking.


Parity Matrix (vs Go reference charon@v1.7.1)

Component Go Rust Match Notes
hex_7 present present
bundle_output present present EXDEV risk is Rust-specific
extract_archive present present symlink skip matches Go
compare_directories present present streaming compare is new; Go reads all to memory
Decompression guard not present present N/A Guard is incomplete (see above)

Tests

Tests are comprehensive for the happy paths and error conditions of compare_directories. The bundle_output integration test (bundle → extract → compare) is a good round-trip check. Missing:

  • A test exercising the streaming comparison branch (>5MB files) — this would expose the read() bug above
  • A test for FileTooLarge via a crafted tar header (acknowledged by author as intentional, but worth a note)

Summary

Two issues should be addressed before merge:

  1. The read() partial-fill bug in compare_file_contents — this is a real correctness defect that will cause nondeterministic failures for large files in production use.
  2. The EXDEV data-loss risk — straightforward to fix by placing the temp file on the same filesystem as the target.

The decompression-bomb guard deserves at minimum a comment clarifying its actual scope of protection. Everything else is polish.
| Branch: emlautarom1/app-utils

@emlautarom1 emlautarom1 merged commit 6ce3eac into main Mar 13, 2026
9 checks passed
@emlautarom1 emlautarom1 deleted the emlautarom1/app-utils branch March 13, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement app/utils

4 participants