Skip to content

Add support for compressed QMDL#970

Draft
wgreenberg wants to merge 5 commits intomainfrom
qmdl-gzip
Draft

Add support for compressed QMDL#970
wgreenberg wants to merge 5 commits intomainfrom
qmdl-gzip

Conversation

@wgreenberg
Copy link
Copy Markdown
Collaborator

this is prerequisite work for #81, since the diag logs for things like RSSI massively increase the size of QMDL files. in my experience, simply gzipping the qmdls reduces their size by 4-5x, which i think should be sufficient for our purposes.

this PR reworks QmdlWriter to output gzipped QMDL files by default, and allows QmdlReader to operate on either compressed or uncompressed QMDLs.

QmdlReader has been significantly rewritten to expose a single AsyncRead interface to both compressed and uncompressed QMDL sources.

i'd still like to do some more in-depth testing of this, but in the meantime i'd love a review on it

This reworks the QmdlWriter to output gzipped QMDL files by default,
and allows QmdlReader to operate on either compressed or uncompressed
QMDLs.

QmdlReader has been significantly rewritten to expose a single AsyncRead
interface to both compressed and uncompressed QMDL sources.
@untitaker untitaker mentioned this pull request Apr 1, 2026
7 tasks
Comment thread lib/src/qmdl.rs
self.total_written += msg.data.len();
// for a gzipped file, we can't use `msg.data.len()` to
// determine the number of bytes written, so we have to
// manually do a `write_all()` type loop
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not understanding this and i'm hoping you can help me

if what we're tracking here is the number of uncompressed bytes, why can't we still use write_all and msg.data.len()?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you're totally right, this is an outdated comment/implementation from when i was trying to track total_bytes_written rather than total_uncompressed_bytes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. that makes sense

i do want to flag tho that when i try changing this back to use write_all, the tests fail so there may be another reason to use this approach which i also don't understand right now lol

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wager it's due to a missing .flush() call -- i'll push a commit w/ working tests in a sec

Copy link
Copy Markdown
Member

@bmw bmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't make the time to really test this (at least not yet), but i wanted to share the comments i had after reading the code and playing with things just a little bit so you could continue working on this

for my benefit at least as much as yours for when i come back to this, my comment above about continuing to use write_all in write_container has not been fully addressed yet

Comment thread check/src/main.rs
Comment on lines +140 to +142
let compressed = qmdl_path.ends_with(".gz");
let qmdl_file_size = qmdl_file.metadata().await.unwrap().len();
let mut qmdl_reader = QmdlReader::new(qmdl_file, Some(qmdl_file_size as usize));
let mut qmdl_reader = QmdlReader::new(qmdl_file, compressed, Some(qmdl_file_size as usize));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the qmdl_file_size value won't work here if compressed is true right? it should probably be omitted like it was above

Comment thread lib/src/qmdl.rs
Comment thread daemon/src/qmdl_store.rs
start_time: start_time.into(),
last_message_time: Some(last_message_time.into()),
qmdl_size_bytes: metadata.size() as usize,
uncompressed_qmdl_size_bytes: metadata.size() as usize,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't metadata.size() going to be the compressed size for qmdl.gz files?

fixing this seems tricky. it seems to me like we either have to

  1. read each file in its entirety to determine the real uncompressed file size
  2. do the refactoring i believe you were talking about of removing the need for tracking max file size entirely by working at the level of HDLC containers

the latter seems much cleaner, but idk how much work it is

what do you think?

Comment thread lib/src/qmdl.rs
reader: BufReader::new(reader),
bytes_read: 0,
max_bytes,
buf_reader: BufReader::new(QmdlAsyncReader::new(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to continue accepting max_uncompressed_bytes in this function, what about using an approach something like this and dropping the handling of max bytes inside QmdlAsyncReader entirely? i think it'd allow us to simplify the code significantly

Comment thread daemon/src/server.rs
{
let entry =
ZipEntryBuilder::new(format!("{qmdl_idx}.qmdl").into(), Compression::Stored);
let extension = if compressed { "qmdl.gz" } else { "qmdl" };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't test this, but won't the resulting file here always be a qmdl file as reading using QmdlReader below will decompress it?

i personally think only including qmdl files in the zip is the nicest behavior for users anyway, but we should make sure the file extension is right regardless. if i'm correct and it is always should just be qmdl, it'll simplify the code and diff here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i ran into this while refactoring the write_all implementation, and am currently stuck rooting out the empty zip bug you mentioned below

Copy link
Copy Markdown
Member

@bmw bmw Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. let me know if you'd like a 2nd set of eyes on the empty zip problem. i hit it when trying to verify my understanding of the code here, but otherwise didn't really dig into it

Comment thread daemon/src/server.rs
let body_bytes = axum::body::to_bytes(body, usize::MAX).await.unwrap();

let zip_reader = ZipFileReader::new(body_bytes.to_vec()).await.unwrap();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think something is going wrong somewhere in here as the qmdl.gz file added to the zip here is empty. if i add code like this, tests pass on main but fail on this branch

        for entry in zip_reader.file().entries() {
            assert_ne!(entry.uncompressed_size(), 0); 
        }

Comment thread lib/src/qmdl.rs
if self.uncompressed_bytes_read > max_bytes {
error!(
"warning: {} bytes read, but max_bytes was {}",
self.uncompressed_bytes_read, max_bytes
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncompressed_bytes_read never gets incremented afaict

Comment thread lib/src/qmdl.rs
}

#[derive(Debug)]
struct QmdlAsyncReader<T> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggle to understand why we have both QmdlReader and QmdlAsyncReader. can you rename one of them? I assume we need this layering, but i'm not sure why.

Comment thread check/src/main.rs
.expect("failed to get QMDL file metadata")
.len();
let mut qmdl_reader = QmdlReader::new(qmdl_file, Some(file_size as usize));
let compressed = qmdl_path.ends_with(".gz");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be easier to sniff gzip magic bytes. Then the distinction between gzip vs non-gzip can happen within the reader entirely, it doesn't need extra params, and no other code needs to be touched.

@cooperq cooperq mentioned this pull request Apr 24, 2026
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants