Version 2.8.1#64
Conversation
Existing Link integration tests only asserted structure/finiteness, which the broken parser coincidentally satisfied despite returning 24-hour log durations and all-zero values. Tighten to catch that. Currently failing; will be made to pass by the next commit.
Preparation for replacing the byte-scan heuristic in parse_binary with proper block-based parsing. Introduces iter_blocks() for walking the `<u32 size><3-byte marker><u8 type>` block framing, and parse_ds3_channel() which locates channel ID/name/unit inside a ds3 block's 645-byte content region. No behavior change yet.
The previous parse_binary used a byte-scan heuristic for channel discovery that produced many false positives, read each 8-byte data pair as (value, time) when the actual format is (time, value), silently skipped the first 8 bytes of every channel's data section, and filtered out pre-trigger samples with negative times. Rewrite around proper block-based parsing: walk the file's <u32 size><3-byte marker><u8 type> framing, collect ds3 blocks for channel definitions, and read the (time_f32_LE, value_f32_LE) pair stream between consecutive ds3 blocks. iter_blocks advances block-to-block via scan-forward rather than fixed stride, which handles both the contiguous header blocks (lf3/ld2/lm1) and the variable-length data regions that follow each ds3. Verified against all four example Link files in the repo (one .llg, three .llg5) and additional .llgx files. Before: time range 0 to ~89477s (24.8 hours) and all-zero values on every file. After: physically plausible time ranges of 21-52 seconds and real RPM/MAP/lambda values.
Locate the ld2 metadata block via the blocks vector rather than using hardcoded absolute file offsets (survives any future change to lf3 size). Pre-allocate the all_times vector with the known total sample count to skip reallocations during push. Truncate the f64 times vector at the same 50_001-sample ceiling the inner loop enforces on data_matrix, so large logs don't pay the cost of a timeline allocation that gets discarded. Drop the now-redundant times[..data_matrix.len()].to_vec() slice at the end. No behavior change on any file in exampleLogs/link/ or on the three .llgx files from a real G4X install; same sample counts, same channel min/max values. All tests still pass.
Signed-off-by: Cole Gentry <peapod2007@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request updates the ultralog version to 2.8.1 and significantly refactors the Link binary parser to improve robustness across various file subtypes. The parser now utilizes a block-based iteration strategy for metadata and channel extraction, and includes a 50,000-sample truncation limit for the data matrix. New test assertions have been added to verify the plausibility of timestamps and the presence of non-zero data. Review feedback suggests optimizing the data_matrix memory allocation by capping its capacity and replacing repeated binary searches with a linear merge approach for improved performance.
|
|
||
| // Build the data matrix with last-observation-carried-forward semantics. | ||
| let row_cap = channels.len(); | ||
| let mut data_matrix: Vec<Vec<Value>> = Vec::with_capacity(all_times.len()); |
There was a problem hiding this comment.
The data_matrix is allocated with a capacity of all_times.len(), but the subsequent loop truncates processing at 50,001 samples (line 350). For logs with a high sampling rate or long duration, all_times.len() could be significantly larger than the truncation limit, leading to unnecessary memory allocation. Capping the capacity to the maximum possible number of rows is more efficient.
| let mut data_matrix: Vec<Vec<Value>> = Vec::with_capacity(all_times.len()); | |
| let mut data_matrix: Vec<Vec<Value>> = Vec::with_capacity(all_times.len().min(50_001)); |
| match points.binary_search_by(|probe| { | ||
| probe | ||
| .0 | ||
| .partial_cmp(&time) | ||
| .unwrap_or(std::cmp::Ordering::Equal) | ||
| }) { | ||
| Ok(idx) => ch_data[idx].1, | ||
| Ok(idx) => points[idx].1, | ||
| Err(idx) => { | ||
| if idx == 0 { | ||
| ch_data[0].1 | ||
| points[0].1 | ||
| } else { | ||
| ch_data[idx - 1].1 | ||
| points[idx - 1].1 | ||
| } | ||
| } | ||
| } | ||
| }; |
There was a problem hiding this comment.
Since both all_times and points are sorted, the current nested loop performs repeated binary searches, resulting in
No description provided.