Skip to content

Version 2.8.1#64

Merged
SomethingNew71 merged 6 commits intomainfrom
dev
Apr 14, 2026
Merged

Version 2.8.1#64
SomethingNew71 merged 6 commits intomainfrom
dev

Conversation

@SomethingNew71
Copy link
Copy Markdown
Collaborator

No description provided.

07-C9 and others added 6 commits April 14, 2026 08:45
Existing Link integration tests only asserted structure/finiteness, which
the broken parser coincidentally satisfied despite returning 24-hour log
durations and all-zero values. Tighten to catch that.

Currently failing; will be made to pass by the next commit.
Preparation for replacing the byte-scan heuristic in parse_binary with
proper block-based parsing. Introduces iter_blocks() for walking the
`<u32 size><3-byte marker><u8 type>` block framing, and parse_ds3_channel()
which locates channel ID/name/unit inside a ds3 block's 645-byte content
region. No behavior change yet.
The previous parse_binary used a byte-scan heuristic for channel discovery
that produced many false positives, read each 8-byte data pair as
(value, time) when the actual format is (time, value), silently skipped
the first 8 bytes of every channel's data section, and filtered out
pre-trigger samples with negative times.

Rewrite around proper block-based parsing: walk the file's
<u32 size><3-byte marker><u8 type> framing, collect ds3 blocks for channel
definitions, and read the (time_f32_LE, value_f32_LE) pair stream between
consecutive ds3 blocks. iter_blocks advances block-to-block via
scan-forward rather than fixed stride, which handles both the contiguous
header blocks (lf3/ld2/lm1) and the variable-length data regions that
follow each ds3.

Verified against all four example Link files in the repo (one .llg, three
.llg5) and additional .llgx files. Before: time range 0 to ~89477s
(24.8 hours) and all-zero values on every file. After: physically
plausible time ranges of 21-52 seconds and real RPM/MAP/lambda values.
Locate the ld2 metadata block via the blocks vector rather than using
hardcoded absolute file offsets (survives any future change to lf3 size).
Pre-allocate the all_times vector with the known total sample count to
skip reallocations during push. Truncate the f64 times vector at the
same 50_001-sample ceiling the inner loop enforces on data_matrix, so
large logs don't pay the cost of a timeline allocation that gets
discarded. Drop the now-redundant times[..data_matrix.len()].to_vec()
slice at the end.

No behavior change on any file in exampleLogs/link/ or on the three
.llgx files from a real G4X install; same sample counts, same channel
min/max values. All tests still pass.
Signed-off-by: Cole Gentry <peapod2007@gmail.com>
@SomethingNew71 SomethingNew71 merged commit d6429f9 into main Apr 14, 2026
1 check passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the ultralog version to 2.8.1 and significantly refactors the Link binary parser to improve robustness across various file subtypes. The parser now utilizes a block-based iteration strategy for metadata and channel extraction, and includes a 50,000-sample truncation limit for the data matrix. New test assertions have been added to verify the plausibility of timestamps and the presence of non-zero data. Review feedback suggests optimizing the data_matrix memory allocation by capping its capacity and replacing repeated binary searches with a linear merge approach for improved performance.

Comment thread src/parsers/link.rs

// Build the data matrix with last-observation-carried-forward semantics.
let row_cap = channels.len();
let mut data_matrix: Vec<Vec<Value>> = Vec::with_capacity(all_times.len());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The data_matrix is allocated with a capacity of all_times.len(), but the subsequent loop truncates processing at 50,001 samples (line 350). For logs with a high sampling rate or long duration, all_times.len() could be significantly larger than the truncation limit, leading to unnecessary memory allocation. Capping the capacity to the maximum possible number of rows is more efficient.

Suggested change
let mut data_matrix: Vec<Vec<Value>> = Vec::with_capacity(all_times.len());
let mut data_matrix: Vec<Vec<Value>> = Vec::with_capacity(all_times.len().min(50_001));

Comment thread src/parsers/link.rs
Comment on lines +331 to 346
match points.binary_search_by(|probe| {
probe
.0
.partial_cmp(&time)
.unwrap_or(std::cmp::Ordering::Equal)
}) {
Ok(idx) => ch_data[idx].1,
Ok(idx) => points[idx].1,
Err(idx) => {
if idx == 0 {
ch_data[0].1
points[0].1
} else {
ch_data[idx - 1].1
points[idx - 1].1
}
}
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since both all_times and points are sorted, the current nested loop performs repeated binary searches, resulting in $O(T \cdot C \cdot \log S)$ complexity. This can be optimized to $O(T \cdot C)$ by using a two-pointer approach (or maintaining a current index per channel) to linearly merge the sorted streams. While the 50,000 sample limit keeps this manageable, a linear merge would provide better performance as the number of channels or samples increases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants