Skip to content

Tag: add correlate(other) — toolbox-free Pearson correlation between two tags (first relational/two-tag analysis primitive) #340

Description

@HanSur94

Problem / motivation

FastSense's Tag analysis family has grown a rich set of primitives — getStats (#223), percentile/quantile (#339), spectrum (#338), derivative (#326), cumulativeIntegral (#327), crossings (#328), findPeaks (#329), exceedance (#316), movingStat (#312), findGaps (#306), resampleUniform (#308), stateDurations (#258). Every one of these is unary — it reduces or transforms a single series.

There is no primitive that relates two tags. DerivedTag/CompositeTag do combine tags, but arithmetically (a formula/expression) — they answer "what is A+B?", not "how strongly does A track B?". Confirmed absent: grep -rniE "correlat|xcorr|coheren|covarian|corrcoef" libs/**.m returns nothing (only cache-incoherence prose), and no existing issue proposes correlating/comparing two tags statistically.

"Do these two channels move together, and by how much?" is a canonical sensor-analysis question:

  • Redundant-sensor agreement — two thermocouples that should read the same.
  • Drift vs. reference — is a channel diverging from a known-good twin?
  • Cause/effect screening — does vibration rise with load?
  • Feature selection before deeper analysis.

Today a user must drop to getXY on both tags, hand-align the timestamps, drop NaNs, and hand-roll the Pearson formula — exactly the boilerplate the rest of the analysis family exists to eliminate.

Proposed feature

A base-Tag convenience method returning the Pearson correlation coefficient of two tags' time-aligned resolved series:

r = tagA.correlate(tagB);            % Pearson r in [-1, 1] over the overlapping window
r = tagA.correlate(tagB, t0, t1);    % over a time window (mirrors getStats/percentile range args)
[r, n] = tagA.correlate(tagB);       % optional: also return the aligned sample count

Rough sketch

Single method on libs/SensorThreshold/Tag.m (base class → every subclass inherits it, same placement as getStats #223 / percentile #339 / spectrum #338). The alignment machinery already exists — every Tag implements valueAt(t) (ZOH/right-biased lookup, e.g. DerivedTag.m:246), so no new resampling code is needed:

  1. [X, YA] = obj.getXYRange(t0, t1) (or getXY when no range) — reuse the range plumbing Tag: add a public getStats() statistics primitive (N/Min/Max/Mean/Rms/Std over a series or time range) #223/Tag: add percentile()/quantile() — toolbox-free order-statistics primitive (P50/P95/P99 & IQR, order-stat sibling to getStats #223) #339 use (Tag.m:125).
  2. Sample the other tag onto A's timestamps via ZOH: YB = arrayfun(@(t) other.valueAt(t), X) — reuses the existing per-subclass valueAt alignment primitive.
  3. Drop pairwise NaNs; guard n < 2 (return NaN, documented); guard zero-variance / constant channel (NaN, documented).
  4. Pearson r, toolbox-free:
    r = sum((a-mean(a)).*(b-mean(b))) / sqrt(sum((a-mean(a)).^2) * sum((b-mean(b)).^2))
    (equivalently core corrcoef, which is base MATLAB and Octave — not a toolbox function).

Pure-function and read-only: no new property, no toStruct/fromStruct change, no Tag/DashboardWidget contract touched. Output is a plain scalar (+ optional n).

Design note for triage: correlation requires time-aligned samples. The sketch aligns via ZOH valueAt onto the first tag's timestamps over the overlapping window — the same convention DerivedTag uses for mismatched parents (cf. the ZOH work in #323). The method header should state this policy explicitly. This is a documented design choice, not a blocker, and it reuses machinery already in the repo.

Value

High — opens an entirely new relational analysis dimension the library currently lacks, directly serving redundant-sensor agreement, drift, and cause/effect screening workflows that today force the user out of FastSense. It is the missing binary/relational sibling of an otherwise unary analysis family.

Constraints check

Effort estimate

S — one method in a single file (libs/SensorThreshold/Tag.m), plus a focused test: two identical tags → r≈1; A vs −A → r≈−1; uncorrelated → r≈0; mismatched-timestamp alignment via ZOH; NaN handling; n < 2 and zero-variance guards; time-range path.

Next-in-line siblings (out of scope here)

crossCorrelation(other) / lag(other) time-delay estimation is the natural relational follow-on (argmax-lag → transport delay / propagation), but needs a uniform grid (resample, #308) and a max-lag window, so it should follow this scalar-r primitive rather than bundle with it.


AI-proposed via /feature-scout — needs a human product decision before implementation.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions