Skip to content

Issues with aggregate functions, date handling, null handling: GREATEST, LEAST, MAX, MIN, VAR, STDEV, VARP, QUART/2/3 #2481

@gabrielmongefranco

Description

@gabrielmongefranco

Bug Reports

Hello!

I have been using alasql for one of my projects (DataLaVista dashboard designer), and I noticed some issues with certain functions. I did some digging to find the definitions and then ran them by Claude Opus (LLM) to document the issues and propose solutions. I don't know if this is correct or not, but I figured I would post it here just in case these issues do indeed need fixing.

Bug: GREATEST / LEAST (scalar MAX / MIN) incorrect NULL handling and unreliable Date comparison

Functions affected: stdlib.GREATEST, stdlib.MAX, stdlib.LEAST, stdlib.MIN

Problem:

  • null and undefined arguments are not skipped. JavaScript coerces null to 0 in comparisons, producing wrong results (e.g., GREATEST(-1, null) returns null instead of -1).
  • Date objects are compared without .getTime(), which is unreliable.
    (Note: This bug was first reported on MIN() and MAX() functions not returning values for dates #2453)

Expected behavior (ANSI SQL): Skip NULLs; return NULL only if all arguments are NULL.

Fix:

stdlib.GREATEST = stdlib.MAX = function () {
    var args = Array.prototype.slice.call(arguments);
    // CHANGED: was a simple reduce with no null filtering and no Date handling
    return '(function(){ ' +
        'var vals = [' + args.join(',') + '].filter(function(x){ return x !== null && typeof x !== "undefined"; }); ' +
        'if(!vals.length) return undefined; ' +
        'return vals.reduce(function(a,b){ ' +
            // CHANGED: use .getTime() for Date comparison
            'var av = a instanceof Date ? a.getTime() : a; ' +
            'var bv = b instanceof Date ? b.getTime() : b; ' +
            'return av > bv ? a : b; }); ' +
    '})()';
};

stdlib.LEAST = stdlib.MIN = function () {
    var args = Array.prototype.slice.call(arguments);
    // CHANGED: was a simple reduce with no null filtering and no Date handling
    return '(function(){ ' +
        'var vals = [' + args.join(',') + '].filter(function(x){ return x !== null && typeof x !== "undefined"; }); ' +
        'if(!vals.length) return undefined; ' +
        'return vals.reduce(function(a,b){ ' +
            // CHANGED: use .getTime() for Date comparison
            'var av = a instanceof Date ? a.getTime() : a; ' +
            'var bv = b instanceof Date ? b.getTime() : b; ' +
            'return av < bv ? a : b; }); ' +
    '})()';
};

Bug: MIN / MAX with OVER (PARTITION BY ...) returns NaN for Date objects

Functions affected: Window aggregate handling in 40select.js, inside the compile() method of yy.Select

File to fix: src/40select.js

Problem: When MIN or MAX is used as a window function with an OVER (PARTITION BY ...) clause, the value is computed using Math.max.apply(null, values) and Math.min.apply(null, values). These functions coerce their arguments via the abstract ToNumber operation, which does not reliably convert Date objects to timestamps in this context, returning NaN instead of the correct date value.

Additionally, when the partition contains no non-null values, the result is null rather than undefined, which is inconsistent with how alasql represents NULL elsewhere.

Expected behavior (ANSI SQL): Return the maximum or minimum value in the partition, correctly handling Date objects; return NULL for an empty partition.

Current code in 40select.js:

case 'MAX':
    aggregateValue = values.length > 0 ? Math.max.apply(null, values) : null;
    break;
case 'MIN':
    aggregateValue = values.length > 0 ? Math.min.apply(null, values) : null;
    break;

Fix:

case 'MAX':
    // CHANGED: was Math.max.apply(null, values) which returns NaN for Date objects;
    // use reduce with explicit Date-aware comparison instead.
    // CHANGED: was returning null for empty partition; changed to undefined for
    // consistency with how alasql represents NULL elsewhere.
    if (values.length === 0) {
        aggregateValue = undefined;
    } else {
        aggregateValue = values.reduce(function(a, b) {
            var av = a instanceof Date ? a.getTime() : a;
            var bv = b instanceof Date ? b.getTime() : b;
            return av > bv ? a : b;
        });
    }
    break;
case 'MIN':
    // CHANGED: was Math.min.apply(null, values) which returns NaN for Date objects;
    // use reduce with explicit Date-aware comparison instead.
    // CHANGED: was returning null for empty partition; changed to undefined for
    // consistency with how alasql represents NULL elsewhere.
    if (values.length === 0) {
        aggregateValue = undefined;
    } else {
        aggregateValue = values.reduce(function(a, b) {
            var av = a instanceof Date ? a.getTime() : a;
            var bv = b instanceof Date ? b.getTime() : b;
            return av < bv ? a : b;
        });
    }
    break;

Bug: VAR returns 0 instead of NULL for fewer than 2 non-null values

Functions affected: alasql.aggr.VAR, alasql.aggr.STDEV (which delegates to VAR)

Problem: Returns 0 when fewer than 2 non-null values are present. ANSI SQL VAR_SAMP requires NULL in this case. STDEV inherits this bug, returning Math.sqrt(0) = 0 instead of NULL.

Expected behavior (ANSI SQL): VAR_SAMP and STDDEV_SAMP return NULL for fewer than 2 non-null values.

Fix:

alasql.aggr.VAR = function (v, s, stage) {
    if (stage === 1) {
        return v === null ? {sum: 0, sumSq: 0, count: 0} : {sum: v, sumSq: v * v, count: 1};
    } else if (stage === 2) {
        if (v !== null) {
            s.sum += v;
            s.sumSq += v * v;
            s.count++;
        }
        return s;
    } else {
        if (s.count > 1) {
            return (s.sumSq - (s.sum * s.sum) / s.count) / (s.count - 1);
        } else {
            // CHANGED: was return 0, which is wrong; ANSI SQL VAR_SAMP requires NULL for n < 2
            return undefined;
        }
    }
};

alasql.aggr.STDEV = function (v, s, stage) {
    if (stage === 1 || stage === 2) {
        return alasql.aggr.VAR(v, s, stage);
    } else {
        // NOTE: remove the duplicate definition of STDEV that appears after this one in the source
        return Math.sqrt(alasql.aggr.VAR(v, s, stage));
    }
};

Bug: STDEV is defined twice; second definition silently overwrites the first

Functions affected: alasql.aggr.STDEV

Problem: alasql.aggr.STDEV is defined twice in functions.js with identical bodies. The second definition silently overwrites the first. This is a latent maintenance hazard: any future change made to one copy but not the other will introduce a silent bug.

Expected behavior: STDEV should be defined exactly once.

Fix: Remove the second definition of alasql.aggr.STDEV. The single remaining definition is:

alasql.aggr.STDEV = function (v, s, stage) {
    if (stage === 1 || stage === 2) {
        return alasql.aggr.VAR(v, s, stage);
    } else {
        return Math.sqrt(alasql.aggr.VAR(v, s, stage));
    }
};
// REMOVED: the following duplicate block that appeared immediately after the above
// alasql.aggr.STDEV = function (v, s, stage) {
//     if (stage === 1 || stage === 2) {
//         return alasql.aggr.VAR(v, s, stage);
//     } else {
//         return Math.sqrt(alasql.aggr.VAR(v, s, stage));
//     }
// };

Bug: VARP does not skip NULL values and returns 0 instead of NULL for empty input

Functions affected: alasql.aggr.VARP, alasql.aggr.STDEVP, alasql.aggr.STD, alasql.aggr.STDDEV

Problem:

  • A null input at stage 1 computes null * null = 0, silently corrupting sumSq.
  • An undefined input at stage 1 computes undefined * undefined = NaN, which propagates silently through all subsequent calculations.
  • Neither null nor undefined is skipped in stage 2.
  • Returns 0 for an empty set instead of NULL.

ANSI SQL VAR_POP requires NULLs to be skipped and NULL returned for an empty set. STDEVP, STD, and STDDEV inherit all of these bugs via VARP.

Expected behavior (ANSI SQL): Skip NULLs; return NULL for an empty set.

Fix:

alasql.aggr.VARP = function (value, accumulator, stage) {
    if (stage === 1) {
        // CHANGED: was missing null/undefined guard; null*null=0 and undefined*undefined=NaN
        // both silently corrupt sumSq
        if (value === null || value === undefined)
            return {count: 0, sum: 0, sumSq: 0};
        return {count: 1, sum: value, sumSq: value * value};
    } else if (stage === 2) {
        // CHANGED: was missing null/undefined guard
        if (value !== null && value !== undefined) {
            accumulator.count++;
            accumulator.sum += value;
            accumulator.sumSq += value * value;
        }
        return accumulator;
    } else {
        // CHANGED: was return 0 for empty set; ANSI SQL VAR_POP requires NULL
        if (accumulator.count === 0) return undefined;
        var mean = accumulator.sum / accumulator.count;
        return accumulator.sumSq / accumulator.count - mean * mean;
    }
};

// STDEVP, STD, STDDEV delegate to VARP and inherit the fix automatically
alasql.aggr.STD =
    alasql.aggr.STDDEV =
    alasql.aggr.STDEVP =
        function (v, s, stage) {
            if (stage == 1 || stage == 2) {
                return alasql.aggr.VARP(v, s, stage);
            } else {
                return Math.sqrt(alasql.aggr.VARP(v, s, stage));
            }
        };

Bug: QUART / QUART2 / QUART3 use wrong formula and no interpolation; not ANSI SQL compliant

Functions affected: alasql.aggr.QUART, alasql.aggr.QUART2, alasql.aggr.QUART3

Problem:

  • The formula p = nth * (n+1) / 4 is not the correct base formula for ANSI SQL PERCENTILE_CONT.
  • The result is a step function: only Math.floor(p) is taken with no interpolation between adjacent values. ANSI SQL PERCENTILE_CONT requires linear interpolation.
  • undefined values are not excluded in stage 2 (only null is checked), allowing them to enter the sort.
  • Returns [] instead of undefined (NULL) for an empty set.

Expected behavior (ANSI SQL): Linear interpolation per PERCENTILE_CONT; skip NULLs; return NULL for empty input.

Fix:

alasql.aggr.QUART = function (v, s, stage, nth) {
    // CHANGED: was missing undefined guard in stage 1
    if (stage === 1) return (v === null || v === undefined) ? [] : [v];
    // CHANGED: was missing undefined guard in stage 2
    if (stage === 2) { if (v !== null && v !== undefined) s.push(v); return s; }
    // CHANGED: was returning [] for empty set; ANSI SQL requires NULL
    if (!s.length) return undefined;
    nth = nth || 1;
    var r = s.slice().sort(function(a, b) { return a - b; });
    var n = r.length;
    // CHANGED: was p = nth * (n+1) / 4 with no interpolation (step function);
    // ANSI SQL PERCENTILE_CONT uses linear interpolation on a 0-indexed scale
    var h = (nth / 4) * (n - 1);
    var hf = Math.floor(h);
    var frac = h - hf;
    if (frac === 0) return r[hf];
    return r[hf] + frac * (r[hf + 1] - r[hf]);
};

alasql.aggr.QUART2 = function (v, s, stage) {
    return alasql.aggr.QUART(v, s, stage, 2);
};

alasql.aggr.QUART3 = function (v, s, stage) {
    return alasql.aggr.QUART(v, s, stage, 3);
};

Regards,

Gabriel Mongefranco
Mobile Data Architect
Eisenberg Family Depression Center, University of Michigan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions