FAQ

Gordon Woodhull edited this page May 14, 2018 · 53 revisions

dc.js Frequently Asked Questions

Why are some of my charts not filtering?

Two charts on the same dimension will not filter each other. More precisely, a group will not observe filters on its own dimension. This is the design of crossfilter:

Note: a grouping intersects the crossfilter's current filters, except for the associated dimension's filter. Thus, group methods consider only records that satisfy every filter except this dimension's filter. So, if the crossfilter of payments is filtered by type and total, then group by total only observes the filter by type.

https://github.com/square/crossfilter/wiki/API-Reference#dimension_group

The assumption is that you don't want to remove data from the current chart when you filter within it. (Instead, dc.js will draw filtered-out data in grey on the filtering chart, but it's still there.)

If you want to have two charts tracking the same data and filtering each other, create a duplicate dimension and give each chart its own dimension and group. Except with range/focus charts, you almost always want each chart to have its own dimension, with its group created from the dimension.

How does dc.js work?

There really is no magic here - when you filter a chart, it sets the filter on the corresponding dimension object. Then the chart broadcasts a redraw message to the other charts in the chart group using the Chart Registry. Then all charts in the chart group pull new data from their crossfilter groups and animate from the old data to the new data.

(These are two separate meanings of the word "group". A crossfilter group is really a grouping or binning of the data. A chart group is a set of charts that respond to each other. Usually a chart group is associated with a crossfilter object.)

In almost all cases, a dimension is write-only and a group is read-only. The only exception in standard dc.js usage is that the data chart pulls ungrouped data directly from its dimension.

Why doesn't elasticX respond when groups become empty?

As far as crossfilter is concerned, a bin still exists if its value is zero - and dc.js will happily draw empty bins. See remove empty bins for a "fake group" that will remove these bins dynamically, causing the domain to get smaller.

How do I reuse custom reduce functions?

dc.js uses Crossfilter's generic group reduce to let you specify initialize, add, and remove functions for custom aggregation of groups. Typically, these are anonymous inline functions with field names hardcoded, but you can instead use a closure to return such a function with custom parameters (thanks @jefffriesen):

// create functions to generate averages for any attribute
function reduceAddAvg(attr) {
  return function(p,v) {
    if (_.isLegitNumber(v[attr])) {
      ++p.count
      p.sums += v[attr];
      p.averages = (p.count === 0) ? 0 : p.sums/p.count; // gaurd against dividing by zero
    }
    return p;
  };
}
function reduceRemoveAvg(attr) {
  return function(p,v) {
    if (_.isLegitNumber(v[attr])) {
      --p.count
      p.sums -= v[attr];
      p.averages = (p.count === 0) ? 0 : p.sums/p.count;
    }
    return p;
  };
}
function reduceInitAvg() {
  return {count:0, sums:0, averages:0};
}
...
var group = dim.group().reduce(reduceAddAvg(attr), reduceRemoveAvg(attr), reduceInitAvg);

Or, check out Ethan Jewett's reductio library.

How do I reduce multiple values at once?

There are lots of ways to do reductions with crossfilter. I'll just cover the two most common cases here, rows that contain a single value but a different value per row, and rows that contain multiple values. Both use the general form of group.reduce

What if rows contain a single value but a different value per row?

Say there is a field type in each row which determines which type of value the row's value field contributes to.

var group = dimension.group().reduce(
    function(p, v) { // add
        p[v.type] = (p[v.type] || 0) + v.value;
        return p;
    },
    function(p, v) { // remove
        p[v.type] -= v.value;
        return p;
    },
    function() { // initial
        return {};
    });

This reduces the sum of any field types it finds; if you want a count, use 1 instead of v.value.

What if the rows contain multiple values?

Here we use the reusable reduce function pattern from above, to avoid tying the functions to global variables:

function reduceFieldsAdd(fields) {
    return function(p, v) {
        fields.forEach(function(f) {
            p[f] += v[f];
        });
        return p;
    };
}
function reduceFieldsRemove(fields) {
    return function(p, v) {
        fields.forEach(function(f) {
            p[f] -= v[f];
        });
        return p;
    };
}
function reduceFieldsInitial(fields) {
    return function() {
        var ret = {};
        fields.forEach(function(f) {
            ret[f] = 0;
        });
        return ret;
    };
}

var fields = ['a', 'b', 'c'...]; // whatever fields you need
var group = dimension.group().reduce(reduceFieldsAdd(fields), reduceFieldsRemove(fields), reduceFieldsInitial(fields));

As above, if you want a count instead of a sum, use 1 instead of v[f].

How do I tell whether my groups are functioning correctly / whether my input data is good?

Set a breakpoint on the chart initialization, after the groups are created. Run group.all() in the debug console and see whether the keys and values make sense. In particular, take a look at whether the value member of each item in the array matches what the accessors (which usually take those array items as input, not just the value part) expect.

How do I filter the data before it's charted?

(E.g. to completely remove empty groups, or create a cumulative line chart or bar chart.) There are two ways to do this.

  1. One way is to use the .data() function. However, this currently won't work with charts that use .data() internally, which is most of them; see #584.
  2. Another way is to create a "fake group". The idea is to wrap the original group from crossfilter in another object which will first fetch the results from the original group and then do something to them: add bins, remove bins, manipulate keys or values.

Fake Groups

dc.js uses a very limited part of the crossfilter API - in fact, it really only uses dimension.filter() and group.all(). (It also currently uses group.top() but this will go away in v2.1. And it uses crossfilter.quicksort.)

So to change the way dc.js pulls data, create an object with a `.all() method and pass this "fake group" to your chart where you would have passed the original group, and your chart will read from it instead.

Some fake group generation functions are shown below. Each takes a group and produces a fake group which you pass to dc.js instead of the original group.

Add them to your usual crossfilter code like this:

var ndx = crossfilter(...)
var dim = ndx.dimension(...)
var group = dim.group(...) ... 

var filtered_group = remove_empty_bins(group) // or filter_bins, or whatever

chart.dimension(dim)
    .group(filtered_group)
    ...

Some examples of "fake groups" follow.

Remove empty bins

function remove_empty_bins(source_group) {
    return {
        all:function () {
            return source_group.all().filter(function(d) {
                //return Math.abs(d.value) > 0.00001; // if using floating-point numbers
                return d.value !== 0; // if integers only
            });
        }
    };
}

Filter out bins by a predicate function on the values

function filter_bins(source_group, f) {
    return {
        all:function () {
            return source_group.all().filter(function(d) {
                return f(d.value);
            });
        }
    };
}

Ensure that bins exist even if there are no values in them

function ensure_group_bins(source_group) { // (source_group, bins...}
    var bins = Array.prototype.slice.call(arguments, 1);
    return {
        all:function () {
            var result = source_group.all().slice(0), // copy original results (we mustn't modify them)
                found = {};
            result.forEach(function(d) {
                found[d.key] = true;
            });
            bins.forEach(function(d) {
                if(!found[d])
                    result.push({key: d, value: 0});
            });
            return result;
        }
    };
};

Ensure that all (minutes, hours, days) exist in a group

This takes a d3 interval for the second parameter, e.g. d3.timeHour. Explanation.

function fill_intervals(group, interval) {
  return {
  	all: function() {
      var orig = group.all().map(kv => ({key: new Date(kv.key), value: kv.value}));
      var target = interval.range(orig[0].key, orig[orig.length-1].key);
      var result = [];
      for(var oi = 0, ti = 0; oi < orig.length && ti < target.length;) {
        if(orig[oi].key <= target[ti]) {
          result.push(orig[oi]);
          if(orig[oi++].key.valueOf() === target[ti].valueOf())
            ++ti;
        } else {
          result.push({key: target[ti], value: 0});
          ++ti;
        }
      }
      if(oi<orig.length)
        Array.prototype.push.apply(result, orig.slice(oi));
      if(ti<target.length)
        Array.prototype.push.apply(result, target.slice(ti).map(t => ({key: t, value: 0})));
      return result;
    }
  }
}

Remove particular bins

(Really just a specialization of filter_bins, but a common one.)

function remove_bins(source_group) { // (source_group, bins...}
    var bins = Array.prototype.slice.call(arguments, 1);
    return {
        all:function () {
            return source_group.all().filter(function(d) {
                return bins.indexOf(d.key) === -1;
            });
        }
    };
}

Combine groups

Say we have a few groups we want to stack, but they have different X values, so the stack mixin won't display them properly as they are.

function combine_groups() { // (groups...)
    var groups = Array.prototype.slice.call(arguments);
    return {
        all: function() {
            var alls = groups.map(function(g) { return g.all(); });
            var gm = {};
            alls.forEach(function(a, i) {
                a.forEach(function(b) {
                    if(!gm[b.key]) {
                        gm[b.key] = new Array(groups.length);
                        for(var j=0; j<groups.length; ++j)
                            gm[b.key][j] = 0;
                    }
                    gm[b.key][i] = b.value;
                });
            });
            var ret = [];
            for(var k in gm)
                ret.push({key: k, value: gm[k]});
            return ret;
        }
    };
}

The stacks can be accessed by index:

var combined = combine_groups(group1, group2, ...);

chart
    .group(combined, "1", function(d) { return d.value[0]; })
    .stack(combined, "2", function(d) { return d.value[1]; })
    ...

Snap to zero

Sometimes crossfilter groups with floating point values don't cancel out to zero when the same values are added and then removed. This can cause strange artifacts like negative bars when there are no negative numbers and the "blank" color not showing for ordinal colors

In mathematical terms, floating point numbers are not associative or distributive, so e.g.

1 + .2 - 1 - .2 === -5.551115123125783e-17

This fake group will "snap" values to zero when they get close:

function snap_to_zero(source_group) {
    return {
        all:function () {
            return source_group.all().map(function(d) {
                return {key: d.key, 
                        value: (Math.abs(d.value)<1e-6) ? 0 : d.value};
            });
        }
    };
}

Accumulate values

(thanks Xavier Dutoit!)

function accumulate_group(source_group) {
    return {
        all:function () {
            var cumulate = 0;
            return source_group.all().map(function(d) {
                cumulate += d.value;
                return {key:d.key, value:cumulate};
            });
        }
    };
}

Sort a group

Sometimes you may need to sort your bins manually. In particular, the line chart can get messed up if you need an ordering different from the natural order of keys, which is what crossfilter will provide through .all()

So here is sort_group:

function sort_group(group, order) {
    return {
        all: function() {
            var g = group.all(), map = {};
         
            g.forEach(function(kv) {
                map[kv.key] = kv.value;
            });
            return order.map(function(k) {
                return {key: k, value: map[k]};
            });
        }
    };
};

... but I need .top()?

If you are using one of the capped charts (e.g. the row chart with .rowsCap(), you may also need to define a top method on the fake group, which returns items in sorted order just like crossfilter's group.top does.

Here is an example expanding remove_empty_bins with .top(). The process is similar for the other fake groups:

function remove_empty_bins(source_group) {
    function non_zero_pred(d) {
        //return Math.abs(d.value) > 0.00001; // if using floating-point numbers
        return d.value !== 0; // if integers only
    }
    return {
        all: function () {
            return source_group.all().filter(non_zero_pred);
        },
        top: function(n) {
            return source_group.top(Infinity)
                .filter(non_zero_pred)
                .slice(0, n);
        }
    };
}

Do reductions that require all the row data

Note: reductio does min, max, and median out of the box, so if that's all you need, you should use reductio. This section is if you need to do something more complicated, or if you don't want the dependency or want to understand how this works.

In order to calculate the minimum, maximum, or median, among other things, you need to maintain an array of all the rows in each bin.

There is no way around this. For example, you might think that to calculate the maximum, all you need to do is see whether each added row's value is greater than the current maximum. But what do you do when that row is removed? Do you know if there were multiple rows with that value? What was the second-to-maximum value to restore? What should you do once that value is removed? Etc.

Crossfilter does not provide access to the underlying rows in each bin. It probably could do this, but it doesn't. So you'll need to keep track of the arrays of rows yourself.

The best way to do this is to maintain each array sorted on some unique key, so that you can remove the entry for a row when you see a reduceRemove for it. (Or you can maintain an array of just the values that you need for your metric, but we won't show that here.)

This code shows how to maintain an array of the rows themselves, inside your reduce functions. Since JavaScript uses references for object, nothing is copied and this is reasonably efficient. It's also the most general solution, allowing multiple metrics to be calculated for each row.

  function groupArrayAdd(keyfn) {
      var bisect = d3.bisector(keyfn);
      return function(elements, item) {
          var pos = bisect.right(elements, keyfn(item));
          elements.splice(pos, 0, item);
          return elements;
      };
  }

  function groupArrayRemove(keyfn) {
      var bisect = d3.bisector(keyfn);
      return function(elements, item) {
          var pos = bisect.left(elements, keyfn(item));
          if(keyfn(elements[pos])===keyfn(item))
              elements.splice(pos, 1);
          return elements;
      };
  }

  function groupArrayInit() {
      return [];
  }

Give these functions a key function which provides a unique key, and they will return a function you can use for your custom reduction:

var runAvgGroup = runDimension.group().reduce(groupArrayAdd(exptKey), groupArrayRemove(exptKey), groupArrayInit);

Then you'll provide accessors which actually calculate the metric:

      function medianSpeed(kv) {
          return d3.median(kv.value, speedValue);
      }
      rowChart.valueAccessor(medianSpeed)

Complete example here.

How do I replace crossfilter with a server-side solution?

Alternatives to crossfilter

Crossfilter runs in the browser and the practical limit is somewhere around half a million to a million rows of data. If you are binning your data properly (so that you aren't drawing thousands of bars, lines, or dots), the drawing is usually not the bottleneck. The bottleneck is usually the download of large data files and the memory usage of large data sets. It depends on the complexity of the row too (number of columns, data types).

If the data size is okay but it's hurting the interactivity of your page, you can try crossfilter-async to put crossfilter in a webworker.

If, however, you start hitting hard limits, you may want to consider a server-side solution. The response time will not be quite as good, but if your data is that big, then network latency is probably less of a problem than processing the data.

Here are some third-party solutions for using dc.js with a server-based data store. Note: these will probably require some modification of your dc.js configuration. To our knowledge, there is currently no drop-in replacement. If you run into trouble, the dc.js users group is probably the best place to ask (in addition to any forums associated with the projects themselves).

Why is the data returned from d3.csv or d3.json undefined?

Make sure you are using the data within the callback from the function. These functions return immediately and the data will not be defined outside of the callback. Once the data fetch and parsing has been done, the callback will be called.

Why are my bars one pixel wide?

Check that you have set .xUnits on your chart. The parameter should be one of the dc.units helpers.

Why does everything break after a call to .xAxis or .yAxis??

Although most dc.js methods chain, some do not chain to the same object. xAxis returns a d3 axis object which is not the chart. If you access the axis objects of a chart, do it last or do it in a separate line:

var chart = dc.barChart(...).this(...).that(...);
var xAxis = chart.xAxis().tickFormat(...).ticks(...);
var yAxis = chart.yAxis().tickFormat(...).ticks(...);

Why does a function I created in a loop not work?

(This is a general JavaScript question, but it comes up a lot with d3 & dc because a lot of function objects are used.)

If you refer to a variable outside the body of the lambda function, you are using the variable by reference, not by value. So when the function is run, it will have the current value of the variable, not the value the variable had when you created the function:

var a = []
for(var i = 0; i < 10; ++i) a.push(function() { return i;});
a.map(function(f) { return f(); }); // returns [10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

One way around this is to "capture" the value using an auxiliary function:

var a = []
function helper(i) { return function() { return i; }; };
for(var i = 0; i < 10; ++i) a.push(helper(i));
a.map(function(f) { return f(); }); // returns [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Or to keep the code in one place, you can use an IIFE:

var a = []
for(var i = 0; i < 10; ++i) 
  a.push(function(i) { 
    return function() { return i; }; 
  }(i));
a.map(function(f) { return f(); }); // returns [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Why don't date scales work for me?

Often a chart will work with ordinal or linear scales, but not work when using d3.time scales.

The usual reason for this is that the dates need to be parsed as JavaScript Date Objects in order to be used with d3 time scales.

Before passing your data to crossfilter, do something like this:

    data.forEach(function(d) {
        d.date = new Date(d.date);
    });