Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Mongoose object instantiation takes longer than expected for nested schema #950

Closed
brettkiefer opened this Issue · 18 comments

2 participants

@brettkiefer

First: thanks for Mongoose!

We're having some performance problems with Mongoose and Trello; we're starting to have some medium-sized payloads coming back from the database (~50kB) and we're noticing that most of our page serve time is actually spent initing Mongoose objects after they have arrived from the DB (specifically in the 'cb' function in execFind in lib/query.js) . I made a little standalone benchmark to test Mongoose doc init, and it is here:

https://gist.github.com/2877516

On my machine, for that JSON object, it takes 1ms to load the object from disk and parse the JSON, and 75ms to build the Mongoose object. That seems like a lot for ~50kB of data; I ran the profiler but couldn't find any obvious big wins, the time seemed pretty well distributed throughout the init code; I think we're instantiating about 400 objects, total.

I know that we can skip the Mongoose init overhead by going to the raw driver or by using the new 'lean' query option when we don't need any ODM functionality, so we may try that route, but since you understand the schema init code much better than anyone over here, maybe you can shed some light?

If our expectations for init time are unreasonable, then let us know, it just seemed a like a lot of time.

Another interesting thing: if you rip out the top-level doc (the board) and just do checklists, that gets rid of about a third of the total init time; that's odd because there's really nothing in the top-level doc in this example, but maybe it's to do with setting hooks on the top-level doc?

@aheckmann
Owner

my gut says its all the arrays. mongoose arrays are subclasses of native arrays and super slow at init time. they're the biggest bottleneck in mongoose.

taking a look now

@brettkiefer

Thanks for checking it out! The closest thing I could find to a hot spot was v8::internal::JSObject::LocalLookupRealNamedProperty, due to EmbeddedDocument.commit.EmbeddedDocument.markModified and Document.set, but that only accounted for ~5% of the total ticks, so I'm not sure it's relevant.

@aheckmann
Owner
@aheckmann
Owner

the problem is the subclassed arrays. when casting to MongooseDocumentArray we set arr.proto to ensure Array.isArray still works. In the past I've tried unsuccessfully to improve this. In Mongoose 2.4.4 or so we had an alternate implementation but it caused bad behavior (dont remember what) so we removed it.

At this point there is a trade-off between having Array.isArray compatibility and speed. I'll take another look at this for v3

@brettkiefer

Interesting; how did you figure that out? Is there a way we can see it without the array prototype inheritance? When I
git checkout 4edd44f
and then run the big.js benchmark, it takes about the same amount of time, though that may be for a different reason.

@aheckmann
Owner

i figured it out by commenting out chunks in document.set, which led to the casting, so i popped open lib/types/documentarray and commented chunks out of the constructor.

removing the inheritance worked for your script but causes massive test errors.

@aheckmann
Owner

the reason you aren't seeing a perf boost in 4edd44f is b/c that branch only implemented the fix for non-document arrays.

@brettkiefer

Oh! I missed that. I'll try to apply that change to document arrays, just to see what kind of a speedup it would yield if it were usable.

@brettkiefer

I got SOMETHING similar to 4edd44f working (with some help from @bpollack), using the exact same trick with some copy/paste to mimic without calling the MongooseArray constructor in lib/types/documentarray.js, but it didn't help the performance of the benchmark. That's probably because I did it utterly wrong and without an understanding of what you were trying to accomplish. Is the slow thing that the VM can't use the fast path on a subclassed array?

/**
 * Module dependencies.
 */
var EmbeddedDocument = require('./embedded');
var Document = require('../document');
var ObjectId = require('./objectid');
var fs = require('fs');
var vm = require('vm');

var MongooseArray = require('./array')
  , driver = global.MONGOOSE_DRIVER_PATH || '../drivers/node-mongodb-native'
  , ObjectId = require(driver + '/objectid')
  , ObjectIdSchema = require('../schema/objectid');

var script = vm.createScript(fs.readFileSync(__dirname +'/subarray.js', 'utf8'), 'subarray.js')
var scope = {
    EmbeddedDocument: EmbeddedDocument
  , Document: Document
  , ObjectId: ObjectId
  , exports: null
  , console: console
  , Date: Date
  , Array2: Array
}
script.runInNewContext(scope);
var SubArray = scope.exports;

/**
 * Array of embedded documents
 * Values always have to be passed to the constructor to initialize, since
 * otherwise MongooseArray#push will mark the array as modified to the parent.
 *
 * @param {Array} values
 * @param {String} key path
 * @param {Document} parent document
 * @api private
 * @see http://bit.ly/f6CnZU
 */

function MongooseDocumentArray (values, path, doc) {
  var arr = new SubArray;
  Array.prototype.push.apply(arr, values);

  arr._atomics = {};
  arr.validators = [];
  arr._path = path;

  if (doc) {
    arr._parent = doc;
    arr._schema = doc.schema.path(path);
    doc.on('save', arr.notify('save'));
    doc.on('isNew', arr.notify('isNew'));
  }

  return arr;
};


/**
 * Inherits from MongooseArray
 */

SubArray.prototype.__proto__ = MongooseDocumentArray.prototype = new Array;
@brettkiefer

I also tried to repro the performance improvement DocumentArray constructor, but the way I did it (also probably wrong) seemed to help because it was causing the checkItems arrays in the checklists to not init as DocumentArrays (so we were calling the MongooseDocumentArray constructor far fewer times). Is your fix different?

var timesCalled = 0;
function MongooseDocumentArray (values, path, doc) {
  console.log("construct MongooseDocumentArray + " + path + ", # calls is " + ++timesCalled);
  var arr = [];
  arr.push.apply(arr, values);
  //arr.__proto__ = MongooseDocumentArray.prototype;

  arr._atomics = {};
  arr.validators = [];
  arr._path = path;

  /*
  if (doc) {
    arr._parent = doc;
    arr._schema = doc.schema.path(path);
    doc.on('save', arr.notify('save'));
    doc.on('isNew', arr.notify('isNew'));
  }
  */

  return arr;
};
@aheckmann
Owner

yeah i didn't get much benefit either. still researching.

@aheckmann aheckmann referenced this issue from a commit in aheckmann/mongoose
@aheckmann aheckmann only create _id if necessary
relates to #950
6be45fc
@aheckmann aheckmann referenced this issue from a commit in aheckmann/mongoose
@aheckmann aheckmann avoid de-optimized fns
relates to #950
4730b53
@aheckmann
Owner

i've got it down to ~35ms on my loops branch. this branch is based on master. there are a couple other areas i'm looking at to improve things more (hooks and a possibly custom objectid class) but they'll take a while.

@brettkiefer

Hey, thanks! I got the same result for the benchmark, and then was able to run a test version of Trello against that change (after modifying it to accept array on the field specifier again - is that an intentional change?) and saw the Mongoose cb time drop by ~30% in real testing there, too, for a significant improvement in overall page times. Again, thanks; we'll be testing with it, and any other improvements will have a direct positive effect on overall page performance for us, so let us know where we can help.

@aheckmann aheckmann referenced this issue from a commit in aheckmann/mongoose
@aheckmann aheckmann another benchmark from #950 dc64cb6
@aheckmann
Owner

great! what is the field specifier issue?
i pushed some more improvements to that branch this morning. objectids are about as good as they're gonna get now for doc creation and save.

@brettkiefer

The issue was that I couldn't specify fields to return with an array of strings on that branch because I would hit

throw new TypeError('Invalid select() argument. Must be a string or object.');"

so I just added

 } else if (1 === arguments.length && Array.isArray(arg)) {
    for (var i =0; i < arg.length; i++) {
      fields[arg[i]] = 1;
  }

because that was easier than looking for the change that had caused it and I was eager to try the branch out. =)

@aheckmann
Owner

oh right. yeah this is based on master which deprecated a bunch of sugar for the upcoming v3

@aheckmann
Owner

its all in master now. basically we have faster doc creation and save() due to ObjectId optimization and removal of several getters/setters from the document and model prototypes: all internal stuff so you shouldn't notice anything but better performance.

thanks for being a guinea pig! gonna close this one.

@aheckmann aheckmann closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.