Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Synchronous validation on single field containing large array of objects? #5191

Closed
justinklemm opened this issue Apr 20, 2017 · 9 comments
Milestone

Comments

@justinklemm
Copy link

I have a schema with a field that contains an array of fairly complex objects. It looks something like this (in CoffeeScript):

Schema
  steps: [
    sequence:
      type: Number
      default: "0"
    url: String
    command: String
    target: String
    value: String
    extracted: String
    variableName: String
    notes: String
    private:
      type: Boolean
      default: false
    optional:
      type: Boolean
      default: false
    file:
      filename: String
    passing:
      type: Boolean
      default: null
    error: String
    extra: Schema.Types.Mixed
    dateExecuted: Date
  ]

An instance of this model will have a steps field with an array of objects matching the outlined schema. I've found that when I create an instance of this model that has a large number (1,000+) of elements in the steps array, my application locks up for 30 seconds when save() is called on the instance... It appears that Mongoose is looping through the steps array synchronously as it applies the validation (as outlined in the schema) to each entry.

When I checked the docs on validation, it says Validation is asynchronously recursive, but that doesn't appear to be the case here in validating a single array field.

When I simplify the schema to:

Schema
  steps: [Schema.Types.Mixed]

then the issue goes away, since (I assume) the validation of each entry is no longer being done.

I'd expect the validation on the array field entries to be applied asynchronous?

Node.js: 4.8.1
Mongoose 4.9.5
MongoDB: 3.4.3

@justinklemm
Copy link
Author

If the solution is to use steps: [Schema.Types.Mixed] and add my own asynchronous validation, I'm fine with that. I'm just trying to understand if this is expected, and pointing it out, in case it's something we want to address.

@sobafuchs
Copy link
Contributor

yeah this is something worth exploring, will get back to you on this

@sobafuchs sobafuchs added this to the 4.9.7 milestone Apr 27, 2017
@justinklemm
Copy link
Author

Cool -- FWIW, when I wrote my own function to loop through and validate the array, even processing 1,000 items synchronously completed quickly (< 1 second). I think there must be some kind of funkiness that causes Mongoose's logic to get really bogged down in this scenario.

@vkarpov15 vkarpov15 modified the milestones: 4.9.8, 4.9.7 May 1, 2017
@vkarpov15
Copy link
Collaborator

So the below script runs much faster (6.6 seconds vs 14 seconds) on my machine after a5f5069 above:

const mongoose = require('mongoose');

mongoose.connect('mongodb://localhost:27017/test');

const M = mongoose.model('Test', new mongoose.Schema({
  steps: [{
    sequence: {
      type: Number,
      default: "0"
    },
    url: String,
    command: String,
    target: String,
    value: String,
    extracted: String,
    variableName: String,
    notes: String,
    private: {
      type: Boolean,
      default: false
    },
    optional: {
      type: Boolean,
      default: false
    },
    file: {
      filename: String
    },
    passing: {
      type: Boolean,
      default: null
    },
    error: String,
    extra: mongoose.Schema.Types.Mixed,
    dateExecuted: Date
  }]
}));

var m = new M();

for (var i = 0; i < 1001; ++i) {
  m.steps.push({
    url: 'a',
    command: 'b',
    target: 'c',
    value: 'd',
    extracted: 'e',
    variableName: 'f',
    notes: 'g',
    file: { filename: 'c' },
    error: 'd',
    extra: { test: i },
    dateExecuted: new Date()
  });
}

console.log('saving');
var start = Date.now();
m.save(function(error) {
  console.log(error);
  console.log('elapsed', Date.now() - start);
  process.exit(0);
});

Going to do some more research into the sync bit tomorrow but it looks like we were doing a lot of unnecessary work when validating subdocs...

@justinklemm
Copy link
Author

Nice. 50%+ perf improvement ain't bad. I poked around a little bit, but it wasn't obvious to me where the subdoc validation code lives. If you could point me to it, I'm happy to tinker.

@vkarpov15
Copy link
Collaborator

It's certainly better than a punch in the face :) Benefit is bigger for longer arrays. Subdoc validation is recursive so it's hard to point out exactly where it lives but this is where we call save() on every subdoc and therefore do validation

@vkarpov15
Copy link
Collaborator

I removed some more duplicate work in 65dc6c0, so now the given script runs in 305ms instead of 6.6s, 40x perf improvement on this one case. Much better.

Also, you are right that there's some sync code in there, specifically validating a document array synchronously loops over the array. The majority of time was spent blocking on the unnecessary getAllSubdocs() calls removed in ☝️ but shouldn't be the case anymore. I'll double check and see if there's anything else we can trim, and also make that async.

@justinklemm
Copy link
Author

justinklemm commented May 5, 2017

Awesome - nice work! That was definitely the slow piece. For my purposes, the sync loop isn't a big deal. I suppose if someone gets crazy and adds 10k+ subdocs to an array, it could come into play, but that seems like an extreme edge case. I really appreciate your efforts here (and with the Mongoose package as a whole). Thanks! Looking forward to v4.9.8 😀

@vkarpov15
Copy link
Collaborator

I refactored some stuff to remove more unnecessary work and shaved it down to ~250ms, but not gonna get much faster because ~150 of that is waiting on the database. Thanks for your patience in digging out this perf issue 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants