Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.save is slower if schema has sub-document or sub-document array #5925

Closed
nilesh27602816 opened this issue Dec 20, 2017 · 8 comments
Closed
Labels
can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity.

Comments

@nilesh27602816
Copy link

nilesh27602816 commented Dec 20, 2017

Do you want to request a feature or report a bug?
I need some clarification on why save is taking more time if my Schema has sub-document/sub-document array?

Example:

var Person = new mongoose.Schema({
    Name: {
            type: String
    },
    Addresses: [{
        streetaddress1: {
            type: String
        },
        streetaddress1: {
            type: String
        },
        city: {
            type: Number
        }
    }]
});

Person.save takes more time if Addresses has 10000 entries.

Whereas below nested schema takes 1/100th of commit time with same payload

var AddressSchema = new mongoose.Schema({
        streetaddress1: {
            type: String
        },
        streetaddress1: {
            type: String
        },
        city: {
            type: Number
        }
});

var Person = new mongoose.Schema({
    Name: {
            type: String
        }
    },
    Addresses: [AddressSchema]
});

What is the current behavior?

If the current behavior is a bug, please provide the steps to reproduce.

What is the expected behavior?

Please mention your node.js, mongoose and MongoDB version.
mongoose version is 4.9.4

@vkarpov15
Copy link
Collaborator

Thanks for the information, I don't know why off the top of my head but will investigate this ASAP. We're currently working on 5.0.0-rc0 so please be patient.

@vkarpov15 vkarpov15 added this to the 4.13.9 milestone Dec 27, 2017
@sobafuchs
Copy link
Contributor

sobafuchs commented Jan 6, 2018

I'm not able to reproduce the performance issue you are reporting. Here is a sample script to profile the creation of 10000 documents. I modified your schemas because a) you duplicated streetaddress in the schema.

const mongoose = require('mongoose');
const co = require('co');
mongoose.Promise = global.Promise;
const GITHUB_ISSUE = `gh-5925`;


exec()
  .then(() => {
    console.log('successfully ran program');
    process.exit(0);
  })
  .catch(error => {
    console.error(`Error: ${error}\n${error.stack}`);
  });


function exec() {
  return co(function* () {
    yield mongoose.connect(`mongodb://localhost:27017/${GITHUB_ISSUE}`);

    console.log(`==============`);
    console.log(`Benchmarking test with sub doc array`);
    yield testWithSubDocArray();

    console.log(`==============`);
    console.log(`Benchmarking test with nested schema`);
    yield testWithNestedSchema();
  });
}

function addressDefinition() {
  return {
    streetaddress1: {
      type: String
    },
    streetaddress2: {
      type: String
    },
    city: {
      type: Number
    }
  };
}

function testWithSubDocArray() {
  return co(function* () {
    const personSchema = new mongoose.Schema({
      name: String,
      addresses: [addressDefinition()]
    });

    const Person = mongoose.model('Person1', personSchema);

    const person = new Person({
      name: 'test sub doc',
      addresses: generateAddresses()
    });

    yield benchmark(function persistWithSubDoc() {
      return person.save();
    });
  });
}

function testWithNestedSchema() {
  return co(function* () {
    const personSchema = new mongoose.Schema({
      name: String,
      addresses: [new mongoose.Schema(addressDefinition())]
    });

    const Person = mongoose.model('Person2', personSchema);

    const person = new Person({
      name: 'test sub doc',
      addresses: generateAddresses()
    });

    yield benchmark(function persistWithNested() {
      return person.save();
    });
  });
}

function benchmark(func) {
  return co(function* () {
    const start = Date.now();
    yield func();
    const timeTaken = Date.now() - start;
    console.log(`Time taken for ${func.name}: 
                 ms: ${timeTaken} ms
                 seconds: ${timeTaken / 1000 }`
               );
  });
}

function generateAddresses() {
  const addresses = [];
  const address = {
    streetaddress1: 'street address 1',
    streetaddress2: 'street address 2',
    city: 1
  };
  return Array.from({ length: 10000 }, (x => address))
}

Here are some sample benchmark profiles from running multiple times:

test run 1

==============
Benchmarking test with sub doc array
Time taken for persistWithSubDoc:
                 ms: 1055 ms
                 seconds: 1.055
==============
Benchmarking test with nested schema
Time taken for persistWithNested:
                 ms: 1049 ms
                 seconds: 1.049
successfully ran program

test run 2

==============
Benchmarking test with sub doc array
Time taken for persistWithSubDoc:
                 ms: 1080 ms
                 seconds: 1.08
==============
Benchmarking test with nested schema
Time taken for persistWithNested:
                 ms: 947 ms
                 seconds: 0.947

test run 3

==============
Benchmarking test with sub doc array
Time taken for persistWithSubDoc:
                 ms: 1088 ms
                 seconds: 1.088
==============
Benchmarking test with nested schema
Time taken for persistWithNested:
                 ms: 1024 ms
                 seconds: 1.024

@sobafuchs sobafuchs added the can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. label Jan 6, 2018
@vkarpov15 vkarpov15 removed this from the 4.13.9 milestone Jan 7, 2018
@TheYuriG
Copy link

TheYuriG commented Mar 14, 2023

I'm sorry for the 4+ years necro!

I'm having this issue using a Map, instead of an Array.
I'm using a map so I can avoid duplicate items.
It legit takes 6+ minutes to save an item that has 18k+ entries in one property and around 1.3k on the other.

Should I just convert to array to save, but then convert back to map to use in my application? I don't know what I could do to have this simply save faster.

Here is some simple time-logging of the process running:

Fetching data for Game ID # new ObjectId("6409c1fa9067c6df2bfffba0")
Trophy log: 0.316s

(nearly 600 of those later)

Fetching data for Game ID # new ObjectId("6409c1fc9067c6df2bfffbbf")
Trophy log: 38.880s
saving on database
Trophy log: 38.941s
Completed Trophy Log creation for USERNAME new ObjectId("6409bcf79067c6df2bff1735")
Trophy log: 6:50.555 (m:ss.mmm)

EDIT: several hours later, many introduced and squashed bugs later, saving stuff through mongoose as arrays work pretty fast.

started creating trophy log for USERNAME 
first trophy timestamp: 1676878658
last trophy timestamp: 1366406565
Completed Trophy Log creation for USERNAME 6409bcf79067c6df2bff1735
Trophy log: 43.734s

I assume that Mongoose somehow will try to double-check if every key is unique under the hood if you set the type as a map. That lookup for gigantic maps (mine was around 21k items overall) will severely slow down the saving. The solution, and the best of both worlds here, is to save as Array, convert it to Map, process your data as a Map, then convert it back to an Array before saving back with Mongoose.

TLDR: Using maps will make Mongoose slow down to a crawl when saving large objects. Just use arrays instead.

@vkarpov15
Copy link
Collaborator

@TheYuriG thanks for reporting this issue. Can you please come up with a repro script or project that demonstrates this slowness? That would really help us debug and fix this performance bottleneck.

You can use getters and setters to automate converting array to map and vice-versa.

@TheYuriG
Copy link

TheYuriG commented Mar 15, 2023

@TheYuriG thanks for reporting this issue. Can you please come up with a repro script or project that demonstrates this slowness? That would really help us debug and fix this performance bottleneck.

Yeah, if you don't mind waiting a day or two? I just want to wrap up these (unrelated) files I'm modifying so I don't jump in between things. Do you want me to open a separate issue? Would you like me to open a separate repo with the reproduction files so you can easily clone, then npm start it?

You can use getters and setters to automate converting array to map and vice-versa.

Nice, I'll probably take a look at it when I get to your requested script then. Thanks for letting me know!

@vkarpov15
Copy link
Collaborator

Opening a separate issue would be great, happy to wait a day or two. Separate repo that I can git clone, npm install, npm start would be great 👍

@TheYuriG
Copy link

TheYuriG commented Mar 20, 2023

The repro script is basically done, I just want to tweak a few things before making the new issue. Sorry for the delay and hope you had a good weekend!

Didn't have time to wrap it up today because I got caught up doing intl stuff after the gym. I'll 100% finish it tomorrow morning.

@TheYuriG
Copy link

A separate issue was created (including reproduction script) at #13191.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity.
Projects
None yet
Development

No branches or pull requests

4 participants