Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jStringify performance bottleneck #172

Open
KrishnaPG opened this issue Oct 21, 2016 · 9 comments
Open

jStringify performance bottleneck #172

KrishnaPG opened this issue Oct 21, 2016 · 9 comments

Comments

@KrishnaPG
Copy link

Started to see webpage freezing for a noticeable amount of time while doing the collection save (for 2k records) and analyzing the performance profile indicates the below:

image

The jStringify method is taking significant amount of time and what is to be noted is the below:

image

The time taken by JSON.stringify is just 1.2 seconds, while the time taken by the wrapper jStringify is about 9.5 seconds.

@Irrelon
Copy link
Owner

Irrelon commented Oct 21, 2016

Hi ya,

Hmm, I think that wrapper can be removed and native JSON.stringify can be used in its place now. I am making that change and checking performance... I'll push an update shortly if it passed unit testing.

You can see more details on why we had wrapping here: https://github.com/Irrelon/ForerunnerDB/wiki/Serialiser-&-Performance-Benchmarks

Performance is very important. There is an update in the works to significantly overhaul the way the current persistent storage system works so that this sort of serialisation on save would be either redundant (because we store the actual data straight into persistent tables) or would be massively reduced (because we only save updates via delta vs saving all data on every save() call which is how it happens at the moment).

Effectively, this situation happened because the persistent storage part of ForerunnerDB was not part of the original project and was added via a plugin later. This meant that we could have persistence but it wasn't baked in at the core level so it is very basic persistence.

Optimisations we are planning include:

  • Persistence built-in at the core
  • When a database is saved, it will KNOW it has persistent storage on boot up and auto-load it's data (pretty-much exactly how you would expect it to work)
  • CRUD operations are persisted individually instead of in bulk and called load() and save() will no longer be required

These features are part of ForerunnerDB 2.0 which is rewritten in ES6 to take advantage of class-based code and lovely @ decorators that allow us to augment parts of the system with mixins without any fancy voodoo code.

Bottom Line

Give me a sec and I'll see if unwrapped it is much faster.

@Irrelon
Copy link
Owner

Irrelon commented Oct 21, 2016

I cannot see any measurable difference between the wrapped and direct usage with 25,000 records. The change passed all unit testing so I've pushed it up to the dev branch.

Would you download the dev branch build and try it please? Let me know what your results are.

@Irrelon
Copy link
Owner

Irrelon commented Oct 21, 2016

PS you can install the dev branch via NPM:

npm install forerunnerdb --tag dev

@KrishnaPG
Copy link
Author

Thanks @Irrelon This is what it looks like now:

image

The Persist._encode has become the show stopper with functions from localforage, such the then promise resolver taking the runner up positions, followed by jParse

image

The page is becoming unresponsive at times. Is there anyway the page freezing can be addressed?

The main concern is: if it takes couple of seconds to save the data to disk that is fine - as long as it does not freeze the page and affect the user experience

I think some of the ideas you pointed earlier are good options:

  • Saving only the differences
  • Making the storage a part of 'core' etc.

Also, is there anyway this save could be done 'asynchronously' so that it would not block the page rendering?

  • when combined with the 'delta saves' this could be effective

Worker threads will full data save may not be much good option (since inter-thread bandwidth once again could pose a bottleneck). However, with 'delta saves' + 'worker threads', it may be something worth considering.

The delta saves could also be very important to address issues such as this: #171 (to synchronize the data across tabs)

@Irrelon
Copy link
Owner

Irrelon commented Oct 26, 2016

Hi ya,

The only way to stop thread blocking is to make the whole process automatically break into small operations that all tie up at the end.

It is possible. It is how collection CRUD operations work at the moment where large operations are broken into multiple ops so that the thread doesn't block.

I agree that when combined with delta saves this will have a massive impact on performance of the persistence system.

@Irrelon
Copy link
Owner

Irrelon commented Oct 26, 2016

Thinking about it more, the way it currently works is it takes the entire collection data as an array and then stringifies it and then saves it to persistent storage. This was done to maintain compatibility with localStorage which only allowed string data and had no concept of tables, rows etc.

What we could do instead is effectively simulate rows by saving the entire collection as a load of keys, each key representing a row / document in the collection.

That way, we could process say 100 rows at a time, give some time back to the thread and then carry on to the next 100.

The code was written with callbacks everywhere so we could support an async architecture easily.

My only concern is what the performance hit of breaking up each document into its own key would be on the browser's persistent storage times. This will require a test.

@KrishnaPG
Copy link
Author

Thanks @Irrelon . Perhaps the approach taken by https://github.com/JakSprats/datanet could throw some pointers.

@dmccarrick
Copy link

dmccarrick commented Dec 6, 2017

Hi @Irrelon!

Any news on when the performance improvements may be ready? I see version 2 is already out there. Are they part of this? Will the changes be compatible with existing, persisted, DBs?

Thanks

@Irrelon
Copy link
Owner

Irrelon commented Dec 11, 2017

@dmccarrick Hey ya. We are going to either have to assume the data doesn't exist if it is not in the format expected, or we are going to have to detect "v1" data and do a conversion automatically. The second option is obviously a better option for users and developers.

The short answer is, when the performance updates are ready, I'll make an announcement about the transition to the new version's data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants