Investigate IO queuing and its benefits #23

HouzuoGuo · 2013-10-01T23:26:18Z

Investigate how introducing IO queues can benefit tiedot's operation.

The IO queues will operate on:

document CRUD
hash bucket CRU

alexandrestein · 2013-10-05T10:38:27Z

I made a quick documentation on it: https://drive.google.com/folderview?id=0B9CkwEErGkG1WVVmZkJHMXNackU&usp=sharing ("docs" folder)

As I said to you, I'm not an expert in database but I'm in touch with people who deal with evy bandwidth and concurrent access. It sounds to me like databases are facing the same kind of problems than big storage. ;-)

Tell me how it sounds to you.
I hope it will help.

Regards,
Alexandre Stein

HouzuoGuo · 2013-10-05T23:01:51Z

Hello Alex.

Those materials are indeed very helpful, thank you very much!

I am also studying Mongo's queue implementation and see if it helps.

NoahShen · 2013-11-07T07:47:26Z

I wrote a key-value db based on B+Tree in Java, maybe it's helpful
But don't know if you can read Chinese. I wrote the code comments and doc in Chinese.
I'll take time to rewrite the doc and translate it into English.

project location: https://code.google.com/p/jue/source/browse#svn%2Ftrunk%2Fjue%2Fsrc%2Fcom%2Fgooglecode%2Fjue
doc: https://drive.google.com/folderview?id=0B2aM_tq3elwXMjJlMjkyMzktNjFmYS00YjBjLWIyYzEtMzUxYWNlZTM2NjNl&usp=sharing

Thanks.

NoahShen · 2013-12-18T06:37:10Z

Hi Houzuo,
I use three data sync strategies in my project now. It's like Redis.

Always sync, sync every time when writing new data. Very slow but very safe.
Buffer sync, never sync unless buffer is full or the file is closed. Very fast, less safe.
Sync every sec, only sync data when latest sync time is more than 1 sec from now.

Maybe you can use this as reference.

NoahShen · 2013-12-18T06:41:19Z

Sorry, clicked wrong button, made a mistake.

HouzuoGuo · 2013-12-18T07:13:35Z

Thanks mate, what about:

Having an HTTP API to flush all buffers
Use a command line parameter to configure how often buffers are flushed automatically, by default it is 1 minute.

HouzuoGuo · 2013-12-29T02:25:47Z

I am not sure if the second point is useful or not, but the first point has been implemented: bfbc18d

In the meanwhile, I have not forgotten IO queue! Combining with another known issue #39 I now have a new idea...

Our experience tells us that tiedot scales well enough from 1 to 4 threads, and it takes advantage of hyperthreading as well. However, going beyond 8 threads does not improve performance at all.

And we know that:

goroutines are extremely lightweight, it does not hurt performance by spawning 20k of them
go channels scale almost linearly as number of hardware threads goes up

So how about:

Split each collection into 16MB chunks
Each chunk maintains its own index structure and data file
Each chunk has its own IO queue (a channel) and a single thread working on CRUD operations
All collection operations (including queries) are sent to a central IO queue (a channel), then dispatched to chunk(s) to work on

That will give us several advantages:

Even better data safety - every chunk can function independently - in case of a broken index structure from one chunk, others can still function at their maximum availability.
Better scalability - independent chunks do not interfere with one another's locks

But there is a disadvantage - the "tiedot new generation" will no longer be compatible with the current tiedot.

What do you think?

alexandrestein · 2013-12-29T08:12:04Z

I think this is a VERY VERY good idea. 👍

In my cases the retrocompatibility is not a problem. Offline procedure and other tricks are OK to me.

I don't have much time to play with Tiedot this time but I still can do some benchmarks for you.

Kind regards

NoahShen · 2013-12-30T02:33:36Z

Good idea! 👍
Collection operations are more like Map-Reduce, right?
Maybe you can also write a tool to convert old tiedot file to new generation file.

HouzuoGuo · 2013-12-30T22:26:35Z

OK, let's go ahead with that.

HouzuoGuo · 2014-01-04T14:39:43Z

Good news - the new idea works pretty well! I setup proof of concept benchmarks in nextgen branch, and the scalability of simple document CRUD operations went beyond my expectation.
I am still working on improving hash scan performance, UID operations have really poor performance at the moment.

Stay tuned.

alexandrestein · 2014-01-05T22:59:53Z

Maybe use B+Tree instead of hash could help improve UID operations.
This is an implementation of B+Tree for Go you could try: https://github.com/cznic/b

HouzuoGuo · 2014-01-06T06:02:46Z

Will do...

HouzuoGuo · 2014-02-27T14:38:27Z

Just an update on this issue:

By partitioning collection index/data and executing IO operations in parallel, tiedot still cannot achieve linear scalability (no speed up past 12 cores).

But don't worry - the "nextgen" branch brings a totally different approach (sorry - it happened again). It borrows the "single thread" idea from Redis, plus the coordination mechanism used by Cassandra, and tiedot runs each collection partition in an independent process, uses Unix domain socket as IPC mechanism.

Let's see...

HouzuoGuo · 2014-06-30T06:59:57Z

I have experimented with different implementations of using message passing mechanism to implement collection partitions, however none of them brought any performance improvement at all. Perhaps I could use more HPC programming exercises in Go. For now, I do not think that IO queueing will help improving tiedot scalability. Let's leave this one for later, ok?

NoahShen closed this as completed Dec 18, 2013

NoahShen reopened this Dec 18, 2013

HouzuoGuo mentioned this issue Dec 29, 2013

Why 130MB Initial Bucket on windows ? #39

Closed

ghost assigned HouzuoGuo Dec 30, 2013

HouzuoGuo closed this as completed Jun 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate IO queuing and its benefits #23

Investigate IO queuing and its benefits #23

HouzuoGuo commented Oct 1, 2013

alexandrestein commented Oct 5, 2013

HouzuoGuo commented Oct 5, 2013

NoahShen commented Nov 7, 2013

NoahShen commented Dec 18, 2013

NoahShen commented Dec 18, 2013

HouzuoGuo commented Dec 18, 2013

HouzuoGuo commented Dec 29, 2013

alexandrestein commented Dec 29, 2013

NoahShen commented Dec 30, 2013

HouzuoGuo commented Dec 30, 2013

HouzuoGuo commented Jan 4, 2014

alexandrestein commented Jan 5, 2014

HouzuoGuo commented Jan 6, 2014

HouzuoGuo commented Feb 27, 2014

HouzuoGuo commented Jun 30, 2014

Investigate IO queuing and its benefits #23

Investigate IO queuing and its benefits #23

Comments

HouzuoGuo commented Oct 1, 2013

alexandrestein commented Oct 5, 2013

HouzuoGuo commented Oct 5, 2013

NoahShen commented Nov 7, 2013

NoahShen commented Dec 18, 2013

NoahShen commented Dec 18, 2013

HouzuoGuo commented Dec 18, 2013

HouzuoGuo commented Dec 29, 2013

alexandrestein commented Dec 29, 2013

NoahShen commented Dec 30, 2013

HouzuoGuo commented Dec 30, 2013

HouzuoGuo commented Jan 4, 2014

alexandrestein commented Jan 5, 2014

HouzuoGuo commented Jan 6, 2014

HouzuoGuo commented Feb 27, 2014

HouzuoGuo commented Jun 30, 2014