Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate IO queuing and its benefits #23

Closed
HouzuoGuo opened this issue Oct 1, 2013 · 15 comments
Closed

Investigate IO queuing and its benefits #23

HouzuoGuo opened this issue Oct 1, 2013 · 15 comments

Comments

@HouzuoGuo
Copy link
Owner

Investigate how introducing IO queues can benefit tiedot's operation.

The IO queues will operate on:

  • document CRUD
  • hash bucket CRU
@alexandrestein
Copy link

I made a quick documentation on it: https://drive.google.com/folderview?id=0B9CkwEErGkG1WVVmZkJHMXNackU&usp=sharing ("docs" folder)

As I said to you, I'm not an expert in database but I'm in touch with people who deal with evy bandwidth and concurrent access. It sounds to me like databases are facing the same kind of problems than big storage. ;-)

Tell me how it sounds to you.
I hope it will help.

Regards,
Alexandre Stein

@HouzuoGuo
Copy link
Owner Author

Hello Alex.

Those materials are indeed very helpful, thank you very much!

I am also studying Mongo's queue implementation and see if it helps.

@NoahShen
Copy link
Collaborator

NoahShen commented Nov 7, 2013

I wrote a key-value db based on B+Tree in Java, maybe it's helpful
But don't know if you can read Chinese. I wrote the code comments and doc in Chinese.
I'll take time to rewrite the doc and translate it into English.

project location: https://code.google.com/p/jue/source/browse#svn%2Ftrunk%2Fjue%2Fsrc%2Fcom%2Fgooglecode%2Fjue
doc: https://drive.google.com/folderview?id=0B2aM_tq3elwXMjJlMjkyMzktNjFmYS00YjBjLWIyYzEtMzUxYWNlZTM2NjNl&usp=sharing

Thanks.

@NoahShen
Copy link
Collaborator

Hi Houzuo,
I use three data sync strategies in my project now. It's like Redis.

  • Always sync, sync every time when writing new data. Very slow but very safe.
  • Buffer sync, never sync unless buffer is full or the file is closed. Very fast, less safe.
  • Sync every sec, only sync data when latest sync time is more than 1 sec from now.

Maybe you can use this as reference.

@NoahShen NoahShen reopened this Dec 18, 2013
@NoahShen
Copy link
Collaborator

Sorry, clicked wrong button, made a mistake.

@HouzuoGuo
Copy link
Owner Author

Thanks mate, what about:

  • Having an HTTP API to flush all buffers
  • Use a command line parameter to configure how often buffers are flushed automatically, by default it is 1 minute.

@HouzuoGuo
Copy link
Owner Author

I am not sure if the second point is useful or not, but the first point has been implemented: bfbc18d

In the meanwhile, I have not forgotten IO queue! Combining with another known issue #39 I now have a new idea...

Our experience tells us that tiedot scales well enough from 1 to 4 threads, and it takes advantage of hyperthreading as well. However, going beyond 8 threads does not improve performance at all.

And we know that:

  • goroutines are extremely lightweight, it does not hurt performance by spawning 20k of them
  • go channels scale almost linearly as number of hardware threads goes up

So how about:

  • Split each collection into 16MB chunks
  • Each chunk maintains its own index structure and data file
  • Each chunk has its own IO queue (a channel) and a single thread working on CRUD operations
  • All collection operations (including queries) are sent to a central IO queue (a channel), then dispatched to chunk(s) to work on

That will give us several advantages:

  • Even better data safety - every chunk can function independently - in case of a broken index structure from one chunk, others can still function at their maximum availability.
  • Better scalability - independent chunks do not interfere with one another's locks

But there is a disadvantage - the "tiedot new generation" will no longer be compatible with the current tiedot.

What do you think?

@alexandrestein
Copy link

I think this is a VERY VERY good idea. 👍

In my cases the retrocompatibility is not a problem. Offline procedure and other tricks are OK to me.

I don't have much time to play with Tiedot this time but I still can do some benchmarks for you.

Kind regards

@NoahShen
Copy link
Collaborator

Good idea! 👍
Collection operations are more like Map-Reduce, right?
Maybe you can also write a tool to convert old tiedot file to new generation file.

@ghost ghost assigned HouzuoGuo Dec 30, 2013
@HouzuoGuo
Copy link
Owner Author

OK, let's go ahead with that.

@HouzuoGuo
Copy link
Owner Author

Good news - the new idea works pretty well! I setup proof of concept benchmarks in nextgen branch, and the scalability of simple document CRUD operations went beyond my expectation.
I am still working on improving hash scan performance, UID operations have really poor performance at the moment.

Stay tuned.

@alexandrestein
Copy link

Maybe use B+Tree instead of hash could help improve UID operations.
This is an implementation of B+Tree for Go you could try: https://github.com/cznic/b

@HouzuoGuo
Copy link
Owner Author

Will do...

@HouzuoGuo
Copy link
Owner Author

Just an update on this issue:

By partitioning collection index/data and executing IO operations in parallel, tiedot still cannot achieve linear scalability (no speed up past 12 cores).

But don't worry - the "nextgen" branch brings a totally different approach (sorry - it happened again). It borrows the "single thread" idea from Redis, plus the coordination mechanism used by Cassandra, and tiedot runs each collection partition in an independent process, uses Unix domain socket as IPC mechanism.

Let's see...

@HouzuoGuo
Copy link
Owner Author

I have experimented with different implementations of using message passing mechanism to implement collection partitions, however none of them brought any performance improvement at all. Perhaps I could use more HPC programming exercises in Go. For now, I do not think that IO queueing will help improving tiedot scalability. Let's leave this one for later, ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants