New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharding support #104

Open
jeffdoolittle opened this Issue Jan 7, 2016 · 13 comments

Comments

Projects
None yet
7 participants
@jeremydmiller

This comment has been minimized.

Show comment
Hide comment
@jeremydmiller

jeremydmiller Jan 7, 2016

Contributor

@jeffdoolittle Can you elaborate on that? What are you envisioning?

Contributor

jeremydmiller commented Jan 7, 2016

@jeffdoolittle Can you elaborate on that? What are you envisioning?

@jeffdoolittle

This comment has been minimized.

Show comment
Hide comment
@jeffdoolittle

jeffdoolittle Jan 7, 2016

Contributor

I was curious to see what others might be thinking along such lines. I know Raven DB has sharding capabilities like this. Would be nice to spin up some AWS Postgres instances (https://aws.amazon.com/rds/postgresql/) and have the ability to both persist and query across sharded instances. Also would be nice to configure sharding using the strategies I mentioned in the original description on this issue.

Anyone else have feedback/ideas on the utility, feasibility, and/or demand for such a feature?

--Jeff

Contributor

jeffdoolittle commented Jan 7, 2016

I was curious to see what others might be thinking along such lines. I know Raven DB has sharding capabilities like this. Would be nice to spin up some AWS Postgres instances (https://aws.amazon.com/rds/postgresql/) and have the ability to both persist and query across sharded instances. Also would be nice to configure sharding using the strategies I mentioned in the original description on this issue.

Anyone else have feedback/ideas on the utility, feasibility, and/or demand for such a feature?

--Jeff

@maggiepint

This comment has been minimized.

Show comment
Hide comment
@maggiepint

maggiepint Apr 3, 2016

I'd be interested to know where you guys want to go with this.

As a casual observer, I'll note that I would use a shard strategy similar to RavenDBs. This was something I really liked about RavenDB, - the shard implementation is quite clean. I don't see the need for blind sharding though. IMO If you're going to shard, you might as well define where your data is going.

It's worth nothing that I kinda like the similar thing Microsoft did with Elastic Database tools on Azure for multi-shard data routing and querying. In particular, the practice of pulling a shard map into the client, and then having multiple ways to open database connections based on that shard map makes sense to me. I haven't used that in production though.

maggiepint commented Apr 3, 2016

I'd be interested to know where you guys want to go with this.

As a casual observer, I'll note that I would use a shard strategy similar to RavenDBs. This was something I really liked about RavenDB, - the shard implementation is quite clean. I don't see the need for blind sharding though. IMO If you're going to shard, you might as well define where your data is going.

It's worth nothing that I kinda like the similar thing Microsoft did with Elastic Database tools on Azure for multi-shard data routing and querying. In particular, the practice of pulling a shard map into the client, and then having multiple ways to open database connections based on that shard map makes sense to me. I haven't used that in production though.

@jeremydmiller

This comment has been minimized.

Show comment
Hide comment
@jeremydmiller

jeremydmiller Apr 3, 2016

Contributor

I honestly haven't done more on this than browse the links @jeffdoolittle put up above. My thought for us at work was to just use something like Citrus.

Contributor

jeremydmiller commented Apr 3, 2016

I honestly haven't done more on this than browse the links @jeffdoolittle put up above. My thought for us at work was to just use something like Citrus.

@jeffdoolittle

This comment has been minimized.

Show comment
Hide comment
@jeffdoolittle

jeffdoolittle Apr 4, 2016

Contributor

Well goodness, Citus looks pretty awesome.

Contributor

jeffdoolittle commented Apr 4, 2016

Well goodness, Citus looks pretty awesome.

@tim-cools

This comment has been minimized.

Show comment
Hide comment
@tim-cools

tim-cools Apr 5, 2016

Contributor

Wow, indeed!

👍 for not making this a Marten responsibility

Contributor

tim-cools commented Apr 5, 2016

Wow, indeed!

👍 for not making this a Marten responsibility

@jeremydmiller jeremydmiller modified the milestone: 1.1 Jun 20, 2016

@khalidabuhakmeh

This comment has been minimized.

Show comment
Hide comment
@khalidabuhakmeh

khalidabuhakmeh Aug 16, 2016

Contributor

I would agree that leaning on the tooling and ecosystem Postgres provides should be the first option to solving any problem with management.

Contributor

khalidabuhakmeh commented Aug 16, 2016

I would agree that leaning on the tooling and ecosystem Postgres provides should be the first option to solving any problem with management.

@jeremydmiller

This comment has been minimized.

Show comment
Hide comment
@jeremydmiller

jeremydmiller Aug 25, 2016

Contributor

I'm shutting this one down. At least until someone can prove that Citrus or pg_shard isn't enough.

Contributor

jeremydmiller commented Aug 25, 2016

I'm shutting this one down. At least until someone can prove that Citrus or pg_shard isn't enough.

@pruiz

This comment has been minimized.

Show comment
Hide comment
@pruiz

pruiz Aug 25, 2016

Hi @jeremydmiller well, to be honest, citus does have some limitations (including cross-shard ACID), etc. which are in fact an issue.

Also, sharding by using an specific product (like citus) which does provide distributed-postgres, does provide some advantages, but it has limited performance scalability compared to sharding by means of completely-isolated postgres instances.

I think we still need some kind of minimal support from marten by which we could provide some sort of ShardingStrategy in order to support the later case (ie. distribute entries among different [and indepentend] postgres instances). This way, an object will be sharded by their identity key (or by some other column/property, like the object's tenant identifier property).

Just my two cents.

pruiz commented Aug 25, 2016

Hi @jeremydmiller well, to be honest, citus does have some limitations (including cross-shard ACID), etc. which are in fact an issue.

Also, sharding by using an specific product (like citus) which does provide distributed-postgres, does provide some advantages, but it has limited performance scalability compared to sharding by means of completely-isolated postgres instances.

I think we still need some kind of minimal support from marten by which we could provide some sort of ShardingStrategy in order to support the later case (ie. distribute entries among different [and indepentend] postgres instances). This way, an object will be sharded by their identity key (or by some other column/property, like the object's tenant identifier property).

Just my two cents.

@jeremydmiller

This comment has been minimized.

Show comment
Hide comment
@jeremydmiller

jeremydmiller Aug 25, 2016

Contributor

@pruiz K. I'm reopening this then. Might just end up being closely related to the multi-tenancy issue.

Contributor

jeremydmiller commented Aug 25, 2016

@pruiz K. I'm reopening this then. Might just end up being closely related to the multi-tenancy issue.

@jeremydmiller jeremydmiller reopened this Aug 25, 2016

@jeremydmiller jeremydmiller modified the milestones: 1.1, 1.2 Oct 4, 2016

@jeremydmiller jeremydmiller modified the milestone: 1.2 Oct 18, 2016

@jaydanielian

This comment has been minimized.

Show comment
Hide comment
@jaydanielian

jaydanielian Jan 10, 2017

@jeremydmiller , this library is outstanding. As far as the sharding discussion goes, I would point you to another fabulous library called Sequel, this one is in the Ruby community maintained by Jeremy Evans http://sequel.jeremyevans.net/rdoc/files/doc/sharding_rdoc.html

His library allows for multiple server nodes to be specified in the connection pool, those server nodes can be tagged as read-only (nice for directing select queries to read replicas automatically), or the application allows for database level sharding and its smart enough to round robin queries to the appropriate "node" in the server list. Extensions are also available meaning you can always instruct the session to go to a particular node.

Anyway, I hope you will see this example and be able to extract some inspiration in terms of applying this concept to Marten. Using Ruby Sequel, I am able to application shard the data across several Amazon Aurora instances thus getting horizontal scalability and nice automatic HA in the event of data/instance failure. Its a pretty powerful feature for the data driver to include.

jaydanielian commented Jan 10, 2017

@jeremydmiller , this library is outstanding. As far as the sharding discussion goes, I would point you to another fabulous library called Sequel, this one is in the Ruby community maintained by Jeremy Evans http://sequel.jeremyevans.net/rdoc/files/doc/sharding_rdoc.html

His library allows for multiple server nodes to be specified in the connection pool, those server nodes can be tagged as read-only (nice for directing select queries to read replicas automatically), or the application allows for database level sharding and its smart enough to round robin queries to the appropriate "node" in the server list. Extensions are also available meaning you can always instruct the session to go to a particular node.

Anyway, I hope you will see this example and be able to extract some inspiration in terms of applying this concept to Marten. Using Ruby Sequel, I am able to application shard the data across several Amazon Aurora instances thus getting horizontal scalability and nice automatic HA in the event of data/instance failure. Its a pretty powerful feature for the data driver to include.

@jeremydmiller

This comment has been minimized.

Show comment
Hide comment
@jeremydmiller

jeremydmiller May 6, 2017

Contributor

This one might be much more viable after we get the multi-tenancy support going.

Contributor

jeremydmiller commented May 6, 2017

This one might be much more viable after we get the multi-tenancy support going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment