[WIP] Add shards_cluster for PG support in OTP23 #49

alisinabh · 2020-09-20T15:05:33Z

Change all pg2 usages to shards_cluster in other modules and tests.
Implement pg2 proxy functions in shards_cluster.
Implement local pg operations in shards_cluster.
Implement remote pg operations to handle remote join/leave.
Handle join/2 and leave/2 on list of pids.
Implement delete/1 for pg using leave/2 on all pids.
~~Wait for completion of remote join and leave as it makes the output on shards.leave/join invalid. (Because of concurrency)~~ Not possible due to population problem in pg implementation.
Write tests for shards_cluster.
Make sure pg is started before shards.

cabol · 2020-09-20T15:33:33Z

src/shards_cluster.erl

+  join(Group, Pid) ->
+    %% HACK: Maybe implement apply_on_target?
+    OwnerNode = node(Pid),
+    if


Let's use case instead

alisinabh · 2020-09-22T19:05:23Z

Hi @cabol,

So everything is implemented except tests but there are some issues I need your help around.

The biggest issue is that when we call leave or join on pg, The members list won't be updated immediately. Because pg is optimized for speed and does not guarantee consistency! So one test was failing because list of nodes was not updated right after the leave. Failing test was: shards_dist_SUITE.erl:146. I saw you had a sleep in there too and assume you anticipated this issue well before pg was event a thing :))

Another issue that I'm having is pg does not start automatically and in it's documentation it says we should set kernel parameters for that. Can you give me some hints on how to do that in this project?

P.S. I think the CI system may be malfunctioning!? I'm not quite sure!

Thanks.

alisinabh · 2020-09-23T02:02:11Z

test/shards_dist_SUITE.erl

@@ -143,7 +143,10 @@ t_join_leave_ops(Config) ->

  % leave node E from SET
  OkNodes2 = lists:usort([node() | lists:droplast(OkNodes1)]),
-  OkNodes2 = shards:leave(?SET, [ENode]),


This line sometime fails due to pg propagation issue. I've just ignored the result for now!

cabol · 2020-10-02T07:47:02Z

Hi @alisinabh , first of all, thanks a lot for the effort 😄 !!

The biggest issue is that when we call leave or join on pg, The members list won't be updated immediately. Because pg is optimized for speed and does not guarantee consistency! So one test was failing because list of nodes was not updated right after the leave. Failing test was: shards_dist_SUITE.erl:146. I saw you had a sleep in there too and assume you anticipated this issue well before pg was event a thing :))

Yeah agreed, the join and leave functions have to be very different, considering all that you mention. And about the test, I had to put the sleep because yes, I was having a sync issue because the group was not updated immediately across the cluster after leave, but not anticipating this 😂 .

Another issue that I'm having is pg does not start automatically and in it's documentation it says we should set kernel parameters for that. Can you give me some hints on how to do that in this project?

Overall, I was checking out the issues you mention and also the pg docs and I think this is going to be a bit more complicated than I expected, because there are significant differences in the implementation, so there are several parts we have to refactor. I'd like to propose something, due to this, one of my ideas is separate shards_dist in a separate repository and continue from there. It will be easier because we'll have more flexibility to do changes without affecting shards. Once we have shards_dist in its own repository I think we can either continue the current approach or change it completely if that is the best, the point is, we will have more flexibility in the implementation. What do you think? I'd really like to know your thoughts.

PD: I could work this weekend on this refactor, moving shards_dist to another repo and let shards completely isolated, only for local sharding.

alisinabh · 2020-10-02T16:57:45Z

Thank you @cabol,

pg is more complicated than it seemed initially. I completely agree with you on separating the repositories since a quick fix at this stage is not possible without messing with shards project configuration and that is not necessary.

I still will be happy to help on this. Please let me know if there was anything that I could do.

Thanks again.

Add shards_cluster for PG support in OTP23

bbc62a9

cabol reviewed Sep 20, 2020

View reviewed changes

alisinabh added 4 commits September 20, 2020 21:20

Change to case for matching nodes

50d1bd4

Use erpc:call for pg join and leave.

a453055

Fix tests for pg members population issue

66641de

Impl delete, handle list of pids on join and leave

5571bb5

alisinabh commented Sep 23, 2020

View reviewed changes

alisinabh closed this Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add shards_cluster for PG support in OTP23 #49

[WIP] Add shards_cluster for PG support in OTP23 #49

alisinabh commented Sep 20, 2020 •

edited

cabol Sep 20, 2020

alisinabh commented Sep 22, 2020

alisinabh Sep 23, 2020

cabol commented Oct 2, 2020

alisinabh commented Oct 2, 2020

[WIP] Add shards_cluster for PG support in OTP23 #49

[WIP] Add shards_cluster for PG support in OTP23 #49

Conversation

alisinabh commented Sep 20, 2020 • edited

cabol Sep 20, 2020

Choose a reason for hiding this comment

alisinabh commented Sep 22, 2020

alisinabh Sep 23, 2020

Choose a reason for hiding this comment

cabol commented Oct 2, 2020

alisinabh commented Oct 2, 2020

alisinabh commented Sep 20, 2020 •

edited