Add mechanism for recovering from disconnections in conferences #1069

zugz · 2018-08-12T11:55:28Z

This change is

CLAassistant · 2018-08-12T11:55:36Z

All committers have signed the CLA.

iphydf

I'll review the logic more in-depth later.

iphydf · 2018-08-12T11:58:50Z

toxcore/group.c

+        return -1;
+    }
+
+    memcpy(&temp[g->numpeers], &g->frozen[frozen_index], sizeof(Group_Peer));


This can be a regular assignment.

iphydf · 2018-08-12T11:59:30Z

toxcore/group.c

+        g->frozen = nullptr;
+    } else {
+        if (g->numfrozen != (uint32_t)frozen_index) {
+            memcpy(&g->frozen[frozen_index], &g->frozen[g->numfrozen], sizeof(Group_Peer));


Also regular assignment.

iphydf · 2018-08-12T12:01:50Z

toxcore/group.c

 {
    Group_c *g = get_group_c(g_c, groupnumber);

    if (!g) {
        return -1;
    }

-    // TODO(irungentoo):
-    int peer_index = peer_in_chat(g, real_pk);
+    int peer_index = -1;


This can be "const int" if you use a ternary expression.

iphydf · 2018-08-12T12:03:08Z

toxcore/group.c

        }
+
+        memcpy(&temp[g->numfrozen], &g->group[peer_index], sizeof(Group_Peer));


Use assignment.

iphydf · 2018-08-12T12:04:04Z

toxcore/group.c

+        return -1;
+    }
+
+    int friendcon_id = getfriend_conn_id_pk(g_c->fr_c, real_pk);


iphydf · 2018-08-12T12:04:12Z

toxcore/group.c

+        return -1;
+    }
+
+    int close_index = add_conn_to_groupchat(g_c, friendcon_id, groupnumber, 0, 1);


zugz · 2018-08-12T13:54:26Z

Done all. Thanks. I also made some more things const.

iphydf · 2018-08-12T15:51:24Z

docs/minpgc.md

+
+# Discussion
+## Overview
+The intention is to recover seemlessly from splits in the group, the most 


zugz · 2018-08-13T19:43:02Z

+The intention is to recover seemlessly from splits in the group, the most seamlessly

Oops! Done.

iphydf

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @iphydf and @zugz)

auto_tests/conference_test.c, line 107 at r4 (raw file):

    uint16_t disconnect_toxes[NUM_DISCONNECT];

    for (uint16_t i = 0; i < NUM_DISCONNECT; ++i) {

NUM_DISCONNECT is actually MAX_DISCONNECT? Since numbers may repeat?

auto_tests/conference_test.c, line 112 at r4 (raw file):

    }

    for (uint16_t j = 0; j < 70 * 5; ++j) {

What's 70 * 5? It looks like we wait for a fixed amount of time. Perhaps we should wait for events to happen? I.e. wait until all the toxes we expect to be disconnected are in fact disconnected.

auto_tests/conference_test.c, line 128 at r4 (raw file):

        }

        c_sleep(200);

I think the test can be a little faster by using 100 here. I think we have an ITERATION_INTERVAL constant for that, which was chosen to iterate as few times as possible while giving about the same test timing. 200, I think (I might be wrong), was increasing test times significantly. The reason I kept 200 here is because in some loops we print something on every loop iteration. That can be fixed, and then we can iterate more frequently, speeding up this test (which currently times out).

auto_tests/conference_test.c, line 141 at r4 (raw file):

    printf("reconnecting toxes\n");

    for (uint16_t j = 0; j < 120 * 5; ++j) {

Should we also wait for an event here?

iphydf · 2018-08-15T23:28:50Z

auto_tests/conference_test.c

+    bool disconnected[NUM_GROUP_TOX];
+
+    for (uint16_t i = 0; i < NUM_GROUP_TOX; ++i) {
+        disconnected[i] = false;


I don't know.. bool disconnected[NUM_GROUP_TOX] = {0}; does the same. You may have a reason to have this explicit loop. What's the reason?

No good reason. Fixed.

iphydf · 2018-08-15T23:29:58Z

auto_tests/conference_test.c

+        uint16_t disconnect;
+
+        do {
+            disconnect = random_u16() % NUM_GROUP_TOX;


It took me about 30 seconds to understand this loop. I might be incompetent, but more incompetent people are going to read this and not understand it within 10 seconds. Perhaps you can factor this out into a function with a name (and maybe a comment) to clarify what this does.

iphydf · 2018-08-15T23:33:25Z

toxcore/group.c

+        g->peer_on_join(g->object, groupnumber, g->numpeers - 1);
+    }
+
+    --g->numfrozen;


It might be worth asserting that this isn't 0 before decrementing. I'm conflicted, because I soon want assert to mean "provably true" and this assertion won't be easy to prove. Just something to consider.

Instead, I rearranged things to call the callbacks after we deal with g->frozen. So now we know g->numfrozen > 0 because we just checked that get_frozen_index didn't return -1. It makes more sense this way anyway.

iphydf

Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @iphydf)

toxcore/group.c, line 1269 at r1 (raw file):

Previously, iphydf wrote…

const

No const?

zugz · 2018-08-19T17:46:49Z

No const?

Done.

anthonybilinski · 2018-08-19T19:34:55Z

I tried running it now and ran into a few problems:

On start of client, I see these warning logs:

Warning: "ERROR:Messenger.c:m_send_message_generic:508: Message length 1374 is too large"
Warning: "ERROR:Messenger.c:m_send_message_generic:508: Message length 1378 is too large"
Warning: "ERROR:Messenger.c:m_send_message_generic:508: Message length 1378 is too large"
Warning: "ERROR:Messenger.c:m_send_message_generic:508: Message length 1378 is too large"

I see this warning log very frequently on my client running non-pgc toxcore which makes sense, but I see it occasionally on clients running pgc toxcore as well, looks like at times when I would expect Peer Query.

core/toxlogger.cpp:57 : Warning: "WARN :network.c:networking_poll:654: [33] -- Packet has no handler"

I also see the ID warning during auto_conference_test after "letting random toxes timeout".

I see a lot of this during the auto_conference_test, but seems less serious:

WARNING network.c:550	sendpacket:	unknown address type: 0

qTox segfaults on close (minpgc toxcore), only if there are still others in the group, see:

double free or corruption (out)
Aborted (core dumped)

Probably a qTox issue, qTox segfaults on leaving group (old toxcore, pgc group members)

zugz · 2018-08-20T16:03:19Z

Thanks for testing.

Warning: "ERROR:Messenger.c:m_send_message_generic:508: Message length 1378 is too large"

I don't see what would cause this. Is it really introduced by minpgc?

core/toxlogger.cpp:57 : Warning: "WARN :network.c:networking_poll:654: [33] -- Packet has no handler" WARNING network.c:550 sendpacket: unknown address type: 0

Interesting. I don't see either of these during the auto_conference_test, but travis gets the latter at least. Do you get either with master?

4. qTox segfaults on close (minpgc toxcore) double free or corruption (out)

Aargh. That one was my fault. Thanks for finding it! Should be fixed now.

5. Probably a qTox issue, qTox segfaults on leaving group (old toxcore, pgc group members)

Well that certainly isn't minpgc's fault - the client shouldn't segfault whatever peers do. I can't imagine what could be causing it.

anthonybilinski · 2018-08-22T20:30:05Z

Is it really introduced by minpgc?

hard to say, but going back and forth I do not see it with master but do see it with this branch. I will test a bit more to make sure.

Do you get either with master?

no, didn't see either with master. I think the "Packet has no handler" is actually blocking functionality, if I close and re-open my client using this branch, I don't rejoin the group, but I see those warnings instead (I think). I have some non-default network settings in client, i.e. ipv4 only, I will test full default and see if I can still repro.

whatever peers do. I can't imagine what could be causing it.

Agreed, just got a repro on stock toxcore, guaranteed qTox issue.

zugz · 2018-08-22T21:51:25Z

no, didn't see either with master. [...] "Packet has no handler"

That one really makes no sense to me. It should be triggered only if you receive a packet with an unknown packet kind. minpgc doesn't add any new packet kinds (only a new crypto packet id), and nor can I imagine why anything it does would solicit any non-standard packets from other peers.

if I close and re-open my client using this branch, I don't rejoin the group

That's expected! You need to have a friend in the group who's also running minpgc for the persistence to work, and even then it doesn't work if you restart the client rather than just lose your network connection, because saving and loading isn't implemented (yet).

anthonybilinski · 2018-08-23T06:44:36Z

Ok more testing and some root causing:

I do seem this with stock toxcore master tip. I didn't see it before I guess because my toxcore was older, created Message length is too large log spam #1116
Same, created Add empty handler for LAN discovery packets when LAN discovery is disabled #1115
May have been the same as 5? Saw this on stock toxcore.
Looks like it actually is toxcore, not qTox. See it on stock toxcore too, issue opened: Segfault on group quit, free of invalid audio_decoder #1114

I tested again stopping one of the clients in the group with gdb until they were viewed as offline instead of restarting the client, and you're right that that works. I was under the impression that the other minpgc peer would re-include the re-connecting minpgc peer, but it makes sense that the peer who left needs to maintain the state as well.

In that case testing looks good from my end. I'll continue reading through the changes tomorrow.

zugz · 2018-08-23T18:59:54Z

* Wednesday, 2018-08-22 at 23:44 -0700 - Anthony Bilinski <notifications@github.com>:

4. May have been the same as 5? Saw this on stock toxcore.

Ah yes, that would make sense. Well, it made me find and fix a bug in minpgc anyway!

5. Looks like it actually is toxcore

Yes. #1117 should fix it.

In that case testing looks good from my end. I'll continue reading through the changes tomorrow.

Great. Thanks for testing and reading! I've just pushed a couple more little changes, and there's one more thing I know needs doing (there's a FIXME in the code for it), but otherwise to my knowledge it's bugfree and working.

codecov · 2018-09-01T16:51:58Z

Codecov Report

Merging #1069 into master will increase coverage by 0.5%.
The diff coverage is 94.1%.

@@           Coverage Diff            @@
##           master   #1069     +/-   ##
========================================
+ Coverage    82.5%   83.1%   +0.5%     
========================================
  Files          82      82             
  Lines       14476   14593    +117     
========================================
+ Hits        11948   12131    +183     
+ Misses       2528    2462     -66

Impacted Files	Coverage Δ
auto_tests/overflow_sendq_test.c	`100% <100%> (ø)`	⬆️
auto_tests/run_auto_test.h	`100% <100%> (ø)`	⬆️
auto_tests/reconnect_test.c	`100% <100%> (ø)`	⬆️
auto_tests/conference_test.c	`99.1% <100%> (+1.4%)`	⬆️
auto_tests/friend_connection_test.c	`100% <100%> (ø)`	⬆️
auto_tests/overflow_recvq_test.c	`100% <100%> (ø)`	⬆️
auto_tests/send_message_test.c	`94.1% <100%> (ø)`	⬆️
auto_tests/conference_peer_nick_test.c	`100% <100%> (ø)`	⬆️
auto_tests/conference_double_invite_test.c	`100% <100%> (ø)`	⬆️
auto_tests/lossy_packet_test.c	`100% <100%> (ø)`	⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6872c14...1b23222. Read the comment docs.

zugz · 2018-09-01T17:02:58Z

This is now ready for final review.

sudden6

Reviewed 3 of 15 files at r6.
Reviewable status: 1 change requests, 0 of 1 approvals obtained (waiting on @iphydf and @zugz)

toxcore/group.c, line 1320 at r6 (raw file):

 * return -1 otherwise.
 */
static int try_send_rejoin(Group_Chats *g_c, uint32_t groupnumber, const uint8_t *real_pk)

use true/false for success/failure?

toxcore/group.c, line 1898 at r6 (raw file):

}

static int handle_packet_rejoin(Group_Chats *g_c, int friendcon_id, const uint8_t *data, uint16_t length,

use true/false for success/failure?

zugz

Reviewable status: 2 change requests, 0 of 1 approvals obtained (waiting on @iphydf, @zugz, and @sudden6)

auto_tests/conference_test.c, line 107 at r4 (raw file):

Previously, iphydf wrote…

NUM_DISCONNECT is actually MAX_DISCONNECT? Since numbers may repeat?

Not any more.

auto_tests/conference_test.c, line 112 at r4 (raw file):

Previously, iphydf wrote…

What's 70 * 5? It looks like we wait for a fixed amount of time. Perhaps we should wait for events to happen? I.e. wait until all the toxes we expect to be disconnected are in fact disconnected.

Done.

auto_tests/conference_test.c, line 128 at r4 (raw file):

Previously, iphydf wrote…

I think the test can be a little faster by using 100 here. I think we have an ITERATION_INTERVAL constant for that, which was chosen to iterate as few times as possible while giving about the same test timing. 200, I think (I might be wrong), was increasing test times significantly. The reason I kept 200 here is because in some loops we print something on every loop iteration. That can be fixed, and then we can iterate more frequently, speeding up this test (which currently times out).

Done.

auto_tests/conference_test.c, line 141 at r4 (raw file):

Previously, iphydf wrote…

Should we also wait for an event here?

Done.

toxcore/group.c, line 1320 at r6 (raw file):

Previously, sudden6 wrote…

use true/false for success/failure?

Done.

toxcore/group.c, line 1898 at r6 (raw file):

Previously, sudden6 wrote…

use true/false for success/failure?

Making this bool would introduce an ugly mismatch with g_handle_packet. The whole file should be converted to use boolean returns where possible, but this is outside the scope of this PR.

sudden6

Reviewable status: 2 change requests, 0 of 1 approvals obtained (waiting on @iphydf)

toxcore/group.c, line 1898 at r6 (raw file):

Previously, zugz (zugz) wrote…

Making this bool would introduce an ugly mismatch with g_handle_packet. The whole file should be converted to use boolean returns where possible, but this is outside the scope of this PR.

ok

robinlinden

Reviewed 2 of 5 files at r1, 1 of 4 files at r3, 2 of 3 files at r5, 14 of 15 files at r6, 1 of 1 files at r7, 1 of 1 files at r8.
Reviewable status: 2 change requests, 0 of 1 approvals obtained (waiting on @iphydf)

sudden6

Reviewable status: complete! 2 change requests, 1 of 1 approvals obtained (waiting on @iphydf)

toxcore/group.c

* add freezing and unfreezing of peers * add rejoin packet * revise handling of temporary invited connections * rename "peer kill" packet to "peer leave" packet * test rejoining in conference test * use custom clock in conference test

iphydf requested changes Aug 12, 2018

View reviewed changes

iphydf added this to the v0.2.x milestone Aug 12, 2018

iphydf reviewed Aug 12, 2018

View reviewed changes

docs/minpgc.md Outdated

# Discussion

## Overview

The intention is to recover seemlessly from splits in the group, the most

Copy link

Member

iphydf Aug 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seamlessly

iphydf mentioned this pull request Aug 12, 2018

WIP: Make conferences persistent across restarts and reconnects. #826

Closed

zugz force-pushed the minpgc branch 2 times, most recently from a14947c to 1c1ebd5 Compare August 13, 2018 19:41

iphydf requested changes Aug 13, 2018

View reviewed changes

zugz force-pushed the minpgc branch 5 times, most recently from 1fc620c to 93b6d54 Compare August 19, 2018 10:51

iphydf requested changes Aug 19, 2018

View reviewed changes

zugz force-pushed the minpgc branch from d2ffce1 to bc82582 Compare August 19, 2018 21:26

This was referenced Aug 23, 2018

Group chats lost on restart qTox/qTox#5301

Closed

[Feature Request] Group Chat is essential for its futural popularity qTox/qTox#5300

Closed

zugz force-pushed the minpgc branch 3 times, most recently from cb731c6 to e3441ea Compare August 31, 2018 16:44

zugz force-pushed the minpgc branch 2 times, most recently from c718ae0 to 14a58e4 Compare September 1, 2018 16:51

sudden6 requested changes Sep 2, 2018

View reviewed changes

zugz commented Sep 2, 2018

View reviewed changes

sudden6 reviewed Sep 3, 2018

View reviewed changes

robinlinden approved these changes Sep 4, 2018

View reviewed changes

sudden6 approved these changes Sep 4, 2018

View reviewed changes

anthonybilinski mentioned this pull request Sep 4, 2018

how to use persistent chats? qTox/qTox#5326

Closed

iphydf approved these changes Sep 4, 2018

View reviewed changes

iphydf requested changes Sep 4, 2018

View reviewed changes

toxcore/group.c Outdated Show resolved Hide resolved

iphydf approved these changes Sep 5, 2018

View reviewed changes

zugz changed the title ~~minimal conference persistence~~ Add mechanism for recovering from disconnections in conferences Sep 5, 2018

zugz force-pushed the minpgc branch from df1863f to 1b23222 Compare September 5, 2018 19:00

zugz merged commit 1b23222 into TokTok:master Sep 5, 2018

zugz deleted the minpgc branch September 5, 2018 19:11

whoizit mentioned this pull request Sep 6, 2018

future plans (full feature persistent chats, without bots) tox-rs/tox#213

Closed

robinlinden modified the milestones: v0.2.x, v0.2.8 Oct 7, 2018

NulAsh mentioned this pull request Oct 24, 2018

Preserve group chats after restart/disconnect (persistent) & notify on new messages qTox/qTox#896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mechanism for recovering from disconnections in conferences #1069

Add mechanism for recovering from disconnections in conferences #1069

zugz commented Aug 12, 2018 •

edited by iphydf

Loading

CLAassistant commented Aug 12, 2018 •

edited

Loading

iphydf left a comment

iphydf Aug 12, 2018

iphydf Aug 12, 2018

iphydf Aug 12, 2018

iphydf Aug 12, 2018

iphydf Aug 12, 2018

zugz Aug 19, 2018

iphydf Aug 12, 2018

zugz commented Aug 12, 2018 via email

iphydf Aug 12, 2018

zugz commented Aug 13, 2018 via email

iphydf left a comment

iphydf Aug 15, 2018

zugz Aug 19, 2018

iphydf Aug 15, 2018

zugz Aug 19, 2018

iphydf Aug 15, 2018

zugz Aug 19, 2018

iphydf left a comment

zugz commented Aug 19, 2018 via email

anthonybilinski commented Aug 19, 2018

zugz commented Aug 20, 2018 via email

anthonybilinski commented Aug 22, 2018 •

edited

Loading

zugz commented Aug 22, 2018 via email

anthonybilinski commented Aug 23, 2018

zugz commented Aug 23, 2018 via email

codecov bot commented Sep 1, 2018 •

edited

Loading

zugz commented Sep 1, 2018

sudden6 left a comment

zugz left a comment

sudden6 left a comment

robinlinden left a comment

sudden6 left a comment

		}

		memcpy(&temp[g->numfrozen], &g->group[peer_index], sizeof(Group_Peer));

Add mechanism for recovering from disconnections in conferences #1069

Add mechanism for recovering from disconnections in conferences #1069

Conversation

zugz commented Aug 12, 2018 • edited by iphydf Loading

CLAassistant commented Aug 12, 2018 • edited Loading

iphydf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zugz commented Aug 12, 2018 via email

Choose a reason for hiding this comment

zugz commented Aug 13, 2018 via email

iphydf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iphydf left a comment

Choose a reason for hiding this comment

zugz commented Aug 19, 2018 via email

anthonybilinski commented Aug 19, 2018

zugz commented Aug 20, 2018 via email

anthonybilinski commented Aug 22, 2018 • edited Loading

zugz commented Aug 22, 2018 via email

anthonybilinski commented Aug 23, 2018

zugz commented Aug 23, 2018 via email

codecov bot commented Sep 1, 2018 • edited Loading

Codecov Report

zugz commented Sep 1, 2018

sudden6 left a comment

Choose a reason for hiding this comment

zugz left a comment

Choose a reason for hiding this comment

sudden6 left a comment

Choose a reason for hiding this comment

robinlinden left a comment

Choose a reason for hiding this comment

sudden6 left a comment

Choose a reason for hiding this comment

zugz commented Aug 12, 2018 •

edited by iphydf

Loading

CLAassistant commented Aug 12, 2018 •

edited

Loading

anthonybilinski commented Aug 22, 2018 •

edited

Loading

codecov bot commented Sep 1, 2018 •

edited

Loading