Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm contact list creation during account creation #2057

Closed
Tracked by #2163
alltheseas opened this issue Mar 15, 2024 · 42 comments
Closed
Tracked by #2163

Confirm contact list creation during account creation #2057

alltheseas opened this issue Mar 15, 2024 · 42 comments
Assignees
Labels
bug Something is not working, or not working as intended Needs recreation Issues requires concrete steps for recreation onboarding technical

Comments

@alltheseas
Copy link
Collaborator

alltheseas commented Mar 15, 2024

solution

We should only continue onboarding if we get a contact list creation confirmation from the server.

https://damus.io/nevent1qqswkugx9lh2lye8snjxgmwl70p85qeanhe99erm49al04qa9nptsec46djng

problem observation

I’m suspecting that some unhappy path damus onboarding leads to some limbo state where there is no relay list, or contact list

I could not recreate with two new test profiles.

diagnosis

This can happen if they create an account when they are not connected to
the internet. We should only continue onboarding if we get a contact
list creation confirmation from the server.

@alltheseas alltheseas added bug Something is not working, or not working as intended Needs recreation Issues requires concrete steps for recreation onboarding labels Mar 15, 2024
@alltheseas
Copy link
Collaborator Author

@ericholguin confirms independently an additional in person bug report

@alltheseas
Copy link
Collaborator Author

alltheseas commented Mar 15, 2024

Bug reporter confirmed they have only used Damus, and no other nostr apps.

@alltheseas
Copy link
Collaborator Author

I am concerned that this is not a one-off, and new folks are dropping off Damus without having a chance to test it.

@alltheseas
Copy link
Collaborator Author

@alltheseas
Copy link
Collaborator Author

@jb55 advises

This can happen if they create an account when they are not connected to
the internet. We should only continue onboarding if we get a contact
list creation confirmation from the server.

It can also happen if they are using an older key and it can't find a
contact list

@alltheseas alltheseas changed the title Troubleshoot limbo onboarding state (relays not displayed on profile) only continue onboarding if we get a contact list creation confirmation from the server. Troubleshoot limbo onboarding state (relays not displayed on profile) Mar 17, 2024
@alltheseas
Copy link
Collaborator Author

alltheseas commented Mar 17, 2024

@danieldaquino @jb55 @ericholguin @kernelkind

Can one of yall grab this to the current sprint? The unhappy path here is a shit experience for new damus users. We have seen an uptick in new folks, and I dont want to keep scaring them off due to bread and butter stuff not working.

Thank yalls 🙏

@jb55
Copy link
Collaborator

jb55 commented Mar 17, 2024 via email

@jb55 jb55 changed the title only continue onboarding if we get a contact list creation confirmation from the server. Troubleshoot limbo onboarding state (relays not displayed on profile) Confirm contact list creation during account creation Mar 17, 2024
@jb55 jb55 self-assigned this Mar 17, 2024
@alltheseas alltheseas added this to the 1.8 post Madeira milestone Mar 17, 2024
@alltheseas
Copy link
Collaborator Author

fyi @ericholguin

@alltheseas
Copy link
Collaborator Author

I doubt that he had a bad connection. He’s in a regular city afaik. I wonder how many people that has happened to

Feedback from bug reported when I suggested to try keys creation on stable wifi

@jb55
Copy link
Collaborator

jb55 commented Apr 9, 2024

I doubt that he had a bad connection. He’s in a regular city afaik. I wonder how many people that has happened to

Feedback from bug reported when I suggested to try keys creation on stable wifi

fwiw I've never been able to reproduce this. anyone else?

@alltheseas
Copy link
Collaborator Author

I've created dozens of test accounts. Has not happened to me

@jb55
Copy link
Collaborator

jb55 commented Apr 9, 2024

this is why I suspect it must have happened when the connection to the relay was down. I can't think of anything else. "stable wifi" or not.

@alltheseas
Copy link
Collaborator Author

this is why I suspect it must have happened when the connection to the relay was down. I can't think of anything else. "stable wifi" or not.

When you say relay, do you mean specifically damus relay?

@jb55
Copy link
Collaborator

jb55 commented Apr 9, 2024 via email

@alltheseas
Copy link
Collaborator Author

alltheseas commented Apr 11, 2024

Feedback from new person onboarding was that the limbo profile state (no following list, no relay list) was casued by Bitwarden app interaction during onboarding. User switched view to bitwarden to save the key, and went back to Damus.

When user stays in damus throughout onboarding flow it works.

@alltheseas
Copy link
Collaborator Author

@danieldaquino this is the onboarding limbo ticket we discussed in today's standup

@alltheseas
Copy link
Collaborator Author

As discussed moved to 1.8 milestone

cc @jb55

@danieldaquino
Copy link
Contributor

In the pictured profile I shared there is a followers list. Relay list is missing.

Oh I see, thanks for clarifying! I can confirm that the profile created with my hacky repro is also missing the relay list

@danieldaquino
Copy link
Contributor

I am investigating how to fix this without having a 100% accurate repro. Since my hacky repro worked, I am assuming that one of two things are happening:

  • Something goes wrong during the creation of the Nostr event itself, causing it to never be sent, or
  • The program fails to send the event, or the relay fails to receive it.

I also verified that if — for whatever reason — the relays lose or delete the contact event from the user (to the point that the user cannot retrieve their contact event), the user will get stuck in this state.

With that in mind, I am organizing this fix into two parts, in order:

  1. Improving robustness around the contact management functionality to make sure that we can get users out of this state if they ever fall in it.
  2. Improving robustness around onboarding to avoid this from happening in the first place

@jb55
Copy link
Collaborator

jb55 commented Apr 17, 2024

Improving robustness around the contact management functionality to make sure that we can get users out of this state if they ever fall in it.

keep in mind this is very dangerous, which is why I never did it. If connectivity is ever lost and damus thinks it can't find the contact list then overrides it, it can wipe an entire contact list which is realllly bad.

There are cases where people login with an old key from other apps that never had a contact list to begin with, so I agree it's still an important thing to fix.

If we save and retrieve contact lists from nostrdb:

Then this at least eliminiates the problem our end during account creating if the issue is in fact relay connectivity.

I will look into splitting off the nostrdb-update branch so that we can do this quicker instead of having to wait for the full

@danieldaquino
Copy link
Contributor

keep in mind this is very dangerous, which is why I never did it. If connectivity is ever lost and damus thinks it can't find the contact list then overrides it, it can wipe an entire contact list which is realllly bad.

Thanks for letting me know about this sharp edge, I will put some extra care into preventing that.

If we save and retrieve contact lists from nostrdb:

#1734
Then this at least eliminiates the problem our end during account creating if the issue is in fact relay connectivity.

I will look into splitting off the nostrdb-update branch so that we can do this quicker instead of having to wait for the full

#2121

Thanks! I will wait for this, and in the meantime I will work on my item 2:

  1. Improving robustness around onboarding to avoid this from happening in the first place

@danieldaquino
Copy link
Contributor

I will look into splitting off the nostrdb-update branch so that we can do this quicker instead of having to wait for the full
#2121

Thanks! I will wait for this, and in the meantime I will work on my item 2:

Actually, @jb55, do we have the mechanism to subscribe to NostrDB merged into master? If so, I can probably get item 1 (Saving contact lists locally) done too

@danieldaquino
Copy link
Contributor

  1. Improving robustness around onboarding to avoid this from happening in the first place

Regarding this item, I am more and more convinced that simply hooking up to NostrDB and making sure this contact list event is saved locally will solve most or all the problems here.

Most of the solutions I have in mind for this item will likely become obsolete once we have the ability to save and read this contact list directly to/from NostrDB.


On the repro side, I have good news! I was able to more reliably (3/3 tries) repro this issue (or at least something very similar) without resorting to hacky changes of code.

More stable, less hacky repro

Device: iPhone 15 simulator
iOS: 17.4
Setup:

  • Network link conditioner setup with the following profile:
    • Download and upload bandwidth: 50 kbps (awfully slow)
    • Packets dropped: 75% (Awful)
    • Connection delay: 1000ms
    • DNS delay: 400ms

Steps:

  1. Delete app and reinstall to start from scratch
  2. Turn the Network link conditioner ON
  3. Go through onboarding
  4. When reaching the home screen, quit the app (to prevent sending that contact list, if it's sitting in any queue somewhere)
  5. Turn the Network link conditioner OFF.
  6. Open the app again.
  7. Check if the symptoms are present (unable to get a follow to stick, absence of relay list on profile)

Results: Symptoms are present 3 out of 3 tries

@danieldaquino
Copy link
Contributor

@jb55 I believe my fix is still incomplete. When I go through the test (similar to the repo in the previous comment) I am now able to actually follow users and get that list persistently saved across app restarts, but I still see some weirdness like posts from those follows not showing up on the home feed. Currently investigating it

@danieldaquino
Copy link
Contributor

(I forgot to post it here, but I sent a draft for this fix on Friday: https://groups.google.com/a/damus.io/g/patches/c/B0H8UK5HQNE)

@danieldaquino
Copy link
Contributor

@jb55 I believe my fix is still incomplete. When I go through the test (similar to the repo in the previous comment) I am now able to actually follow users and get that list persistently saved across app restarts, but I still see some weirdness like posts from those follows not showing up on the home feed. Currently investigating it

I fixed it! Re-testing with new version

@jb55
Copy link
Collaborator

jb55 commented Apr 22, 2024 via email

@danieldaquino
Copy link
Contributor

Sent the patch!

Details in https://groups.google.com/a/damus.io/g/patches/c/a8kI0CO2yOc

@jb55, @alltheseas, if you don't get a chance to see the contact list First Aid in action, here is how the flow looks like:

Simulator Screenshot - iPhone 15 Plus - 2024-04-22 at 15 47 11 Simulator Screenshot - iPhone 15 Plus - 2024-04-22 at 15 47 20 Simulator Screenshot - iPhone 15 Plus - 2024-04-22 at 14 13 41 Simulator Screenshot - iPhone 15 Plus - 2024-04-22 at 14 13 53

The right-most picture is the one people without this issue will see. The first 3 pictures is what people who have the issue will see and go through (if they choose to reset).

@alltheseas
Copy link
Collaborator Author

Is the first aid always a manual action? Does it help the current folks who did not have successfully formed relay lists, follow lists?

During onboarding did the changes you made prevent limbo state, or reduce chance of limbo state happening in the first place?

@danieldaquino
Copy link
Contributor

Is the first aid always a manual action?

Yes, I made it a manual action for two reasons:

  1. Due to the decentralized nature of Nostr, there is no good way to tell if a user really does not have any contact list, or if a contact list is just temporarily unavailable due to network issues or relay outages.
  2. On top of the above, if we try to automatically perform a reset based on some rules, and we get a false negative (i.e. the app thinks they don't have one when they actually do), the consequence will be a loss of all the follow list they have.

All in all, making an automated decision on this could lead to some people inadvertently losing all of their contact list, which would really really bad, so I chose to leave it a manual action to prevent this.

Does it help the current folks who did not have successfully formed relay lists, follow lists?

Yes, this First Aid option was made specifically for those folks!

During onboarding did the changes you made prevent limbo state, or reduce chance of limbo state happening in the first place?

Yes, I also made changes to increase robustness and prevent this issue in the first place. To summarize those changes:

  1. The initial contact list is now saved locally to NostrDB during onboarding even before the app connects to any relay (to make sure we have one even if the network connections fail to send it to the network)
  2. The latest known contact list is now also loaded directly from NostrDB during app startup, to make sure it gets loaded even if there are network or relay issues.

@alltheseas
Copy link
Collaborator Author

Thanks Daniel for the comprehensive and thoughtful approach.

@jb55
Copy link
Collaborator

jb55 commented Apr 23, 2024 via email

@danieldaquino
Copy link
Contributor

@jb55 we forgot to push this to v1.8_relay_fix_and_video_player.

Can I go ahead and cherry-pick the commits from master to that branch?

@danieldaquino danieldaquino reopened this Apr 30, 2024
@jb55
Copy link
Collaborator

jb55 commented Apr 30, 2024 via email

@danieldaquino
Copy link
Contributor

On Mon, Apr 29, 2024 at 05:17:59PM -0700, Daniel D’Aquino wrote: @jb55 we forgot to push this to v1.8_relay_fix_and_video_player. Can I go ahead and cherry-pick the commits from master to that branch?
sure!

Sounds good, I will do that now and then close this ticket

I'm not sure what that branch is.

That branch is the one for the 1.8 AppStore release

danieldaquino added a commit to danieldaquino/damus that referenced this issue May 1, 2024
This commit adds a mechanism to add the contact list to storage as soon
as it is generated, and thus it reduces the risk of poor network
conditions causing issues.

Changelog-Fixed: Improve reliability of contact list creation during onboarding
Closes: damus-io#2057
Signed-off-by: Daniel D’Aquino <daniel@daquino.me>
Reviewed-by: William Casarin <jb55@jb55.com>
Link: 20240422230912.65056-3-daniel@daquino.me
Signed-off-by: William Casarin <jb55@jb55.com>
@danieldaquino
Copy link
Contributor

Cherry-picked the commits associated with this ticket:

to the 1.8 release branch at https://github.com/damus-io/damus/commits/v1.8_relay_fix_and_video_player/

Closing this ticket!

danieldaquino added a commit to danieldaquino/damus that referenced this issue May 3, 2024
This commit fixes an issue where the Damus relay (Or other bootstrap relays) would be added to the user's relay list even though they explicitly removed it.

The root cause of the issue lies in the way we load bootstrap relays. The default bootstrap relays would be initially loaded even though the user already has a bootstrap list stored, just in case all the relays on the user list fails. This would cause the app to inadvertently connect to relays that the user did not select whenever there is a connectivity issue with all their listed relays.

The fix is to simply not add the default bootstrap list when the user already has a list stored. We do not need to use bootstrap relays in order to get our relay list, because that list is already stored in both UserDefaults as well as on NostrDB through the user's contact list event. (A contact list which is also locally loaded on startup since the fix related to damus-io#2057)

Issue reproduction + Testing
----------------------------

Procedure:
1. Disconnect from all relays, and disconnect from the Damus relay last.
2. Connect to a local relay (that you control). Connection should be successful.
3. Quit the app completely.
4. Stop the local relay.
5. Restart the app.
6. Go to the relay list view.
7. Check the relay list. It should list the one local relay selected by the user

Issue reproduction:
- Device: iPhone 15 simulator
- iOS: 17.4
- Damus: 1.8 (`97169f4fa276723bfab28ca304953ec206c904d2`)
- Result: ISSUE REPRODUCED
- Details: On step 7, the relay list only lists the Damus relay

Fix test:
- Device: iPhone 15 simulator
- iOS: 17.4
- Damus: This commit
- Result: PASS
- Details: On step 7, the local relay is listed even though connection is unsuccessful. No notes are loaded since no relays were able to connect successfully

Quick regression check
----------------------

PASS

Device: iPhone 15 simulator
iOS: 17.4
Damus: This commit
Steps:
1. Reinstall app from scratch
2. Create a new account, go through onboarding
3. Make sure that new account connects to bootstrap relays. PASS
4. Sign out
5. Sign in with previously existing account (The one from the previous test) (Notice no UserDefaults exists for this user at that point)
6. Make sure relay list is loaded to the latest relay list known to the bootstrap relays (i.e. connects only to the Damus relay) (It cannot recover the latest relay list pointing only to the local relay, since the bootstrap relays have no knowledge about that relay or the contact lists stored there.). PASS

Note: The behavior on step 6 is not a bug, it is an expected limitation. In fact, this behavior is privacy protecting, as the user may not want those public relays from knowing about its connection preference to the local relay (and its address)

Other information
------------------

Q: How is this test using local relays related or equivalent to Tor relay list described in damus-io#2186?
A: Those Tor relays need dedicated software (such as Orbot VPN) to be running successfully in order for Damus to make a successful connection to them. If at any moment that VPN stops working, it would trigger the same situation as described in the test above, where all relay connections fail at once.

Q: In damus-io#2186, the user reports that the Damus relay is added, but does not describe the Damus relay replacing existing relays. What is the difference?
A: I believe the difference is in the order in which relays are added or removed. We have to remember that the relay we just disconnected from will likely still have version N-1 of our contact list event, where it still includes itself on that list.

Changelog-Fixed: Fix issue where bootstrap relays would inadvertently be added to the user's list on connectivity issues
Closes: damus-io#2186
Signed-off-by: Daniel D’Aquino <daniel@daquino.me>
Reviewed-by: William Casarin <jb55@jb55.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working, or not working as intended Needs recreation Issues requires concrete steps for recreation onboarding technical
Projects
Status: Done
Development

No branches or pull requests

3 participants