Skip to content

Conversation

@r10s
Copy link
Member

@r10s r10s commented May 3, 2025

the fix could have been avoid by checking for null or not being overly specific with the exception.

as this is a very sensible area,
where any failure is dramatic,
we do both.

as said, this error pops up on monikas device,
unclear if related to an app update (she is on 1.58.2) or by some message.

did not test yet, still trying to get this patch on the phone EDIT: tested, it fixes the issue at hand

the fix could have been avoid by checking for null
or not being overly specific with the exception.

as this is a very sensible area,
where any failure is dramatic,
we do both.
@r10s r10s requested review from Hocuri and adbenitez May 3, 2025 18:42
@r10s r10s added the bug label May 3, 2025
@github-actions
Copy link

github-actions bot commented May 3, 2025

To test the changes in this pull request, install this apk:
📦 app-preview.apk

Copy link
Contributor

@hpk42 hpk42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@Hocuri
Copy link
Collaborator

Hocuri commented May 3, 2025

While developing chatmail/core#6818, I had a bug with the same symptom: DC crashed on startup with exactly this error message. The problem turned out to be:

  • Android tries to get an account manager (via CFFI), which in turn tries to open the context. Opening the context fails because the migration fails (because of a bug, which I fixed since).
  • The error message can't be printed to the logs, since there is no event loop yet; this is why there is no helpful error message in the logs.
  • get_dc_accounts() returns null
  • dc_jsonrpc_init() (i.e. getJsonrpcInstanceCPtr()) returns null
  • get_dc_jsonrpc_instance() returns null
  • dc_jsonrpc_next_response() (Rust) / getNextResponse() (Java) returns an empty string

If you can reproduce the problem and can install a re-compiled DC Android, I recommend replacing all calls to eprintln!(() in deltachat-cffi/lib.rs with panic!(). This way, the root error cause will show in the logs. You may be able to copy out DeltaChat's folder via adb.

About this PR: If the problem you had is similar to the problem I had, then this PR would not have made anything better. Instead of crashing on startup, DC would just hang forever on startup. I'm afraid there is not a lot the UI can do when get_dc_accounts() errors. This doesn't mean I'm against this PR, it just means it seems unlikely to me that it would have helped.

What core could do is, make get_dc_accounts() return successfully if one of the profiles failed to open, so that the other profiles can be used. Not sure how much effort it would be to make this work nicely.

@Hocuri
Copy link
Collaborator

Hocuri commented May 3, 2025

I need to think a bit about general ways to improve this, maybe I'll come up with something.

@r10s
Copy link
Member Author

r10s commented May 3, 2025

If the problem you had is similar to the problem I had, then this PR would not have made anything better.

then i assume, you have another issue. the PR actually solves the issue at hand, things are working again as expected.

here was also nothing super-fancy on that account, so i assume, it can happen also in other cases

@Hocuri
Copy link
Collaborator

Hocuri commented May 4, 2025

Ok, that's great. As commented on the code, we need to log a warning / error when we ignore a null response, and should also log what jsonResponse for easier debugging (usually, it's probably an empty string, but maybe sometimes it's something else that fails to parse as JSON).

Still, there is some root underlying problem which led to response being null in the first place; would be nice to find out what it is in order not to get some subtle bug there.

@adbenitez
Copy link
Member

If the problem you had is similar to the problem I had, then this PR would not have made anything better.

then i assume, you have another issue. the PR actually solves the issue at hand, things are working again as expected.

here was also nothing super-fancy on that account, so i assume, it can happen also in other cases

this is a bit bad, this didn't "solved the problem", in the end we don't even know now what the problem was, instead of just ignoring it and losing the opportunity to debug it and fix the actual problem it would be better to know why core returned null in this case, this shouldn't happen to start with if I got it right, and as @Hocuri said in other similar bugs where core would return null we then would just ignore the problem and potentially it can be a problem you can't just ignore :(

@r10s
Copy link
Member Author

r10s commented May 5, 2025

"better crash, the user will report and we will fix" is nice in theory, but unfortunately also comes with "we do not care if the user's device is unusable for weeks" ;)

i could not make more sense out of the error, though these were mostly perfect debugging conditions, which are usually not the case

i agree that in many flows it is better to crash, eg. if there is a concrete user action the user can not do as a workaround. but for such basic things, it just makes everything unusable. there may be tons of reasons for a bad value, including some temporary memory flaw ...

we had that discussion often, i do not think, it makes much sense to warm that up again.

additional logging as @Hocuri suggested makes sense, though

@adbenitez
Copy link
Member

(Meant it in this particular case that you had access to the device)

@Hocuri
Copy link
Collaborator

Hocuri commented May 5, 2025

@r10s in any case, could you try to find out the root cause of this particular problem? You can try replacing all calls to eprintln!(() in deltachat-cffi/lib.rs with panic!(). This way, the root error cause should show in the logs.

@r10s
Copy link
Member Author

r10s commented May 5, 2025

yeah, but i could not make more sense of that even then. what could we expect from a random user?

Copy link
Member

@adbenitez adbenitez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides not silently ignoring null without logging, LGTM

@r10s
Copy link
Member Author

r10s commented May 5, 2025

@adbenitez @Hocuri i added a commit to log the error case. if there are more things, please add to this PR, otherwise please merge; i am not really on android atm

@adbenitez adbenitez enabled auto-merge May 5, 2025 11:56
@adbenitez adbenitez merged commit ce89535 into main May 5, 2025
3 of 4 checks passed
@adbenitez adbenitez deleted the r10s/fix-jsonrpc-crash branch May 5, 2025 14:31
@r10s r10s mentioned this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants