Skip to content

Conversation

@FranAguilera
Copy link
Contributor

@FranAguilera FranAguilera commented Jun 17, 2025

What

Resolves BIT-4913

This follow similar approach from @snowp in #125 and moves the time out logic as part of bd_completion

To avoid potentially blocking permanently on no network situations upon logger blocking calls adding timeout to those calls

See related doc

Verification

Steps

  1. Run Android gradle test app
  2. Set app in airplane mode
  3. Trigger artificial crash
  4. App will continue crashing after the defined thresholds (4seconds) to avoid ANRs on android. Before will be deadlocked an leading to ANR dialog

Temporary logs with airplane mode

See demo here https://drive.google.com/file/d/12ui3gyApvHdl37YGi4V6-36K-ZL41MQ0/view?usp=drive_link

06-17 15:46:10.449  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "enqueue_log entry"
06-17 15:46:10.449  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "about to call tokio::sync::oneshot::channel::<()>()"
06-17 15:46:11.454  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "enqueue_log: error waiting for completion: Timeout"
06-17 15:46:11.455  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "enqueue_log: OK report success 333"
06-17 15:46:11.455  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: entering flush blocking portion"
06-17 15:46:11.455  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: about to call blocking_recv_with_timeout. 386"
06-17 15:46:12.456  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: error waiting for completion: Timeout"
06-17 15:46:12.456  9324  9324 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: Success flush_state. 398"

Temporary logs with valid connection

See demo here https://drive.google.com/file/d/13ViBmsQDUjLb59iIqZjb3thWFMDuzi1O/view?usp=drive_link

06-17 15:48:04.940  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "enqueue_log entry"
06-17 15:48:04.940  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "about to call tokio::sync::oneshot::channel::<()>()"
06-17 15:48:04.951  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "enqueue_log: OK within blocking_wait_with_timeout 305"
06-17 15:48:04.951  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "enqueue_log: OK report success 333"
06-17 15:48:04.951  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: entering flush blocking portion"
06-17 15:48:04.951  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: about to call blocking_recv_with_timeout. 386"
06-17 15:48:04.961  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: completion received. 389"
06-17 15:48:04.961  9603  9603 D bd_logger::async_log_buffer: ANDROID_ANR: "flush_state: Success flush_state. 398"

@FranAguilera FranAguilera force-pushed the franjam/prevent-deadlock-no-network branch from a08eb68 to 080231b Compare June 17, 2025 14:45
@FranAguilera FranAguilera force-pushed the franjam/prevent-deadlock-no-network branch from 080231b to 3b5b149 Compare June 17, 2025 14:48
// on the log being pushed through the whole log processing pipeline.
let (tx, rx) = tokio::sync::oneshot::channel::<()>();
(Some(tx), Some(rx))
let bd_rx = bd_completion::Receiver::to_bd_completion_rx(rx);
Copy link
Contributor Author

@FranAguilera FranAguilera Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit nasty but couldn't find a better way to handle proper receiver type without further refactors given LogLine is pub log_processing_completed_tx: Option<oneshot::Sender<()>>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just change the type to use bd_completion in LogLine as well, seems like an appropriate usage of this type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, will give it a try

Copy link
Contributor Author

@FranAguilera FranAguilera Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I gave this a try and requires major changes, functions like write_log, replay_log, process_log, etc will required to be updated as well. It seems a bit risky to do in this one. I can create a follow up task to consolidate all this tokio::sync::oneshot with bd_completion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ticket here BIT-5624

// on the log being pushed through the whole log processing pipeline.
let (tx, rx) = tokio::sync::oneshot::channel::<()>();
(Some(tx), Some(rx))
let bd_rx = bd_completion::Receiver::to_bd_completion_rx(rx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just change the type to use bd_completion in LogLine as well, seems like an appropriate usage of this type

@FranAguilera FranAguilera merged commit 6a412c4 into main Jun 17, 2025
5 of 6 checks passed
@FranAguilera FranAguilera deleted the franjam/prevent-deadlock-no-network branch June 17, 2025 16:04
@github-actions github-actions bot locked and limited conversation to collaborators Jun 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants