Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Timeouts on Windows Unit Tests #16253

Closed
3 of 5 tasks
david-allison opened this issue Apr 23, 2024 · 16 comments · Fixed by #16289
Closed
3 of 5 tasks

CI: Timeouts on Windows Unit Tests #16253

david-allison opened this issue Apr 23, 2024 · 16 comments · Fixed by #16289
Labels
Dev Development, testing & CI Priority-High
Milestone

Comments

@david-allison
Copy link
Member

david-allison commented Apr 23, 2024

old

First failure seems to be ~3 days ago

https://github.com/ankidroid/Anki-Android/actions/runs/8767647456/job/24061139040

But we had a similar issue with dependency-updates


My suspicion is that the cause is:

But this needs triage


Things to do to close this:

  • add a "flake hunter" mode to CI that runs on manual dispatch and/or a schedule that can run a large number of iterations of all or just one operating system to give statistical evidence for flake discovery + flake fixing - Ci flake finder for unit tests #16279
  • BackendCollectionAlreadyOpenException needs to be caught and re-thrown as an Error if in the robolectric execution context so this becomes fail-fast - the thinking is that this is a hang because it is an Exception instead of an Error. The commit from fix: Use flags from col, dynamically loading them into the menu #16218 can be used provisionally in combination with a matrix of 10+ iterations running on windows in order to trigger the condition - this didn't work
  • something about robolectric + windows + multiple collection opens seems to cause non-serialized collection access, leading to this hang? That appears like it will be the root cause?
@david-allison david-allison added Priority-High Help Wanted Requesting Pull Requests from volunteers Dev Development, testing & CI labels Apr 23, 2024
@mikehardy
Copy link
Member

@david-allison - are the logs still uploading? I don't see windows log artifacts and that would be a very useful thing to have to troubleshoot this

@david-allison
Copy link
Member Author

I suspect the failure reason isn't 'failed', so logs are skipped

perhaps: { cancelled() || failure() } is necessary

@david-allison
Copy link
Member Author

@david-allison david-allison removed the Help Wanted Requesting Pull Requests from volunteers label Apr 23, 2024
@david-allison
Copy link
Member Author

I see this resolution as being 4-pronged:

  • Fix the CI issue
  • Fix so the backend requires a lock on creation
  • A thrown error crashes rather than deadlocks
  • A timeout crashes after a reasonable period of time

@david-allison
Copy link
Member Author

david-allison commented Apr 25, 2024

My read (from intuition) is that withQueue is failing to uphold invariants

This may just require runTest to fix

lukstbit added a commit to criticalAY/Anki-Android that referenced this issue Apr 25, 2024
@mikehardy
Copy link
Member

A thrown error crashes rather than deadlocks

This one here is what I would prioritize first, fail-fast and (IMNSHO) fail-correctly vs fail-ambiguously

lukstbit added a commit to criticalAY/Anki-Android that referenced this issue Apr 25, 2024
mikehardy pushed a commit to criticalAY/Anki-Android that referenced this issue Apr 25, 2024
mikehardy pushed a commit to criticalAY/Anki-Android that referenced this issue Apr 25, 2024
@david-allison
Copy link
Member Author

My read (from intuition) is that withQueue is failing.

This may just require runTest to fix


To get an exception from the .bin file:

  • strings
  • Filter to lines containing --
  • Trim the crud before -- on each line
  • Trim the started/stopped prefixes of the lines so you just get the test name
  • run uniq and you get the test which started but didn't fail
  • grep for that test name

@david-allison
Copy link
Member Author

net.ankiweb.rsdroid.exceptions.BackendInvalidInputException$BackendCollectionAlreadyOpenException: CollectionAlreadyOpen
at net.ankiweb.rsdroid.exceptions.BackendInvalidInputException$Companion.fromInvalidInputError(BackendInvalidInputException.kt:34)
at net.ankiweb.rsdroid.BackendException$Companion.fromError(BackendException.kt:114)
at net.ankiweb.rsdroid.BackendKt.unpackResult(Backend.kt:271)
at net.ankiweb.rsdroid.BackendKt.access$unpackResult(Backend.kt:1)
at net.ankiweb.rsdroid.Backend$runMethodRaw$1.invoke(Backend.kt:118)
at net.ankiweb.rsdroid.Backend$runMethodRaw$1.invoke(Backend.kt:117)
at net.ankiweb.rsdroid.Backend.withBackend(Backend.kt:131)
at net.ankiweb.rsdroid.Backend.runMethodRaw(Backend.kt:117)
at anki.backend.GeneratedBackend.openCollectionRaw(GeneratedBackend.kt:102)
at anki.backend.GeneratedBackend.openCollection(GeneratedBackend.kt:109)
at net.ankiweb.rsdroid.Backend.openCollection(Backend.kt:98)
at net.ankiweb.rsdroid.Backend.openCollection(Backend.kt:57)
at com.ichi2.libanki.Storage.openDB$AnkiDroid_playDebug(Storage.kt:52)
at com.ichi2.libanki.Collection.reopen(Collection.kt:206)
at com.ichi2.libanki.Collection.reopen$default(Collection.kt:203)
at com.ichi2.libanki.Collection.<init>(Collection.kt:138)
at com.ichi2.libanki.Storage.collection(Storage.kt:40)
at com.ichi2.anki.CollectionManager.ensureOpenInner(CollectionManager.kt:230)
at com.ichi2.anki.CollectionManager.access$ensureOpenInner(CollectionManager.kt:39)
at com.ichi2.anki.CollectionManager$withCol$2.invoke(CollectionManager.kt:101)
at com.ichi2.anki.CollectionManager$withCol$2.invoke(CollectionManager.kt:100)
at com.ichi2.anki.CollectionManager$withQueue$2.invokeSuspend(CollectionManager.kt:86)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
at kotlinx.coroutines.EventLoop.processUnconfinedEvent(EventLoop.common.kt:65)
at kotlinx.coroutines.internal.DispatchedContinuationKt.resumeCancellableWith(DispatchedContinuation.kt:371)
at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:26)
at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable$default(Cancellable.kt:21)
at kotlinx.coroutines.CoroutineStart.invoke(CoroutineStart.kt:88)
at kotlinx.coroutines.AbstractCoroutine.start(AbstractCoroutine.kt:123)
at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch(Builders.common.kt:52)
at kotlinx.coroutines.BuildersKt.launch(Unknown Source)
at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:43)
at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
at com.ichi2.anki.Reviewer.addFlags(Reviewer.kt:660)
at com.ichi2.anki.Reviewer.onCreateOptionsMenu(Reviewer.kt:677)
at android.app.Activity.$$robo$$android_app_Activity$onCreatePanelMenu(Activity.java:4343)
at android.app.Activity.onCreatePanelMenu(Activity.java)
at androidx.activity.ComponentActivity.onCreatePanelMenu(ComponentActivity.java:520)
at androidx.appcompat.view.WindowCallbackWrapper.onCreatePanelMenu(WindowCallbackWrapper.java:94)
at androidx.appcompat.app.AppCompatDelegateImpl$AppCompatWindowCallback.onCreatePanelMenu(AppCompatDelegateImpl.java:3442)
at androidx.appcompat.app.ToolbarActionBar.populateOptionsMenu(ToolbarActionBar.java:458)
at androidx.appcompat.app.ToolbarActionBar$1.run(ToolbarActionBar.java:58)
at android.os.Handler.$$robo$$android_os_Handler$handleCallback(Handler.java:942)
at android.os.Handler.handleCallback(Handler.java)
at android.os.Handler.$$robo$$android_os_Handler$dispatchMessage(Handler.java:99)
at android.os.Handler.dispatchMessage(Handler.java)
at org.robolectric.shadows.ShadowPausedLooper$IdlingRunnable.doRun(ShadowPausedLooper.java:573)
at org.robolectric.shadows.ShadowPausedLooper$ControlRunnable.run(ShadowPausedLooper.java:536)
at org.robolectric.shadows.ShadowPausedLooper.executeOnLooper(ShadowPausedLooper.java:629)
at org.robolectric.shadows.ShadowPausedLooper.idle(ShadowPausedLooper.java:104)
at org.robolectric.shadows.ShadowPausedLooper.idleIfPaused(ShadowPausedLooper.java:177)
at org.robolectric.android.controller.ActivityController.visible(ActivityController.java:232)
at com.ichi2.anki.RobolectricTest$Companion.startActivityNormallyOpenCollectionWithIntent(RobolectricTest.kt:281)
at com.ichi2.anki.RobolectricTest.startActivityNormallyOpenCollectionWithIntent(RobolectricTest.kt)
at com.ichi2.anki.RobolectricTest.startActivityNormallyOpenCollectionWithIntent$AnkiDroid_playDebugUnitTest(RobolectricTest.kt:339)
at com.ichi2.anki.WhiteboardDefaultForegroundColorTest.getForegroundColor(WhiteboardDefaultForegroundColorTest.kt:43)
at com.ichi2.anki.WhiteboardDefaultForegroundColorTest.testDefaultForegroundColor(WhiteboardDefaultForegroundColorTest.kt:38)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.robolectric.RobolectricTestRunner$HelperTestRunner$1.evaluate(RobolectricTestRunner.java:588)
at org.robolectric.internal.SandboxTestRunner$2.lambda$evaluate$2(SandboxTestRunner.java:290)
at org.robolectric.internal.bytecode.Sandbox.lambda$runOnMainThread$0(Sandbox.java:101)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)

@mikehardy

This comment was marked as duplicate.

@david-allison
Copy link
Member Author

The following does NOT make tests fail early

Index: AnkiDroid/src/main/java/com/ichi2/libanki/Storage.kt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/AnkiDroid/src/main/java/com/ichi2/libanki/Storage.kt b/AnkiDroid/src/main/java/com/ichi2/libanki/Storage.kt
--- a/AnkiDroid/src/main/java/com/ichi2/libanki/Storage.kt	(revision 638209ed4989872d528fedd845772b4187d80df2)
+++ b/AnkiDroid/src/main/java/com/ichi2/libanki/Storage.kt	(revision 0daf248ba9a11f454b15ec2dadb783294b84d7c3)
@@ -18,8 +18,10 @@
 import com.ichi2.anki.getDayStart
 import com.ichi2.libanki.utils.Time
 import com.ichi2.libanki.utils.TimeManager.time
+import com.ichi2.utils.isRobolectric
 import net.ankiweb.rsdroid.Backend
 import net.ankiweb.rsdroid.BackendFactory
+import net.ankiweb.rsdroid.exceptions.BackendInvalidInputException.BackendCollectionAlreadyOpenException
 import java.io.File
 
 object Storage {
@@ -49,7 +51,12 @@
         if (afterFullSync) {
             create = false
         } else {
-            backend.openCollection(if (isInMemory) ":memory:" else path)
+            try {
+                backend.openCollection(if (isInMemory) ":memory:" else path)
+            } catch (e: BackendCollectionAlreadyOpenException) {
+                if (!isRobolectric) throw e
+                throw Error("BackendCollectionAlreadyOpenException", e)
+            }
         }
         val db = DB.withRustBackend(backend)

See: https://github.com/david-allison/Anki-Android/actions/runs/8855405150

@david-allison
Copy link
Member Author

Reproduction of hang (runs on macOS)

Index: AnkiDroid/src/main/java/com/ichi2/anki/Reviewer.kt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/AnkiDroid/src/main/java/com/ichi2/anki/Reviewer.kt b/AnkiDroid/src/main/java/com/ichi2/anki/Reviewer.kt
--- a/AnkiDroid/src/main/java/com/ichi2/anki/Reviewer.kt	(revision 9036f1611380e2952630527dd26483e9d4772805)
+++ b/AnkiDroid/src/main/java/com/ichi2/anki/Reviewer.kt	(date 1714178146303)
@@ -40,7 +40,9 @@
 import androidx.appcompat.widget.Toolbar
 import androidx.core.app.ActivityCompat
 import androidx.core.content.ContextCompat
+import androidx.lifecycle.lifecycleScope
 import androidx.vectordrawable.graphics.drawable.VectorDrawableCompat
+import anki.backend.BackendError
 import anki.frontend.SetSchedulingStatesRequest
 import com.google.android.material.color.MaterialColors
 import com.google.android.material.snackbar.Snackbar
@@ -87,6 +89,8 @@
 import com.ichi2.utils.Permissions.canRecordAudio
 import com.ichi2.utils.ViewGroupUtils.setRenderWorkaround
 import com.ichi2.widget.WidgetStatus.updateInBackground
+import kotlinx.coroutines.launch
+import net.ankiweb.rsdroid.exceptions.BackendInvalidInputException
 import timber.log.Timber
 import java.io.File
 
@@ -682,6 +686,9 @@
     @NeedsTest("Order of operations needs Testing around Menu (Overflow) Icons and their colors.")
     override fun onCreateOptionsMenu(menu: Menu): Boolean {
         Timber.d("onCreateOptionsMenu()")
+        lifecycleScope.launch {
+            throw BackendInvalidInputException.BackendCollectionAlreadyOpenException(BackendError.getDefaultInstance())
+        }
         // NOTE: This is called every time a new question is shown via invalidate options menu
         menuInflater.inflate(R.menu.reviewer, menu)
         displayIcons(menu)

@david-allison
Copy link
Member Author

The hang appears to be: https://github.com/ACRA/acra/blob/0c2b36ceb873f75b7528cd76f2fefec3f0d5ac23/acra-core/src/main/java/org/acra/interaction/ReportInteractionExecutor.kt#L50

Caused by an uncaught exception (setDefaultUncaughtExceptionHandler)

@david-allison
Copy link
Member Author

Applied a fix, we'll know for certain: https://github.com/david-allison/Anki-Android/actions/runs/8859500839/job/24329333389

@david-allison
Copy link
Member Author

Proposal on the anti-flake script: fail if the branch is main.

Avoids user error when forgetting to select a branch

@mikehardy
Copy link
Member

Proposal on the anti-flake script: fail if the branch is main.

Avoids user error when forgetting to select a branch

But...I want to run it on main sometimes? If a flake slips through we will want to run it on main to verify ?

@github-actions github-actions bot added this to the 2.18 release milestone Apr 29, 2024
@david-allison
Copy link
Member Author

Any way we can confirm that we want to run it on main?

I currently have "ad blindness" when it comes to branch selection, this might go away with time now it's a necessary field to change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dev Development, testing & CI Priority-High
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants