Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytest: Mark tests using space_time_raster_dataset as needs_solo_run #3939

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

echoix
Copy link
Member

@echoix echoix commented Jun 28, 2024

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Since it causes some intermittent timeouts, this PR adds the marker to run them separately, without other tests in parallel. The setup stage is still slow, but I highly expect after my couple tries that it will prevent some more timeouts that were still present after #3879.

On the positive side, there's (hopefully) less flaky tests to retry, that can cause delays when there is a lot of activity. On the negative side, since these tests are slow AND that the setup phase of that fixture is also slow, the overall time taken for the tests increases compared to the sum of parallel+serial test steps durations. So that means that there was some speedup of running them in parallel that we are temporarily letting go. It is still better than no parallel workers as before. And either way, some of deadlock issues, if they are caused by the multiprocessing using the fork startup method, will need to be fixed in order to run any of these tests on macOS and Windows. It's a work in progress (not this PR, this PR is ready)

@echoix echoix added this to the 8.5.0 milestone Jun 28, 2024
@github-actions github-actions bot added temporal Related to temporal data processing Python Related code is in Python libraries module notebook labels Jun 28, 2024
@echoix
Copy link
Member Author

echoix commented Jun 28, 2024

Also note that the "coverage" change that you might see is still flawed, as this PR will add new files "seen" by the coverage tool. It doesn't know about all the files in the repo yet, so it now sees 54 more new files.

@neteler
Copy link
Member

neteler commented Jun 29, 2024

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Wild guess without checking the code: could it be a SQLite locking issue due to SQLite concurrent (read/) write access happening?

@echoix
Copy link
Member Author

echoix commented Jun 29, 2024

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Wild guess without checking the code: could it be a SQLite locking issue due to SQLite concurrent (read/) write access happening?

I'm not sure, as trying to follow around what is called within the temporal modules is a nightmare, everything gets invoked/imported. Technically, it is supposed to be 7 rasters, and that fixture valid for all the tests. But something isn't quite right.

Once coverage is properly set up (maybe once C-code is also tracked), it will be a good thing to incrementally change the tests and see what changes, to simplify redundant tests (by only using the coverage provided by one test vs another).

In a separate attempt last weekend to profile some tests (gunittest), I started with only a subfolder in temporal, t.rast.algebra, and a lot of time (18%) was spent on a single inner loop of PLY that checked if an element was in a list. Other than that, one big time wasted at importing tgis, as it imported almost everything with star imports.

So the problem can be anywhere.

For now, the goal is only to prevent useless failures to have correct feedback earlier (without waiting to retry a job)

@echoix
Copy link
Member Author

echoix commented Jun 29, 2024

I have other changes queued to these files (dating from the last day of the sprint), so I'm waiting for this to be merged first.

@echoix echoix requested a review from wenzeslaus June 29, 2024 20:33
@echoix
Copy link
Member Author

echoix commented Jul 1, 2024

I had to rerun about 4-5 pytest failures on main that worked on a second try. I would've hoped that this would have fixed it

@echoix echoix requested a review from ninsbl July 1, 2024 22:45
@echoix echoix enabled auto-merge (squash) July 2, 2024 09:55
@echoix
Copy link
Member Author

echoix commented Jul 5, 2024

Now I'm trying to push for this PR to be merged, as we still have some pytest flaky timeouts that I'm convinced this will help a bit. And also was blocking me from continuing on other pytest improvements the last weekend. This weekend will be more quiet on my end though.

There's nothing left to change in this PR

Copy link
Member

@ninsbl ninsbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also guess SQLite DB locking may cause
the issues with long runtimes/timeouts.

The changes seem fine, and at very least help identifying the root cause of long run times...

@echoix echoix merged commit 18d11ad into OSGeo:main Jul 9, 2024
28 checks passed
@echoix echoix deleted the extra-solo-tests branch July 9, 2024 01:28
@echoix
Copy link
Member Author

echoix commented Jul 9, 2024

In curious to see if the dbif.close() changes from https://github.com/OSGeo/grass/pull/3996/files#diff-30a61f23aa90129beedbff09dd723876e2a54703c7355eaef4e45a8827eaecda would change something…

It hasn’t been there enough for it, and for the CI on my side on my PR that is before all of them, I think I still caught some failures..

@ninsbl
Copy link
Member

ninsbl commented Jul 9, 2024

I agree, and for me it is not clear either how TGIS temporal database interactions are managed in total, and what is the overarching concept in dB handling, esp. in combination with TGIS python objects for example...

@wenzeslaus wenzeslaus removed their request for review July 9, 2024 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libraries module notebook Python Related code is in Python temporal Related to temporal data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants