Support TBB on non-Linux, skip PyPy #73

henryiii · 2021-03-14T18:10:12Z

Explicit skip for PyPy added, testing out including TBB (closes #43), and update the test line for TBB (closes #63). Also fixed the homepage link; GitHub will turn off the old redirects at some point.

feat: add tbb on macOS/win
fix: tbb min version match
fix: explicit skip for PyPy

conda-forge-linter · 2021-03-14T18:10:16Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

henryiii · 2021-03-14T20:46:10Z

@conda-forge-admin, please rerender

…da-forge-pinning 2021.03.13.19.08.27

henryiii · 2021-03-14T21:55:34Z

I think this is working. Can I get another pair of eyes or two?

henryiii · 2021-03-15T02:03:55Z

@stuartarchibald might be interested in verifying, too?

stuartarchibald

Thanks for the patch.

I think the original restrictions around TBB were from a time when Numba supported a lot more architectures than were widely supported in the ecosystem. For what conda-forge builds against, removal of these restrictions seems valid.

Numba doesn't work on PyPy so that restriction makes sense and the GH homepage redirect seems fine.

Thanks again!

henryiii · 2021-03-16T15:34:16Z

The Drone aarch64 builds are randomly hanging. I'm going to restore the aarch64 limiting for now; macOS and Windows seem to be just fine. (might be a Drone issue, or might be TBB related).

henryiii · 2021-03-16T17:26:44Z

@conda-forge-admin, please rerender

henryiii · 2021-03-16T18:38:36Z

Ahh, looking in the history, it looks like Drone has been randomly hanging for a while, doesn't appear to have anything to do with the addition of TBB. I'll revert back to the original, tbb everywhere version.

henryiii · 2021-03-17T17:27:31Z

Yes, I think @jakirkham's analysis is correct. The higher versions of Python sometimes take 55 mins instead of 20 mins, which causes them to pass. 3.6 and 3.7 seem to be most vulnerable to failing with a 1 hour timeout. Is there some other infrastructure we can move to here? @chrisburr, do you know if there's an alternative to Drone like there is for Travis?

Another idea could be to skip running the tests on 3.6 and 3.7 for Drone, I think that frees up several minutes.

jakirkham · 2021-03-17T17:38:27Z

For context, I think Henry is referring to the discussion in issue ( #74 )

Am not aware of an alternative for aarch64 hardware. We can just restart on master. We could use emulated builds on Azure, but this may be slower and am a little hesitant to use emulation with something as sophisticated as Numba. Alternatively maybe we skip some tests on aarch64

cc @isuruf (in case you have other thoughts)

henryiii · 2021-03-17T17:41:44Z

We would only need to skip for py<38, since it seems the newer Pythons are a bit faster - the failures are almost always 3.6 and 3.7, and I can see 3.8 and 3.9 running "slow" (about 55 mins) sometimes, but they slip in under the limit usually.

Currently, it seems to only make it through 1/4 of the time or maybe less. I've restarted this about 6 times, always fails on one of 3.6, 3.7, and at least once failed with both.

isuruf · 2021-03-17T17:43:02Z

You can try cross compilation and running tests using QEMU, or full emulation. Those are the two choices right now. Skipping a python version wouldn't work, because it depends on which machine gets assigned in drone CI (They have different types of machines and some are slower) and even if this version passes just under 55 mins, a new version might not.

jakirkham · 2021-03-17T17:47:23Z

Alternatively maybe we skip some tests on aarch64

Just want to highlight this option again. For example do we need to run all of the TBB tests there?

Edit: Or could we just run them on Python 3.8 given that seems to work reliably and these tests are not likely too dependent on Python versions?

henryiii · 2021-03-17T18:11:31Z

even if this version passes just under 55 mins

It's possible Drone will update all their machines by the time the next version is released? Guessing the future of both the CI and the changes to Numba isn't very reliable. We can always change in the future if it fails on a new version.

We are currently, if I understand correctly, only running 20% of the tests on aarch64, maybe a first step would be to reduce that fraction? I wonder if that's why it's variable (3.6 and 3.7 do pass sometimes), it depends on which 20% it selects. Maybe there's a long test or two that can always be excluded?

henryiii · 2021-03-17T18:37:06Z

numba-feedstock/recipe/run_test.sh

Lines 40 to 41 in 9796deb

    
           echo 'Running only a slice of tests' 
        
           $SEGVCATCH python -m numba.runtests -b -j --random='0.20' --exclude-tags='long_running' -m $TEST_NPROCS -- numba.tests

stuartarchibald · 2021-03-17T19:25:16Z

even if this version passes just under 55 mins

It's possible Drone will update all their machines by the time the next version is released? Guessing the future of both the CI and the changes to Numba isn't very reliable. We can always change in the future if it fails on a new version.

We are currently, if I understand correctly, only running 20% of the tests on aarch64, maybe a first step would be to reduce that fraction? I wonder if that's why it's variable (3.6 and 3.7 do pass sometimes), it depends on which 20% it selects. Maybe there's a long test or two that can always be excluded?

IIRC for a given environment the selection will be consistent across runs.

stuartarchibald · 2021-03-17T19:35:08Z

Alternatively maybe we skip some tests on aarch64

Just want to highlight this option again. For example do we need to run all of the TBB tests there?

I think the really heavy threading layer specific tests should already be skipped by virtue of --exclude-tags='long_running' on the test runner line.

Another option, instead of picking a percentage of "random" tests, is to "slice" through the test suite with the slice syntax as used here:
https://github.com/numba/numba/blob/5f7bcce048b9c0a7fd02376ca8a2e3574625c99f/buildscripts/incremental/test.sh#L84

-j "$TEST_START_INDEX,None,$TEST_COUNT" is interpreted as

run_these_tests = list_of_all_tests[slice($TEST_START_INDEX,None,$TEST_COUNT)]

and in the case of Numba's own CI, there's TEST_COUNT CI build combinations and each one starts at a linearly increasing index from 0..TEST_COUNT, i.e. bound through the test suite running every TEST_COUNT^th test starting from index TEST_START_INDEX. This is not great but realistically the only way public CI can cope with Numba's test suite.

More info here:
https://numba.readthedocs.io/en/stable/developer/contributing.html#continuous-integration-testing

henryiii · 2021-03-17T21:54:48Z

10% of the tests passed on the first try. It cut the "long" time for Python 3.7 to 42-ish minutes, and 3.9 to 39 minutes. For Python 3.6 and 3.8, it ran on the newer hardware, taking 12.5 and <12 minutes, respectively. Almost tempted to try 15%. I think I will.

IIRC for a given environment the selection will be consistent across runs.

It could very well be, this was running at 59 minutes when it was not timing out, so it was easily possible the differences causing pass/fail were CI load related. If these are consistent, I think it's fine as long as they easily pass.

recipe/run_test.sh

henryiii · 2021-03-18T00:46:57Z

Haha, of course, that's the one time in the last 10 runs or so that py36 and py37 both get a fast node. 🤦 18 mins for 3.6 & 3.7, and 17 mins for 3.8. Looks like 3.9 got a slow node, and it took 53 minutes. Guessing around 55 mins for 3.6 and 3.7. I think that's pretty good, let's go with this for now, we can always refer back here and modify if things get more challenging in the future (or less, too).

Thanks everyone!

jakirkham · 2021-03-18T00:56:00Z

Sounds great! Thanks for working on this Henry 😄

henryiii added 3 commits March 14, 2021 13:55

feat: add tbb on macOS/win

240c61b

fix: tbb min version match

b7409ef

fix: explicit skip for PyPy

1ec63fd

henryiii requested review from jakirkham, marcelotrevisani, mbargull, souravsingh and step21 as code owners March 14, 2021 18:10

MNT: Re-rendered with conda-build 3.21.4, conda-smithy 3.9.0, and con…

2afc959

…da-forge-pinning 2021.03.13.19.08.27

henryiii changed the title ~~Support TBB, skip PyPy~~ Support TBB on non-Linux, skip PyPy Mar 15, 2021

stuartarchibald approved these changes Mar 16, 2021

View reviewed changes

henryiii mentioned this pull request Mar 16, 2021

DroneIO hangs randomly #74

Closed

henryiii force-pushed the henryiii/fix/tbb branch from e9b3bec to 2afc959 Compare March 16, 2021 18:44

tests: reduce to 10% of tests run on aarch64

a630da9

henryiii commented Mar 17, 2021

View reviewed changes

recipe/run_test.sh Outdated Show resolved Hide resolved

tests: try 15%

ff2bdaa

henryiii merged commit aa5136c into conda-forge:master Mar 18, 2021

henryiii deleted the henryiii/fix/tbb branch March 18, 2021 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support TBB on non-Linux, skip PyPy #73

Support TBB on non-Linux, skip PyPy #73

henryiii commented Mar 14, 2021 •

edited

Loading

conda-forge-linter commented Mar 14, 2021

henryiii commented Mar 14, 2021

henryiii commented Mar 14, 2021

henryiii commented Mar 15, 2021

stuartarchibald left a comment

henryiii commented Mar 16, 2021

henryiii commented Mar 16, 2021

henryiii commented Mar 16, 2021

henryiii commented Mar 17, 2021 •

edited

Loading

jakirkham commented Mar 17, 2021

henryiii commented Mar 17, 2021

isuruf commented Mar 17, 2021

jakirkham commented Mar 17, 2021 •

edited

Loading

henryiii commented Mar 17, 2021

henryiii commented Mar 17, 2021

stuartarchibald commented Mar 17, 2021

stuartarchibald commented Mar 17, 2021

henryiii commented Mar 17, 2021

henryiii commented Mar 18, 2021 •

edited

Loading

jakirkham commented Mar 18, 2021

Support TBB on non-Linux, skip PyPy #73

Support TBB on non-Linux, skip PyPy #73

Conversation

henryiii commented Mar 14, 2021 • edited Loading

conda-forge-linter commented Mar 14, 2021

henryiii commented Mar 14, 2021

henryiii commented Mar 14, 2021

henryiii commented Mar 15, 2021

stuartarchibald left a comment

Choose a reason for hiding this comment

henryiii commented Mar 16, 2021

henryiii commented Mar 16, 2021

henryiii commented Mar 16, 2021

henryiii commented Mar 17, 2021 • edited Loading

jakirkham commented Mar 17, 2021

henryiii commented Mar 17, 2021

isuruf commented Mar 17, 2021

jakirkham commented Mar 17, 2021 • edited Loading

henryiii commented Mar 17, 2021

henryiii commented Mar 17, 2021

stuartarchibald commented Mar 17, 2021

stuartarchibald commented Mar 17, 2021

henryiii commented Mar 17, 2021

henryiii commented Mar 18, 2021 • edited Loading

jakirkham commented Mar 18, 2021

henryiii commented Mar 14, 2021 •

edited

Loading

henryiii commented Mar 17, 2021 •

edited

Loading

jakirkham commented Mar 17, 2021 •

edited

Loading

henryiii commented Mar 18, 2021 •

edited

Loading