Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TBB on non-Linux, skip PyPy #73

Merged
merged 6 commits into from
Mar 18, 2021

Conversation

henryiii
Copy link
Contributor

@henryiii henryiii commented Mar 14, 2021

Explicit skip for PyPy added, testing out including TBB (closes #43), and update the test line for TBB (closes #63). Also fixed the homepage link; GitHub will turn off the old redirects at some point.

  • feat: add tbb on macOS/win
  • fix: tbb min version match
  • fix: explicit skip for PyPy

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@henryiii
Copy link
Contributor Author

@conda-forge-admin, please rerender

@henryiii
Copy link
Contributor Author

I think this is working. Can I get another pair of eyes or two?

@henryiii
Copy link
Contributor Author

@stuartarchibald might be interested in verifying, too?

@henryiii henryiii changed the title Support TBB, skip PyPy Support TBB on non-Linux, skip PyPy Mar 15, 2021
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch.

I think the original restrictions around TBB were from a time when Numba supported a lot more architectures than were widely supported in the ecosystem. For what conda-forge builds against, removal of these restrictions seems valid.

Numba doesn't work on PyPy so that restriction makes sense and the GH homepage redirect seems fine.

Thanks again!

@henryiii
Copy link
Contributor Author

The Drone aarch64 builds are randomly hanging. I'm going to restore the aarch64 limiting for now; macOS and Windows seem to be just fine. (might be a Drone issue, or might be TBB related).

@henryiii
Copy link
Contributor Author

@conda-forge-admin, please rerender

@henryiii
Copy link
Contributor Author

Ahh, looking in the history, it looks like Drone has been randomly hanging for a while, doesn't appear to have anything to do with the addition of TBB. I'll revert back to the original, tbb everywhere version.

@henryiii
Copy link
Contributor Author

henryiii commented Mar 17, 2021

Yes, I think @jakirkham's analysis is correct. The higher versions of Python sometimes take 55 mins instead of 20 mins, which causes them to pass. 3.6 and 3.7 seem to be most vulnerable to failing with a 1 hour timeout. Is there some other infrastructure we can move to here? @chrisburr, do you know if there's an alternative to Drone like there is for Travis?

Another idea could be to skip running the tests on 3.6 and 3.7 for Drone, I think that frees up several minutes.

@jakirkham
Copy link
Member

For context, I think Henry is referring to the discussion in issue ( #74 )

Am not aware of an alternative for aarch64 hardware. We can just restart on master. We could use emulated builds on Azure, but this may be slower and am a little hesitant to use emulation with something as sophisticated as Numba. Alternatively maybe we skip some tests on aarch64

cc @isuruf (in case you have other thoughts)

@henryiii
Copy link
Contributor Author

We would only need to skip for py<38, since it seems the newer Pythons are a bit faster - the failures are almost always 3.6 and 3.7, and I can see 3.8 and 3.9 running "slow" (about 55 mins) sometimes, but they slip in under the limit usually.

Currently, it seems to only make it through 1/4 of the time or maybe less. I've restarted this about 6 times, always fails on one of 3.6, 3.7, and at least once failed with both.

@isuruf
Copy link
Member

isuruf commented Mar 17, 2021

You can try cross compilation and running tests using QEMU, or full emulation. Those are the two choices right now. Skipping a python version wouldn't work, because it depends on which machine gets assigned in drone CI (They have different types of machines and some are slower) and even if this version passes just under 55 mins, a new version might not.

@jakirkham
Copy link
Member

jakirkham commented Mar 17, 2021

Alternatively maybe we skip some tests on aarch64

Just want to highlight this option again. For example do we need to run all of the TBB tests there?

Edit: Or could we just run them on Python 3.8 given that seems to work reliably and these tests are not likely too dependent on Python versions?

@henryiii
Copy link
Contributor Author

even if this version passes just under 55 mins

It's possible Drone will update all their machines by the time the next version is released? Guessing the future of both the CI and the changes to Numba isn't very reliable. We can always change in the future if it fails on a new version.

We are currently, if I understand correctly, only running 20% of the tests on aarch64, maybe a first step would be to reduce that fraction? I wonder if that's why it's variable (3.6 and 3.7 do pass sometimes), it depends on which 20% it selects. Maybe there's a long test or two that can always be excluded?

@henryiii
Copy link
Contributor Author

echo 'Running only a slice of tests'
$SEGVCATCH python -m numba.runtests -b -j --random='0.20' --exclude-tags='long_running' -m $TEST_NPROCS -- numba.tests

@stuartarchibald
Copy link
Contributor

even if this version passes just under 55 mins

It's possible Drone will update all their machines by the time the next version is released? Guessing the future of both the CI and the changes to Numba isn't very reliable. We can always change in the future if it fails on a new version.

We are currently, if I understand correctly, only running 20% of the tests on aarch64, maybe a first step would be to reduce that fraction? I wonder if that's why it's variable (3.6 and 3.7 do pass sometimes), it depends on which 20% it selects. Maybe there's a long test or two that can always be excluded?

IIRC for a given environment the selection will be consistent across runs.

@stuartarchibald
Copy link
Contributor

Alternatively maybe we skip some tests on aarch64

Just want to highlight this option again. For example do we need to run all of the TBB tests there?

I think the really heavy threading layer specific tests should already be skipped by virtue of --exclude-tags='long_running' on the test runner line.

Another option, instead of picking a percentage of "random" tests, is to "slice" through the test suite with the slice syntax as used here:
https://github.com/numba/numba/blob/5f7bcce048b9c0a7fd02376ca8a2e3574625c99f/buildscripts/incremental/test.sh#L84

-j "$TEST_START_INDEX,None,$TEST_COUNT" is interpreted as

run_these_tests = list_of_all_tests[slice($TEST_START_INDEX,None,$TEST_COUNT)]

and in the case of Numba's own CI, there's TEST_COUNT CI build combinations and each one starts at a linearly increasing index from 0..TEST_COUNT, i.e. bound through the test suite running every TEST_COUNT^th test starting from index TEST_START_INDEX. This is not great but realistically the only way public CI can cope with Numba's test suite.

More info here:
https://numba.readthedocs.io/en/stable/developer/contributing.html#continuous-integration-testing

@henryiii
Copy link
Contributor Author

10% of the tests passed on the first try. It cut the "long" time for Python 3.7 to 42-ish minutes, and 3.9 to 39 minutes. For Python 3.6 and 3.8, it ran on the newer hardware, taking 12.5 and <12 minutes, respectively. Almost tempted to try 15%. I think I will.

IIRC for a given environment the selection will be consistent across runs.

It could very well be, this was running at 59 minutes when it was not timing out, so it was easily possible the differences causing pass/fail were CI load related. If these are consistent, I think it's fine as long as they easily pass.

recipe/run_test.sh Outdated Show resolved Hide resolved
@henryiii
Copy link
Contributor Author

henryiii commented Mar 18, 2021

Haha, of course, that's the one time in the last 10 runs or so that py36 and py37 both get a fast node. 🤦 18 mins for 3.6 & 3.7, and 17 mins for 3.8. Looks like 3.9 got a slow node, and it took 53 minutes. Guessing around 55 mins for 3.6 and 3.7. I think that's pretty good, let's go with this for now, we can always refer back here and modify if things get more challenging in the future (or less, too).

Thanks everyone!

@henryiii henryiii merged commit aa5136c into conda-forge:master Mar 18, 2021
@henryiii henryiii deleted the henryiii/fix/tbb branch March 18, 2021 00:47
@jakirkham
Copy link
Member

Sounds great! Thanks for working on this Henry 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

For the next build testing needs to use a newer TBB Reenable tbb on macOS? (and maybe windows?)
5 participants