Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.47 with llvmlite 0.31 #41

Merged
merged 13 commits into from
Jan 9, 2020
Merged

Conversation

henryiii
Copy link
Contributor

@henryiii henryiii commented Jan 4, 2020

Checklist

  • Used a fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Closes #39

Also closes #38 by including the commit from that PR.

Uses latest llvmlite.

@henryiii henryiii changed the title Version 0.31 Version 0.47 with llvmlite 0.31 Jan 4, 2020
@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@xhochy
Copy link
Member

xhochy commented Jan 4, 2020

Can you also include https://github.com/conda-forge/numba-feedstock/pull/38/files#diff-074883f7d957e193a6bcdd3c93e34b76 and rerender so that this also solves #38 ?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2020

Sure, wasn't sure it would be picked up in the migration properly if I did.

@xhochy
Copy link
Member

xhochy commented Jan 4, 2020

The migration will continue once #38 is closed for some time.

recipe/meta.yaml Outdated Show resolved Hide resolved
recipe/meta.yaml Outdated Show resolved Hide resolved
@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2020

For linux, numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb is timing out. Is tbb still optional?

And for windows, llvmlite doesn't support 2.7 I believe.

@marcelotrevisani
Copy link
Member

For linux, numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb is timing out. Is tbb still optional?

And for windows, llvmlite doesn't support 2.7 I believe.

According to the documentation it is optional
https://github.com/numba/numba#dependencies

@marcelotrevisani
Copy link
Member

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2020

  • Python 3.6, 3.7 Linux: numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrent_jit_tbb timed out
  • Python 3.8: test_exercise_code_path_with_lifted_loop (numba.tests.test_annotations.TestAnnotation) fails (all 3 OS's) (also warns about switching to Python mode a couple of times)
  • Python 2.7 on Windows fails to solve

The others pass.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 4, 2020

Why does build.sh use $TESTS_TO_RUN when it is not defined/empty? Not related, just curious.

@mbargull
Copy link
Member

mbargull commented Jan 4, 2020

Is tbb still optional?

Yes, but we want to have it in the tests to make sure it works if installed.

  • Python 3.6, 3.7 Linux: numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrent_jit_tbb timed out

This one is odd and probably needs some more investigation...

  • Python 3.8: test_exercise_code_path_with_lifted_loop (numba.tests.test_annotations.TestAnnotation) fails (all 3 OS's) (also warns about switching to Python mode a couple of times)

This seems to be a failure that hadn't been caught by Numba's CI due to some unfortunate circumstances. I'll open a PR for Numba; not to fix it, but to make them aware of this.

  • Python 2.7 on Windows fails to solve

IMHO, we can ignore this for now. If there is demand to get this working, someone can tackle this on the llvmlite-feedstock (possibly with help from someone from Anaconda since the defaults and numba channels have builds for Windows and PY2).

Why does build.sh use $TESTS_TO_RUN when it is not defined/empty? Not related, just curious.

This comes from numba/numba@fed89e3:

[...] if the env var is not set it defaults to all tests.

@mbargull
Copy link
Member

mbargull commented Jan 6, 2020

I couldn't reproduce the tbb-related timeouts locally. Maybe it was just a hiccup (though, strange it happens on two runs..) or something specific to the CI machine? Let's see if it happens again.

@mbargull
Copy link
Member

mbargull commented Jan 6, 2020

The previous timeouts were indeed only hiccups, apparently.
This seems to be good to go and we can merge, IMHO, albeit with the following shortcomings:

  1. Annotations might not work as expected on Python 3.8.
  2. Windows build for Python 2.7 fails.

re 1.: This is an upstream issue and possibly (probably?!) not that important that it warrants blocking Numba builds for Python 3.8 on conda-forge.

re 2.: We can either merge as is and just rerun the CI once Python 2.7 builds of llvm 0.31 are available on Windows, or we can add skip: True # [win and py=27]. I'm fine with merging as-is.
(NB: numba 0.48 will likely remove Python 2.7 support, which can mean one of two things: 1. We could go ahead and drop those builds already for 0.47 because the "ship is sailed" anyway. 2. But there might be increased demand for the "last Numba version that supports Python 2.7". ¯\_(ツ)_/¯ )

@stuartarchibald
Copy link
Contributor

The previous timeouts were indeed only hiccups, apparently.

The threading backend tests are compute heavy (runs many forking/threading/multiprocessing things along with the compiler and also threaded execution, all at the same time), Numba does not run them on public CI https://github.com/numba/numba/blob/88e3dad3d43a59c79e6295f72fc344d77f13330c/buildscripts/incremental/test.sh#L75 (--exclude-tags='long_running').

This seems to be good to go and we can merge, IMHO, albeit with the following shortcomings:

1. Annotations might not work as expected on Python 3.8.

The fail is because a loop didn't lift and the test was asserting that a loop did, annotations should work fine still. This is the sample code, with the annotations call added, runs fine and reports correctly:

from numba import jit

def bar(x):
    return x

@jit
def foo(x):
    h = 0.
    for k in range(x):
        h = h + k
    if x:
        h = h - bar(x)
    return h


foo(10)

cres = foo.overloads[foo.signatures[0]]
ta = cres.type_annotation
with open("annotated.html", 'wt') as f:
    ta.html_annotate(f)

will fix for 0.48.0.

2. Windows build for Python 2.7 fails.

re 1.: This is an upstream issue and possibly (probably?!) not that important that it warrants blocking Numba builds for Python 3.8 on conda-forge.

re 2.: We can either merge as is and just rerun the CI once Python 2.7 builds of llvm 0.31 are available on Windows, or we can add skip: True # [win and py=27]. I'm fine with merging as-is.
(NB: numba 0.48 will likely remove Python 2.7 support, which can mean one of two things: 1. We could go ahead and drop those builds already for 0.47 because the "ship is sailed" anyway. 2. But there might be increased demand for the "last Numba version that supports Python 2.7". ¯_(ツ)_/¯ )

@stuartarchibald
Copy link
Contributor

For linux, numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb is timing out. Is tbb still optional?

And for windows, llvmlite doesn't support 2.7 I believe.

TBB is optional, however if you build Numba without tbb-devel then obviously at run time it cannot use TBB regardless of whether a tbb package is present. Lack of TBB would lead to some parallel execution features being unavailable on certain platforms.

@stuartarchibald
Copy link
Contributor

I assume this conda-forge/llvmlite-feedstock#13 is where Python 2.7 got removed as a result of not being able to resolve these build problems: https://ci.appveyor.com/project/conda-forge/llvmlite-feedstock/builds/22085156/job/7vrdo4g30l9wwesn ? The errors there look like they could be from mixed tooling? Is perhaps conda-forge LLVM built with VS2017 and then llvmlite with VS2015?

@mbargull
Copy link
Member

mbargull commented Jan 6, 2020

The threading backend tests are compute heavy

Yeah, I figured the CI might've had a general slowdown and/or didn't handle the concurrency well. Locally, the two reported tests ran in about 2.5 and 3.5 seconds with all tests from numba.tests.test_parallel_backend.TestSpecificBackend taking about 95.5 seconds in total (i.e., far lower than the 600 (540) second timeout for a single test).

annotations should work fine still.

Great, thanks for taking a look at it!

Is perhaps conda-forge LLVM built with VS2017 and then llvmlite with VS2015?

I have very limited knowledge on that topic. If he's available, @isuruf would know, I guess.

@mbargull
Copy link
Member

mbargull commented Jan 6, 2020

@henryiii, you may want to take a look at those two:

Those seem to additionally add vs2015_runtime to the host (not build) requirements. As to why and what the details are, I can't say. But I'd just try to match those as a start.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 6, 2020

I think we should drop Windows + 2.7 for now, then add it back if llvmlite gets a Windows 2.7 build in the future - I don't want to block this for 2.7 on Windows in 2020. We shouldn't even need to bump the build number, I believe (if that's all that changes).

Copy link
Member

@mbargull mbargull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Does anyone else have objections?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 6, 2020

At least one test needs to be restarted to avoid the timeout. Is there a way to increase that timeout?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 6, 2020

The timeout seems to be 600 seconds / 10 minutes?

https://github.com/numba/numba/blob/0.47.0/numba/testing/main.py#L699-L700

Edit: I apparently can't read, this is listed above.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 6, 2020

numba-feedstock (linux linux_python3.7) needs to be restarted.

@jakirkham
Copy link
Member

Thanks restarted.

@jakirkham
Copy link
Member

@henryiii @mbargull, would either or both of you be interested in being maintainers here? 🙂

@henryiii
Copy link
Contributor Author

henryiii commented Jan 6, 2020

would either or both of you be interested in being maintainers here?

Sure, I'd be okay with it.

@jakirkham
Copy link
Member

Great! Could you please add yourself to the maintainers' list in the recipe?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 7, 2020

Let's see what @mbargull says, then I'll add either myself or both of us.

I'm especially interested in the result of his PR conda-forge/llvmlite-feedstock#26, looks like that is building for Windows + 2.7.

Edit: Looks like that works, but would require a CFEP to be written in order to be merged (VS 17 + Python 2.7 is not allowed). So this cannot be provided for Windows + 2.7 unless it can be built (it == LLVM) with MSVC 2008 - which I assume is not likely.

@mbargull
Copy link
Member

mbargull commented Jan 8, 2020

Given the time the tests take on my local run, I'm really baffled that we are hitting that timeout...
I'll add a workaround to increase the timeout.
And yes, given my interest in the project, it would make sense to add me as a maintainer here. Will do.

@stuartarchibald
Copy link
Contributor

Note that Numba CI does not run the tests that are timing out (--exclude-tags='long_running')
https://github.com/numba/numba/blob/709967b7a6ce6c036c70f5eb99b06a8ef8c7172d/buildscripts/incremental/test.sh#L73-L75 because they have prohibitively long run times on public CI hardware. The Numba build farm does run these tests.

@stuartarchibald
Copy link
Contributor

There's a bug, that we (Numba core devs) have internally named "Frosty Thompson" after the docker container that first got stuck with it. Frosty Thompson's are basically where something goes wrong in TBB and it "gets stuck", we've been trying to pin it down for a while. It seems to happen most often on heavily loaded systems and always on linux. The place it "gets stuck" is here: https://github.com/intel/tbb/blob/18070344d755ece04d169e6cc40775cae9288cee/src/tbbmalloc/backend.cpp#L301-L327, essentially the loop invokes longer and longer pauses (and eventually ends up yielding) whilst waiting for a condition to change that never does. Needless to say we're trying to debug and get a reproducer for this. If longer timeouts still don't "fix" the failing test case(s) I'd be very suspicious that you've hit the first case of a Frosty Thompson outside the Numba build farm. It should also be noted that the test cases in numba.tests.test_parallel_backend deliberately place a huge amount of strain on a system, there's concurrent compilation and execution (both single and multithreaded Numba code some of which has nested parallelism), Python based fork/forkserver/spawn/threading all happening at the same time!

@stuartarchibald
Copy link
Contributor

The way in which 3.6/3.7/3.8 are all failing, different TBB tests timing out, suggests Frosty Thompson. CC @seibert.

@mbargull
Copy link
Member

mbargull commented Jan 8, 2020

If longer timeouts still don't "fix" the failing test case(s) I'd be very suspicious that you've hit the first case of a Frosty Thompson outside the Numba build farm.

I was thinking an additional 50% / 5 minutes timeout increase should really suffice in case things were only "slow". But I was also suspicious if some potential locking or contention issue may be at hand and whether we should just --exclude-tags='long_running' here, too.

"Frosty Thompson" it is then :D. Thanks for your clarifications!

@stuartarchibald
Copy link
Contributor

I've no idea if you can shell into the build containers, but if you can, Frosty Thompson symptoms include:

  • the machine having no load but the test process(es) still running
  • if gdb is attached to each of the child processes of the TBB tests, one will be stuck in that loop described above, most likely with a backtrace starting at sched_yield syscall.

@mbargull
Copy link
Member

mbargull commented Jan 9, 2020

conda-forge/llvmlite-feedstock#26 is being blocked and I can't tell what will come of it.
Merging this as is, i.e. without Py2.7 build on Windows.
Thanks everyone.

@mbargull mbargull merged commit c2846ee into conda-forge:master Jan 9, 2020
@henryiii henryiii deleted the henryiii-0.31 branch January 9, 2020 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants