Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix threading when building OpenBLAS #9

Merged
merged 1 commit into from
Jul 26, 2016
Merged

fix threading when building OpenBLAS #9

merged 1 commit into from
Jul 26, 2016

Conversation

zym1010
Copy link
Contributor

@zym1010 zym1010 commented Jul 5, 2016

fix threading.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@ocefpaf
Copy link
Member

ocefpaf commented Jul 5, 2016

@zym1010 please bump the build number in the meta.yaml file to get a new build.

@zym1010 zym1010 changed the title Update build.sh fix threading when building OpenBLAS Jul 5, 2016
@zym1010
Copy link
Contributor Author

zym1010 commented Jul 5, 2016

@ocefpaf Done!

@ocefpaf
Copy link
Member

ocefpaf commented Jul 5, 2016

Also, if you are interested in this recipe, consider adding yourself in the maintainers list 😉

@jakirkham
Copy link
Member

So, we had initially disabled threading as there is a lot of bad behavior in OpenBLAS w.r.t. and it has several fun bugs as a consequence.

@jakirkham
Copy link
Member

See these two comments and related xref'd issue for details.

@zym1010
Copy link
Contributor Author

zym1010 commented Jul 5, 2016

Thanks. But I think those issues are fixed right? Also, without threading, it's basically useless...

@zym1010
Copy link
Contributor Author

zym1010 commented Jul 5, 2016

I used to compile OpenBLAS myself under Ubuntu 14.04 with default settings and they worked without a problem.

@zym1010
Copy link
Contributor Author

zym1010 commented Jul 5, 2016

Or, if really necessary, open a new fork of multithreaded OpenBLAS... I switched to conda-forge instead of Anaconda because sometimes they make crappy numpy and scipy packages for various reasons, yet their OpenBLAS is multithreaded. I don't want to lose the benefit of multithreaded OpenBLAS.

@zym1010
Copy link
Contributor Author

zym1010 commented Jul 5, 2016

@jakirkham I checked those issues you mentioned. Can't they be avoided by setting environment variable at runtime?

@jakirkham
Copy link
Member

Thanks. But I think those issues are fixed right?

As @njsmith says in his comment, they added a workaround. The problem is how they are scheduling threading and that has not changed. For instance, I'm looking at this issue ( OpenMathLib/OpenBLAS#900 ) and it looks like a variant on the exact same problem that he experienced.

@zym1010
Copy link
Contributor Author

zym1010 commented Jul 5, 2016

@jakirkham I see. But can they be avoided by setting flags to restrict threading at runtime, rather than falling back to single-thread for everyone when building? Since I think most people won't encounter such issue...

@patricksnape
Copy link

Just to weigh in, given that the user can set the behaviour both via an environment variable or a call into OpenBLAS (in C) it seems sensible that we would build with threading and let the user avoid their own pathological cases?

@@ -10,7 +10,7 @@ source:
md5: 805e7f660877d588ea7e3792cda2ee65

build:
number: 2
number: 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please bump to 4.

Copy link
Contributor Author

@zym1010 zym1010 Jul 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakirkham sorry for delay since I was traveling. It's done now.

@jakirkham
Copy link
Member

jakirkham commented Jul 23, 2016

@njsmith, do you have any idea as to the frequency that sched_yield problems occur for the user? In other words, how easy is it to land in a pathological threading case?

@njsmith
Copy link

njsmith commented Jul 23, 2016

Well, I did it once, and then they fixed that case. So no, not really. But presumably if it were happening to people every day (and in obvious ways), then they would be getting bug reports about it constantly, which doesn't seem to be the case.

@njsmith
Copy link

njsmith commented Jul 24, 2016

I guess another data point is that the numpy wheels are using openblas with threading enabled for a few months now, and we haven't gotten any complaints.

@jakirkham
Copy link
Member

Thanks for the data points and thanks for the proposal @zym1010. Given this seems to be working fine for a large number of users, I'm more then happy to enable it.

@jakirkham jakirkham merged commit 18a1d3a into conda-forge:master Jul 26, 2016
@jakirkham
Copy link
Member

jakirkham commented Jul 26, 2016

Appears the whole stack linking to OpenBLAS directly needs to be rebuilt. Unfortunately I missed this before merging. Sorry about that. Perhaps future BLAS changes should be enhancement proposals.

Trying to help move the stack forward to avoid brokenness. Though I can really use others help. Here is what I have done so far. If you depend on NumPy, you should be fine to proceed. If you depend on SciPy, please wait for the deployment of its rebuild you should be fine. Really this is basically done at this point.

cc @conda-forge/core @conda-forge/numpy @conda-forge/scipy @conda-forge/scikit-learn @conda-forge/suitesparse @conda-forge/bob-math @conda-forge/hmat-oss @conda-forge/openturns @conda-forge/coincbc @conda-forge/caffe @conda-forge/cvxopt @conda-forge/simbody

@jakirkham
Copy link
Member

Should add that if any builds are getting running into timeouts or getting segmentation faults. It may be that it is assuming to many threads on the CI. As we know 1 thread has worked in the past for these feedstocks (how they were built originally), please trying setting OPENBLAS_NUM_THREADS to 1 before building or testing so as to avoid using too many threads.

@jakirkham
Copy link
Member

jakirkham commented Jul 26, 2016

A question came up in another thread about whether this breaks NumPy extensions that don't use a BLAS.

I don't think so. It seems only things linking to the BLAS need to be rebuilt. This ends up being a much shorter list. That being said, I'm not thrilled with this breakage either. As a test, I have tried a couple numpy extensions (rank_filter, pyfftw) and they seem fine without a rebuild. Would be happy to test (or help others test) more things.

cc @moorepants @ocefpaf

@183amir
Copy link

183amir commented Jul 26, 2016

Since old packages that were built with older openblas do not work with this new openblas build, we need to hotfix all the old ones to make sure they don't get installed with this new openblas.

@jakirkham
Copy link
Member

jakirkham commented Jul 26, 2016

So as we have not decided as a community to go through with hotfixing ATM. See discussion in this PR ( conda-forge/conda-forge.github.io#170 ), I'm not really sure we can do that until we agree on hotfixing as a strategy. The old packages shouldn't be installed by default as the newer build number of OpenBLAS and the rebuilt packages is preferred. Thus by default the user gets a working environment once all PRs are merged.

However, I'm aware that you do like to use your old packages, @183amir. Pinning the build number would make this possible and this would be doable if hotfixing were permissible. Though it is not. I can see two other options.

  1. Delete the packages built with the old OpenBLAS and the old OpenBLAS.
  2. Relabel the old packages.

1 seems totally opposite to what you are trying to accomplish even though it "fixes" the problem. 2 could provide a legitimate fix that would work for you without trying to use hotfixing. If 2 is permissible, we should discuss how that relabeling is happening and to what.

This is a bit off topic, but I could see value in stamping packages with labels based on date or some similar granularity. This way people wanting old packages could pull everything in from that date without changing the packages themselves. Maybe worth a bullet point in PR ( conda-forge/conda-forge.github.io#170 )'s overview of strategies.

@jakirkham
Copy link
Member

It seems the Travis CI OS X workers are saturated. Not seeing anything building on our end (though it is not 100% obvious from the web interface for me so feel free to correct me). Maybe there is an issue they need to fix. This is probably going to hold up resolution of this issue.

@jakirkham
Copy link
Member

🎉 SciPy has been rebuilt! 🎉

@ocefpaf
Copy link
Member

ocefpaf commented Jul 26, 2016

A question came up in another thread about whether this breaks NumPy extensions that don't use a BLAS.

I don't think so.

You cannot know that. Some extensions may depend on numpy and openblas but have only numpy specified as dependency while it try to auto-discover the "blas" that numpy brought with it if any. (And note that is is fair as packaging BTW!)

Luckily it seems that scikit-learn had openblas specified and you found it. We will only know if there are others if people report breakages.

@jakirkham
Copy link
Member

jakirkham commented Jul 26, 2016

Some extensions may depend on numpy and openblas but have only numpy specified as dependency while it try to auto-discover the "blas" that numpy brought with it if any.

It is possible there are packages linking to the BLAS and not specifying it. However, this is no longer a BLAS specific problem. This is a general packaging problem on our end that needs addressing. Namely packages should list the dependencies they link to. Not doing so seems to have caused the issues we experienced with Qt. Though I agree that we should remain vigilant for similar issues.

Luckily it seems that scikit-learn had openblas specified and you found it. We will only know if there are others if people report breakages.

There was no luck involved. I investigated this when packaging scikit-learn and made sure that it was packaged appropriately.

@jakirkham
Copy link
Member

So basically we are down to openturns (of things we know of). The PR for it works. Simply a matter of merging it and getting it built at this point. Everything else has been rebuilt.

As mentioned by @ocefpaf, there could be packages that we did not find depending on OpenBLAS that do. Keep your eyes open for loading issues with OpenBLAS. It may indicate such a package is involved. Please let us know so we can help you fix it.

Thanks everyone for your help.

@jakirkham
Copy link
Member

openturns was rebuilt earlier today. Considering this resolved.

skupr-anaconda pushed a commit to AnacondaRecipes/openblas-feedstock that referenced this pull request Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants