Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build 1.8.0 #198

Merged
merged 14 commits into from Feb 6, 2022
Merged

build 1.8.0 #198

merged 14 commits into from Feb 6, 2022

Conversation

h-vetinari
Copy link
Member

More submodules! 🙃

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

I updated the import tests in light of scipy/scipy#14360.

All the previously tested underscore-less imports are mentioned under PRIVATE_BUT_PRESENT_MODULES, and therefore can be removed. The only exception is scipy.signal.sigtools, which presumably was an oversight. I commented upstream.

@h-vetinari
Copy link
Member Author

@rgommers @tylerjereddy

There seems to be a segfault with the windows builds regarding PROPACK, and it's not ameliorated by moving to vs2019.

[...]
sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_simple[aslinearoperator-True-False-3-A1] PASSED [ 63%]
sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_simple[aslinearoperator-True-False-4-A0] SKIPPED [ 63%]
sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_simple[aslinearoperator-True-False-4-A1] PASSED [ 63%]
Windows fatal exception: access violation

Thread 0x00001a0c (most recent call first):
  File "C:\bld\scipy_1639306043924\_test_env\lib\threading.py", line 306 in wait
  File "C:\bld\scipy_1639306043924\_test_env\lib\thWindows fatal exception: raccess violatione

ading.py", line 558 in wait
  File "C:\bld\scipy_1639306043924\_test_env\lib\threading.py", line 1252 in run
  File "C:\bld\scipy_1639306043924\_test_env\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\bld\scipy_1639306043924\_test_env\lib\threading.py", line 890 in _bootstrap

Current thread 0x00001570 (most recent call first):
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\scipy\sparse\linalg\_svdp.py", line 307 in _svdp
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\scipy\sparse\linalg\_eigen\_svds.py", line 316 in svds
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\scipy\sparse\linalg\_eigen\tests\test_svds.py", line 579 in test_svd_linop
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\_pytest\python.py", line 183 in pytest_pyfunc_call
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\_pytest\python.py", line 1641 in runtest
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\_pytest\runner.py", line 162 in pytest_runtest_call
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "C:\bld\scipy_1639306043924\_test_env\lib\site-packages\scipy\_lib\_testutils.py", line 69 in __call__
  File "<string>", line 1 in <module>
sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_linop 
(%PREFIX%) %SRC_DIR%>IF -1073741819 NEQ 0 exit /B 1

I don't have time to investigate this right now, but wanted to let you know.

@rgommers
Copy link
Contributor

Argh. Cc @mckib2, @mdhaber. Maybe one of you has seen this before or has an idea?

@mdhaber
Copy link

mdhaber commented Dec 12, 2021

I don't remember segfaults while working with PROPACK. Do you, @mckib2?
I'll check the development issues/PRs though.
Update: we did get segfaults on Windows during development, but we thought they were fixed. I'm not sure if this is the same kind. We'll take a closer look.

@mckib2
Copy link

mckib2 commented Dec 12, 2021

No, I these segfaults look like a new development to me. I'm hobbled on Windows right now (can't figure out how to build scipy), but I'll try again this evening to see if I can move the needle. Are there updates to the existing Windows build guide we're aware of that just haven't been written up yet?

@mckib2
Copy link

mckib2 commented Dec 12, 2021

This seems very related: #15108

These jobs are using conda-forge which looks like it doesn't use OpenBLAS

@h-vetinari
Copy link
Member Author

OK, the bug seems to be related to blas/lapack, and more specifically, involving MKL. In particular, this combination

    libblas:                 3.9.0-12_win64_mkl        conda-forge
    libcblas:                3.9.0-12_win64_mkl        conda-forge
    liblapack:               3.9.0-12_win64_mkl        conda-forge
    mkl:                     2021.4.0-h0e2418a_729     conda-forge

failed, while the reference lapack works:

    libblas:                 3.9.0-5_hd5c7e75_netlib   conda-forge
    libcblas:                3.9.0-5_hd5c7e75_netlib   conda-forge
    liblapack:               3.9.0-5_hd5c7e75_netlib   conda-forge

I'll check if the issue also appears with openblas, but since MKL is the default in conda-forge for windows, this definitely needs fixing, IMO.

@h-vetinari
Copy link
Member Author

The segfault also appears with openblas, but - interestingly - not with blis.

@mckib2
Copy link

mckib2 commented Dec 15, 2021

From the CI failures, it looks like only builds on Windows fail? I've been at it for hours again and still haven't been able to resurrect my Windows build, so I'm not sure what kind of help I can be until I get this sorted on my end

@h-vetinari
Copy link
Member Author

From the CI failures, it looks like only builds on Windows fail?

Yes, it's only windows. Actually, even more specifically - only with the OpenBLAS or MKL flavours of blas/lapack; Netlib & Blis don't run into the same issues for some reason.

@h-vetinari
Copy link
Member Author

New segfault on osx64, but only for python 3.10. Haven't verified all that has changed in the last 2 days, but at least pythran 0.11 got released, which might be a hint?

sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_LOBPCG::test_svd_simple[asarray-False-True-4-A0] SKIPPED [ 62%]
sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_LOBPCG::test_svd_simple[asarray-False-True-4-A1] SKIPPED [ 62%]
sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_LOBPCG::test_svd_simple[asarray-False-False-1-A0] PASSED [ 62%]
Fatal Python error: Segmentation fault

Current thread 0x000000010b4bfdc0 (most recent call first):
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/scipy/linalg/_decomp.py", line 547 in eigh
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/lobpcg/lobpcg.py", line 361 in lobpcg
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/_svds.py", line 309 in svds
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/tests/test_svds.py", line 497 in test_svd_simple
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/python.py", line 1641 in runtest
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/runner.py", line 255 in <lambda>
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/runner.py", line 311 in from_call
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/runner.py", line 215 in call_and_report
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/_pytest/config/__init__.py", line 162 in main
  File "/Users/runner/miniforge3/conda-bld/[...]/lib/python3.10/site-packages/scipy/_lib/_testutils.py", line 69 in __call__
  File "<string>", line 1 in <module>

CC @serge-sans-paille

@rgommers
Copy link
Contributor

There is no Pythran usage in sparse.linalg. Also, we haven't enabled Pythran 0.11 yet in the 1.8.x release branch, because it was released after we created the branch - current condition is >=0.9.12,<0.11. We may bump it now.

@h-vetinari
Copy link
Member Author

@martin-frbg, maybe you could opine here? There's a segfault in the new scipy release with openblas (also happens for MKL), but not with netlib.

The segfault occurs in the test suite, but the following is enough to trigger very likely the same problem on my windows machine (using the artefacts from this PR**; adapted from here):

import numpy as np
from scipy.sparse.linalg import svds

a = np.array([[1, 2, 4], [4, 5, 4], [4, 4, 1]], dtype=np.complex64)
result = svds(a, k=1, solver='propack')

** to use the build from this PR, download the artefact, unzip it, then unzip it again, and once you have a folder containing channeldata.json, do:

conda create -n test_env -c "path/to/unpacked/folder" -c conda-forge scipy libblas=*=*openblas

@WarrenWeckesser
Copy link

From the CI failures, it looks like only builds on Windows fail?

Yes, it's only windows.

At this point, we don't know if it is the same issue as the crashes encountered here, but the seg. fault that I reported in scipy/scipy#15108 is on Linux. I get the seg. fault when I build SciPy with conda-installed numpy (so it gets MKL), but not when I pip-install everything (which uses OpenBLAS).

In my few experiments, the crashes only occurred with complex input. If that is the case in general, a pragmatic work-around might be to disallow complex input when the PROPACK method is used in SciPy 1.8.0, and see if the problem with complex input can be fixed for 1.8.1.

@martin-frbg
Copy link

Any chance to see a backtrace (faulthandler.py or whatever) - I am not too keen to install the whole thing, conda and all on Windows right now.

@tylerjereddy
Copy link

One thing I'd appreciate here is a quick ping letting me know if, in your opinion, I should delay the release of the next SciPy release candidate (scheduled Monday December 20th, tentatively) based on the observations here? Or if we should continue approximately on that schedule but likely consider an rc3.

@h-vetinari
Copy link
Member Author

Any chance to see a backtrace (faulthandler.py or whatever)

Not much info I'm afraid...

>python -q -X faulthandler
>>> import numpy as np
>>> from scipy.sparse.linalg import svds
>>> a = np.array([[1, 2, 4], [4, 5, 4], [4, 4, 1]], dtype=np.complex64)
>>> result = svds(a, k=1, solver='propack')
Windows fatal exception: access violation

Current thread 0x000013e4 (most recent call first):
  File "C:\Users\[xxx]\.conda\envs\test\lib\site-packages\scipy\sparse\linalg\_svdp.py", line 307 in _svdp
  File "C:\Users\[xxx]\.conda\envs\test\lib\site-packages\scipy\sparse\linalg\_eigen\_svds.py", line 316 in svds
  File "<stdin>", line 1 in <module>

@mckib2
Copy link

mckib2 commented Dec 19, 2021

One thing I'd appreciate here is a quick ping letting me know if, in your opinion, I should delay the release of the next SciPy release candidate (scheduled Monday December 20th, tentatively) based on the observations here? Or if we should continue approximately on that schedule but likely consider an rc3.

Going to spin up a VM and try to replicate the issue and get a backtrace. I'll let you know hopefully by the end of tonight what I opinion is

@h-vetinari
Copy link
Member Author

I tried running this with the Propack PR mentioned in scipy/scipy#15108, but the segfault persists.

@h-vetinari
Copy link
Member Author

One thing I'd appreciate here is a quick ping letting me know if, in your opinion, I should delay the release of the next SciPy release candidate (scheduled Monday December 20th, tentatively) based on the observations here? Or if we should continue approximately on that schedule but likely consider an rc3.

Currently, the status is "unreleaseable" from the conda-forge side. Unfortunately it's unclear so far what's causing the issue, resp. what the fix would be. However, it's not like the scipy release needs to wait for conda-forge.

More specifically to your question, I think more RCs is good - i.e. release rc2 soon, and keep rc3 as an option - I see some propack-changes in the rc2-PR, and as soon as there's a tag, I'll get it tested. Bringing in accumulated changes will hopefully help chip away at the problem.

@mckib2
Copy link

mckib2 commented Dec 19, 2021

Assuming this is an MKL issue and not Windows error, I left an update here on current progress at resolving these segfaults. What I've found is all the changes I've tested in scipy/PROPACK#1 are good (I want to test out all of them out with something other than MKL and get the present issues resolved before I approve and merge) and a simple macro definition seems to solve complex64 issues completely. I have not attempted a Windows build again, but I will put up a PR with my edits in case someone wants to try to make sure the fixes are seen there as well

@h-vetinari
Copy link
Member Author

Assuming this is an MKL issue and not Windows error

So far it only appears on windows (and not just MKL, but also openblas), so I think it definitely is at least somewhat of a windows issue. In any case, I've now also added testing of linux/osx against the various blas flavours - then we'll see if it appears for MKL on other platforms as well.

Given that you mention complex128 problems in the upstream issue, it could be that something about that data-type isn't working correctly on windows?

@mckib2
Copy link

mckib2 commented Dec 19, 2021

Given that you mention complex128 problems in the upstream issue, it could be that something about that data-type isn't working correctly on windows?

I was doing all testing on Linux, so if complex128 data-types are an issue, it's a cross-platform one

@h-vetinari
Copy link
Member Author

h-vetinari commented Dec 19, 2021

So MKL also segfaults on osx, but not on linux, where it's "merely" a single test failure. The matrix (for x64 only) is:

blas
flavour
blis mkl netlib openblas comment
linux ✔️ ✔️ ✔️ one test failure; test_x0_equals_Mb[bicgstab];
fails in the presence of AVX512, passes for non-AVX512
osx ❌❌ ✔️ ✔️ segfaults for mkl;
3 test failures for blis
windows ✔️ ❌❌ ✔️ ❌❌ segfaults for mkl & openblas

@martin-frbg
Copy link

Any chance to see a backtrace (faulthandler.py or whatever)

Not much info I'm afraid...

weird. I would have hoped to see which BLAS or LAPACK function is failing. Right now it is not even certain that MKL and OpenBLAS hit the same problem. What I can say for OpenBLAS is that 0.3.18 has a potential out-of bounds access bug in ?TRSV which could be on the code path of the SVD algorithm you (or Eigen) use - I am about to release 0.3.19.

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2022

thanks @h-vetinari!

@tylerjereddy
Copy link

fyi, I pushed up the v1.8.0 tag on our end; working through release process..

@h-vetinari h-vetinari changed the title WIP: build 1.8.0rc's build 1.8.0 Feb 6, 2022
@h-vetinari h-vetinari marked this pull request as ready for review February 6, 2022 01:26
@h-vetinari
Copy link
Member Author

fyi, I pushed up the v1.8.0 tag on our end; working through release process..

I'm ready to push the merge button any time. From then on, it takes about ~3h until the packages become installable.

@h-vetinari
Copy link
Member Author

OK, given that I see the wheels on the GH release page, I think there's no need to wait anymore.

@h-vetinari h-vetinari merged commit 152b1db into conda-forge:master Feb 6, 2022
@h-vetinari h-vetinari deleted the 1.8 branch February 6, 2022 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants