Skip to content

Conversation

@garrettwrong
Copy link
Collaborator

For testing purposes.

@garrettwrong garrettwrong added the extern Relating to external changes label Aug 21, 2024
@garrettwrong garrettwrong self-assigned this Aug 21, 2024
@garrettwrong
Copy link
Collaborator Author

Initial findings:

The osx-arm runner is reporting Fatal Python error: Illegal instruction. Likely something is trouble in that wheel. I can test manually on an m1 in both arm and intel conda installs when I'm home to confirm.

The intel osx wheel appears to not crash, but required loosening the epsilon tolerance check for FLE from 1epsilon to 1.5epsilon.

Linux wheel was fine on caf, and actually did not require changing eps tolerance. This suggests the accuracy issue may be specific to OSX. I'll resubmit with the eps change only applied to darwin.

@garrettwrong
Copy link
Collaborator Author

Going to mark ready for review so the CI jobs run. No intention of merging at this time.

@garrettwrong garrettwrong marked this pull request as ready for review August 21, 2024 17:24
@garrettwrong garrettwrong requested a review from janden as a code owner August 21, 2024 17:24
@garrettwrong
Copy link
Collaborator Author

I'm seeing weird test failures relating to basic logging failing to be emitted/captured on linux and mac. Can see below that the captured log is empty. I've never seen this in our suite and they go away if I back down to 2.2.0 (or rerun the test in isolation).

FAILED tests/test_utils.py::test_log_filter_by_count - AssertionError: assert 'A is for ASCII' in ''

@garrettwrong
Copy link
Collaborator Author

Minor wrinkle, my M1 macs are all on some form of Ventura (13.x) but the ARM wheel is only 14+. I'm going to upgrade one and try again.

Might consider supporting older versions as I think they are more common than 14 at this time...

@garrettwrong
Copy link
Collaborator Author

Upgraded to new OS (14.6.1).... in brief, breaks python 3.8; numpy/scipy have blas/lapack issues on this newer OS; bummer (not relating to finufft, but a general problem with this combo)....

Tried 3.9. Initially hit problems with pyshtools. I should drop that package, it's not maintained well enough for what we need... my mistake for bringing it in. I'll make an issue (we've already discussed it, will just reprioritize refactoring the sph_harm calls in FB). Removed pyshtools and associated code; everything else seems fine for python 3.9 on M1.

Unfortunately this has not reproduced the specifc issue on the osx-arm runner, which appears to use a different OS minor version than my machine upgraded to.... I'll need to look into that further.

@lu1and10
Copy link

@garrettwrong I see the osx-arm ci failing because of picking up the x86 wheel not the arm wheel. Any idea? On my local m1 or m2, it's picking the correct arm wheel in https://pypi.org/project/finufft/2.3.0rc1/#files
Screenshot 2024-08-22 at 3 22 20 PM

@lu1and10
Copy link

lu1and10 commented Aug 22, 2024

@garrettwrong it seems in the old successful osx-arm ci runs also picked the x86 wheels for all the packages but worked,
And the conda env is setup the architecture: x64, I'm a bit confused why the ci running on arm image but the conda env is set to x64.
Screenshot 2024-08-22 at 3 38 35 PM

@garrettwrong
Copy link
Collaborator Author

Hi Libin, yes, that is what we've recommended users run as more packages are supported via x86 conda. My expectation is that should work.... (as it has been).

On my personal M1 machine I am running conda using the ARM install. I ran into issues using python 3.8 after upgrading my OS to use the FI arm wheel. 3.9 was okay.

@lu1and10
Copy link

lu1and10 commented Aug 22, 2024

Hi Libin, yes, that is what we've recommended users run as more packages are supported via x86 conda. My expectation is that should work.... (as it has been).

It seems falling with the new wheel, if you run some x86_64 generated wheel which has x86_64 instruction sets, is it meaning the conda env is running under some x86_64 virtual machine on arm github hosted machine?
The illegal instruction error in the ci seems to mean arm machine trying to run x86_64 instructions from the x86 wheel?

On my personal M1 machine I am running conda using the ARM install. I ran into issues using python 3.8 after upgrading my OS to use the FI arm wheel. 3.9 was okay.

I'm able to run python 3.8, 3.9, 3.11 and 3.12 on my local m1 machine, what is the error for your 3.8.

@garrettwrong
Copy link
Collaborator Author

Hi Libin, yes, that is what we've recommended users run as more packages are supported via x86 conda. My expectation is that should work.... (as it has been).

It seems falling with the new wheel, if you run some x86_64 generated wheel which has x86_64 instruction sets, is it meaning the conda env is running under some x86_64 virtual machine on arm github hosted machine? The illegal instruction error in the ci seems to mean arm machine trying to run x86_64 instructions from the x86 wheel?

On my personal M1 machine I am running conda using the ARM install. I ran into issues using python 3.8 after upgrading my OS to use the FI arm wheel. 3.9 was okay.

I'm able to run python 3.8, 3.9, 3.11 and 3.12 on my local m1 machine, what is the error for your 3.8.

Hi Libin the issue with 3.8 on the latest OSX is relating to some lapack callsscipy. The issue is known and was fixed in scipy 1.13, but this is not available for python 3.8. It is not relating to FINUFFT; just an unfortunate casualty of the ARM wheel not supporting the OS I had available... Its not something that I need help with, but the wheel probably should have supported the OS I had...(its more common than the latest).

The x86_64 wheel should run on the osx_arm platform (or any M1 machine) using the x86_64 based conda install. An arm wheel should run on osx_arm (or any M1 machine) using an arm based conda install.

Currently we only automatically check the former in our CI. Until recently our code depended on (py)FFTW which was causing a lot trouble on M1 native arm installs before... trouble like illegal instructions.... This is mostly because they just don't support that platform, but it was not a problem under x86 using the x86 installs. To use M1 native I had to alter sources and manually build FFTW and pyFFTW etc. Further, they were not the only packages to have issues on native arm, so the safer route was to recommend and test x86_64.

I'd like to support and put both under test but we'll need to remove at least one other problem package first. Thanks

@lu1and10
Copy link

The x86_64 wheel should run on the osx_arm platform (or any M1 machine) using the x86_64 based conda install.

Do you know which CPU instruction sets conda x86_64 env support if it's the conda x86_64 install running on Mac arm arch machine? CPU instruction sets I mean the Intel sse, avx, avx2, avx512 etc.

@lu1and10
Copy link

@garrettwrong I guess the conda x86 env on Mac arm doesn't support avx2 instructions. That's why the illegal instruction set error comes.

I'm guess running low end x86 instructions with an emulator on Mac arm machine will be slow compared to run everything with native arm instructions.

@garrettwrong
Copy link
Collaborator Author

Correct, Rosetta does not provide AVX emulation. Is there not a way to provide a SSE fallback or just not use AVX in the
OSX x86_64 code/wheels? I'm not convinced a minor speed up or slow down on a mac laptop is that important. All the prior releases were functioning for that platform combo...

So the other main thing remaining to sort out is how this FINUFFT release is interacting with logging. I suspect it has something to do with the combo of multiprocessing/threading, but haven't looked into it further yet. Where there any logging code changes?

@janden
Copy link
Collaborator

janden commented Aug 27, 2024

We're recompiling the wheels without -march=native now. Somehow we missed that this was going on.

Regarding the logging, we're not sure what's happening there. There is no logging in the finufft package. Could it be that it triggers updates of some of the dependencies that may interfere with the logging?

@garrettwrong
Copy link
Collaborator Author

We're recompiling the wheels without -march=native now. Somehow we missed that this was going on.

Thanks.

Cool, I think that happened a few times before with the x86_64 wheels getting compiled on a specific machine and not working on our x86_64 linux machine. I didn't think of it this time... the linux wheels worked :).

Regarding the logging, we're not sure what's happening there. There is no logging in the finufft package. Could it be that it triggers updates of some of the dependencies that may interfere with the logging?

I'm not sure about that. There is definitely a warning capture context (that is most likely not thread safe...). I think I might have put that in before because the last wheel was throwing out all other packages warnings/logs :D :D ... The context patch did work as expected when tested before, no side effects then. Tomorrow I can try to check if it is that warning context, but I'll need to build finufft locally to do so.

Good idea. I checked but it does not appear other any other packages change.

(finup) ➜  ~ pip freeze > f2220     
(finup) ➜  ~ pip freeze | grep finuf
finufft==2.2.0
(finup) ➜  ~ pip install finufft==2.3.0rc1
Collecting finufft==2.3.0rc1
  Using cached finufft-2.3.0rc1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)
Requirement already satisfied: numpy>=1.12.0 in /scratch/gbwright/miniconda3/envs/finup/lib/python3.8/site-packages (from finufft==2.3.0rc1) (1.24.4)
Using cached finufft-2.3.0rc1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
Installing collected packages: finufft
  Attempting uninstall: finufft
    Found existing installation: finufft 2.2.0
    Uninstalling finufft-2.2.0:
      Successfully uninstalled finufft-2.2.0
Successfully installed finufft-2.3.0rc1
(finup) ➜  ~ pip freeze > f2230rc1        
(finup) ➜  ~ diff f2220 f2230rc1 
48c48
< finufft==2.2.0
---
> finufft==2.3.0rc1
(finup) ➜  ~ 

@garrettwrong
Copy link
Collaborator Author

@janden . After our discussion this morning I found that commenting out the following lines resolves the logging issue in our pytest. Obviously this breaks finufft, but maybe helps narrow things down.

diff --git a/python/finufft/finufft/_interfaces.py b/python/finufft/finufft/_int
erfaces.py
index 80b3b8b9..ec2edcb6 100644
--- a/python/finufft/finufft/_interfaces.py
+++ b/python/finufft/finufft/_interfaces.py
@@ -148,12 +148,12 @@ class Plan:
             self._execute = _finufft._execute
             self._destroy = _finufft._destroy
 
-        ier = self._makeplan(nufft_type, dim, n_modes, isign, n_trans, eps,
-                             byref(plan), opts)
+        # ier = self._makeplan(nufft_type, dim, n_modes, isign, n_trans, eps,
+        #                      byref(plan), opts)
 
-        # check error
-        if ier != 0:
-            err_handler(ier)
+        # # check error
+        # if ier != 0:
+        #     err_handler(ier)

@janden
Copy link
Collaborator

janden commented Aug 29, 2024

Thanks for this. I'll take a closer look. Both sets of lines have to be commented out then? Very strange.

@garrettwrong
Copy link
Collaborator Author

no, it appears to be the make_plan line. Could alternatively set ier=0.

@janden
Copy link
Collaborator

janden commented Aug 29, 2024

Ok. I'll continue investigating.

@lu1and10
Copy link

It seems the np.ctypeslib.load_library called in finufft 2.3.0-rc1 resets the root log level to WARNING, I guess ASPIRE init the root log level to DEBUG.

This only happens for numpy < 1.25.0, for numpy >= 1.25.0, it's good. While for python 3.8, the latest supported numpy version is 1.24.4.

Tested with the following tweaks in _finufft.py with numpy == 1.24.4, seems working. Not sure finufft should include the tweak as numpy >= 1.25.0 fixed np.ctypeslib.load_library, it does not reset log level to WARNING.

import logging
log_level = logging.getLevelName(logging.root.level)
    try:
        print("log level before: "+log_level);
        lib = np.ctypeslib.load_library(lib_name, path)
        print("log level after: "+logging.getLevelName(logging.root.level));
        break
    except OSError:
        # Paranoid, in case lib is set to something and then an exception is thrown
        lib = None
logging.root.setLevel(log_level)

janden added a commit to janden/finufft that referenced this pull request Sep 4, 2024
For older (<1.25) versions of NumPy, calling `np.ctypeslib.load_library`
results in the logging level being reset. This workaround checks for
that bug and preserves the logging level.

For more info, see discussion at

        ComputationalCryoEM/ASPIRE-Python#1168

Co-authored-by: Libin Lu <llu@flatironinstitute.org>
janden added a commit to janden/finufft that referenced this pull request Sep 4, 2024
For older (<1.25) versions of NumPy, calling `np.ctypeslib.load_library`
results in the logging level being reset. This workaround checks for
that bug and preserves the logging level.

For more info, see discussion at

        ComputationalCryoEM/ASPIRE-Python#1168

Co-authored-by: Libin Lu <llu@flatironinstitute.org>
DiamonDinoia pushed a commit to flatironinstitute/finufft that referenced this pull request Sep 4, 2024
For older (<1.25) versions of NumPy, calling `np.ctypeslib.load_library`
results in the logging level being reset. This workaround checks for
that bug and preserves the logging level.

For more info, see discussion at

        ComputationalCryoEM/ASPIRE-Python#1168

Co-authored-by: Libin Lu <llu@flatironinstitute.org>
@garrettwrong
Copy link
Collaborator Author

Cool. I'm wondering if we no longer need the different tolerance for OSX now that the compile options were change for those wheels. I'll check, but otherwise this is looking good. Thanks for the help!

@garrettwrong
Copy link
Collaborator Author

All checks passed at 496241749624174962417

squashing all the rc and debug commits down to 3 main commits.

@garrettwrong
Copy link
Collaborator Author

Merging 🎆 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extern Relating to external changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants