- 
                Notifications
    
You must be signed in to change notification settings  - Fork 26
 
Description
For the last fews days I have been working on-and-off towards fixing the GPU unit tests. Since they were last maintained a wide variety of failures have developed across the codebase.
The current state of the work is that I have tracked all the unit test failures down to the anufft (adjoint) in single precision.
The following patch will restore all the unit tests at the expense of invoking the anufft (used in evaluate) in doubles.  Well, all the unit tests except the 32bit cufinufft adjoint. So that is good.
It seems like the simulated experimental pipeline was working okay (in both single and doubles), so perhaps this appears to only effect some small problems in singles. Not sure yet.
I am focusing down to a small standalone problem, which hopefully we can tackle and fix everything up.
(gpu_utests) [gbwright@caf ASPIRE-Python]$ git diff develop | cat
diff --git a/setup.py b/setup.py
index b1eb913..b5533f8 100644
--- a/setup.py
+++ b/setup.py
@@ -47,7 +47,7 @@ setup(
     #   for example gpu packages which may not install for all users,
     #   or developer tools that are handy but not required for users.
     extras_require={
-        "gpu": ["pycuda", "cupy", "cufinufft==1.2"],
+        "gpu_102": ["pycuda", "cupy-cuda102", "cufinufft==1.2"],
         "dev": [
             "black",
             "bumpversion",
diff --git a/src/aspire/nufft/cufinufft.py b/src/aspire/nufft/cufinufft.py
index c2e83dd..56985ab 100644
--- a/src/aspire/nufft/cufinufft.py
+++ b/src/aspire/nufft/cufinufft.py
@@ -72,16 +72,20 @@ class CufinufftPlan(Plan):
             self.ntransforms,
             self.epsilon,
             1,
-            dtype=self.dtype,
+            dtype=np.float64,
             **self.adjoint_opts,
         )
 
         # Note, I store self.fourier_pts_gpu so the GPUArrray life
         #   is tied to instance, instead of this method.
         self.fourier_pts_gpu = gpuarray.to_gpu(self.fourier_pts)
-
         self._transform_plan.set_pts(*self.fourier_pts_gpu)
-        self._adjoint_plan.set_pts(*self.fourier_pts_gpu)
+
+        self.afourier_pts_gpu = self.fourier_pts_gpu
+        if self.dtype == np.float32:
+            self.afourier_pts_gpu = gpuarray.to_gpu(self.fourier_pts.astype(np.float64))
+
+        self._adjoint_plan.set_pts(*self.afourier_pts_gpu)
 
     def transform(self, signal):
         """
@@ -162,11 +166,14 @@ class CufinufftPlan(Plan):
             ), "For multiple transforms, signal stack length should match ntransforms {self.ntransforms}."
             res_shape = (self.ntransforms, *self.sz)
 
         signal_gpu = gpuarray.to_gpu(
-            np.ascontiguousarray(signal, dtype=self.complex_dtype)
+            np.ascontiguousarray(signal, dtype=np.complex128)
         )
 
-        result_gpu = gpuarray.GPUArray(res_shape, dtype=self.complex_dtype)
+        result_gpu = gpuarray.GPUArray(res_shape, dtype=np.complex128)
 
         self._adjoint_plan.execute(signal_gpu, result_gpu)