# About

Code snippet for finding closest image dimensions for which cuFFT uses faster Cooley-Tukey implementation (as opposed to Bluestein).
See https://docs.nvidia.com/cuda/cufft/index.html

Quote from there:
_Algorithms highly optimized for input sizes that can be written in the form 2 a × 3 b × 5 c × 7 d . In general the smaller the prime factor, the better the performance, i.e., powers of two are fastest._ 

#  Implementation

In [1]:
from sympy.ntheory import factorint
import warnings
import numpy as np

def is_optimal_for_cuFFT(n: int, allowed_factors) -> bool:
    factorization = factorint(n)
    if len(factorization) == 0: # factorint(1) returns empyt dict
        return False
    factors = set(factorization.keys())
    return factors.issubset(set(allowed_factors))
    
def _closest_optimal(n: int, search_next_largest: bool, allowed_factors) -> int:
    while(not is_optimal_for_cuFFT(n, allowed_factors) and n>=1):
        if search_next_largest:
            n += 1
        else:
            n -= 1
    # edge case: decreasing search with start value smaller than allowed factor
    if n < min(allowed_factors):
        
        warnings.warn(f"{n}One provided dimension is smaller than smallest allowed factor and search direction is decreasing")
        return(min(allowed_factors))
    return n

def closest_optimal(n, search_next_largest: bool=True, allowed_factors=(2,3,5,7)):
    """ Finds closest optimal array dimensions for cuFFT
    
    Parameters
    ----------
    n : iterable of integers
        Input dimensions
    search_next_largest : bool
        if True (default) search closest optimal dimensions that are larger or equal to original
        otherwise look for smaller ones. 
    allowed_factor: tuple of integers
        allowed factors in decomposition. Defaults to (2,3,5,7) which are the factors listed in 
        the cuFFT documentation. 
    
    Returns
    -------
    np.array of ints
        optimal dimensions for cuFFT
        
        
    See also
    --------
    https://docs.nvidia.com/cuda/cufft/index.html
    
    """
    n = np.asarray(n)
    scalar_input = False
    if n.ndim == 0:
        n = n[None] 
        scalar_input = True
    ret = np.array([_closest_optimal(ni, search_next_largest, allowed_factors) for ni in n])
    if scalar_input:
        return np.squeeze(ret)
    return ret

# Examples

In [2]:
# Simple case, single number
closest_optimal(123)

array(125)

In [9]:
# find a smaller optimal dimension
closest_optimal(123, search_next_largest=False)

array(120)

In [10]:
# don't allow all factors
closest_optimal(123, search_next_largest=False, allowed_factors=(2,3))

array(108)

In [11]:
# only allow a single factor
# use a comma to make it a tuple, otherwise it will throw an error!
closest_optimal(123, search_next_largest=False, allowed_factors=(2,))

array(64)

In [12]:
# apply to multiple dimensions
closest_optimal((123, 23, 615))

array([125,  24, 625])

In [13]:
# edge case, one dimension smaller than smallest factor and decreasing search should generate a warning
closest_optimal((1, 23, 615), search_next_largest=False)



array([  2,  21, 600])

In [14]:
# one dimension smaller than smallest factor and increasing search should not generate a warning
closest_optimal((1, 23, 615))

array([  2,  24, 625])

# Todo

* could allow `search_next_largest` to be an iterable of bools, to apply different strategies (rounding up/rounding down) according to dimension.
* could remove `sympy`-dependency by implementing recursive modulo tests as in `notGoodDimension` from https://github.com/dmilkie/cudaDecon/blob/master/RL-Biggs-Andrews.cpp. However, I find the explicit factorization more readable than the recursion.