Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to realise convolution? #518

Closed
wujilingfeng opened this issue Apr 27, 2022 · 13 comments
Closed

How to realise convolution? #518

wujilingfeng opened this issue Apr 27, 2022 · 13 comments

Comments

@wujilingfeng
Copy link

wujilingfeng commented Apr 27, 2022

Below is my code

gauss_convotion::(Int,Int,Acc (Array DIM2 Double))->(Int,Int,Acc (Array DIM2 Double))-> Acc (Array DIM2 Double)
gauss_convotion (m,n,gauss_op) (w,h,image)= generate (A.constant (Z :. h :. w)) (compute_re.compute_plat )
  where
    yuanx= A.floor  $  (A.fromIntegral  $  A.constant m ::Exp Double)/2.0 ::Exp Int
    yuany= A.floor  $  (A.fromIntegral  $  A.constant n ::Exp Double)/2.0 ::Exp Int

    compute_plat::Exp DIM2 ->Acc (Array DIM2 Double)
    compute_plat index= A.generate (A.constant (Z :. n :. m)) myfun   
      where 
        (Z :. y :. x)= unlift index:: (Z:. Exp Int :. Exp Int)
        myfun :: Exp DIM2-> Exp Double 
        myfun index1= image A.! (lift $ (Z :. ( A.min (A.constant h-1)$  A.max 0 (y+y1-yuany)  ) :. (A.min (A.constant w -1) $ A.max 0 (x+x1-yuanx))) ) 
          where 
            (Z :. y1 :. x1)=unlift index1
    compute_re:: Acc (Array DIM2 Double) ->Exp Double
    compute_re pla = A.the $A.foldAll (A.+) 0 (A.zipWith (A.*) gauss_op pla)

There throwed an error
Screenshot from 2022-04-27 09-48-59
Maybe you don't need to read my code, I just want to know how to realize convolution?
Thank you!

@tomsmeding
Copy link
Member

First let me explain the error. Accelerate has a very strict separation between array computations (Acc) and scalar-level computations (Exp). You cannot perform array computations within an Exp context. This is also explained under "Avoiding nested parallelism" in the documentation. In your program, you're entering the Exp context by writing generate (A.constant (Z :. h :. w)); the operation that you then perform in that context is compute_re . compute_plat, which does array operations. This is disallowed, hence the error.

As for how to implement convolution: if the kernel size (m,n in your code) is known at Haskell compile time, then a 2-dimensional stencil (for usage, see the docs for 1-dimensional stencils) is without doubt the most runtime-efficient way of implementing this. Stencils in general (see I guess this wikipedia article, but not a super clear source) are literally made for this. However, with Accelerate's built-in stencil operations, the size of the kernel influences the type of the stencil operation, meaning that you cannot just have a size in two Int values and do a stencil of that size. Arbitrary-size built-in stencil operations have been asked for earlier (#426); they just haven't been implemented yet.

@tomsmeding
Copy link
Member

tomsmeding commented Apr 27, 2022

About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.

EDIT: as correctly observed by robbert-vdh below, if the kernel array is non-symmetric, this is technically cross-correlation, not convolution. (This is the same behaviour as the poster's original code.) Either reverse the kernel array in both dimensions, or replace ker{x,y} - {m,n} `A.div` 2 by {m,n} `A.div` 2 - ker{x,y}, to get convolution even for non-symmetric kernel arrays.

module Main where

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate (Exp, Acc, Array, DIM2, DIM3, DIM4, Z(..), (:.)(..))
import qualified Data.Array.Accelerate.Interpreter as I


-- Note that I've removed the additional Int parameters; you can use `shape` to
-- request the size of an array.
gauss_convolution :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM2 Double)
gauss_convolution gauss_op image = compute_re gauss_op (compute_plat gauss_op image)

-- Add two dimensions on the bottom. This function replaces every cell in the
-- input image with a 2-dimensional array the size of gauss_op, thereby making
-- a 4-dimensional array.
compute_plat :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM4 Double)
compute_plat gauss_op image = A.generate (A.I4 h w n m) myfun
  where
    -- Size of the gaussian kernel. Use the pattern synonyms instead of `lift`
    -- and `unlift`; they are much nicer. Alternative notation here would be
    -- `Z_ ::. n ::. m`.
    A.I2 n m = A.shape gauss_op

    -- Size of the full image
    A.I2 h w = A.shape image

    -- Now this function is going to be called for the whole 4-dimensional
    -- array, which means that it gets the index in the image as the first two
    -- components and the index in the kernel as the last two components.
    --
    -- About yuan{x,y}:  `floor ((fromIntegral x :: Double) / 2)` is precisely
    --                   what `div` does. ;)
    --
    -- kery and kerx are what you called y1 and x1.
    myfun :: Exp DIM4 -> Exp Double
    myfun (A.I4 y x kery kerx) =
        image A.! A.I2 (A.min (h - 1) (A.max 0 (y + kery - n `A.div` 2)))
                       (A.min (w - 1) (A.max 0 (x + kerx - m `A.div` 2)))

-- Fold the 2-dimensional subarrays into a scalar again, bringing the
-- 4-dimensional array down to 2 dimensions again. The first argument is the
-- gaussian kernel.
compute_re :: Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) -> Acc (Array DIM2 Double)
compute_re gauss_op pla =
    let A.I4 h w n m = A.shape pla
        -- Replicate the gaussian kernel over all positions in the image
        replicated_gauss_op :: Acc (Array DIM4 Double)
        replicated_gauss_op = A.replicate (A.I4 h w A.All_ A.All_) gauss_op
        -- Multiply the kernel elementwise with each neighbourhood in the image
        multiplied :: Acc (Array DIM4 Double)
        multiplied = A.zipWith (A.*) replicated_gauss_op pla
        -- Contract each neighbourhood (2-dimensional subarray) down to a
        -- 1-dimensional vector, so that we can fold it away in one go
        contracted :: Acc (Array DIM3 Double)
        contracted = A.reshape (A.I3 h w (n * m)) multiplied
    in -- Finally, sum this contracted vector to get the 2-dimensional result, the size of the original image.
       A.fold (A.+) 0 contracted

main :: IO ()
main = do
    let image = A.fromList (Z :. (7 :: Int) :. (8 :: Int))
                  [1, 0, 0, 0, 0, 0, 0, 1
                  ,0, 1, 0, 0, 0, 0, 1, 0
                  ,0, 0, 1, 0, 0, 1, 0, 0
                  ,0, 0, 0, 1, 1, 0, 0, 0
                  ,0, 0, 1, 0, 0, 1, 0, 0
                  ,0, 1, 0, 0, 0, 0, 1, 0
                  ,1, 0, 0, 0, 0, 0, 0, 1]
        gauss_op = A.fromList (Z :. (3 :: Int) :. (5 :: Int))
                     (map (/ 17)
                        [0, 0, 2, 0, 0
                        ,1, 3, 5, 3, 1
                        ,0, 0, 2, 0, 0])

    print $ I.runN gauss_convolution gauss_op image

@wujilingfeng
Copy link
Author

the

First let me explain the error. Accelerate has a very strict separation between array computations (Acc) and scalar-level computations (Exp). You cannot perform array computations within an Exp context. This is also explained under "Avoiding nested parallelism" in the documentation. In your program, you're entering the Exp context by writing generate (A.constant (Z :. h :. w)); the operation that you then perform in that context is compute_re . compute_plat, which does array operations. This is disallowed, hence the error.

As for how to implement convolution: if the kernel size (m,n in your code) is known at Haskell compile time, then a 2-dimensional stencil (for usage, see the docs for 1-dimensional stencils) is without doubt the most runtime-efficient way of implementing this. Stencils in general (see I guess this wikipedia article, but not a super clear source) are literally made for this. However, with Accelerate's built-in stencil operations, the size of the kernel influences the type of the stencil operation, meaning that you cannot just have a size in two Int values and do a stencil of that size. Arbitrary-size built-in stencil operations have been asked for earlier (#426); they just haven't been implemented yet.

Thanks, It's effective!

@wujilingfeng
Copy link
Author

About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.

module Main where

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate (Exp, Acc, Array, DIM2, DIM3, DIM4, Z(..), (:.)(..))
import qualified Data.Array.Accelerate.Interpreter as I


-- Note that I've removed the additional Int parameters; you can use `shape` to
-- request the size of an array.
gauss_convolution :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM2 Double)
gauss_convolution gauss_op image = compute_re gauss_op (compute_plat gauss_op image)

-- Add two dimensions on the bottom. This function replaces every cell in the
-- input image with a 2-dimensional array the size of gauss_op, thereby making
-- a 4-dimensional array.
compute_plat :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM4 Double)
compute_plat gauss_op image = A.generate (A.I4 h w n m) myfun
  where
    -- Size of the gaussian kernel. Use the pattern synonyms instead of `lift`
    -- and `unlift`; they are much nicer. Alternative notation here would be
    -- `Z_ ::. n ::. m`.
    A.I2 n m = A.shape gauss_op

    -- Size of the full image
    A.I2 h w = A.shape image

    -- Now this function is going to be called for the whole 4-dimensional
    -- array, which means that it gets the index in the image as the first two
    -- components and the index in the kernel as the last two components.
    --
    -- About yuan{x,y}:  `floor ((fromIntegral x :: Double) / 2)` is precisely
    --                   what `div` does. ;)
    --
    -- kery and kerx are what you called y1 and x1.
    myfun :: Exp DIM4 -> Exp Double
    myfun (A.I4 y x kery kerx) =
        image A.! A.I2 (A.min (h - 1) (A.max 0 (y + kery - n `A.div` 2)))
                       (A.min (w - 1) (A.max 0 (x + kerx - m `A.div` 2)))

-- Fold the 2-dimensional subarrays into a scalar again, bringing the
-- 4-dimensional array down to 2 dimensions again. The first argument is the
-- gaussian kernel.
compute_re :: Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) -> Acc (Array DIM2 Double)
compute_re gauss_op pla =
    let A.I4 h w n m = A.shape pla
        -- Replicate the gaussian kernel over all positions in the image
        replicated_gauss_op :: Acc (Array DIM4 Double)
        replicated_gauss_op = A.replicate (A.I4 h w A.All_ A.All_) gauss_op
        -- Multiply the kernel elementwise with each neighbourhood in the image
        multiplied :: Acc (Array DIM4 Double)
        multiplied = A.zipWith (A.*) replicated_gauss_op pla
        -- Contract each neighbourhood (2-dimensional subarray) down to a
        -- 1-dimensional vector, so that we can fold it away in one go
        contracted :: Acc (Array DIM3 Double)
        contracted = A.reshape (A.I3 h w (n * m)) multiplied
    in -- Finally, sum this contracted vector to get the 2-dimensional result, the size of the original image.
       A.fold (A.+) 0 contracted

main :: IO ()
main = do
    let image = A.fromList (Z :. (7 :: Int) :. (8 :: Int))
                  [1, 0, 0, 0, 0, 0, 0, 1
                  ,0, 1, 0, 0, 0, 0, 1, 0
                  ,0, 0, 1, 0, 0, 1, 0, 0
                  ,0, 0, 0, 1, 1, 0, 0, 0
                  ,0, 0, 1, 0, 0, 1, 0, 0
                  ,0, 1, 0, 0, 0, 0, 1, 0
                  ,1, 0, 0, 0, 0, 0, 0, 1]
        gauss_op = A.fromList (Z :. (3 :: Int) :. (5 :: Int))
                     (map (/ 17)
                        [0, 0, 2, 0, 0
                        ,1, 3, 5, 3, 1
                        ,0, 0, 2, 0, 0])

    print $ I.runN gauss_convolution gauss_op image

WOW! Beautiful job! I need to retry it.

@wujilingfeng
Copy link
Author

About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.

module Main where

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate (Exp, Acc, Array, DIM2, DIM3, DIM4, Z(..), (:.)(..))
import qualified Data.Array.Accelerate.Interpreter as I


-- Note that I've removed the additional Int parameters; you can use `shape` to
-- request the size of an array.
gauss_convolution :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM2 Double)
gauss_convolution gauss_op image = compute_re gauss_op (compute_plat gauss_op image)

-- Add two dimensions on the bottom. This function replaces every cell in the
-- input image with a 2-dimensional array the size of gauss_op, thereby making
-- a 4-dimensional array.
compute_plat :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM4 Double)
compute_plat gauss_op image = A.generate (A.I4 h w n m) myfun
  where
    -- Size of the gaussian kernel. Use the pattern synonyms instead of `lift`
    -- and `unlift`; they are much nicer. Alternative notation here would be
    -- `Z_ ::. n ::. m`.
    A.I2 n m = A.shape gauss_op

    -- Size of the full image
    A.I2 h w = A.shape image

    -- Now this function is going to be called for the whole 4-dimensional
    -- array, which means that it gets the index in the image as the first two
    -- components and the index in the kernel as the last two components.
    --
    -- About yuan{x,y}:  `floor ((fromIntegral x :: Double) / 2)` is precisely
    --                   what `div` does. ;)
    --
    -- kery and kerx are what you called y1 and x1.
    myfun :: Exp DIM4 -> Exp Double
    myfun (A.I4 y x kery kerx) =
        image A.! A.I2 (A.min (h - 1) (A.max 0 (y + kery - n `A.div` 2)))
                       (A.min (w - 1) (A.max 0 (x + kerx - m `A.div` 2)))

-- Fold the 2-dimensional subarrays into a scalar again, bringing the
-- 4-dimensional array down to 2 dimensions again. The first argument is the
-- gaussian kernel.
compute_re :: Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) -> Acc (Array DIM2 Double)
compute_re gauss_op pla =
    let A.I4 h w n m = A.shape pla
        -- Replicate the gaussian kernel over all positions in the image
        replicated_gauss_op :: Acc (Array DIM4 Double)
        replicated_gauss_op = A.replicate (A.I4 h w A.All_ A.All_) gauss_op
        -- Multiply the kernel elementwise with each neighbourhood in the image
        multiplied :: Acc (Array DIM4 Double)
        multiplied = A.zipWith (A.*) replicated_gauss_op pla
        -- Contract each neighbourhood (2-dimensional subarray) down to a
        -- 1-dimensional vector, so that we can fold it away in one go
        contracted :: Acc (Array DIM3 Double)
        contracted = A.reshape (A.I3 h w (n * m)) multiplied
    in -- Finally, sum this contracted vector to get the 2-dimensional result, the size of the original image.
       A.fold (A.+) 0 contracted

main :: IO ()
main = do
    let image = A.fromList (Z :. (7 :: Int) :. (8 :: Int))
                  [1, 0, 0, 0, 0, 0, 0, 1
                  ,0, 1, 0, 0, 0, 0, 1, 0
                  ,0, 0, 1, 0, 0, 1, 0, 0
                  ,0, 0, 0, 1, 1, 0, 0, 0
                  ,0, 0, 1, 0, 0, 1, 0, 0
                  ,0, 1, 0, 0, 0, 0, 1, 0
                  ,1, 0, 0, 0, 0, 0, 0, 1]
        gauss_op = A.fromList (Z :. (3 :: Int) :. (5 :: Int))
                     (map (/ 17)
                        [0, 0, 2, 0, 0
                        ,1, 3, 5, 3, 1
                        ,0, 0, 2, 0, 0])

    print $ I.runN gauss_convolution gauss_op image

I read your code, it's good job. But the process of creating a four-dimensional array consumes a lot of memory, especially when I work with large images

@tomsmeding
Copy link
Member

tomsmeding commented Apr 27, 2022

But the process of creating a four-dimensional array consumes a lot of memory, especially when I work with large images

This is an understandable worry. However, have you tried it? Does it indeed consume much more memory at runtime than expected?

The reason I'm asking is that Accelerate will not actually create ("manifest") this intermediate 4-dimensional array in memory. (At least, if all is right.) The reason is array fusion: see the "Fusion" subheading in the documentation for Acc.

See the image (I hope it's readable...) for a diagram showing how this works for the program that I wrote above. In black is the original program, in red some of the fusion rules, and in blue the fused program. Note that Accelerate can actually give you the fused program: show gauss_convolution is a String that shows what it will actually compute. Notice that there is just one fold over a delayed array, which contains an expression computing each element of the three-dimensional array. (That expression is kind of unreadable, which is unfortunate, but at least you can see that there are no other array primitives left.)

gauss_convolution_fusion

@tomsmeding tomsmeding changed the title [BUG] How to realisze convolution? How to realise convolution? Apr 27, 2022
@robbert-vdh
Copy link
Contributor

robbert-vdh commented Apr 27, 2022

About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.

<snip>

In case anyone runs into this in the future, keep in mind that does not actually compute convolution. Instead it computes cross correlation. Since the kernel is symmetrical those two will yield the same result, but you need to change the {kerx,kery} - n `A.div` 2 into n `A.div` 2 - {kerx,kery} to get convolution.

@wujilingfeng
Copy link
Author

wujilingfeng commented Apr 29, 2022

But the process of creating a four-dimensional array consumes a lot of memory, especially when I work with large images

This is an understandable worry. However, have you tried it? Does it indeed consume much more memory at runtime than expected?

The reason I'm asking is that Accelerate will not actually create ("manifest") this intermediate 4-dimensional array in memory. (At least, if all is right.) The reason is array fusion: see the "Fusion" subheading in the documentation for Acc.

See the image (I hope it's readable...) for a diagram showing how this works for the program that I wrote above. In black is the original program, in red some of the fusion rules, and in blue the fused program. Note that Accelerate can actually give you the fused program: show gauss_convolution is a String that shows what it will actually compute. Notice that there is just one fold over a delayed array, which contains an expression computing each element of the three-dimensional array. (That expression is kind of unreadable, which is unfortunate, but at least you can see that there are no other array primitives left.)

gauss_convolution_fusion

Thank you for your professional help.I solved it.
By the way , what functions should I use to extract some rows and columns of the matrix?
Do I need to call slice function repeatedly?

@tomsmeding
Copy link
Member

You can indeed use calls to slice to get a few rows and columns of a matrix; does that not work?

@wujilingfeng
Copy link
Author

The problem is that I need to call it repeatly, I just wan to sample every other row and col .

@tomsmeding
Copy link
Member

if you want, for example, the 0th, 2nd, 4th, 6th, row etc. of a matrix m :: Acc (Matrix a), why not use:

let I2 h w = shape m
in backpermute (I2 ((h + 1) `div` 2) w)         -- new shape: height goes 0->0, 1->1, 2->1, 3->2, 4->2, 5->3, etc.
               (\(I2 y x) -> m ! I2 (2 * y) x)  -- given index in new array, produce index in old array
               m                                -- array to read values from

Getting the 1st, 3rd, 5th, etc. rows is an exercise to the reader. :)

@wujilingfeng
Copy link
Author

wujilingfeng commented Apr 29, 2022

backpermute

Thank you!
This question seems to go beyond the subject, I think it's time to finish it perfectly.
Thank you once again. I will study it.

@tomsmeding
Copy link
Member

Hope this was useful :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants