Skip to content

Dynamic Generic Reduction Implementation #53

Closed
qianfengz wants to merge 26 commits into
masterfrom
dynamic-reduce-pr
Closed

Dynamic Generic Reduction Implementation #53
qianfengz wants to merge 26 commits into
masterfrom
dynamic-reduce-pr

Conversation

@qianfengz
Copy link
Copy Markdown
Collaborator

@qianfengz qianfengz commented Jul 29, 2021

Dynamic means the Lengths of the input and output tensors for reduction are passed to the GPU kernels as run-time parameters.

Using the follow scripts for performance testing and comparing with static reduction
  • Performance Testing of dynamic reduction
#!/bin/bash

PRECISION=                             # --double,  --half

if test -n $PRECISION && test "$PRECISION" = "--half"; then
   CTYPE="-C 1"
else
   CTYPE=""
fi

if [ $# -ge 1 ] ; then
    NREPEAT=$1
else
    NREPEAT=1
fi

for op in 0 5 6 7; do
    set -x
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 1 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 2 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 3 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0,1 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 1,2 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 2,3 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0,2 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 1,3 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0,3 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0,1,2 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0,1,3 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 1,2,3 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0,2,3 -O $op $CTYPE -v 1 1 $NREPEAT
    ./host/driver_online/reduce_driver_online $PRECISION -D 64,3,280,81 -R 0,1,2,3 -O $op $CTYPE -v 1 1 $NREPEAT
    set +x
done
  • Performance testing of static reduction
#!/bin/bash

PRECISION=reduce    ## reducefp64 reducefp16

if test -n $PRECISION && test "$PRECISION" = "reducefp16"; then
   CTYPE="-C 1"
else
   CTYPE=""
fi

if [ $# -ge 1 ] ; then
    NREPEAT=$1
else
    NREPEAT=1
fi

for op in 0 5 6 7; do
    set -x
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 1 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 2 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 3 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0,1 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 1,2 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 2,3 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0,2 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 1,3 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0,3 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0,1,2 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0,1,3 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 1,2,3 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0,2,3 -O $op $CTYPE  -t 1 -i $NREPEAT
    /opt/rocm/miopen/bin/MIOpenDriver $PRECISION -D 64,3,280,81 -R 0,1,2,3 -O $op $CTYPE  -t 1 -i $NREPEAT
    set +x
done

@asroy
Copy link
Copy Markdown
Owner

asroy commented Jul 29, 2021

@qianfengz formating issue. Please use clang-format-10

@qianfengz qianfengz closed this Jul 30, 2021
@qianfengz qianfengz reopened this Jul 31, 2021
@qianfengz qianfengz requested review from asroy August 3, 2021 04:47
{
using type = T;

using rawType = typename std::remove_const<T>::type;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please limit the usage of std as much as possible,

use ck::remove_cv_t instead of std::remove_const

@asroy
Copy link
Copy Markdown
Owner

asroy commented Aug 16, 2021

@qianfengz

This PR has added support for DynamicBuffer and StaticBuffer to support read customized value for "invalid" element ROCm/composable_kernel#8.

I'm going to merge that PR soon.

Please point your PR into ROCmSoftwarePlatform/composable_kernel repo

@qianfengz qianfengz closed this Aug 18, 2021
@qianfengz qianfengz deleted the dynamic-reduce-pr branch August 18, 2021 06:20
@qianfengz qianfengz restored the dynamic-reduce-pr branch August 18, 2021 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants