Skip to content

List of GEMM+epilogues fusion needed for Transformer Engine  #674

@bghimireamd

Description

@bghimireamd

The ones we need for Transformer Engine are the following:

  1. CUBLASLT_EPILOGUE_GELU_AUX
    step 1 : matrix multiplication
    step 2 : apply gelu
    step 3 : store the result to seperate matrix (basically do matrix copy)

    • I think we can use CK's gemm_fastgelu for this.
  2. CUBLASLT_EPILOGUE_DGELU : 
        step 1 : matrix multiplication
        step 2 : apply derivative of gelu

  3. CUBLASLT_EPILOGUE_BIAS
        step 1 : matrix multiplication A( M, K ) X B (K, N) = C (M, N) : 
        step 2 : Obtain Bias Vector (1, M) eg: [ 0.1, 0.2 0.3]
        step 3 : Broadcast Bias Vector : 
                    we can simply replicate it N times along the columns, 
                    resulting in a new bias vector with dimensions (M x N)
                eg: 
                A(4, 2), B(2, 3), C(4, 3)
                Broadcast Bias:  

                |0.1 0.2 0.3|
                |0.1 0.2 0.3|
                |0.1 0.2 0.3|
                |0.1 0.2 0.3|
            C = C + Broadcast Bias

  1. CUBLASLT_EPILOGUE_BGRADB
        Apply Bias gradient to the input matrix B. The bias size corresponds to the number of columns of the matrix D. 
        The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer

  2. CUBLASLT_EPILOGUE_GELU_AUX_BIAS
        * fusion of gelu and bias

  3. CUBLASLT_EPILOGUE_DGELU_BGRAD
        * fusion of gelu and bgrad

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions