Skip to content

Faster Implicit update #5

@jdonners

Description

@jdonners

Massimiliano Fatica (NVIDIA) wrote:

While I was adding the GPU path to the Implicit update routines, I found a good improvement (2x-3x) for the CPU code with a better use of the dgttrs call.

Basically, after the dgttrf call, instead of solving each vertical line:

       do ic=xstart(3),xend(3)
       do jc=xstart(2),xend(2)

!     Normalize RHS of equation

        fkl(1)= real(0.,fp_kind)
        do kc=2,nxm
         ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
         fkl(kc)=rhs(kc,jc,ic)*ackl_b
        end do
        fkl(nx)= real(0.,fp_kind)

!     Solve equation using LAPACK library

        call dgttrs('N',nx,1,amkT,ackT,apkT,appk,ipkv,fkl,nx,info)

!      Update temperature field

        do kc=2,nxm
          temp(kc,jc,ic) = temp(kc,jc,ic) + fkl(kc)
        end do

       enddo
      end do

you can solve all of them together

       nrhs=(xend(3)-xstart(3)+1)*(xend(2)-xstart(2)+1)
! Normalize RHS (but this should be moved in the main loop of the corresponding ImplicitUpdate
       do ic=xstart(3),xend(3)
         do jc=xstart(2),xend(2)
            do kc=2,nxm
              ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
              rhs(kc,jc,ic)=rhs(kc,jc,ic)*ackl_b
             end do
          end do
      end do

      call dgttrs('N',nx,nrhs,amkT,ackT,apkT,appk,ipkv,rhs,nx,info)

! You can also add OpenMP directives on these loops
       do ic=xstart(3),xend(3)
         do jc=xstart(2),xend(2)
            do kc=2,nxm
              temp(kc,jc,ic)=temp(kc,jc,ic) + rhs(kc,jc,ic)
             end do
          end do
      end do

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions