RPP Tensor Voxel Support - slice and fmadd on HOST and HIP (#223)

* Remove ci * HIP Test suite changes Use pinned memory for roi pointers and image sizes Remove the HOST mem allocation and memcopy for the 2 variables * Minor fix HOST Test suite Release memory for image sizes buffers * Change name of ROI variable * Add changes to Non linear blend test cases in HIP test suite * Minor fix - Rotate affine matrix * initial working commit for merged test suite * added helper function for converting PLN3 to PKD3 in test suite * fixed the issue with CMN outputs * minor changes * Add new HIP merged test suite * Test suite merge for HIP backend * Add function to check the correctness of unit-tests output * add a function to check the correctness of HIP unit-test output * removed the dependency of hardcoded path in test suite * fixed the issue with noise outputs from HOST test suite minor cleanup in shell script files for unittests and performance tests * fix the output difference between Avx and c outputs * Add function to convert PKD3 input to PLN3 * fix errors in common test suite * Add three augmentation refrence pixel values for unit tests * minor changes in PKD3 to PLN3 conversion function removed unnecessary variables in helpers in test suite * Add rotate test cases in new test suite * Minor changes * fixed issue with HIP backend test suite changed few variable names to camelCase minor code cleanup * minor change in compare_output function * added a Map for finding augmentation name from case number * minor changes * removed the usage of device i8, f32,f16 buffers * Add goldens outputs for some augmentations * avoided the usage of tempirary buffer for HIP test suite * Made changes in HOST teat suite similar to HIP test suite removed unnecessary buffer usage in HOST test suite * minor changes * made changes to pass list of cases for test suite * fixed the issue with folder dump for case list * added support to generate kernel times for HIP test suite * minor change * fix seg faultin compare output function * minor change in compare_output function * restructure compare output function and code cleanup * add common function for opencv dump * Add common code for HOST and HIP in a common file * Add common functions for similar code in HOST and HIP backends * changes in write_output function * made changes in timer variable names converted measured time in ms * Add readme file for new test suite * combined set_strides() and set_nchw_values() functions into single function * reverted changes happended with rotate merge * readded spatter vector code renamed reference output file names * made changes to common ROI updation in test suite added function to read images from imread * updated readme * Fixed the issue with bulleting in readme * fix error in runTest file * fix errors in runTest.py in Hip backend * minor changes * fix file odering issue * change file ordering logic * minor changes in parser and validator functions * minor changes * Bug fix for HIP Kernel time calculations * added validation cases for PLN1 * fixed incorrect reference output for CMN PLN1 * Change ONE_OVER_255 precision * change refFile path for interpolation functions * Add new option to dump output array into csv file * change golden outputs for color_twist * Add Nearbyintf in Uchar store functions * Add new Option QA_mode to compare the output with golden outputs * Add new common function to set layout for descriptor pointers Change logic in compare output function * Remove function calls for augmentations with missing golden outputs * change color_cast logic for pkd3-pln3 and pkd3-pkd3 conversions change golden outputs of pkd3-pkd3 conversion of color_cast augmentation * bug fix for CMN HIP * minor changes * Fix formatting in readme * minor changes to parameter names in python * Add TurboJpeg decoder to read Input data Resolve exist_ok issue in python2 Change QA function logic * minor code cleanup * added modes of operation for test suite readme * modifed case numbers in augmentation map * removed nearbyintf() changes in rpp_hip_common.hpp * changed max case (case end) in new test suite based on current functions supported minor code cleanup * added changes to print the summary of qa tests in a text file * minor changes in README.md and runTests.py * Minor changes w.r.t turbojpeg installation in test suite CMakeLists.txt * added instructions for installing turbojpeg * Codacy fixes * Codacy fix * Codacy fix * Remove C-style pointer casting and codacy fixes * updated instructions for installing turbojpeg * removed reinterpret cast for u8 conversions * Codacy fixes * codacy fixes * Remove C style type casting * Add new option num_iterations * made changes to display the total valid cases being run in QA mode * minor formatting changes * minor changes * codacy fixes * change TurboJpeg read from BGR to RGB change reference outputs * minor changes * Turbojpeg find package mods * funcCount to str * Add new option preserve output fix tee warnings changes in readme * fix tee warnings fix PLN1 image dump issue cleared mkdir warnings * minor changes * minor changes * Add recursive file search Add new option batch Size ROI strides updation * Flag for handling dual Input cases * minor change * Changes in batchSize implementation * Minor Change * Add Validation check for DST_FOLDER path * Add Validation checks for all options in testAllScript.sh * Add sanity check for dual Input cases Set Max Dimension and Max Image Dump Replaced Fast DCT tag with Accurate DCT * Regenerate golden outputs using accurate dct Flag Add golden outputs for some new augmentations * Fix Flip golden outputs mismatch Fix PLN3 variants mismatch in QA mode * Add MAX_BATCH_SIZE check removed Augmentations function calls for failing Qa modes code cleanup * Add crop and gamma correction augmentations code cleanup * Add comments to functions in rpp_test_suite_common.h * minor change * code cleanup * minor code changes * Change roi and Image sizes for crop augmentation * Change numIterations option to numRuns Addressed PR comments * Add turboJpeg header to update maxHeight and maxWidth values * Change the performance Timings logic * minor change to support u8_f16 and u8_f32 cases * Regenerate LUT golden outputs with ACCURATE_DCT tag * Minor code changes * Update testAllScript.sh in HIP - quotes to prevent globbing * Update testAllScript.sh in HOST - quotes to prevent globbing * Add initial nii file sample * Add initial mods to read nii files * test suite update for voxel processing * Update readme on test_suite preReqs * Add system command for gif creation * Rearrange cmake * RPP integration updates * Add fmadd_scalar on hip * Fix RPP_ERROR_INVALID_SRC_DATA_TYPE * test_suite hip update for voxel processing support with RPP * remove first version * Fix rois for correct 3D xyz slicing * Add slice hip implementations * Add slice support in test suite hip * Add sample 3d voxel kernel outputs to readme * mods for clarity * mods for clarity * Add line breaks * HOST test suite update for voxel processing * Initial commit - Implements PLN1 Fmadd Kernel * Add dependencies for fmadd kernel * Implement NDHWC variant for Fmadd Includes testsuite changes to support 3 channels * Implement Slice HOST Kernel Includes testsuite changes * Fix NCDHW variant for Slice * Cleanup * Fix NDHWC variant * Fix stride used for NDHWC * Fix NDHWC layout handling in testsuite Temporarily converts pln3 inputs into pkd3 inputs later stores them as pln3 after processing * Add sample input .nii file Also fixes build error in testsuite * Fix NDHWC layout for fmadd and slice Also includes fixes in voxel testsuite * Optimize slice using memcpy for HOST * Add QA test support Also includes golden outputs * Change enum name * Add Voxel unittest change testSuite CMakeLists * Add Doxygen Voxel augmentations * minor change * Add readme for Voxel test suite * Cleanup Includes modification in function naming for fmadd operation * Modify HIP testsuite * Fix output dump issue in HIP and profiler logs * Move __AVX2__ flag * Changes to remove localThreads definitions, add _hip to kernel names * Fix codacy warnings * Fix conflicting ROI types in API between HIP and HOST Also includes U8 support for slice * resolved issue with QA mode after U8 addition * made changes to display the exact variant being run in QA mode and performance test mode * minor change * resolved issue with unit test mode changed few variables from snake_case to camel case * reset DEBUG_MODE flag * resolved issue with HIP profiler tests * Update docs Also adds error code for invalid datatype for Slice kernel * CMakeLists - Version Update 1.5.0 - TOT Version * CHANGELOG Updates Version 1.5.0 placeholder * Update rppdefs.h for comments on2D/ 3D types * Rename to fused_multiply_add_scalar * Change names of ref outputs * Fix host test suite cmake * Add Voxel tests for ctest and CI * Remove boost deps and change name fmadd to fused_multiply_add_scalar * Add project name to remove warning * Add scriptPath variable usage to make paths generic for CI * Move CHECK to header * Add C++17 warning fix * Add clarity in final QA result display - match voxel tests with other tensor tests * Build fixes * Fix merge issue of double call to set_max_dimensions * Add clarity on QA test final result * Add references for sample nii image usage * Remove tensor voxel slice augmentation output sample from main ReadMe * Codacy fix * Update README.md - Codacy Fix * Update CMakeLists.txt bugfix for HOST/HIP tests * Update CMakeLists.txt - parenthesis fix * removed redundant condition checks added seperate functions for running unit tests and performance tests * Pass variable in function * CTest - Voxel Test Updates * CI - Update common.groovy Add audio test support * CI - Update common.groovy * Codacy fix * Update CMakeLists.txt - Fix Failure * Remove boost deps * Cmake changes --------- Co-authored-by: fiona-gladwin <fionagladwin@multicorewareinc.com> Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com> Co-authored-by: sampath1117 <89194005+sampath1117@users.noreply.github.com> Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com>
ROCm · Nov 21, 2023 · fd59a91 · fd59a91
1 parent 1d5058d
commit fd59a91
Show file tree

Hide file tree

Showing 47 changed files with 3,878 additions and 44 deletions.
diff --git a/.Doxyfile b/.Doxyfile
@@ -967,6 +967,7 @@ INPUT                  = README.md \
                         include/rppt_tensor_geometric_augmentations.h \
                         include/rppt_tensor_morphological_operations.h \
                         include/rppt_tensor_statistical_operations.h \
+                        include/rppt_tensor_arithmetic_operations.h
                         include/rppt_tensor_audio_augmentations.h
 
 

diff --git a/.gitignore b/.gitignore
@@ -12,6 +12,7 @@ hip_build/
 OUTPUT_IMAGES*
 OUTPUT_PERFORMANCE*
 QA_RESULTS*
+OUTPUT_VOXEL*
 doxygen_output*
 
 # documentation artifacts

diff --git a/.jenkins/common.groovy b/.jenkins/common.groovy
@@ -9,19 +9,22 @@ def runCompileCommand(platform, project, jobName, boolean debug=false, boolean s
     String backend = 'HIP'
     String enableSCL = 'echo build-rpp'
     String enableAudioTesting = 'echo audio-tests-not-supported'
+    String enableVoxelTesting = 'echo voxel-tests-not-supported'
 
     if (platform.jenkinsLabel.contains('centos')) {
         backend = 'CPU'
         enableSCL = 'source scl_source enable llvm-toolset-7'
     }
     else if (platform.jenkinsLabel.contains('ubuntu')) {
         enableAudioTesting = 'sudo apt-get install -y libsndfile1-dev'
+        enableVoxelTesting = '(git clone https://github.com/NIFTI-Imaging/nifti_clib.git; cd nifti_clib; mkdir build; cd build; cmake ../; sudo make -j$nproc install)'
         if (platform.jenkinsLabel.contains('ubuntu20')) {
             backend = 'OCL'
         }
     }
     else if (platform.jenkinsLabel.contains('rhel')) {
         enableAudioTesting = 'sudo yum install -y libsndfile-devel'
+        enableVoxelTesting = '(git clone https://github.com/NIFTI-Imaging/nifti_clib.git; cd nifti_clib; mkdir build; cd build; cmake ../; sudo make -j$nproc install)'
     }
 
 
@@ -36,6 +39,7 @@ def runCompileCommand(platform, project, jobName, boolean debug=false, boolean s
                 mkdir -p build/${buildTypeDir} && cd build/${buildTypeDir}
                 ${enableSCL}
                 ${enableAudioTesting}
+                ${enableVoxelTesting}
                 cmake -DBACKEND=${backend} ${buildTypeArg} ../..
                 make -j\$(nproc)
                 sudo make install

diff --git a/README.md b/README.md
@@ -16,6 +16,12 @@ The latest RPP release is: [![GitHub tag (latest SemVer)](https://img.shields.io
 
 <p align="center"><img width="90%" src="https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/raw/master/docs/data/supported_functionalities_samples.jpg" /></p>
 
+### Supported 3D Functionalities Samples
+
+Input<br>(nifti1 .nii medical image) | fused_multiply_add_scalar<br>(brightened 3D image)
+:-------------------------:|:-------------------------:
+![](docs/data/niftiInput.gif)  |  ![](docs/data/niftiOutputBrightened.gif)
+
 ## Documentation
 
 You can build our documentation locally using the following code:
@@ -163,7 +169,52 @@ To use RPP, you must have installed the following:
   sudo apt-get install libsndfile1-dev
   ```
 
-## Build and install RPP
+* Imagemagick
+  ```
+  sudo apt-get install imagemagick
+  ```
+
+* Nifti-Imaging nifti_clib
+  ```
+  git clone git@github.com:NIFTI-Imaging/nifti_clib.git
+  cd nifti_clib
+  mkdir build
+  cd build
+  cmake ..
+  sudo make -j$nproc install
+  ```
+
+## Build & Install RPP
+
+The ROCm Performance Primitives (RPP) library has support for three backends: HIP, OpenCL, and CPU:
+
+* Building RPP with the **HIP** backend **(default)**:
+```
+$ git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp.git
+$ mkdir build && cd build
+$ cmake -DBACKEND=HIP ../rpp
+$ make -j8
+$ sudo make install
+```
+
+* Building RPP with the **OPENCL** backend:
+```
+$ git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp.git
+$ mkdir build && cd build
+$ cmake -DBACKEND=OCL ../rpp
+$ make -j8
+$ sudo make install
+```
+
+* Building RPP with the **CPU** backend:
+```
+$ git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp.git
+$ mkdir build && cd build
+$ cmake -DBACKEND=CPU ../rpp
+$ make -j8
+$ sudo make install
+```
+## Test Functionalities
 
 To build and install RPP, run the code shown for your backend:
 

diff --git a/docs/data/niftiInput.gif b/docs/data/niftiInput.gif
diff --git a/docs/data/niftiOutputBrightened.gif b/docs/data/niftiOutputBrightened.gif
diff --git a/docs/data/niftiOutputBrightenedSlicedXYZ.gif b/docs/data/niftiOutputBrightenedSlicedXYZ.gif
diff --git a/docs/data/niftiOutputSlicedXZ.gif b/docs/data/niftiOutputSlicedXZ.gif
diff --git a/include/rppdefs.h b/include/rppdefs.h
@@ -35,10 +35,12 @@ THE SOFTWARE.
 #include <CL/cl.h>
 #endif
 
-/*! \brief 8 bit unsigned char minimum \ingroup group_rppdefs \page subpage_rpp asdfsf*/
+/*! \brief 8 bit unsigned char minimum \ingroup group_rppdefs \page subpage_rpp */
 #define RPP_MIN_8U      ( 0 )
-/*! \brief 8 bit unsigned char maximum \ingroup group_rppdefs \page subpage_rppi asdfsfasdfds */
+/*! \brief 8 bit unsigned char maximum \ingroup group_rppdefs \page subpage_rppi */
 #define RPP_MAX_8U      ( 255 )
+/*! \brief RPP maximum dimensions in tensor \ingroup group_rppdefs \page subpage_rppt */
+#define RPPT_MAX_DIMS   ( 5 )
 
 const float ONE_OVER_6 = 1.0f / 6;
 const float ONE_OVER_3 = 1.0f / 3;
@@ -99,15 +101,15 @@ typedef enum
     /*! \brief Invalid dst tensor layout. (Needs to adhere to function specification.) \ingroup group_rppdefs */
     RPP_ERROR_INVALID_DST_LAYOUT        = -10,
     /*! \brief Invalid src tensor datatype. (Needs to adhere to function specification.) \ingroup group_rppdefs */
-    RPP_ERROR_INVALID_SRC_DATATYPE     = -11,
+    RPP_ERROR_INVALID_SRC_DATATYPE      = -11,
     /*! \brief Invalid dst tensor datatype. (Needs to adhere to function specification.) \ingroup group_rppdefs */
-    RPP_ERROR_INVALID_DST_DATATYPE     = -12,
+    RPP_ERROR_INVALID_DST_DATATYPE      = -12,
     /*! \brief Invalid src/dst tensor datatype. (Needs to adhere to function specification.) \ingroup group_rppdefs */
-    RPP_ERROR_INVALID_SRC_OR_DST_DATATYPE      = -13,
+    RPP_ERROR_INVALID_SRC_OR_DST_DATATYPE       = -13,
     /*! \brief Insufficient dst buffer length provided. (Needs to adhere to function specification.) \ingroup group_rppdefs */
     RPP_ERROR_INSUFFICIENT_DST_BUFFER_LENGTH    = -14,
     /*! \brief Invalid datatype \ingroup group_rppdefs */
-    RPP_ERROR_INVALID_DATATYPE          = -15,
+    RPP_ERROR_INVALID_PARAMETER_DATATYPE        = -15,
     /*! \brief Not enough memory \ingroup group_rppdefs */
     RPP_ERROR_NOT_ENOUGH_MEMORY         = -16,
     /*! \brief Out of bound source ROI \ingroup group_rppdefs */
@@ -297,6 +299,13 @@ typedef struct
 /*! \brief RPPI Image 2D Rectangle (XYWH format) type struct
  * \ingroup group_rppdefs
  */
+typedef struct
+{
+    int x;
+    int y;
+    int z;
+} RppiPoint3D;
+
 typedef struct
 {
     int x;
@@ -335,21 +344,29 @@ typedef enum
 typedef enum
 {
     NCHW,
-    NHWC
+    NHWC,
+    NCDHW,
+    NDHWC
 } RpptLayout;
 
 /*! \brief RPPT Tensor 2D ROI type enum
  * \ingroup group_rppdefs
  */
 typedef enum
 {
-    LTRB,
-    XYWH
+    LTRB,    // Left-Top-Right-Bottom
+    XYWH     // X-Y-Width-Height
 } RpptRoiType;
 
 /*! \brief RPPT Tensor subpixel layout type enum
  * \ingroup group_rppdefs
  */
+typedef enum
+{
+    LTFRBB,    // Left-Top-Front-Right-Bottom-Back
+    XYZWHD     // X-Y-Z-Width-Height-Depth
+} RpptRoi3DType;
+
 typedef enum
 {
     RGBtype,
@@ -374,10 +391,19 @@ typedef enum
  */
 typedef struct
 {
-    RppiPoint lt, rb;
+    RppiPoint lt, rb;    // Left-Top point and Right-Bottom point
 
 } RpptRoiLtrb;
 
+/*! \brief RPPT Tensor 3D ROI LTFRBB struct
+ * \ingroup group_rppdefs
+ */
+typedef struct
+{
+    RppiPoint3D ltf, rbb; // Left-Top-Front point and Right-Bottom-Back point
+
+} RpptRoiLtfrbb;
+
 /*! \brief RPPT Tensor 2D ROI XYWH struct
  * \ingroup group_rppdefs
  */
@@ -388,16 +414,36 @@ typedef struct
 
 } RpptRoiXywh;
 
+/*! \brief RPPT Tensor 3D ROI XYZWHD struct
+ * \ingroup group_rppdefs
+ */
+typedef struct
+{
+    RppiPoint3D xyz;
+    int roiWidth, roiHeight, roiDepth;
+
+} RpptRoiXyzwhd;
+
 /*! \brief RPPT Tensor 2D ROI union
  * \ingroup group_rppdefs
  */
 typedef union
 {
-    RpptRoiLtrb ltrbROI;
-    RpptRoiXywh xywhROI;
+    RpptRoiLtrb ltrbROI;    // ROI defined as Left-Top-Right-Bottom
+    RpptRoiXywh xywhROI;    // ROI defined as X-Y-Width-Height
 
 } RpptROI, *RpptROIPtr;
 
+/*! \brief RPPT Tensor 3D ROI union
+ * \ingroup group_rppdefs
+ */
+typedef union
+{
+    RpptRoiLtfrbb ltfrbbROI;    // ROI defined as Left-Top-Front-Right-Bottom-Back
+    RpptRoiXyzwhd xyzwhdROI;    // ROI defined as X-Y-Z-Width-Height-Depth
+
+} RpptROI3D, *RpptROI3DPtr;
+
 /*! \brief RPPT Tensor strides type struct
  * \ingroup group_rppdefs
  */
@@ -417,14 +463,24 @@ typedef struct
     RppSize_t numDims;
     Rpp32u offsetInBytes;
     RpptDataType dataType;
-    RpptLayout layout;
     Rpp32u n, c, h, w;
     RpptStrides strides;
+    RpptLayout layout;
 } RpptDesc, *RpptDescPtr;
 
 /*! \brief RPPT Tensor 8-bit uchar RGB type struct
  * \ingroup group_rppdefs
  */
+typedef struct
+{
+    RppSize_t numDims;
+    Rpp32u offsetInBytes;
+    RpptDataType dataType;
+    Rpp32u dims[RPPT_MAX_DIMS];
+    Rpp32u strides[RPPT_MAX_DIMS];
+    RpptLayout layout;
+} RpptGenericDesc, *RpptGenericDescPtr;
+
 typedef struct
 {
     Rpp8u R;

diff --git a/include/rppt.h b/include/rppt.h
@@ -41,6 +41,7 @@ extern "C" {
 #include "rppt_tensor_filter_augmentations.h"
 #include "rppt_tensor_geometric_augmentations.h"
 #include "rppt_tensor_morphological_operations.h"
+#include "rppt_tensor_arithmetic_operations.h"
 #include "rppt_tensor_statistical_operations.h"
 #include "rppt_tensor_audio_augmentations.h"
 

diff --git a/include/rppt_tensor_arithmetic_operations.h b/include/rppt_tensor_arithmetic_operations.h
@@ -0,0 +1,89 @@
+/*
+Copyright (c) 2019 - 2023 Advanced Micro Devices, Inc. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+*/
+
+#ifndef RPPT_TENSOR_ARITHMETIC_OPERATIONS_H
+#define RPPT_TENSOR_ARITHMETIC_OPERATIONS_H
+
+/*!
+ * \file
+ * \brief RPPT Tensor Arithmetic operation Functions.
+ *
+ * \defgroup group_tensor_arithmetic Operations: AMD RPP Tensor Arithmetic Operations
+ * \brief Tensor Color Augmentations.
+ */
+
+#include "rpp.h"
+#include "rppdefs.h"
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*! \brief  Fmadd augmentation HOST
+ * \details This function performs the fmadd operation on a batch of 4D tensors.
+ *          It multiplies each element of the source tensor by a corresponding element in the 'mulTensor',
+ *          adds a corresponding element from the 'addTensor', and stores the result in the destination tensor.
+ *          Support added for f32 -> f32 dataype.
+ * \param [in] srcPtr source tensor memory
+ * \param[in] srcGenericDescPtr source tensor descriptor
+ * \param[out] dstPtr destination tensor memory
+ * \param[in] dstGenericDescPtr destination tensor descriptor
+ * \param[in] mulTensor mul values for fmadd calculation (1D tensor of batchSize Rpp32f values)
+ * \param[in] addTensor add values for fmadd calculation (1D tensor of batchSize Rpp32f values)
+ * \param[in] roiGenericPtrSrc ROI data for each image in source tensor (tensor of batchSize RpptRoiGeneric values)
+ * \param[in] roiType ROI type used (RpptRoi3DType::XYZWHD or RpptRoi3DType::LTFRBB)
+ * \param [in] rppHandle Host-handle
+ * \return <tt> RppStatus enum</tt>.
+ * \returns RPP_SUCCESS <tt>\ref RppStatus</tt> on successful completion.
+ * Else return RPP_ERROR
+ * \ingroup group_tensor_arithmetic
+ */
+RppStatus rppt_fused_multiply_add_scalar_host(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32f *mulTensor, Rpp32f *addTensor, RpptROI3DPtr roiGenericPtrSrc, RpptRoi3DType roiType, rppHandle_t rppHandle);
+
+
+/*! \brief  Fmadd augmentation GPU
+ * \details This function performs the fmadd operation on a batch of 4D tensors.
+ *          It multiplies each element of the source tensor by a corresponding element in the 'mulTensor',
+ *          adds a corresponding element from the 'addTensor', and stores the result in the destination tensor.
+ *          Support added for f32 -> f32 dataype.
+ * \param [in] srcPtr source tensor memory
+ * \param[in] srcGenericDescPtr source tensor descriptor
+ * \param[out] dstPtr destination tensor memory
+ * \param[in] dstGenericDescPtr destination tensor descriptor
+ * \param[in] mulTensor mul values for fmadd calculation (1D tensor of batchSize Rpp32f values)
+ * \param[in] addTensor add values for fmadd calculation (1D tensor of batchSize Rpp32f values)
+ * \param[in] roiGenericPtrSrc ROI data for each image in source tensor (tensor of batchSize RpptRoiGeneric values)
+ * \param[in] roiType ROI type used (RpptRoi3DType::XYZWHD or RpptRoi3DType::LTFRBB)
+ * \param [in] rppHandle Hip-handle
+ * \return <tt> RppStatus enum</tt>.
+ * \returns RPP_SUCCESS <tt>\ref RppStatus</tt> on successful completion.
+ * Else return RPP_ERROR
+ * \ingroup group_tensor_arithmetic
+ */
+
+#ifdef GPU_SUPPORT
+RppStatus rppt_fused_multiply_add_scalar_gpu(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32f *mulTensor, Rpp32f *addTensor, RpptROI3DPtr roiGenericPtrSrc, RpptRoi3DType roiType, rppHandle_t rppHandle);
+#endif // GPU_SUPPORT
+
+#ifdef __cplusplus
+}
+#endif
+#endif // RPPT_TENSOR_ARITHMETIC_OPERATIONS_H