Skip to content

Commit

Permalink
RPP Tensor Voxel Support - slice and fmadd on HOST and HIP (#223)
Browse files Browse the repository at this point in the history
* Remove ci

* HIP Test suite changes

Use pinned memory for roi pointers and image sizes
Remove the HOST mem allocation and memcopy for the 2 variables

* Minor fix HOST Test suite

Release memory for image sizes buffers

* Change name of ROI variable

* Add changes to Non linear blend test cases in HIP test suite

* Minor fix - Rotate affine matrix

* initial working commit for merged test suite

* added helper function for converting PLN3 to PKD3 in test suite

* fixed the issue with CMN outputs

* minor changes

* Add new HIP merged test suite

* Test suite merge for HIP backend

* Add function to check the correctness of unit-tests output

* add a function to check the correctness of HIP unit-test output

* removed the dependency of hardcoded path in test suite

* fixed the issue with noise outputs from HOST test suite

minor cleanup in shell script files for unittests and performance tests

* fix the output difference between Avx and c outputs

* Add function to convert PKD3 input to PLN3

* fix errors in common test suite

* Add three augmentation refrence pixel values for unit tests

* minor changes in PKD3 to PLN3 conversion function

removed unnecessary variables in helpers in test suite

* Add rotate test cases in new test suite

* Minor changes

* fixed issue with HIP backend test suite

changed few variable names to camelCase

minor code cleanup

* minor change in compare_output function

* added a Map for finding augmentation name from case number

* minor changes

* removed the usage of device i8, f32,f16 buffers

* Add goldens outputs for some augmentations

* avoided the usage of tempirary buffer for HIP test suite

* Made changes in HOST teat suite similar to HIP test suite

removed unnecessary buffer usage in HOST test suite

* minor changes

* made changes to pass list of cases for test suite

* fixed the issue with folder dump for case list

* added support to generate kernel times for HIP test suite

* minor change

* fix seg faultin compare output function

* minor change in compare_output function

* restructure compare output function and code cleanup

* add common function for opencv dump

* Add common code for HOST and HIP in a common file

* Add common functions for similar code in HOST and HIP backends

* changes in write_output function

* made changes in timer variable names

converted measured time in ms

* Add readme file for new test suite

* combined set_strides() and set_nchw_values() functions into single function

* reverted changes happended with rotate merge

* readded spatter vector code

renamed reference output file names

* made changes to common ROI updation in test suite

added function to read images from imread

* updated readme

* Fixed the issue with bulleting in readme

* fix error in runTest file

* fix errors in runTest.py in Hip backend

* minor changes

* fix file odering issue

* change file ordering logic

* minor changes in parser and validator functions

* minor changes

* Bug fix for HIP Kernel time calculations

* added validation cases for PLN1

* fixed incorrect reference output for CMN PLN1

* Change ONE_OVER_255 precision

* change refFile path for interpolation functions

* Add new option to dump output array into csv file

* change golden outputs for color_twist

* Add Nearbyintf in Uchar store functions

* Add new Option QA_mode to compare the output with golden outputs

* Add new common function to set layout for descriptor pointers
Change logic in compare output function

* Remove function calls for augmentations with missing golden outputs

* change color_cast logic for pkd3-pln3 and pkd3-pkd3 conversions
change golden outputs of pkd3-pkd3 conversion of color_cast augmentation

* bug fix for CMN HIP

* minor changes

* Fix formatting in readme

* minor changes to parameter names in python

* Add TurboJpeg decoder to read Input data
Resolve exist_ok issue in python2
Change QA function logic

* minor code cleanup

* added modes of operation for test suite readme

* modifed case numbers in augmentation map

* removed nearbyintf() changes in rpp_hip_common.hpp

* changed max case (case end) in new test suite based on current functions supported

minor code cleanup

* added changes to print the summary of qa tests in a text file

* minor changes in README.md and runTests.py

* Minor changes w.r.t turbojpeg installation in test suite CMakeLists.txt

* added instructions for installing turbojpeg

* Codacy fixes

* Codacy fix

* Codacy fix

* Remove C-style pointer casting and codacy fixes

* updated instructions for installing turbojpeg

* removed reinterpret cast for u8 conversions

* Codacy fixes

* codacy fixes

* Remove C style type casting

* Add new option num_iterations

* made changes to display the total valid cases being run in QA mode

* minor formatting changes

* minor changes

* codacy fixes

* change TurboJpeg read from BGR to RGB
change reference outputs

* minor changes

* Turbojpeg find package mods

* funcCount to str

* Add new option preserve output
fix tee warnings
changes in readme

* fix tee warnings
fix PLN1 image dump issue
cleared mkdir warnings

* minor changes

* minor changes

* Add recursive file search
Add new option batch Size
ROI strides updation

* Flag for handling dual Input cases

* minor change

* Changes in batchSize implementation

* Minor Change

* Add Validation check for DST_FOLDER path

* Add Validation checks for all options in testAllScript.sh

* Add sanity check for dual Input cases
Set Max Dimension and Max Image Dump
Replaced Fast DCT tag with Accurate DCT

* Regenerate golden outputs using accurate dct Flag
Add golden outputs for some new augmentations

* Fix Flip golden outputs mismatch
Fix PLN3 variants mismatch in QA mode

* Add MAX_BATCH_SIZE check
removed Augmentations function calls for failing Qa modes
code cleanup

* Add crop and gamma correction augmentations
code cleanup

* Add comments to functions in rpp_test_suite_common.h

* minor change

* code cleanup

* minor code changes

* Change roi and Image sizes for crop augmentation

* Change numIterations option to numRuns
Addressed PR comments

* Add turboJpeg header to update maxHeight and maxWidth values

* Change the performance Timings logic

* minor change to support u8_f16 and u8_f32 cases

* Regenerate LUT golden outputs with ACCURATE_DCT tag

* Minor code changes

* Update testAllScript.sh in HIP - quotes to prevent globbing

* Update testAllScript.sh in HOST - quotes to prevent globbing

* Add initial nii file sample

* Add initial mods to read nii files

* test suite update for voxel processing

* Update readme on test_suite preReqs

* Add system command for gif creation

* Rearrange cmake

* RPP integration updates

* Add fmadd_scalar on hip

* Fix RPP_ERROR_INVALID_SRC_DATA_TYPE

* test_suite hip update for voxel processing support with RPP

* remove first version

* Fix rois for correct 3D xyz slicing

* Add slice hip implementations

* Add slice support in test suite hip

* Add sample 3d voxel kernel outputs to readme

* mods for clarity

* mods for clarity

* Add line breaks

* HOST test suite update for voxel processing

* Initial commit - Implements PLN1 Fmadd Kernel

* Add dependencies for fmadd kernel

* Implement NDHWC variant for Fmadd

Includes testsuite changes to support 3 channels

* Implement Slice HOST Kernel

Includes testsuite changes

* Fix NCDHW variant for Slice

* Cleanup

* Fix NDHWC variant

* Fix stride used for NDHWC

* Fix NDHWC layout handling in testsuite

Temporarily converts pln3 inputs into pkd3 inputs
later stores them as pln3 after processing

* Add sample input .nii file

Also fixes build error in testsuite

* Fix NDHWC layout for fmadd and slice

Also includes fixes in voxel testsuite

* Optimize slice using memcpy for HOST

* Add QA test support

Also includes golden outputs

* Change enum name

* Add Voxel unittest
change testSuite CMakeLists

* Add Doxygen Voxel augmentations

* minor change

* Add readme for Voxel test suite

* Cleanup

Includes modification in function naming for fmadd operation

* Modify HIP testsuite

* Fix output dump issue in HIP and profiler logs

* Move __AVX2__ flag

* Changes to remove localThreads definitions, add _hip to kernel names

* Fix codacy warnings

* Fix conflicting ROI types in API between HIP and HOST

Also includes U8 support for slice

* resolved issue with QA mode after U8 addition

* made changes to display the exact variant being run in QA mode and performance test mode

* minor change

* resolved issue with unit test mode

changed few variables from snake_case to camel case

* reset DEBUG_MODE flag

* resolved issue with HIP profiler tests

* Update docs

Also adds error code for invalid datatype for Slice kernel

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* Update rppdefs.h for comments on2D/ 3D types

* Rename to fused_multiply_add_scalar

* Change names of ref outputs

* Fix host test suite cmake

* Add Voxel tests for ctest and CI

* Remove boost deps and change name fmadd to fused_multiply_add_scalar

* Add project name to remove warning

* Add scriptPath variable usage to make paths generic for CI

* Move CHECK to header

* Add C++17 warning fix

* Add clarity in final QA result display - match voxel tests with other tensor tests

* Build fixes

* Fix merge issue of double call to set_max_dimensions

* Add clarity on QA test final result

* Add references for sample nii image usage

* Remove tensor voxel slice augmentation output sample from main ReadMe

* Codacy fix

* Update README.md - Codacy Fix

* Update CMakeLists.txt bugfix for HOST/HIP tests

* Update CMakeLists.txt - parenthesis fix

* removed redundant condition checks

added seperate functions for running unit tests and performance tests

* Pass variable in function

* CTest - Voxel Test Updates

* CI - Update common.groovy

Add audio test support

* CI - Update common.groovy

* Codacy fix

* Update CMakeLists.txt - Fix Failure

* Remove boost deps

* Cmake changes

---------

Co-authored-by: fiona-gladwin <fionagladwin@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: sampath1117 <89194005+sampath1117@users.noreply.github.com>
Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com>
  • Loading branch information
9 people committed Nov 21, 2023
1 parent 1d5058d commit fd59a91
Show file tree
Hide file tree
Showing 47 changed files with 3,878 additions and 44 deletions.
1 change: 1 addition & 0 deletions .Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -967,6 +967,7 @@ INPUT = README.md \
include/rppt_tensor_geometric_augmentations.h \
include/rppt_tensor_morphological_operations.h \
include/rppt_tensor_statistical_operations.h \
include/rppt_tensor_arithmetic_operations.h
include/rppt_tensor_audio_augmentations.h


Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ hip_build/
OUTPUT_IMAGES*
OUTPUT_PERFORMANCE*
QA_RESULTS*
OUTPUT_VOXEL*
doxygen_output*

# documentation artifacts
Expand Down
4 changes: 4 additions & 0 deletions .jenkins/common.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,22 @@ def runCompileCommand(platform, project, jobName, boolean debug=false, boolean s
String backend = 'HIP'
String enableSCL = 'echo build-rpp'
String enableAudioTesting = 'echo audio-tests-not-supported'
String enableVoxelTesting = 'echo voxel-tests-not-supported'

if (platform.jenkinsLabel.contains('centos')) {
backend = 'CPU'
enableSCL = 'source scl_source enable llvm-toolset-7'
}
else if (platform.jenkinsLabel.contains('ubuntu')) {
enableAudioTesting = 'sudo apt-get install -y libsndfile1-dev'
enableVoxelTesting = '(git clone https://github.com/NIFTI-Imaging/nifti_clib.git; cd nifti_clib; mkdir build; cd build; cmake ../; sudo make -j$nproc install)'
if (platform.jenkinsLabel.contains('ubuntu20')) {
backend = 'OCL'
}
}
else if (platform.jenkinsLabel.contains('rhel')) {
enableAudioTesting = 'sudo yum install -y libsndfile-devel'
enableVoxelTesting = '(git clone https://github.com/NIFTI-Imaging/nifti_clib.git; cd nifti_clib; mkdir build; cd build; cmake ../; sudo make -j$nproc install)'
}


Expand All @@ -36,6 +39,7 @@ def runCompileCommand(platform, project, jobName, boolean debug=false, boolean s
mkdir -p build/${buildTypeDir} && cd build/${buildTypeDir}
${enableSCL}
${enableAudioTesting}
${enableVoxelTesting}
cmake -DBACKEND=${backend} ${buildTypeArg} ../..
make -j\$(nproc)
sudo make install
Expand Down
53 changes: 52 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,12 @@ The latest RPP release is: [![GitHub tag (latest SemVer)](https://img.shields.io

<p align="center"><img width="90%" src="https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp/raw/master/docs/data/supported_functionalities_samples.jpg" /></p>

### Supported 3D Functionalities Samples

Input<br>(nifti1 .nii medical image) | fused_multiply_add_scalar<br>(brightened 3D image)
:-------------------------:|:-------------------------:
![](docs/data/niftiInput.gif) | ![](docs/data/niftiOutputBrightened.gif)

## Documentation

You can build our documentation locally using the following code:
Expand Down Expand Up @@ -163,7 +169,52 @@ To use RPP, you must have installed the following:
sudo apt-get install libsndfile1-dev
```

## Build and install RPP
* Imagemagick
```
sudo apt-get install imagemagick
```

* Nifti-Imaging nifti_clib
```
git clone git@github.com:NIFTI-Imaging/nifti_clib.git
cd nifti_clib
mkdir build
cd build
cmake ..
sudo make -j$nproc install
```

## Build & Install RPP

The ROCm Performance Primitives (RPP) library has support for three backends: HIP, OpenCL, and CPU:

* Building RPP with the **HIP** backend **(default)**:
```
$ git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp.git
$ mkdir build && cd build
$ cmake -DBACKEND=HIP ../rpp
$ make -j8
$ sudo make install
```

* Building RPP with the **OPENCL** backend:
```
$ git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp.git
$ mkdir build && cd build
$ cmake -DBACKEND=OCL ../rpp
$ make -j8
$ sudo make install
```

* Building RPP with the **CPU** backend:
```
$ git clone https://github.com/GPUOpen-ProfessionalCompute-Libraries/rpp.git
$ mkdir build && cd build
$ cmake -DBACKEND=CPU ../rpp
$ make -j8
$ sudo make install
```
## Test Functionalities

To build and install RPP, run the code shown for your backend:

Expand Down
Binary file added docs/data/niftiInput.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/data/niftiOutputBrightened.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/data/niftiOutputBrightenedSlicedXYZ.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/data/niftiOutputSlicedXZ.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
82 changes: 69 additions & 13 deletions include/rppdefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,12 @@ THE SOFTWARE.
#include <CL/cl.h>
#endif

/*! \brief 8 bit unsigned char minimum \ingroup group_rppdefs \page subpage_rpp asdfsf*/
/*! \brief 8 bit unsigned char minimum \ingroup group_rppdefs \page subpage_rpp */
#define RPP_MIN_8U ( 0 )
/*! \brief 8 bit unsigned char maximum \ingroup group_rppdefs \page subpage_rppi asdfsfasdfds */
/*! \brief 8 bit unsigned char maximum \ingroup group_rppdefs \page subpage_rppi */
#define RPP_MAX_8U ( 255 )
/*! \brief RPP maximum dimensions in tensor \ingroup group_rppdefs \page subpage_rppt */
#define RPPT_MAX_DIMS ( 5 )

const float ONE_OVER_6 = 1.0f / 6;
const float ONE_OVER_3 = 1.0f / 3;
Expand Down Expand Up @@ -99,15 +101,15 @@ typedef enum
/*! \brief Invalid dst tensor layout. (Needs to adhere to function specification.) \ingroup group_rppdefs */
RPP_ERROR_INVALID_DST_LAYOUT = -10,
/*! \brief Invalid src tensor datatype. (Needs to adhere to function specification.) \ingroup group_rppdefs */
RPP_ERROR_INVALID_SRC_DATATYPE = -11,
RPP_ERROR_INVALID_SRC_DATATYPE = -11,
/*! \brief Invalid dst tensor datatype. (Needs to adhere to function specification.) \ingroup group_rppdefs */
RPP_ERROR_INVALID_DST_DATATYPE = -12,
RPP_ERROR_INVALID_DST_DATATYPE = -12,
/*! \brief Invalid src/dst tensor datatype. (Needs to adhere to function specification.) \ingroup group_rppdefs */
RPP_ERROR_INVALID_SRC_OR_DST_DATATYPE = -13,
RPP_ERROR_INVALID_SRC_OR_DST_DATATYPE = -13,
/*! \brief Insufficient dst buffer length provided. (Needs to adhere to function specification.) \ingroup group_rppdefs */
RPP_ERROR_INSUFFICIENT_DST_BUFFER_LENGTH = -14,
/*! \brief Invalid datatype \ingroup group_rppdefs */
RPP_ERROR_INVALID_DATATYPE = -15,
RPP_ERROR_INVALID_PARAMETER_DATATYPE = -15,
/*! \brief Not enough memory \ingroup group_rppdefs */
RPP_ERROR_NOT_ENOUGH_MEMORY = -16,
/*! \brief Out of bound source ROI \ingroup group_rppdefs */
Expand Down Expand Up @@ -297,6 +299,13 @@ typedef struct
/*! \brief RPPI Image 2D Rectangle (XYWH format) type struct
* \ingroup group_rppdefs
*/
typedef struct
{
int x;
int y;
int z;
} RppiPoint3D;

typedef struct
{
int x;
Expand Down Expand Up @@ -335,21 +344,29 @@ typedef enum
typedef enum
{
NCHW,
NHWC
NHWC,
NCDHW,
NDHWC
} RpptLayout;

/*! \brief RPPT Tensor 2D ROI type enum
* \ingroup group_rppdefs
*/
typedef enum
{
LTRB,
XYWH
LTRB, // Left-Top-Right-Bottom
XYWH // X-Y-Width-Height
} RpptRoiType;

/*! \brief RPPT Tensor subpixel layout type enum
* \ingroup group_rppdefs
*/
typedef enum
{
LTFRBB, // Left-Top-Front-Right-Bottom-Back
XYZWHD // X-Y-Z-Width-Height-Depth
} RpptRoi3DType;

typedef enum
{
RGBtype,
Expand All @@ -374,10 +391,19 @@ typedef enum
*/
typedef struct
{
RppiPoint lt, rb;
RppiPoint lt, rb; // Left-Top point and Right-Bottom point

} RpptRoiLtrb;

/*! \brief RPPT Tensor 3D ROI LTFRBB struct
* \ingroup group_rppdefs
*/
typedef struct
{
RppiPoint3D ltf, rbb; // Left-Top-Front point and Right-Bottom-Back point

} RpptRoiLtfrbb;

/*! \brief RPPT Tensor 2D ROI XYWH struct
* \ingroup group_rppdefs
*/
Expand All @@ -388,16 +414,36 @@ typedef struct

} RpptRoiXywh;

/*! \brief RPPT Tensor 3D ROI XYZWHD struct
* \ingroup group_rppdefs
*/
typedef struct
{
RppiPoint3D xyz;
int roiWidth, roiHeight, roiDepth;

} RpptRoiXyzwhd;

/*! \brief RPPT Tensor 2D ROI union
* \ingroup group_rppdefs
*/
typedef union
{
RpptRoiLtrb ltrbROI;
RpptRoiXywh xywhROI;
RpptRoiLtrb ltrbROI; // ROI defined as Left-Top-Right-Bottom
RpptRoiXywh xywhROI; // ROI defined as X-Y-Width-Height

} RpptROI, *RpptROIPtr;

/*! \brief RPPT Tensor 3D ROI union
* \ingroup group_rppdefs
*/
typedef union
{
RpptRoiLtfrbb ltfrbbROI; // ROI defined as Left-Top-Front-Right-Bottom-Back
RpptRoiXyzwhd xyzwhdROI; // ROI defined as X-Y-Z-Width-Height-Depth

} RpptROI3D, *RpptROI3DPtr;

/*! \brief RPPT Tensor strides type struct
* \ingroup group_rppdefs
*/
Expand All @@ -417,14 +463,24 @@ typedef struct
RppSize_t numDims;
Rpp32u offsetInBytes;
RpptDataType dataType;
RpptLayout layout;
Rpp32u n, c, h, w;
RpptStrides strides;
RpptLayout layout;
} RpptDesc, *RpptDescPtr;

/*! \brief RPPT Tensor 8-bit uchar RGB type struct
* \ingroup group_rppdefs
*/
typedef struct
{
RppSize_t numDims;
Rpp32u offsetInBytes;
RpptDataType dataType;
Rpp32u dims[RPPT_MAX_DIMS];
Rpp32u strides[RPPT_MAX_DIMS];
RpptLayout layout;
} RpptGenericDesc, *RpptGenericDescPtr;

typedef struct
{
Rpp8u R;
Expand Down
1 change: 1 addition & 0 deletions include/rppt.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ extern "C" {
#include "rppt_tensor_filter_augmentations.h"
#include "rppt_tensor_geometric_augmentations.h"
#include "rppt_tensor_morphological_operations.h"
#include "rppt_tensor_arithmetic_operations.h"
#include "rppt_tensor_statistical_operations.h"
#include "rppt_tensor_audio_augmentations.h"

Expand Down
89 changes: 89 additions & 0 deletions include/rppt_tensor_arithmetic_operations.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
/*
Copyright (c) 2019 - 2023 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#ifndef RPPT_TENSOR_ARITHMETIC_OPERATIONS_H
#define RPPT_TENSOR_ARITHMETIC_OPERATIONS_H

/*!
* \file
* \brief RPPT Tensor Arithmetic operation Functions.
*
* \defgroup group_tensor_arithmetic Operations: AMD RPP Tensor Arithmetic Operations
* \brief Tensor Color Augmentations.
*/

#include "rpp.h"
#include "rppdefs.h"
#ifdef __cplusplus
extern "C" {
#endif

/*! \brief Fmadd augmentation HOST
* \details This function performs the fmadd operation on a batch of 4D tensors.
* It multiplies each element of the source tensor by a corresponding element in the 'mulTensor',
* adds a corresponding element from the 'addTensor', and stores the result in the destination tensor.
* Support added for f32 -> f32 dataype.
* \param [in] srcPtr source tensor memory
* \param[in] srcGenericDescPtr source tensor descriptor
* \param[out] dstPtr destination tensor memory
* \param[in] dstGenericDescPtr destination tensor descriptor
* \param[in] mulTensor mul values for fmadd calculation (1D tensor of batchSize Rpp32f values)
* \param[in] addTensor add values for fmadd calculation (1D tensor of batchSize Rpp32f values)
* \param[in] roiGenericPtrSrc ROI data for each image in source tensor (tensor of batchSize RpptRoiGeneric values)
* \param[in] roiType ROI type used (RpptRoi3DType::XYZWHD or RpptRoi3DType::LTFRBB)
* \param [in] rppHandle Host-handle
* \return <tt> RppStatus enum</tt>.
* \returns RPP_SUCCESS <tt>\ref RppStatus</tt> on successful completion.
* Else return RPP_ERROR
* \ingroup group_tensor_arithmetic
*/
RppStatus rppt_fused_multiply_add_scalar_host(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32f *mulTensor, Rpp32f *addTensor, RpptROI3DPtr roiGenericPtrSrc, RpptRoi3DType roiType, rppHandle_t rppHandle);


/*! \brief Fmadd augmentation GPU
* \details This function performs the fmadd operation on a batch of 4D tensors.
* It multiplies each element of the source tensor by a corresponding element in the 'mulTensor',
* adds a corresponding element from the 'addTensor', and stores the result in the destination tensor.
* Support added for f32 -> f32 dataype.
* \param [in] srcPtr source tensor memory
* \param[in] srcGenericDescPtr source tensor descriptor
* \param[out] dstPtr destination tensor memory
* \param[in] dstGenericDescPtr destination tensor descriptor
* \param[in] mulTensor mul values for fmadd calculation (1D tensor of batchSize Rpp32f values)
* \param[in] addTensor add values for fmadd calculation (1D tensor of batchSize Rpp32f values)
* \param[in] roiGenericPtrSrc ROI data for each image in source tensor (tensor of batchSize RpptRoiGeneric values)
* \param[in] roiType ROI type used (RpptRoi3DType::XYZWHD or RpptRoi3DType::LTFRBB)
* \param [in] rppHandle Hip-handle
* \return <tt> RppStatus enum</tt>.
* \returns RPP_SUCCESS <tt>\ref RppStatus</tt> on successful completion.
* Else return RPP_ERROR
* \ingroup group_tensor_arithmetic
*/

#ifdef GPU_SUPPORT
RppStatus rppt_fused_multiply_add_scalar_gpu(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32f *mulTensor, Rpp32f *addTensor, RpptROI3DPtr roiGenericPtrSrc, RpptRoi3DType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

#ifdef __cplusplus
}
#endif
#endif // RPPT_TENSOR_ARITHMETIC_OPERATIONS_H
Loading

0 comments on commit fd59a91

Please sign in to comment.