Skip to content

Commit

Permalink
RPP Phase on HIP and HOST (#267)
Browse files Browse the repository at this point in the history
* RPP RICAP Tensor for HOST and HIP (#213)

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* Support HIP Backend for RICAP

* Fix HIP pkd3->pkd3 variant

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Improve readability and Cleanup

* Additional cleanup

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Fix codacy warnings

* Address other codacy warnings

* Update ricap.hpp with reference paper

* Add RICAP dataset path in readme

* Make changes to error codes returned

* Modify roi crop region for unit and perf tests

* RPP Tensor Water Augmentation on HOST and HIP (#181)

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Add Avx2 implementation for F32 and U8 toggle variants

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* change F32 load and store logic

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* removed golden outputs for water

* minor changes

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* changed generic nn F32 loads using gather and setr instructions

* added comments for latest changes

* minor change

* added definition for storing 32 and 64 bits from a 128bit register

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>

* Fix build error

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* Boost deps fix for test suite

---------

Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

* Documentation - Readme & changelog updates (#251)

* readme and changelog updates for 6.0

* minor update

* Documentation - Bump rocm-docs-core[api_reference] from 0.26.0 to 0.27.0 in /docs/sphinx (#253)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Resize Mirror Normalize Bugfix (#252)

* added fix for hipMemset

* remove pixel check for U8-F32 and U8-F16 for HOST codes

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>

* Sphinx - updates (#257)

* Sphinx - updates

* Doxygen - Updates

* Docs - Remove index.md

* Initial commit - Phase HOST Tensor

* Add QA outputs

* Update QA reference outputs for phase kernel

Also fixes mismatch

* Replace SSE optimization with AVX

Fixes QA reference outputs

* Fix f16 and i8 datatype variants

* Initial commit - Phase HIP Tensor

* Fix pixel mismatch between HIP and HOST

* Use reinrerpret_cast instead of typecast

* Optimize HIP kernel

* Cleanup and few optimizations for HOST

* Replace all occurences of static_cast with reinterpret_cast

Includes using cvtps_ph instead of direct typecasting

* Optimize HIP pkd3 loads by loading 24 pixels in one shot

* Fix f16 PLN3->PKD3 variant

* Merge branch 'master' of https://github.com/r-abishek/rpp into sn/phase_tensor

* Remove ricap mods from phase PR branch

* Remove ricap changes from phase PR branch

* Change to small f

* minor bug fix for phase hip pkd kernel

* Minor docs fix

* Add fix for warnings on hip build

* Fix static_cast used in f16 HOST backend

* Cleanup

Enable AVX optimization only when AVX flag is enabled

* Fix constant definition

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Loading branch information
8 people committed Dec 22, 2023
1 parent 81823d5 commit 8999c22
Show file tree
Hide file tree
Showing 17 changed files with 1,603 additions and 15 deletions.
44 changes: 44 additions & 0 deletions include/rppt_tensor_geometric_augmentations.h
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,50 @@ RppStatus rppt_rotate_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dst
RppStatus rppt_rotate_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *angle, RpptInterpolationType interpolationType, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Phase augmentation on HOST backend for a NCHW/NHWC layout tensor
* \details The phase augmentation computes phase of corresponding pixels for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.jpg Sample Input1
* \image html img150x150_2.jpg Sample Input2
* \image html phase_augmentation_img150x150.jpg Sample Output
* \param [in] srcPtr1 source1 tensor in HOST memory
* \param [in] srcPtr2 source2 tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HOST memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] roiTensorSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_phase_host(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Phase augmentation on HIP backend for a NCHW/NHWC layout tensor
* \details The phase augmentation computes phase of corresponding pixels for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.jpg Sample Input1
* \image html img150x150_2.jpg Sample Input2
* \image html phase_augmentation_img150x150.jpg Sample Output
* \param [in] srcPtr1 source1 tensor in HIP memory
* \param [in] srcPtr2 source2 tensor in HIP memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HIP memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] roiTensorSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_phase_gpu(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Slice augmentation HOST
* \details This function performs slice augmentation on a generic 4D tensor.
* Slice augmentation involves selecting a region of interest (ROI) from the source tensor
Expand Down
2 changes: 2 additions & 0 deletions src/include/cpu/rpp_cpu_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ typedef halfhpp Rpp16f;
#define RPP_2POW32_INV_DIV_2 1.164153218e-10f // RPP_2POW32_INV / 2
#define RPP_2POW32_INV_MUL_2PI 1.46291812e-09f // (1 / 2^32) * 2PI
#define RPP_2POW32_INV_MUL_2PI_DIV_2 7.3145906e-10f // RPP_2POW32_INV_MUL_2PI / 2
#define RPP_255_OVER_1PT57 162.3380757272f // (255 / 1.570796) - multiplier used in phase computation
#define ONE_OVER_1PT57 0.6366199048f // (1 / 1.570796) i.e. 2/pi - multiplier used in phase computation

const __m128 xmm_p2Pow32 = _mm_set1_ps(RPP_2POW32);
const __m128 xmm_p2Pow32Inv = _mm_set1_ps(RPP_2POW32_INV);
Expand Down
129 changes: 123 additions & 6 deletions src/include/cpu/rpp_cpu_simd.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1891,13 +1891,13 @@ static const __m128 _ps_coscof_p1 = _mm_set1_ps(-1.388731625493765E-003f);
static const __m128 _ps_coscof_p2 = _mm_set1_ps( 4.166664568298827E-002f);
static const __m128 _ps_cephes_FOPI = _mm_set1_ps(1.27323954473516f); // 4 / M_PI

static const __m256 _ps_1_avx = _mm256_set1_ps(1.f);
static const __m256 _ps_0p5_avx = _mm256_set1_ps(0.5f);
static const __m256 _ps_n0p5_avx = _mm256_set1_ps(-0.5f);
static const __m256 _ps_1p5_avx = _mm256_set1_ps(1.5f);
static const __m256 _ps_min_norm_pos_avx = set1_ps_hex_avx(0x00800000);
static const __m256 _ps_inv_mant_mask_avx = set1_ps_hex_avx(~0x7f800000);
static const __m256 _ps_sign_mask_avx = set1_ps_hex_avx(0x80000000);
static const __m256 _ps_inv_sign_mask_avx = set1_ps_hex_avx(~0x80000000);

static const __m256i _pi32_1_avx = _mm256_set1_epi32(1);
static const __m256i _pi32_inv1_avx = _mm256_set1_epi32(~1);
Expand Down Expand Up @@ -1969,7 +1969,7 @@ static inline void sincos_ps(__m256 x, __m256 *s, __m256 *c)
y = _mm256_mul_ps(y, _mm256_mul_ps(z, z));
__m256 tmp = _mm256_mul_ps(z, _ps_0p5_avx);
y = _mm256_sub_ps(y, tmp);
y = _mm256_add_ps(y, _ps_1_avx);
y = _mm256_add_ps(y, avx_p1);

// Evaluate the second polynom (Pi/4 <= x <= 0)

Expand Down Expand Up @@ -2083,9 +2083,9 @@ static const __m128 _ps_cephes_PIO2F = _mm_set1_ps(1.5707963267948966192);
static const __m128 _ps_cephes_PIO4F = _mm_set1_ps(0.7853981633974483096);

static const __m128 _ps_atancof_p0 = _mm_set1_ps(8.05374449538e-2);
static const __m128 _ps_atancof_p1 = _mm_set1_ps(1.38776856032E-1);
static const __m128 _ps_atancof_p2 = _mm_set1_ps(1.99777106478E-1);
static const __m128 _ps_atancof_p3 = _mm_set1_ps(3.33329491539E-1);
static const __m128 _ps_atancof_p1 = _mm_set1_ps(1.38776856032e-1);
static const __m128 _ps_atancof_p2 = _mm_set1_ps(1.99777106478e-1);
static const __m128 _ps_atancof_p3 = _mm_set1_ps(3.33329491539e-1);

static inline __m128 atan_ps( __m128 x )
{
Expand Down Expand Up @@ -2195,12 +2195,129 @@ static inline __m128 atan2_ps( __m128 y, __m128 x )
return result;
}

static const __m256 _ps_atanrange_hi_avx = _mm256_set1_ps(2.414213562373095);
static const __m256 _ps_atanrange_lo_avx = _mm256_set1_ps(0.4142135623730950);
static const __m256 _ps_cephes_PIF_avx = _mm256_set1_ps(3.141592653589793238);
static const __m256 _ps_cephes_PIO2F_avx = _mm256_set1_ps(1.5707963267948966192);
static const __m256 _ps_cephes_PIO4F_avx = _mm256_set1_ps(0.7853981633974483096);

static const __m256 _ps_atancof_p0_avx = _mm256_set1_ps(8.05374449538e-2);
static const __m256 _ps_atancof_p1_avx = _mm256_set1_ps(1.38776856032e-1);
static const __m256 _ps_atancof_p2_avx = _mm256_set1_ps(1.99777106478e-1);
static const __m256 _ps_atancof_p3_avx = _mm256_set1_ps(3.33329491539e-1);

// AVX2 version of the atan_ps() SSE version
static inline __m256 atan_ps(__m256 x)
{
__m256 sign_bit, y;

sign_bit = x;
// Take the absolute value
x = _mm256_and_ps(x, _ps_inv_sign_mask_avx);
// Extract the sign bit (upper one)
sign_bit = _mm256_and_ps(sign_bit, _ps_sign_mask_avx);

// Range reduction, init x and y depending on range
// x > 2.414213562373095
__m256 cmp0 = _mm256_cmp_ps(x, _ps_atanrange_hi_avx, _CMP_GT_OS);
// x > 0.4142135623730950
__m256 cmp1 = _mm256_cmp_ps(x, _ps_atanrange_lo_avx, _CMP_GT_OS);

// x > 0.4142135623730950 && !(x > 2.414213562373095)
__m256 cmp2 = _mm256_andnot_ps(cmp0, cmp1);

// -(1.0/x);
__m256 y0 = _mm256_and_ps(cmp0, _ps_cephes_PIO2F_avx);
__m256 x0 = _mm256_div_ps(avx_p1, x);
x0 = _mm256_xor_ps(x0, _ps_sign_mask_avx);

__m256 y1 = _mm256_and_ps(cmp2, _ps_cephes_PIO4F_avx);
// (x-1.0)/(x+1.0)
__m256 x1_o = _mm256_sub_ps(x, avx_p1);
__m256 x1_u = _mm256_add_ps(x, avx_p1);
__m256 x1 = _mm256_div_ps(x1_o, x1_u);

__m256 x2 = _mm256_and_ps(cmp2, x1);
x0 = _mm256_and_ps(cmp0, x0);
x2 = _mm256_or_ps(x2, x0);
cmp1 = _mm256_or_ps(cmp0, cmp2);
x2 = _mm256_and_ps(cmp1, x2);
x = _mm256_andnot_ps(cmp1, x);
x = _mm256_or_ps(x2, x);

y = _mm256_or_ps(y0, y1);

__m256 zz = _mm256_mul_ps(x, x);
__m256 acc = _ps_atancof_p0_avx;
acc = _mm256_fmsub_ps(acc, zz, _ps_atancof_p1_avx);
acc = _mm256_fmadd_ps(acc, zz, _ps_atancof_p2_avx);
acc = _mm256_fmsub_ps(acc, zz, _ps_atancof_p3_avx);
acc = _mm256_mul_ps(acc, zz);
acc = _mm256_fmadd_ps(acc, x, x);
y = _mm256_add_ps(y, acc);

// Update the sign
y = _mm256_xor_ps(y, sign_bit);

return y;
}

// AVX2 version of the atan2_ps() SSE version
static inline __m256 atan2_ps(__m256 y, __m256 x)
{
__m256 x_eq_0 = _mm256_cmp_ps(x, avx_p0, _CMP_EQ_OQ);
__m256 x_gt_0 = _mm256_cmp_ps(x, avx_p0, _CMP_GT_OS);
__m256 x_le_0 = _mm256_cmp_ps(x, avx_p0, _CMP_LE_OS);
__m256 y_eq_0 = _mm256_cmp_ps(y, avx_p0, _CMP_EQ_OQ);
__m256 x_lt_0 = _mm256_cmp_ps(x, avx_p0, _CMP_LT_OS);
__m256 y_lt_0 = _mm256_cmp_ps(y, avx_p0, _CMP_LT_OS);

// Computes a zero mask, set if either both x=y=0 or y=0&x>0
__m256 zero_mask = _mm256_and_ps(x_eq_0, y_eq_0);
__m256 zero_mask_other_case = _mm256_and_ps(y_eq_0, x_gt_0);
zero_mask = _mm256_or_ps(zero_mask, zero_mask_other_case);

// Computes pio2 intermediate result, set if (y!0 and x=0) & (pi/2 XOR (upper bit y<0))
__m256 pio2_mask = _mm256_andnot_ps(y_eq_0, x_eq_0);
__m256 pio2_mask_sign = _mm256_and_ps(y_lt_0, _ps_sign_mask_avx);
__m256 pio2_result = _ps_cephes_PIO2F_avx;
pio2_result = _mm256_xor_ps(pio2_result, pio2_mask_sign);
pio2_result = _mm256_and_ps(pio2_mask, pio2_result);

// Computes pi intermediate result, set if y=0&x<0 and pi
__m256 pi_mask = _mm256_and_ps(y_eq_0, x_le_0);
__m256 pi_result = _mm256_and_ps(pi_mask, _ps_cephes_PIF_avx);

// Computes swap_sign_mask_offset, set if x<0 & y<0 of sign bit(uppermost bit)
__m256 swap_sign_mask_offset = _mm256_and_ps(x_lt_0, y_lt_0);
swap_sign_mask_offset = _mm256_and_ps(swap_sign_mask_offset, _ps_sign_mask_avx);

// Computes offset, set based on pi, swap_sign_mask_offset and x<0
__m256 offset0 = _mm256_xor_ps(_ps_cephes_PIF_avx, swap_sign_mask_offset);
__m256 offset = _mm256_andnot_ps(x_lt_0, avx_p0);
offset = _mm256_and_ps(x_lt_0, offset0);

// Computes division of x,y
__m256 arg = _mm256_div_ps(y, x);
__m256 atan_result = atan_ps(arg);
atan_result = _mm256_add_ps(atan_result, offset);

// Select between zero_result, pio2_result and atan_result
__m256 result = _mm256_andnot_ps(zero_mask, pio2_result);
atan_result = _mm256_andnot_ps(pio2_mask, atan_result);
atan_result = _mm256_andnot_ps(pio2_mask, atan_result);
result = _mm256_or_ps(result, atan_result);
result = _mm256_or_ps(result, pi_result);

return result;
}

// Modified AVX2 version of the original SSE version at https://github.com/RJVB/sse_mathfun/blob/master/sse_mathfun.h
static inline __m256 log_ps(__m256 x)
{
__m256 e;
__m256i emm0;
__m256 one = *(__m256 *)&_ps_1_avx;
__m256 one = *(__m256 *)&avx_p1;
__m256 invalid_mask = _mm256_cmp_ps(x, avx_p0, _CMP_LE_OQ);

// cut off denormalized stuff
Expand Down
15 changes: 11 additions & 4 deletions src/include/hip/rpp_hip_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ struct RPPTensorFunctionMetaData
#define RPP_2POW32_INV_DIV_2 1.164153218e-10f // RPP_2POW32_INV / 2
#define RPP_2POW32_INV_MUL_2PI 1.46291812e-09f // (1 / 2^32) * 2PI
#define RPP_2POW32_INV_MUL_2PI_DIV_2 7.3145906e-10f // RPP_2POW32_INV_MUL_2PI / 2
#define RPP_255_OVER_1PT57 162.3380757272f // (255 / 1.570796) - multiplier used in phase computation
#define ONE_OVER_1PT57 0.6366199048f // (1 / 1.570796) i.e. 2/pi - multiplier used in phase computation
#define SMEM_LENGTH_X 128 // Shared memory length of 128 cols to efficiently utilize all 16 LOCAL_THREADS_X as 16 * 8-byte vectorized global read/writes per thread = 128 bytes, fitting in 32 banks 4 byte wide
#define SMEM_LENGTH_Y_1C 16 // Shared memory length of 16 rows to efficiently utilize all 16 LOCAL_THREADS_Y as 1 128-byte-long row per thread (single channel greyscale)
#define SMEM_LENGTH_Y_3C 48 // Shared memory length of 48 rows to efficiently utilize all 16 LOCAL_THREADS_Y as 3 128-byte-long rows per thread (three channel rgb)
Expand Down Expand Up @@ -218,10 +220,15 @@ __device__ __forceinline__ void rpp_hip_pixel_check_and_store(float pixel, half*

__device__ __forceinline__ float4 rpp_hip_pixel_check_0to255(float4 src_f4)
{
return make_float4(fminf(fmaxf(src_f4.x, 0), 255),
fminf(fmaxf(src_f4.y, 0), 255),
fminf(fmaxf(src_f4.z, 0), 255),
fminf(fmaxf(src_f4.w, 0), 255));
return make_float4(fminf(fmaxf(src_f4.x, 0.0f), 255.0f),
fminf(fmaxf(src_f4.y, 0.0f), 255.0f),
fminf(fmaxf(src_f4.z, 0.0f), 255.0f),
fminf(fmaxf(src_f4.w, 0.0f), 255.0f));
}

__device__ __forceinline__ float rpp_hip_pixel_check_0to255(float src_f1)
{
return fminf(fmaxf(src_f1, 0), 255);
}

// float4 pixel check for 0-1 range
Expand Down
1 change: 1 addition & 0 deletions src/modules/cpu/host_tensor_geometric_augmentations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ THE SOFTWARE.
#include "kernel/resize_mirror_normalize.hpp"
#include "kernel/resize_crop_mirror.hpp"
#include "kernel/warp_affine.hpp"
#include "kernel/phase.hpp"
#include "kernel/slice.hpp"

#endif // HOST_TENSOR_GEOMETRIC_AUGMENTATIONS_HPP
Loading

0 comments on commit 8999c22

Please sign in to comment.