Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPP Remap on HOST and HIP #338

Merged
merged 89 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from 79 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
eddf955
Add Remap Tensor HOST and HIP implementation
HazarathKumarM Jan 17, 2024
c893816
Add testsuite support
snehaa8 Jan 17, 2024
2527978
Fix non layout toggle PKD3 HOST variants
snehaa8 Jan 18, 2024
ca0fde5
Update check condition while comparing QA outputs to support remap
snehaa8 Jan 18, 2024
408f2dd
Fixing naming of HIP tensor as per latest format
snehaa8 Jan 19, 2024
a1f4213
License - updates to 2024 and consistency changes (#298)
r-abishek Jan 31, 2024
7096c1d
Test - Update README.md for test_suite (#299)
r-abishek Jan 31, 2024
07a5f66
Bump rocm-docs-core[api_reference] from 0.33.0 to 0.33.1 in /docs/sph…
dependabot[bot] Feb 6, 2024
e93c591
Fix mismatch between HIP and HOST
snehaa8 Feb 7, 2024
a5e5679
Bump rocm-docs-core[api_reference] from 0.33.1 to 0.33.2 in /docs/sph…
dependabot[bot] Feb 7, 2024
30a6fa0
Change typecast into reinterpret_cast
snehaa8 Feb 8, 2024
3c4b6a4
Modify PLN3 reference output
snehaa8 Feb 8, 2024
0c46d84
Merge branch 'master' of https://github.com/ROCm/rpp into sn/remap
snehaa8 Feb 8, 2024
40073fa
Merge branch 'develop' of https://github.com/ROCm/rpp into sn/remap
snehaa8 Feb 8, 2024
43ed4d5
Update reference outputs
snehaa8 Feb 8, 2024
e8aa6b2
Update doc codeowners (#303)
samjwu Feb 8, 2024
a921332
Documentation - Bump rocm-docs-core[api_reference] from 0.33.2 to 0.3…
dependabot[bot] Feb 9, 2024
54d16d1
Fix PLN3 output corruption of remap
snehaa8 Feb 9, 2024
0f83b1d
Cleanup comments
snehaa8 Feb 9, 2024
30bed4e
Test suite - upgrade 5 qa perf (#305)
kiritigowda Feb 9, 2024
5c423ab
RPP Color Temperature on HOST and HIP (#271)
r-abishek Feb 9, 2024
df6e2c9
RPP Voxel 3D Tensor Add/Subtract scalar on HOST and HIP (#272)
r-abishek Feb 9, 2024
61f56e7
Merge branch 'develop' into sn/remap
r-abishek Feb 13, 2024
a4ed137
RPP Magnitude on HOST and HIP (#278)
r-abishek Feb 14, 2024
1976cbf
Bump rocm-docs-core[api_reference] from 0.34.0 to 0.34.2 in /docs/sph…
dependabot[bot] Feb 16, 2024
ec8f2f0
RPP Tensor Audio Support - Down Mixing (#296)
r-abishek Feb 16, 2024
29a5c82
RPP Voxel 3D Tensor Multiply scalar on HOST and HIP (#306)
r-abishek Feb 16, 2024
98a3c82
Test Suite Bugfix (#307)
r-abishek Feb 16, 2024
c455555
Revert commit "Fix mismatch between HIP and HOST"
snehaa8 Feb 20, 2024
c750beb
Merge branch 'sn/remap' of https://github.com/snehaa8/rpp into sn/remap
snehaa8 Feb 20, 2024
3a5579b
Modify roi input format
snehaa8 Feb 22, 2024
a5937b0
Merge branch 'develop' into sn/remap
snehaa8 Feb 22, 2024
cfebb41
Modify HOST to use numThreads for openMP parallelization
snehaa8 Feb 22, 2024
b69456e
Add tableDescPtr into doxygen docs
snehaa8 Feb 22, 2024
608225b
Bump rocm-docs-core[api_reference] from 0.34.2 to 0.35.0 in /docs/sph…
dependabot[bot] Feb 23, 2024
a7ef385
RPP Reduction - Tensor min and Tensor max on HOST and HIP (#260)
r-abishek Feb 24, 2024
473cde4
CI - Update precheckin.groovy
kiritigowda Feb 24, 2024
368c0d5
Cleanup
snehaa8 Feb 29, 2024
91e56fa
Merge branch 'develop' into sn/remap
snehaa8 Feb 29, 2024
c33af22
Bump rocm-docs-core[api_reference] from 0.35.0 to 0.35.1 in /docs/sph…
dependabot[bot] Mar 6, 2024
14f6334
Bump rocm-docs-core[api_reference] from 0.35.1 to 0.36.0 in /docs/sph…
dependabot[bot] Mar 12, 2024
95c3272
Merge branch 'master' into develop
kiritigowda Mar 12, 2024
641f653
Docs - Bump rocm-docs-core[api_reference] from 0.36.0 to 0.37.0 in /d…
dependabot[bot] Mar 20, 2024
5568573
Link cleanup (#326)
LisaDelaney Mar 20, 2024
a6749ba
Update notes
LisaDelaney Mar 20, 2024
a255906
Docs - Bump rocm-docs-core[api_reference] from 0.37.0 to 0.37.1 in /d…
dependabot[bot] Mar 22, 2024
d3df761
RPP Voxel Flip on HIP and HOST (#285)
r-abishek Mar 23, 2024
ebecb42
RPP Vignette Tensor on HOST and HIP (#311)
r-abishek Mar 23, 2024
f9519be
added missing outputs for image augmentations
sampath1117 Mar 25, 2024
d309411
added gif for voxel input and outputs
sampath1117 Mar 25, 2024
918a297
Merge branch 'master' into sn/remap
snehaa8 Mar 26, 2024
46467c8
modified the output images for water, resize_crop_mirror and resize_m…
sampath1117 Mar 26, 2024
2f6ba34
Merge pull request #253 from sampath1117/sr/doxygen_outputs
r-abishek Mar 27, 2024
0f95723
Merge branch 'ar/doxygen_update_4' of https://github.com/r-abishek/rp…
snehaa8 Mar 28, 2024
58c6b1b
Add doc outputs for remap
snehaa8 Mar 28, 2024
1147bfe
Update CMakeLists.txt
kiritigowda Apr 12, 2024
352fb22
Merge branch 'develop' into sn/remap
snehaa8 Apr 16, 2024
bd6a6c3
Address review comments
snehaa8 Apr 18, 2024
2b80df8
Revert changes in common file
snehaa8 Apr 18, 2024
5e3fc7a
Bump rocm-docs-core[api_reference] from 0.38.1 to 1.0.0 in /docs/sphi…
dependabot[bot] Apr 18, 2024
6c4e179
Fix doxygen comments
snehaa8 Apr 23, 2024
4536723
Fix datatype of remap tables in doxygen comments
snehaa8 Apr 23, 2024
50435f1
Merge pull request #233 from snehaa8/sn/remap
r-abishek Apr 23, 2024
b74a4dc
Revert removal of nearbyintf
snehaa8 Apr 24, 2024
860749c
Merge pull request #266 from snehaa8/sn/remap
r-abishek Apr 25, 2024
77e14ef
Minor common-fixes for HIP (#345)
r-abishek May 7, 2024
e365141
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_remap
r-abishek May 7, 2024
49e12b1
merge fix
r-abishek May 7, 2024
34f3f6d
Readme Updates: --usecase=rocm (#349)
kiritigowda May 8, 2024
ab52683
RPP Tensor Audio Support - Spectrogram (#312)
r-abishek May 8, 2024
ee0d6fe
Update CHANGELOG.md (#352)
r-abishek May 8, 2024
2decd32
RPP Tensor Audio Support - Slice (#325)
r-abishek May 8, 2024
30ce1d6
RPP Tensor Audio Support - MelFilterBank (#332)
r-abishek May 8, 2024
64ae74f
RPP Tensor Normalize ND on HOST and HIP (#335)
r-abishek May 9, 2024
1a3015c
SWDEV-459739 - Remove the package obsolete setting (#353)
raramakr May 9, 2024
4cb8d4b
Audio support merge commit fixes (#354)
r-abishek May 9, 2024
e7b6f9b
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_remap
r-abishek May 14, 2024
d4b6a41
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_remap
r-abishek May 28, 2024
e6d04f5
Merge branch 'develop' into ar/opt_remap
kiritigowda May 29, 2024
a8ede4d
removed unnecesesary inline helper functions
sampath1117 May 31, 2024
69aed52
Merge pull request #275 from sampath1117/remap_pr_changes
r-abishek May 31, 2024
d02a4c3
removed further unncessary inline helpers
sampath1117 Jun 4, 2024
14d5c8f
Merge pull request #276 from sampath1117/sr/remap_changes
r-abishek Jun 4, 2024
6eea0c6
Merge branch 'develop' into ar/opt_remap
r-abishek Jun 4, 2024
ac6b0c7
Update rppt_tensor_geometric_augmentations.h
r-abishek Jun 4, 2024
fcb1939
Update remap.hpp
r-abishek Jun 4, 2024
9643655
Merge branch 'develop' into ar/opt_remap
r-abishek Jun 5, 2024
ed7bd73
Merge branch 'develop' into ar/opt_remap
r-abishek Jun 5, 2024
f4a2f02
Merge branch 'develop' into ar/opt_remap
kiritigowda Jun 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 50 additions & 0 deletions include/rppt_tensor_geometric_augmentations.h
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,56 @@ RppStatus rppt_flip_voxel_host(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDes
RppStatus rppt_flip_voxel_gpu(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32u *horizontalTensor, Rpp32u *verticalTensor, Rpp32u *depthTensor, RpptROI3DPtr roiGenericPtrSrc, RpptRoi3DType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Remap augmentation on HOST backend for a NCHW/NHWC layout tensor
* \details Performs a remap operation using user specified remap tables for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout. For each image, the output(x,y) = input(mapx(x, y), mapy(x, y)) for every (x,y) in the destination image.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.png Sample Input
* \image html geometric_augmentations_remap_img150x150.png Sample Output
* \param [in] srcPtr source tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HOST memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] rowRemapTable Rpp32f row numbers in HOST memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
* \param [in] colRemapTable Rpp32f column numbers in HOST memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
* \param [in] tableDescPtr table tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = F32, layout = NHWC, c = 1)
* \param [in] interpolationType Interpolation type used in <tt> \ref RpptInterpolationType </tt> (Restrictions - Supports only NEAREST_NEIGHBOR and BILINEAR)
* \param [in] roiTensorSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
* \ingroup group_tensor_geometric
*/
RppStatus rppt_remap_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *rowRemapTable, Rpp32f *colRemapTable, RpptDescPtr tableDescPtr, RpptInterpolationType interpolationType, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Remap augmentation on HIP backend for a NCHW/NHWC layout tensor
* \details Performs a remap operation using user specified remap tables for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout. For each image, the output(x,y) = input(mapx(x, y), mapy(x, y)) for every (x,y) in the destination image.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.jpg Sample Input
* \image html geometric_augmentations_remap_img150x150.jpg Sample Output
* \param [in] srcPtr source tensor in HIP memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HIP memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] rowRemapTable Rpp32f row numbers in HIP memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
AryanSalmanpour marked this conversation as resolved.
Show resolved Hide resolved
* \param [in] colRemapTable Rpp32f column numbers in HIP memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
* \param [in] tableDescPtr table tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = F32, layout = NHWC, c = 1)
* \param [in] interpolationType Interpolation type used in <tt> \ref RpptInterpolationType </tt> (Restrictions - Supports only NEAREST_NEIGHBOR and BILINEAR)
* \param [in] roiTensorSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
* \ingroup group_tensor_geometric
*/
RppStatus rppt_remap_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *rowRemapTable, Rpp32f *colRemapTable, RpptDescPtr tableDescPtr, RpptInterpolationType interpolationType, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! @}
*/

Expand Down
17 changes: 17 additions & 0 deletions src/include/cpu/rpp_cpu_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6444,4 +6444,21 @@ inline void reduce_max_i48_host(__m128i *pMaxR, __m128i *pMaxG, __m128i *pMaxB,
result[0] = _mm_max_epi8(_mm_unpacklo_epi64(px[0], zero), _mm_unpackhi_epi64(px[0], zero));
}

inline void compute_remap_src_loc_sse(Rpp32f *rowRemapTablePtr, Rpp32f *colRemapTablePtr, Rpp32s *locArray, __m128 &pStride, __m128 &pWidthLimit, __m128 &pHeightLimit, const __m128 &pChannel = xmm_p1)
{
__m128 pRowRemapVal = _mm_loadu_ps(rowRemapTablePtr);
pRowRemapVal = _mm_max_ps(_mm_min_ps(pRowRemapVal, pHeightLimit), xmm_p0);
__m128 pColRemapVal = _mm_loadu_ps(colRemapTablePtr);
pColRemapVal = _mm_max_ps(_mm_min_ps(pColRemapVal, pWidthLimit), xmm_p0);
__m128i pxRemappedSrcLoc = _mm_cvtps_epi32(_mm_fmadd_ps(pRowRemapVal, pStride, _mm_mul_ps(pColRemapVal, pChannel)));
_mm_storeu_si128((__m128i*) locArray, pxRemappedSrcLoc);
}

inline void compute_remap_src_loc(Rpp32f rowLoc, Rpp32f colLoc, Rpp32s &srcLoc, Rpp32s stride, Rpp32f widthLimit, Rpp32f heightLimit, Rpp32s channels = 1)
{
rowLoc = std::max(0.0f, std::min(rowLoc, heightLimit));
colLoc = std::max(0.0f, std::min(colLoc, widthLimit));
srcLoc = (rowLoc * stride) + colLoc * channels;
}

#endif //RPP_CPU_COMMON_H
39 changes: 25 additions & 14 deletions src/include/cpu/rpp_cpu_simd.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3085,10 +3085,17 @@ inline void rpp_store24_f32pln3_to_u8pkd3_avx(Rpp8u* dstPtr, __m256* p)
_mm256_storeu_si256((__m256i *)(dstPtr), px1); /* store the 24 U8 pixels in dst */
}

inline void rpp_store8_u8pln1_to_u8pln1_avx(Rpp8u* dstPtr, __m256i &p)
{
__m128i pTemp = _mm256_castsi256_si128(p);
rpp_storeu_si64((__m128i *)(dstPtr), pTemp);
}

inline void rpp_store8_f32pln1_to_u8pln1_avx(Rpp8u* dstPtr, __m256 &p)
{
__m256i px1 = _mm256_permute4x64_epi64(_mm256_packus_epi32(_mm256_cvtps_epi32(p), avx_px0), _MM_SHUFFLE(3,1,2,0));
_mm256_storeu_si256((__m256i *)(dstPtr), _mm256_packus_epi16(px1, avx_px0));
px1 = _mm256_packus_epi16(px1, avx_px0);
rpp_store8_u8pln1_to_u8pln1_avx(dstPtr, px1);
rrawther marked this conversation as resolved.
Show resolved Hide resolved
}

inline void rpp_store24_f32pln3_to_u8pln3_avx(Rpp8u* dstRPtr, Rpp8u* dstGPtr, Rpp8u* dstBPtr, __m256* p)
Expand Down Expand Up @@ -3249,11 +3256,17 @@ inline void rpp_store24_f32pln3_to_i8pkd3_avx(Rpp8s* dstPtr, __m256* p)
_mm256_storeu_si256((__m256i *)(dstPtr), px1); /* store the 12 U8 pixels in dst */
}

inline void rpp_store8_i8pln1_to_i8pln1_avx(Rpp8s* dstPtr, __m256i &p)
{
__m128i pTemp = _mm256_castsi256_si128(p);
rpp_storeu_si64((__m128i *)(dstPtr), pTemp);
}

inline void rpp_store8_f32pln1_to_i8pln1_avx(Rpp8s* dstPtr, __m256 &p)
{
__m256i px1 = _mm256_permute4x64_epi64(_mm256_packus_epi32(_mm256_cvtps_epi32(p), avx_px0), _MM_SHUFFLE(3,1,2,0));
px1 = _mm256_sub_epi8(_mm256_packus_epi16(px1, avx_px0), avx_pxConvertI8); /* Pack and add I8 conversion param */
_mm256_storeu_si256((__m256i *)(dstPtr), px1); /* store the 4 pixels in dst */
rpp_storeu_si64((__m128i *)(dstPtr), _mm256_castsi256_si128(px1)); /* store the 4 pixels in dst */
}

inline void rpp_store24_f32pln3_to_i8pln3_avx(Rpp8s* dstRPtr, Rpp8s* dstGPtr, Rpp8s* dstBPtr, __m256* p)
Expand Down Expand Up @@ -3787,6 +3800,11 @@ inline void rpp_store24_i8pkd3_to_i8pln3_avx(Rpp8s* dstPtrR, Rpp8s* dstPtrG, Rpp
}

inline void rpp_store4_u8_to_u8(Rpp8u* dstPtr, __m128i &p)
rrawther marked this conversation as resolved.
Show resolved Hide resolved
{
rpp_storeu_si32((__m128i *)(dstPtr), p);
}

inline void rpp_store12_u8_to_u8(Rpp8u* dstPtr, __m128i &p)
{
_mm_storeu_si128((__m128i *)(dstPtr), p);
}
Expand All @@ -3801,18 +3819,6 @@ inline void rpp_store24_i8_to_i8_avx(Rpp8s* dstPtr, __m256i &p)
_mm256_storeu_si256((__m256i *)(dstPtr), p);
}

inline void rpp_store8_u8pln1_to_u8pln1_avx(Rpp8u* dstPtr, __m256i &p)
{
__m128i pTemp = _mm256_castsi256_si128(p);
rpp_storeu_si64((__m128i *)(dstPtr), pTemp);
}

inline void rpp_store8_i8pln1_to_i8pln1(Rpp8s* dstPtr, __m256i &p)
{
__m128i pTemp = _mm256_castsi256_si128(p);
rpp_storeu_si64((__m128i *)(dstPtr), pTemp);
}

inline void rpp_store12_u8pln3_to_u8pkd3(Rpp8u* dstPtr, __m128i *p)
{
__m128i px[4];
Expand Down Expand Up @@ -3845,6 +3851,11 @@ inline void rpp_store12_i8pkd3_to_i8pln3(Rpp8s* dstPtrR, Rpp8s* dstPtrG, Rpp8s*
}

inline void rpp_store4_i8_to_i8(Rpp8s* dstPtr, __m128i &p)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not to do these kinds of calling convensions which calls single line inline functions from another functions. This will make calling inefficient. Please avoid

rpp_storeu_si32((__m128i *)(dstPtr), p);
}

inline void rpp_store12_i8_to_i8(Rpp8s* dstPtr, __m128i &p)
{
_mm_storeu_si128((__m128i *)(dstPtr), p);
}
Expand Down
1 change: 1 addition & 0 deletions src/modules/cpu/host_tensor_geometric_augmentations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ SOFTWARE.
#include "kernel/crop.hpp"
#include "kernel/crop_mirror_normalize.hpp"
#include "kernel/flip.hpp"
#include "kernel/remap.hpp"
#include "kernel/resize.hpp"
#include "kernel/resize_mirror_normalize.hpp"
#include "kernel/resize_crop_mirror.hpp"
Expand Down
Loading