Find file History
Pull request Compare This branch is 4 commits ahead, 64 commits behind develop.
rgiduthuri release 0.9.8 (#138)
* Changes for Batch Norm Layer

* batch_norm layer changes

* batchnorm and scale layer changes

* rebase amdovx-core

* fixed some bugs related to batchnorm and scale layer

* inferencegen and vx_nn scale_layer changes

* editorial changes to inference generator -- removed tabs, etc.

* Fixed the conv and deconv for the case when bias term is not present

* Added a check for top_size while accessing top data items during model parsing

* Fixed bug for multicolumn and multirow

* group support for conv using tensor init repeat option

* rebase amdovx-core

* rebase amdovx-core

* rebase amdovx-core

* initial framework

* rebase amdovx-core

* rebase amdovx-core

* vxcopylayer to vxcopynode

* Fixed a typo in image to tensor example gdf in

* Buildable vx-code-gen from inference generator

* File read for weights & bias fixed

* Eror Messages for the Model

* Running VGG16 vx-code

* desktop app for inference client

* Check for input tensor

* legible code output

* defined comm protocol and refined gui

* Inference Generator - VX_CODE_GEN fixes

* few more features added

* code gen for nn

* Removed the TEST phase check from the inference generator

* updated gui

* Concat layer now supports batch size > 1 and concatenates along C axis. The order of arguments is changed

* Modified the inference generator to dump gdf and C code based on the new concat layer signature

* more cleanup

* changed folder to appInferenceApp and minor cleanup

* Batch Norm & Scale layer support for Inference C Code Generator

* Initialize for optional variables in Batch Norm & Scale Layer

* Scale_init.f32 generation

* Batch Norm & Scale Layer Fuse Support in C Code Gen

* Fuse Scale&BatchNorm layer by default

* Main.cpp -- check for input.f32

* Check Verify Graph Status in Initialize

* Virtual Tensor Support fot C Code Generator

* User Specified Tensor Creation Supported

* Weights & Bias load Handling

* added annInferenceServer, updated annInferenceApp, inference_generator prints output dimensions

* clmem leak fix - batch norm and scale layer release allocated memory

* AMD NN header added

* Added check for V1LayerParameters while parsing prototxt in inference gen

* Release Clmem with check

* Removed un-necessary line break

* check for clcreatememory for errors

* unintialize function fix in batch Norm and Scale layers

* merge with upstream develop

* Added vx_nn_amd.h - headers for AMD defined layers

* removed non essential includes

* Added support for global pooling parameter in inference generator

* added cmake install default to /opt/rocm, change library output to lib folder, inference generator prints output dimensions

* added networking part and scheduler

* enabled c codegen by default; modified c codegen to use three C APIs instead of a C++ class to support shared; changed output filenames; added CMakeLists.txt generation that uses default installation of amdovx-modules in /opt/rocm folder; cmake produces both a shared library as well as a unit test executable; added an weights data folder as a parameter

* updated inference generator usage and import of generated code; app changes for configured models; enabled basic scheduler path

* added INFERENCE_SCHEDULER_MODE flag to enable integration of other schedulers

* confined inputQ and masterInputQ thread to SIMPLE_INFERENCE_SCHEDULER

* added USE_VX_TENSOR_WITHOUT_CL for debug

* fixed linux build errors

* image_to_tensor: add Mult and Add operation

float p0 = a * amd_unpack0(u4) + b;
float p1 = a * amd_unpack1(u4) + b;
float p2 = a * amd_unpack2(u4) + b;
float p3 = a * amd_unpack3(u4) + b;

* fixed memory leaks reported by issue #104

* refined network protocol code; shows initialization messages; rebased amdovx-core; annInferenceServer build and install is default

* BGR option; SAVE button; run without scheduler; renamed simple scheduler as libre scheduler

* inferencegen cmake build fix

* inferecegen build cmake fix

* fix spaces in cmake file

* fixed linux warnings

* Re-query the tensors before MIOpen call to make sure the cl_mems are passed correct

* git inference_gen cmake clean_up (#107)

* inference_gen cmake clean_up

* inference_gen cmake clean up

* inference_gen minor cleanup

* Fixes in inference_generator and releasing objects in vx_nn node creation (#109)

* Fixed the signature of concat layer in C-code generation

* Fixed creationg of LRN layer in C-code generator

* The objects created in  vxNode...(...) are now released

* annInferenceServer: initialize tensors with OpenCL for convolution initializations; removed USE_VX_TENSOR_WITHOUT_CL; other clean-up

* Update

* Fixed the U8 interleaved to planar F32 conversion in the server code for NO_SCHEDULER and LIBRE_SCHEDULER (#110)

* Fixed variable name in LIBRE_SCHEDULER flow (#111)

* annInferenceServer: Fixed the opencl map flags for reading output of processGraph in LIBRE_SCHEDULER flow (#112)

* added option to save uploaded model as a preconfigured model

* annInferenceApp: added compiler options more explicit

* C++11 keyword needed for Linux builds (#113)

* added inference pre-processing step (per channel multiply add) to handle models that are trained with normalized input values

* made result match/mismatch to show clearly in viewing panel and saved files

* changed to display text for classifier result and ground truth

* made the app inferface more intuitive

* v1layer support, several bug fixes, and cleanups (#118)

* default config makes use of all GPUs

* add SSE2 intrinsics for buffer copy

* Server recv: receive header & eof

* Server Header Receive function cleanup

* cleaned decode and convert code

* added limit to deviceTagQ and option to use OpenCL read/write instead of Map/Unmap

* removed blank line

* added caffe.proto

* minor changes to previous pull request

* Added support for reading V1 layers in caffemodel and prototxt

* added option to Scale Images before sending to server

* Inference Gen - Virtual/No-Virtual Tensor Support

* Fixed the number of parameters in the publish kernel definition

* Fixed the slice layer to slice along the C axis and not N

* Initial commit for groups support. Only supports in V1layer models in GDF in this commit

* Inference Gen: Scale & Batch Norm Layer scale_init.f32 dump fix

* Fixed the definition of vxSliceLayer(..) to have up to eight outputs

* Fixed the reading of dims for vectors in writeGDF call

* Added groups support to C code generation

* Fixed passing the bias term in the GDF writer function

* add new message_q and multithreading for jpeg decode

* added multi_threding for jpeg decode

* Added a check for biasTensor set dims in conv

* Added checks for bias parameter while using bias tensors in fully connected layer

* fix crash at the end for multithread code

* Inference App: Font and Wordings Edit

* Inference App: View unclassified images

* Inference App: Display not in data base images

* fixed incomplete return code path

* annInferenceApp: GUI cleanup

* annInferenceServer: code cleanup

* removed accidental commit

* Fixed pooling layer initialization code (#120)

* Removed workspace creation for LRN, BN and pool layers  (#121)

* Removed the workspace initialize from pooling layer - not needed for inference

* Removed workspace init from BN layer

* Removed workspace init from LRN layer - not needed for inf

* added profiling code (#119)

* Add Upsampling layer (#122)

* Add Upsampling layer

Add new layer (Upsampling) resize nearest neighbor

* tensor_upsample.cpp

integrate the new layer into vx_nn

* insert the new file src/tensor_upsample.cpp


* Kernel.h

* Update kernels.h

* Update tensor_upsample.cpp

* Update tensor_upsample.cpp

* Update tensor_upsample.cpp

* Update tensor_upsample.cpp

* To upsample a tensor GDF.

To upsample a tensor of type float32, one can use the below GDF with RunVX (2x2).

* Update tensor_upsample.cpp

* rebase amdovx-core

* SSE optimization for RGB image resize (#133)

* fixed image<->tensor conversion API calls, added vxTensorNearestUpsampleNode (#134)

* Revert "SSE optimization for RGB image resize" (#136)

* Revert "fixed image<->tensor conversion API calls, added vxTensorNearestUpsampleNode (#134)"

This reverts commit f046f75.

* Revert "SSE optimization for RGB image resize (#133)"

This reverts commit 721d5ac.

* Fix wrong output tensor (#135)

the b value was overwritten by the blue pixel value.

* added

* updated version to 0.9.8

* documentation update and vx_nn cleanup (#137)

* documentation update and cleanup of vx_nn module

* fixed upsample naming

* Update

* Update
Latest commit 8809292 Dec 3, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information.
kernels release 0.9.8 (#138) Dec 3, 2017
CMakeLists.txt release 0.9.8 (#138) Dec 3, 2017
COPYRIGHT.txt loom release 0.9 Oct 13, 2016

Radeon Loom Stitching Library (vx_loomsl)

Radeon Loom Stitching Library (beta preview) is a highly optimized library for 360 degree video stitching applications. This library consists of:

  • Live Stitch API: stitching framework built on top of OpenVX kernels (see live_stitch_api.h for API)
  • OpenVX module [vx_loomsl]: additional OpenVX kernels needed for 360 degree video stitching

The loom_shell command-line tool can be used to build your application quickly. It provides direct access to Live Stitch API by encapsulating the calls to enable rapid prototyping.

This software is provided under a MIT-style license, see the file COPYRIGHT.txt for details.


  • Real-time live 360 degree video stitching optimized for Radeon Pro Graphics
  • Upto 31 cameras
  • Upto 7680x3840 output resolution
  • RGB and YUV 4:2:2 image formats
  • Overlay other videos on top of stitched video
  • Support for 3rd party LoomIO plug-ins for camera capture and stitched output
  • Support PtGui project export/import for camera calibration

Live Stitch API: Simple Example

Let's consider a 360 rig that has 3 1080p cameras with Circular FishEye lenses. The below example demonstrates how to stitch images from these cameras into a 4K Equirectangular buffer.

#include "vx_loomsl/live_stitch_api.h"
#include "utils/loom_shell/loom_shell_util.h"

int main()
  # define camera orientation and lens parameters
  camera_params cam1_par = {{ 120,0,90,0,0,0},{176,1094,547,0,-37,ptgui_lens_fisheye_circ,-0.1719,0.1539,1.0177}};
  camera_params cam2_par = {{   0,0,90,0,0,0},{176,1094,547,0,-37,ptgui_lens_fisheye_circ,-0.1719,0.1539,1.0177}};
  camera_params cam3_par = {{-120,0,90,0,0,0},{176,1094,547,0,-37,ptgui_lens_fisheye_circ,-0.1719,0.1539,1.0177}};

  # create a live stitch instance and initialize
  ls_context context;
  context = lsCreateContext();
  lsSetCameraParams(context, 0, &cam1_par);
  lsSetCameraParams(context, 1, &cam2_par);
  lsSetCameraParams(context, 2, &cam3_par);

  # Get OpenCL context and create OpenCL buffers for input and output
  cl_context opencl_context;
  cl_mem buf[2];
  createBuffer(opencl_context,3*1920*1080*3, &buf[0]);
  createBuffer(opencl_context,3*3840*1920  , &buf[1]);

  # load CAM00.bmp, CAM01.bmp, and CAM02.bmp (1920x1080 each) into buf[0]

  # set input and output buffers and stitch a frame
  lsSetCameraBuffer(context, &buf[0]);
  lsSetOutputBuffer(context, &buf[1]);

  # save the stitched output into "output.bmp"

  # release resources
  return 0;

Live Stitch API: Real-time Live Stitch using LoomIO

This example makes use of a 3rd party LoomIO plug-ins for live camera capture and display.

#include "vx_loomsl/live_stitch_api.h"
int main()
    // create context, configure, and initialize
    ls_context context;
    context = lsCreateContext();
    lsSetOutputConfig(context, VX_DF_IMAGE_RGB, 3840, 1920);
    lsSetCameraConfig(context, 16, 1, VX_DF_IMAGE_RGB, 1920, 1080 * 16);
    lsImportConfiguration(context, "pts", "myrig.pts");
    lsSetCameraModule(context, "vx_loomio_bm", "com.amd.loomio_bm.capture", "30,0,0,16");
    lsSetOutputModule(context, "vx_loomio_bm", "com.amd.loomio_bm.display", "30,0,0");

    // process live from camera until aborted by input capture plug-in
    for(;;) {
      vx_status status;
      status = lsScheduleFrame(context);
      if (status != VX_SUCCESS) break;
      status = lsWaitForCompletion(context);
      if (status != VX_SUCCESS) break;

    // release the context

    return 0;