Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Merge AMD-HIP port #6257

Open
VincentSC opened this issue May 15, 2017 · 25 comments
Open

Merge AMD-HIP port #6257

VincentSC opened this issue May 15, 2017 · 25 comments

Comments

@VincentSC
Copy link

AMD has just released their port for MXnet to AMD GPUs, to be found here.
Please merge that code, so MXnet works on both NVidia and AMD hardware.

@piiswrong
Copy link
Contributor

I took a quick look but it looks pretty preliminary. Have you verified that its actually working?

@VincentSC
Copy link
Author

Nope, I was hinted on new additions to Github and reacted on that. So you could be totally correct on this.
We're going to benchmark Torch7, Caffe and the MXnet in the coming weeks, and then we would've found out. Torch7 and Caffe are full ports for sure, so I assumed MXnet would too - it would explain the lack of build-instructions.

The good part is that you can choose to integrate it gradually, or wait until it's finished.

@bensander
Copy link

bensander commented Jun 3, 2017

Hi, this is Ben from AMD. The port is in the early stages of development as you note. Glad to see this issue since we wanted to make sure the community was aware of our plans and get some early feedback before we get too deep into the port. Let me provide a little background on the tools and what we'd like to do, and ask a couple questions:

HIP is an open-source toolkit designed to port CUDA code. After porting, the code can run on AMD ROCm GPU platform or still on the original CUDA (with same performance). HIP runtime and C++ language are familiar to CUDA programmers so the porting process is lightweight and does little damage to the original code. The HIP code is easy to maintain since it is close to CUDA, and provides full C++ support including templates, classes, namespaces, etc. HIP runtime code, headers, and porting tools are maintained on GitHub and updated every 2-4 weeks by AMD developers.

AMD is also developing an open-source optimized machine intelligence library called "MIOpen". This will contain convolution, pooling, and other kernels which have been optimized for AMD GPUs. The MIOpen host runtime API accepts HIP parameters for things like streams, and also defines Tensor and other structures for memory management. MIOpen is currently internal but will be open-sourced shortly.

We'd like to use these tools to enable MXNet to run on the ROCm platform. Overall strategy would be to port the CUDA code to HIP, and add the calls into MIOpen in the appropriate layers so it runs fast. The MXNet port is being developed in the open here.

One early design question we could use some help on: the port currently replaces the CUDA code with HIP code. This was an easy way to start since it requires less initial modification to the MXNet make system - we can just port the .cu files and start building. However, we likely want to instead use a design where the HIP files exist alongside the CUDA ones. What is best way to add a new HIP target to the MXNet code - where should we put the files, and what make changes are required?

Thanks! We look forward to participating in the MXNet community,
-ben

@mli
Copy link
Member

mli commented Jun 5, 2017

I suggest adding a compilation flag USE_HIP. Then in makefile, add the following

ifeq ($(USE_HIP), 1)
	CFLAGS += -DMXNET_USE_HIP=1
	LDFLAGS += -lhip  # if there is a libhip.so
endif

So now we can switch between hip and cuda in source codes by

#if MXNET_USE_HIP == 1
// codes
#endif

@antinucleon
Copy link
Contributor

antinucleon commented Jun 5, 2017 via email

@sriharikarnam
Copy link

Hi,

This is Srihari from AMD:
Above approach is one of the case we considered earlier, but this approach duplicates the code for every runtime API and other CUDA changes.
Need your inputs on how to confine HIP porting changes as a seperate backend alongside CUDA.
For example : Adding a new directory at mxnet/src/hip and confine HIP porting changes to this directory.

@sriharikarnam
Copy link

We are thinking of more modular approach, like introducing another flag MXNET_USE_GPU, this would have an inline function defined that will be common for both CUDA and HIP implementations, we would separate the inline function implementations into two different files specific to CUDA and HIP, based on the flag that is defined, the preprocessor would expand the inline function into the corresponding implementation. We think that this approach would help us in future to separate the implentation into different backends(HIP,CUDA,...). Please comment.

@tqchen
Copy link
Member

tqchen commented Jun 12, 2017

That makes sense.

@yajiedesign
Copy link
Contributor

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

@xiaoxinyin
Copy link

Just wonder if MXNet can be used with AMD GPUs now? Thanks!

@yajiedesign yajiedesign reopened this Oct 15, 2017
@resure-tech
Copy link

resure-tech commented Oct 15, 2017 via email

@sriharikarnam
Copy link

@xiaoxinyin This is Work In Progress as a separate package. Very soon we will discuss on refactoring the code and its abstractions(HIP backend) to start merging the code to master.

@szha
Copy link
Member

szha commented Jan 14, 2018

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

@sriharikarnam
Copy link

@szha We have created a provision for common backend (CUDA and HIP).We are in the process of creating PR for the same.MxNet has 8 subModules[dlpack, dmlc-core, mshadow, nnvm, ps-lite, cub, openmp, googletest] and backend change should go as atomic commit in all sub-modules.So Please suggest way to raise PR in all sub-modules in one go.

@szha
Copy link
Member

szha commented Jan 16, 2018

@sriharikarnam submodules are versioned by commit hash, so you may go ahead and make changes in the submodules first, and once merged, update the submodules in mxnet.

@sriharikarnam
Copy link

We are in-progress of disscusiion with sub module owners.
We will be ready with HIP port of MXNet, which supports both CUDA and HIP path
Need your inputs to upstream the HIP Port.
We are targeting run time API's, we have made changes accordingly, please review and propose changes if any or shall we goahead for the merge with same approach
please go throgh follwing link to understand design changes
1)changes related to runtime api's and cuda_backend creation approach:
ROCm@3d4d4fc
ROCm@00196d8

example:
cudaMalloc will be replaced by gpuMalloc and based on platform selection in make system it will call
cudaMalloc for cuda and hipMalloc for HIP

@zhreshold
Copy link
Member

This change will break all legacy build systems and documents with USE_CUDA=1 enabled. I suggest keep USE_CUDA and add USE_GPU as a dispatcher to turn on either USE_CUDA or USE_HIP in makefile and cmake

@sriharikarnam
Copy link

@zhreshold
mxnet_upstream

please find below changes to retain legacy build system with USE_CUDA.

1) Build system changes:
USE_CUDA will be used to enable/disable CUDA and HIP
USE_NATIVE And USE_HIP will be use to switch between Native MXNet and HIP port.

2) Deep Learning accelerated library(CUDNN):
USE_CUDNN will be used to enable/disbled CUDNN/MIOPEN API.
USE_NATIVE and USE_HIP flag will be used to check platform and HIP_PLATFORM will be used for HIP/CUDA ,HIP/ROCm to switch between CUDNN and MIOPEN API

3)CUDA runtime API
gpuruntime.h will contain definition for runtime api's
e.g. #define gpuMalloc cudaMalloc
based on USE_NATIVE and USE_HIP flag cudaMalloc or hipMalloc will be invoked

@eric-haibin-lin
Copy link
Member

@sriharikarnam could you add a subpage to apache mxnet wiki and send out an email to dev@ list for the community to review the high level design? Making a PR to reveal the code changes will be also helpful.

@sriharikarnam
Copy link

@eric-haibin-lin
Could you please suggest the procedure to add a subpage in mxnet wiki for the design proposal Couldn't find a link in the page where we can upload the design
We had registered for @dev and got approval

@eric-haibin-lin
Copy link
Member

Sorry I was not aware that you cannot edit the wiki page. What's your cwiki.apache.org account/email? I can add you.

For the initial design let's just send out a link to google doc on dev@ for people to comment, and we can copy paste the final design to cwiki later.

@PistonY
Copy link

PistonY commented Nov 16, 2018

M@sriharikarnam Does this has any progress?Is this work still going on?

@sriharikarnam
Copy link

@PistonY
Yes, it is, please review the design doc at cwiki https://cwiki.apache.org/confluence/display/MXNET/Upstreaming+of+Mxnet+HIP+port

@boniek83
Copy link

Any news on this? It has been over 2 years already...

@Titaniumtown
Copy link

Is this not happening?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests