Skip to content

No GPU backend for Tuna #681

Merged
daniellowell merged 8 commits intodevelopfrom
nogpu_bend
Jan 27, 2021
Merged

No GPU backend for Tuna #681
daniellowell merged 8 commits intodevelopfrom
nogpu_bend

Conversation

@JehandadKhan
Copy link
Copy Markdown
Contributor

@JehandadKhan JehandadKhan commented Jan 14, 2021

The PR adds another backend to MIOpen to support operations that do not require a GPU. This is accomplished by modifying the HIP backend so that the HIP API calls corresponding to those operations are not issued.

Additionally, this PR enables the access to the handle implementation outside MIOpen so that the client application can override Handle behaviors around device properties.

This backend can be used to enable functionality that only depends on the host side code and primarily intended to support testing and Tuna related operation around collection of data and library introspection purposes.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 15, 2021

Codecov Report

Merging #681 (70cc1fd) into develop (1c46da8) will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #681      +/-   ##
===========================================
+ Coverage    52.40%   52.42%   +0.01%     
===========================================
  Files          298      298              
  Lines        46059    46059              
===========================================
+ Hits         24139    24148       +9     
+ Misses       21920    21911       -9     
Impacted Files Coverage Δ
src/db_record.cpp 87.50% <0.00%> (-1.79%) ⬇️
src/include/miopen/sqlite_db.hpp 85.53% <0.00%> (-0.43%) ⬇️
src/sqlite_db.cpp 85.61% <0.00%> (+2.73%) ⬆️
src/include/miopen/exp_backoff.hpp 94.44% <0.00%> (+38.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c46da8...70cc1fd. Read the comment docs.

@JehandadKhan
Copy link
Copy Markdown
Contributor Author

@atamazov Can I bother you to take a look at this PR

Comment thread CMakeLists.txt Outdated
"Which of MIOpens's backends to use?" )
set_property( CACHE MIOPEN_BACKEND PROPERTY STRINGS
OpenCL HIP HIPOC )
OpenCL HIP HIPOC NOGPU)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe this should be name HIPNOGPU since it still requires hip runtime but no gpu.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread src/nogpu/handle.cpp Outdated
const Allocator::ManageDataPtr& /* ddata */,
std::size_t /* sz */) const
{
MIOPEN_HANDLE_LOCK
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handle lock is actually for the GPU, so I dont think its needed here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment thread src/nogpu/handle.cpp Outdated
std::size_t /* sz */) const
{
MIOPEN_HANDLE_LOCK
this->Finish();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Finish does nothing, this function call could be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Jan 19, 2021

We should probably add some tests for this on jenkins so we can make sure someone doesn't accidentally break this in another PR. I guess at first, we could just add jenkins job that just builds this.

Copy link
Copy Markdown
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge latest develop here. I am afraid that some of the latest changes (TargetID related) might break functional compatibility of this PR with develop.

This PR looks like a continuation of #307. Two related questions:

  • There is #560 which is essentially leftovers of #307. Are we going to resolve it and when?
  • It seems like this PR makes some changes done in #307 useless. If so, are we going to revert that changes and when?

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread src/CMakeLists.txt
if( MIOPEN_BACKEND STREQUAL "NOGPU")
list(APPEND MIOpen_Source
hip/hiperrors.cpp
nogpu/handle.cpp
Copy link
Copy Markdown
Contributor

@atamazov atamazov Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notice] AFAICS this PR does not use a mechanism designed to add new backends -- I mean the HandleImpl machinery, -- in the originally intended way. So I am curious, why.

However, I do not have time to consider this issue in detail. If it doesn't break anything, and it works for you, then okay. For now ;)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR does introduce a new implementation here which has been moved out of the cpp file so that the client program can include the header file and override the behavior when desired.

@ghost
Copy link
Copy Markdown

ghost commented Jan 20, 2021

DeepCode's analysis on #70cc1f found:

  • ℹ️ 2 minor issues. 👇

Top issues

Description Example fixes
Unnecessary parens after 'assert' keyword Occurrences: 🔧 Example fixes
No need to call keys, directly check with the in operator. Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Copy link
Copy Markdown
Contributor Author

@JehandadKhan JehandadKhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add some tests for this on jenkins so we can make sure someone doesn't accidentally break this in another PR. I guess at first, we could just add jenkins job that just builds this.

Updated the Jenkinsfile

@JehandadKhan
Copy link
Copy Markdown
Contributor Author

Please merge latest develop here. I am afraid that some of the latest changes (TargetID related) might break functional compatibility of this PR with develop.

Updated the branch with latest changes from develop.

This PR looks like a continuation of #307.

@atamazov I concur with your observation, there certainly is considerable overlap. In retrospect we should have implemented this PR from the get go and avoided littering the library with numerous env vars and conditionals to check for them. However, recently it became clear to me that we need a more thorough approach to the issue of Tuna ( + fin) trying to override various MIOpen behaviors which is not achievable using env vars. For example an API centric way to override the device name / cu count and in the future, behavior around the program/kernel cache. After discussing with @pfultz2 I decided to implement the first necessary step (this PR) which may be expanded upon later (for binary cache etc).

Two related questions:

I am not sure, perhaps its best to ask the author on the same issue ie #560

  • It seems like this PR makes some changes done in GPU'less compile  #307 useless. If so, are we going to revert that changes and when?

That is correct, we can revert those changes as soon as the corresponding changes ( using the proposed PR) are implemented in Tuna. In that case the leftovers in #560 would be irrelevant and we can close them.

@JehandadKhan
Copy link
Copy Markdown
Contributor Author

@atamazov and @pfultz2 please re-review

@atamazov
Copy link
Copy Markdown
Contributor

@JehandadKhan Thanks for feedback.

atamazov
atamazov previously approved these changes Jan 20, 2021
Copy link
Copy Markdown
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@JehandadKhan
Copy link
Copy Markdown
Contributor Author

@daniellowell The CI passes

@daniellowell daniellowell merged commit f6ad7ed into develop Jan 27, 2021
@atamazov atamazov deleted the nogpu_bend branch April 8, 2021 17:16
Comment thread src/kernels/migrate_db.py
new_cnt = cur.execute("SELECT count(*) from perf_db").fetchone()[0]
assert(new_cnt == total_cnt)
new_slv_cnt = {}
for cnt, slv in cur.execute("select count(*), solver from perf_db group by solver;"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JehandadKhan should there be a cur.close() somewhere here?

Comment thread src/nogpu/handle.cpp

std::size_t Handle::GetImage3dMaxWidth() const { return this->impl->img3d_max_width; }

std::size_t Handle::GetWavefrontWidth() const { return this->impl->warp_size; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the NOGPU backend is used for building kernels. GetWavefrontWidth() should return correct value (in accordance to the target GPU type) in order to Solvers generate proper Solutions. Build options may depend on wavesize. This would lead to cache misses when using a cache filled by kernels built with NOGPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants