No GPU backend for Tuna by JehandadKhan · Pull Request #681 · ROCm/MIOpen

JehandadKhan · 2021-01-14T16:16:24Z

The PR adds another backend to MIOpen to support operations that do not require a GPU. This is accomplished by modifying the HIP backend so that the HIP API calls corresponding to those operations are not issued.

Additionally, this PR enables the access to the handle implementation outside MIOpen so that the client application can override Handle behaviors around device properties.

This backend can be used to enable functionality that only depends on the host side code and primarily intended to support testing and Tuna related operation around collection of data and library introspection purposes.

codecov · 2021-01-15T09:10:07Z

Codecov Report

Merging #681 (70cc1fd) into develop (1c46da8) will increase coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #681      +/-   ##
===========================================
+ Coverage    52.40%   52.42%   +0.01%     
===========================================
  Files          298      298              
  Lines        46059    46059              
===========================================
+ Hits         24139    24148       +9     
+ Misses       21920    21911       -9

Impacted Files	Coverage Δ
src/db_record.cpp	`87.50% <0.00%> (-1.79%)`	⬇️
src/include/miopen/sqlite_db.hpp	`85.53% <0.00%> (-0.43%)`	⬇️
src/sqlite_db.cpp	`85.61% <0.00%> (+2.73%)`	⬆️
src/include/miopen/exp_backoff.hpp	`94.44% <0.00%> (+38.88%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c46da8...70cc1fd. Read the comment docs.

JehandadKhan · 2021-01-18T18:02:53Z

@atamazov Can I bother you to take a look at this PR

pfultz2 · 2021-01-19T01:07:56Z

    "Which of MIOpens's backends to use?" )
 set_property( CACHE MIOPEN_BACKEND PROPERTY STRINGS
-    OpenCL HIP HIPOC )
+    OpenCL HIP HIPOC NOGPU)


So maybe this should be name HIPNOGPU since it still requires hip runtime but no gpu.

pfultz2 · 2021-01-19T01:09:13Z

+                    const Allocator::ManageDataPtr& /* ddata */,
+                    std::size_t /* sz */) const
+{
+    MIOPEN_HANDLE_LOCK


This handle lock is actually for the GPU, so I dont think its needed here.

pfultz2 · 2021-01-19T01:09:52Z

+                    std::size_t /* sz */) const
+{
+    MIOPEN_HANDLE_LOCK
+    this->Finish();


Since Finish does nothing, this function call could be removed.

pfultz2 · 2021-01-19T01:13:00Z

We should probably add some tests for this on jenkins so we can make sure someone doesn't accidentally break this in another PR. I guess at first, we could just add jenkins job that just builds this.

atamazov

Please merge latest develop here. I am afraid that some of the latest changes (TargetID related) might break functional compatibility of this PR with develop.

This PR looks like a continuation of #307. Two related questions:

There is #560 which is essentially leftovers of #307. Are we going to resolve it and when?
It seems like this PR makes some changes done in #307 useless. If so, are we going to revert that changes and when?

atamazov · 2021-01-19T13:15:27Z

+if( MIOPEN_BACKEND STREQUAL "NOGPU")
+    list(APPEND MIOpen_Source
+        hip/hiperrors.cpp
+        nogpu/handle.cpp


[Notice] AFAICS this PR does not use a mechanism designed to add new backends -- I mean the HandleImpl machinery, -- in the originally intended way. So I am curious, why.

However, I do not have time to consider this issue in detail. If it doesn't break anything, and it works for you, then okay. For now ;)

The PR does introduce a new implementation here which has been moved out of the cpp file so that the client program can include the header file and override the behavior when desired.

ghost · 2021-01-20T14:14:35Z

DeepCode's analysis on #70cc1f found:

ℹ️ 2 minor issues. 👇

Top issues

Description	Example fixes
Unnecessary parens after 'assert' keyword Occurrences: migrate_db.py:24 migrate_db.py:26 migrate_db.py:27	🔧 Example fixes
No need to call keys, directly check with the in operator. Occurrences: migrate_db.py:49	🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

JehandadKhan

We should probably add some tests for this on jenkins so we can make sure someone doesn't accidentally break this in another PR. I guess at first, we could just add jenkins job that just builds this.

Updated the Jenkinsfile

JehandadKhan · 2021-01-20T14:28:00Z

Please merge latest develop here. I am afraid that some of the latest changes (TargetID related) might break functional compatibility of this PR with develop.

Updated the branch with latest changes from develop.

This PR looks like a continuation of #307.

@atamazov I concur with your observation, there certainly is considerable overlap. In retrospect we should have implemented this PR from the get go and avoided littering the library with numerous env vars and conditionals to check for them. However, recently it became clear to me that we need a more thorough approach to the issue of Tuna ( + fin) trying to override various MIOpen behaviors which is not achievable using env vars. For example an API centric way to override the device name / cu count and in the future, behavior around the program/kernel cache. After discussing with @pfultz2 I decided to implement the first necessary step (this PR) which may be expanded upon later (for binary cache etc).

Two related questions:

There is Tracking changes requested in merged PR #307 #560 which is essentially leftovers of GPU'less compile #307. Are we going to resolve it and when?

I am not sure, perhaps its best to ask the author on the same issue ie #560

It seems like this PR makes some changes done in GPU'less compile #307 useless. If so, are we going to revert that changes and when?

That is correct, we can revert those changes as soon as the corresponding changes ( using the proposed PR) are implemented in Tuna. In that case the leftovers in #560 would be irrelevant and we can close them.

JehandadKhan · 2021-01-20T14:28:20Z

@atamazov and @pfultz2 please re-review

atamazov · 2021-01-20T15:10:14Z

@JehandadKhan Thanks for feedback.

atamazov

LGTM!

JehandadKhan · 2021-01-27T20:08:59Z

@daniellowell The CI passes

alexandraBara · 2021-04-19T16:09:39Z

+new_cnt = cur.execute("SELECT count(*) from perf_db").fetchone()[0]
+assert(new_cnt == total_cnt)
+new_slv_cnt = {}
+for cnt, slv in cur.execute("select count(*), solver from perf_db group by solver;"):


@JehandadKhan should there be a cur.close() somewhere here?

atamazov · 2023-04-28T16:13:53Z

+
+std::size_t Handle::GetImage3dMaxWidth() const { return this->impl->img3d_max_width; }
+
+std::size_t Handle::GetWavefrontWidth() const { return this->impl->warp_size; }


IIRC the NOGPU backend is used for building kernels. GetWavefrontWidth() should return correct value (in accordance to the target GPU type) in order to Solvers generate proper Solutions. Build options may depend on wavesize. This would lead to cache misses when using a cache filled by kernels built with NOGPU.

add nogpu backend to MIOpen

00d1e99

JehandadKhan requested review from atamazov, daniellowell and pfultz2 January 14, 2021 16:16

clang format

839ddd6

pfultz2 reviewed Jan 19, 2021

View reviewed changes

atamazov suggested changes Jan 19, 2021

View reviewed changes

JehandadKhan added 4 commits January 20, 2021 07:25

address review comments

02eb1a6

Merge branch 'develop' into nogpu_bend

28556c9

Updates for TargetID

4b43988

Correct the NOGPU mode define

3801700

JehandadKhan commented Jan 20, 2021

View reviewed changes

JehandadKhan requested review from atamazov and pfultz2 January 20, 2021 14:28

atamazov mentioned this pull request Jan 20, 2021

Tracking changes requested in merged PR #307 #560

Closed

atamazov previously approved these changes Jan 20, 2021

View reviewed changes

Add missing node directive to Jenkinsfile

73de07d

JehandadKhan dismissed atamazov’s stale review via 73de07d January 21, 2021 13:14

Correct MIOPEN_MODE_NOGPU spelling

70cc1fd

atamazov approved these changes Jan 22, 2021

View reviewed changes

JehandadKhan added the TESTING_CI_PASSED label Jan 27, 2021

daniellowell merged commit f6ad7ed into develop Jan 27, 2021

atamazov deleted the nogpu_bend branch April 8, 2021 17:16

alexandraBara reviewed Apr 19, 2021

View reviewed changes

atamazov reviewed Apr 28, 2023

View reviewed changes


		std::size_t Handle::GetImage3dMaxWidth() const { return this->impl->img3d_max_width; }

		std::size_t Handle::GetWavefrontWidth() const { return this->impl->warp_size; }

Conversation

JehandadKhan commented Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JehandadKhan commented Jan 18, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pfultz2 commented Jan 19, 2021

Uh oh!

atamazov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atamazov Jan 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Jan 20, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DeepCode's analysis on #70cc1f found:

Top issues

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Uh oh!

JehandadKhan left a comment

Choose a reason for hiding this comment

Uh oh!

JehandadKhan commented Jan 20, 2021

Uh oh!

JehandadKhan commented Jan 20, 2021

Uh oh!

atamazov commented Jan 20, 2021

Uh oh!

atamazov left a comment

Choose a reason for hiding this comment

Uh oh!

JehandadKhan commented Jan 27, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JehandadKhan commented Jan 14, 2021 •

edited

Loading

codecov bot commented Jan 15, 2021 •

edited

Loading

atamazov Jan 19, 2021 •

edited

Loading

ghost commented Jan 20, 2021 •

edited by ghost

Loading