Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSM and Adreno support #204

Merged
merged 2 commits into from
May 21, 2023
Merged

MSM and Adreno support #204

merged 2 commits into from
May 21, 2023

Conversation

Sonicadvance1
Copy link
Contributor

@Sonicadvance1 Sonicadvance1 commented Apr 10, 2023

An image to show it working.
Image_2023-04-09_20-37-40

Mostly based around the amdgpu implementation. Which uses a combination
of the drm and fdinfo interfaces for querying information.

Some things to note.

  • This requires at least kernel v6.0 for the fdinfo interface.
  • No current way to get some information
    • Memory clocks
    • GPU temperature
    • GPU power consumption
    • Fan (May or may not exist on platforms)
    • RX/TX
    • Current clock speed (Clock is visible as maximum)

static const char drm_msm_engine_gpu[] = "drm-engine-gpu";
static const char drm_msm_cycles_gpu[] = "drm-cycles-gpu";
static const char drm_msm_maxfreq_gpu[] = "drm-maxfreq-gpu";
static const char drm_msm_resident_mem[] = "drm-resident-memory";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, patchset adding drm-resident-memory has not landed yet

https://patchwork.freedesktop.org/series/116216/

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I guess that there is little chance for the name to be changed?

@robclark
Copy link

fwiw, there is a msm_gpu_freq_change tracepoint, which might be a reasonable way to monitor gpu freq. It works in the same way as the i915 tracepoint (not sure if nvtop uses that?)..

As far as temp, I think there are usually one or more tsens's linked to gpu.. and I think they should show up in sysfs. Not sure what the portable way to fish that out would be.. maybe find the cooling-device associated with the gpu and then see what tsens's it uses?

uint64_t gpuid;
if (gpuinfo_msm_query_param(gpu_info->fd, MSM_PARAM_CHIP_ID, &gpuid) == 0) {
// TODO: This might not be correct.
snprintf(static_info->device_name, sizeof(static_info->device_name), "Adreno %ld%ld%ld", gpuid >> 24, (gpuid >> 16) & 0xFF, (gpuid >> 8) & 0xFF);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, if you mask off the upper 32b of the chip-id that will give you something that mostly works. The current exception is:

        GPUId(chip_id=0x00be06030500, name="Adreno 8c Gen 3"),
        GPUId(chip_id=0x007506030500, name="Adreno 7c+ Gen 3"),
        GPUId(chip_id=0x006006030500, name="Adreno 7c+ Gen 3 Lite"),

I guess you try to avoid a dependency on gl/vk driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. nvtop doesn't depend on anything beyond libdrm or udev. So it's not creating any GL contexts or anything.
On the amdgpu side it does a table lookup, which I could do the same on the msm side with the chipid. Matching the table lookup in Mesa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched over to a lookup instead which includes the weirdo exceptions. On unknown gpu_id it will generate a name with the gpu_id in it.

Mostly based around the amdgpu implementation. Which uses a combination
of the drm and fdinfo interfaces for querying information.

Some things to note.
- This requires at least kernel v6.0 for the fdinfo interface.
- No current way to get some information
  - Memory clocks
  - GPU temperature
  - GPU power consumption
  - Fan (May or may not exist on platforms)
  - RX/TX
  - Current clock speed (Clock is visible as maximum)
@Sonicadvance1
Copy link
Contributor Author

Sonicadvance1 commented Apr 11, 2023

fwiw, there is a msm_gpu_freq_change tracepoint, which might be a reasonable way to monitor gpu freq. It works in the same way as the i915 tracepoint (not sure if nvtop uses that?)..

As far as temp, I think there are usually one or more tsens's linked to gpu.. and I think they should show up in sysfs. Not sure what the portable way to fish that out would be.. maybe find the cooling-device associated with the gpu and then see what tsens's it uses?

If there was some portable way to query this but most cases I'm hardcoding the /sys/devices/platform/soc@0/3d00000.gpu/ path in things I'm doing locally.

For "current" frequency through fdinfo it might be reasonable to average all the drm-maxfreq-gpu frequencies? Not sure if actually better.
Nevermind drm-maxfreq-gpu won't work since that is just the max frequency of the GPU?

@Sonicadvance1
Copy link
Contributor Author

diff --git a/src/extract_gpuinfo_msm.c b/src/extract_gpuinfo_msm.c
index 244240e..36c0fb3 100644
--- a/src/extract_gpuinfo_msm.c
+++ b/src/extract_gpuinfo_msm.c
@@ -69,6 +69,8 @@ struct gpu_info_msm {
   struct gpu_info base;
   int fd;

+  int freq_file; // For the current frequency tracking.
+
   struct msm_process_info_cache *last_update_process_cache, *current_update_process_cache; // Cached processes info
 };

@@ -206,6 +208,10 @@ void gpuinfo_msm_shutdown(void) {
   for (unsigned i = 0; i < msm_gpu_count; ++i) {
     struct gpu_info_msm *current = &gpu_infos[i];
     _drmFreeVersion(current->drmVersion);
+
+    if (current->freq_file != -1) {
+      close(current->freq_file);
+    }
   }

   free(gpu_infos);
@@ -421,6 +427,9 @@ static bool gpuinfo_msm_get_device_handles(struct list_head *devices, unsigned *
     gpu_infos[msm_gpu_count].fd = fd;
     gpu_infos[msm_gpu_count].base.vendor = &gpu_vendor_msm;

+    // This path is currently hardcoded for my device.
+    gpu_infos[msm_gpu_count].freq_file = open("/sys/devices/platform/soc@0/3d00000.gpu/devfreq/3d00000.gpu/cur_freq", O_RDONLY);
+
     list_add_tail(&gpu_infos[msm_gpu_count].base.list, devices);
     // Register a fdinfo callback for this GPU
     processinfo_register_fdinfo_callback(parse_drm_fdinfo_msm, &gpu_infos[msm_gpu_count].base);
@@ -433,6 +442,24 @@ static bool gpuinfo_msm_get_device_handles(struct list_head *devices, unsigned *
   return true;
 }

+static int read_pattern_from_fd(int fd, const char *format, ...) {
+  if (fd == -1) {
+    return 0;
+  }
+
+  va_list args;
+  va_start(args, format);
+  char Tmp[16];
+  int Read = pread(fd, Tmp, sizeof(Tmp), 0);
+  if (Read == -1) {
+    return 0;
+  }
+  Tmp[Read] = '\0';
+  int matches = vsscanf(Tmp, format, args);
+  va_end(args);
+  return matches;
+}
+
 static int gpuinfo_msm_query_param(int gpu, uint32_t param, uint64_t *value) {
   struct drm_msm_param req = {
     .pipe = MSM_PIPE_3D0, // Only the 3D pipe.
@@ -479,10 +506,13 @@ void gpuinfo_msm_refresh_dynamic_info(struct gpu_info *_gpu_info) {
   dynamic_info->encode_decode_shared = true;

   // GPU clock
+  uint64_t current_gpu_freq;
+  if (read_pattern_from_fd(gpu_info->freq_file, "%lu", &current_gpu_freq) == 1) {
+    SET_GPUINFO_DYNAMIC(dynamic_info, gpu_clock_speed, current_gpu_freq / 1000000);
+  }
+
   uint64_t val;
   if (gpuinfo_msm_query_param(gpu_info->fd, MSM_PARAM_MAX_FREQ, &val) == 0) {
-    // TODO: No way to query current clock speed.
-    SET_GPUINFO_DYNAMIC(dynamic_info, gpu_clock_speed, val / 1000000);
     SET_GPUINFO_DYNAMIC(dynamic_info, gpu_clock_speed_max, val / 1000000);
   }

If there was some way to correlate to the hardcoded path correctly then this is likely precise enough for this application.

README.markdown Outdated Show resolved Hide resolved
@robclark
Copy link

fwiw, there is a msm_gpu_freq_change tracepoint, which might be a reasonable way to monitor gpu freq. It works in the same way as the i915 tracepoint (not sure if nvtop uses that?)..
As far as temp, I think there are usually one or more tsens's linked to gpu.. and I think they should show up in sysfs. Not sure what the portable way to fish that out would be.. maybe find the cooling-device associated with the gpu and then see what tsens's it uses?

If there was some portable way to query this but most cases I'm hardcoding the /sys/devices/platform/soc@0/3d00000.gpu/ path in things I'm doing locally.

For "current" frequency through fdinfo it might be reasonable to average all the drm-maxfreq-gpu frequencies? Not sure if actually better. Nevermind drm-maxfreq-gpu won't work since that is just the max frequency of the GPU?

yeah, just the max-freq

But it's possible that we could add something in fdinfo..

(Longer term, ideally we could get everything from fdinfo so that tools like nvtop don't need much driver-specific support)

@robclark
Copy link

If there was some way to correlate to the hardcoded path correctly then this is likely precise enough for this application.

there is /proc/device-tree.. but I guess if we can get more from fdinfo or some other standardized way then that might be better for making things work the same way across drivers..

@Syllo
Copy link
Owner

Syllo commented Apr 12, 2023

Hello,
Thanks for this patch, I'll take a look at it shortly.

I generally am not against using a dependency if it simplifies the code and is pretty much guaranteed to be present on the different distributions. Even more so if it can be used by multiple drivers. Although we'd need some wrapper around the lib since nvtop relies on the dynamic loader to support multiple GPUs without requiring all the GPU libs to be present on all the systems.

@robclark you mentioned trace-points, but if I remember correctly they usually require some additional permission from user space and I want to avoid depending on anything that requires elevated privileges.

@robclark
Copy link

I generally am not against using a dependency if it simplifies the code and is pretty much guaranteed to be present on the different distributions. Even more so if it can be used by multiple drivers. Although we'd need some wrapper around the lib since nvtop relies on the dynamic loader to support multiple GPUs without requiring all the GPU libs to be present on all the systems.

The dependency would be libGL, if you wanted to use GL_RENDERER to get the GPU name in a generic way. (Maybe you'd also need gbm to ensure you are getting the correct GPU in the case of multiple GPUs?)

@robclark you mentioned trace-points, but if I remember correctly they usually require some additional permission from user space and I want to avoid depending on anything that requires elevated privileges.

Yeah, I think trace-points would need root.. although fdinfo also needs permissions to access information about other user's processes.

We could probably hold off on freq/temp for now.. there is currently some discussion on dri-devel about how we could expose some of this in a more portable way. (Fwiw, the fdinfo stuff for utilization and memory usage should also work on i915/amdgpu.. my hope is that we can get to the point where tools like nvtop don't need vendor specific stuff, at least for the upstream drm drivers.)

static const char drm_msm_engine_gpu[] = "drm-engine-gpu";
static const char drm_msm_cycles_gpu[] = "drm-cycles-gpu";
static const char drm_msm_maxfreq_gpu[] = "drm-maxfreq-gpu";
static const char drm_msm_resident_mem[] = "drm-resident-memory";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I guess that there is little chance for the name to be changed?

};

struct __attribute__((__packed__)) unique_cache_id {
unsigned client_id;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be multiple Adreno GPUs on one system? If yes, is the client-id global?
Because the doc either propose the client-id to be unique or having the client-id+pdev tuple be unique

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far only a single one, and I wouldn't expect that to change unless qcom decides to start making dgpu's.. if there is no pdev field then the client-id is globally unique.

(In practice, my patchset that adds GPU mem info also makes client-id globally unique in all cases, because that is a simpler implementation on the kernel side.)


struct sysinfo info;
if (sysinfo(&info) == 0) {
SET_GPUINFO_DYNAMIC(dynamic_info, total_memory, info.totalram);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the GPU access to the entirety of the RAM or is this an approximation to get the info to show up?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, GPU can access all RAM.. or rather any RAM can be mapped to the GPU. You might want to use MSM_PARAM_VA_SIZE to get the size of the GPU's address space, which might be less than "all RAM" (ie. on some devices it is 4GB.. but even in this case each process using the GPU has it's own 4GB address space so a combination of multiple processes could map all RAM).

bool hasAMD = false;
struct gpu_info *gpuinfo;
list_for_each_entry(gpuinfo, devices, list) {
if (strcmp(gpuinfo->vendor->name, "Intel") == 0) {
hasIntel = true;
}
if (strcmp(gpuinfo->vendor->name, "msm") == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, fdinfo support for this driver was added in Linux version 6. Hence, it may be worth warning about this limitation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, v6.0 for GPU utilization (and v6.5 hopefully for memory usage)

@Syllo
Copy link
Owner

Syllo commented Apr 16, 2023

@robclark thanks for the precision.
AMDGPU and Intel both have the fdinfo and hwmon support.
AMD exposes temp, fan and power device info through hwmon. Intel only provides power for their dedicated graphics cards.

@robclark
Copy link

Sounds good. I guess that there is little chance for the name to be changed?

There is still some debate, so I'd hold off merging at least the GPU memory info until that is resolved.

@robclark
Copy link

@robclark thanks for the precision.
AMDGPU and Intel both have the fdinfo and hwmon support.
AMD exposes temp, fan and power device info through hwmon. Intel only provides power for their dedicated graphics cards.

There are also hwmon's that expose temp for adreno.. the open question is how for userspace to find them in a portable way, since they aren't directly linked (currently) to the drm device. I guess we need to invent some way to link them so that userspace can figure out which hwmon's look at.. there are a lot, associated with various different hw blocks on the SoC.

@Syllo Syllo merged commit 9b3c12c into Syllo:master May 21, 2023
@Sonicadvance1 Sonicadvance1 deleted the msm_support branch May 21, 2023 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants