MSM and Adreno support #204

Sonicadvance1 · 2023-04-10T03:32:53Z

An image to show it working.

Mostly based around the amdgpu implementation. Which uses a combination
of the drm and fdinfo interfaces for querying information.

Some things to note.

This requires at least kernel v6.0 for the fdinfo interface.
No current way to get some information
- Memory clocks
- GPU temperature
- GPU power consumption
- Fan (May or may not exist on platforms)
- RX/TX
- Current clock speed (Clock is visible as maximum)

robclark · 2023-04-10T20:44:51Z

src/extract_gpuinfo_msm.c

+static const char drm_msm_engine_gpu[] = "drm-engine-gpu";
+static const char drm_msm_cycles_gpu[] = "drm-cycles-gpu";
+static const char drm_msm_maxfreq_gpu[] = "drm-maxfreq-gpu";
+static const char drm_msm_resident_mem[] = "drm-resident-memory";


fyi, patchset adding drm-resident-memory has not landed yet

https://patchwork.freedesktop.org/series/116216/

Sounds good. I guess that there is little chance for the name to be changed?

robclark · 2023-04-10T20:49:42Z

fwiw, there is a msm_gpu_freq_change tracepoint, which might be a reasonable way to monitor gpu freq. It works in the same way as the i915 tracepoint (not sure if nvtop uses that?)..

As far as temp, I think there are usually one or more tsens's linked to gpu.. and I think they should show up in sysfs. Not sure what the portable way to fish that out would be.. maybe find the cooling-device associated with the gpu and then see what tsens's it uses?

robclark · 2023-04-10T21:14:41Z

src/extract_gpuinfo_msm.c

+  uint64_t gpuid;
+  if (gpuinfo_msm_query_param(gpu_info->fd, MSM_PARAM_CHIP_ID, &gpuid) == 0) {
+    // TODO: This might not be correct.
+    snprintf(static_info->device_name, sizeof(static_info->device_name), "Adreno %ld%ld%ld", gpuid >> 24, (gpuid >> 16) & 0xFF, (gpuid >> 8) & 0xFF);


hmm, if you mask off the upper 32b of the chip-id that will give you something that mostly works. The current exception is:

GPUId(chip_id=0x00be06030500, name="Adreno 8c Gen 3"), GPUId(chip_id=0x007506030500, name="Adreno 7c+ Gen 3"), GPUId(chip_id=0x006006030500, name="Adreno 7c+ Gen 3 Lite"),

I guess you try to avoid a dependency on gl/vk driver?

Yes. nvtop doesn't depend on anything beyond libdrm or udev. So it's not creating any GL contexts or anything.
On the amdgpu side it does a table lookup, which I could do the same on the msm side with the chipid. Matching the table lookup in Mesa.

Switched over to a lookup instead which includes the weirdo exceptions. On unknown gpu_id it will generate a name with the gpu_id in it.

Mostly based around the amdgpu implementation. Which uses a combination of the drm and fdinfo interfaces for querying information. Some things to note. - This requires at least kernel v6.0 for the fdinfo interface. - No current way to get some information - Memory clocks - GPU temperature - GPU power consumption - Fan (May or may not exist on platforms) - RX/TX - Current clock speed (Clock is visible as maximum)

Sonicadvance1 · 2023-04-11T08:06:25Z

fwiw, there is a msm_gpu_freq_change tracepoint, which might be a reasonable way to monitor gpu freq. It works in the same way as the i915 tracepoint (not sure if nvtop uses that?)..

As far as temp, I think there are usually one or more tsens's linked to gpu.. and I think they should show up in sysfs. Not sure what the portable way to fish that out would be.. maybe find the cooling-device associated with the gpu and then see what tsens's it uses?

If there was some portable way to query this but most cases I'm hardcoding the /sys/devices/platform/soc@0/3d00000.gpu/ path in things I'm doing locally.

~~For "current" frequency through fdinfo it might be reasonable to average all the drm-maxfreq-gpu frequencies? Not sure if actually better.~~
Nevermind drm-maxfreq-gpu won't work since that is just the max frequency of the GPU?

Sonicadvance1 · 2023-04-11T08:51:11Z

diff --git a/src/extract_gpuinfo_msm.c b/src/extract_gpuinfo_msm.c
index 244240e..36c0fb3 100644
--- a/src/extract_gpuinfo_msm.c
+++ b/src/extract_gpuinfo_msm.c
@@ -69,6 +69,8 @@ struct gpu_info_msm {
   struct gpu_info base;
   int fd;

+  int freq_file; // For the current frequency tracking.
+
   struct msm_process_info_cache *last_update_process_cache, *current_update_process_cache; // Cached processes info
 };

@@ -206,6 +208,10 @@ void gpuinfo_msm_shutdown(void) {
   for (unsigned i = 0; i < msm_gpu_count; ++i) {
     struct gpu_info_msm *current = &gpu_infos[i];
     _drmFreeVersion(current->drmVersion);
+
+    if (current->freq_file != -1) {
+      close(current->freq_file);
+    }
   }

   free(gpu_infos);
@@ -421,6 +427,9 @@ static bool gpuinfo_msm_get_device_handles(struct list_head *devices, unsigned *
     gpu_infos[msm_gpu_count].fd = fd;
     gpu_infos[msm_gpu_count].base.vendor = &gpu_vendor_msm;

+    // This path is currently hardcoded for my device.
+    gpu_infos[msm_gpu_count].freq_file = open("/sys/devices/platform/soc@0/3d00000.gpu/devfreq/3d00000.gpu/cur_freq", O_RDONLY);
+
     list_add_tail(&gpu_infos[msm_gpu_count].base.list, devices);
     // Register a fdinfo callback for this GPU
     processinfo_register_fdinfo_callback(parse_drm_fdinfo_msm, &gpu_infos[msm_gpu_count].base);
@@ -433,6 +442,24 @@ static bool gpuinfo_msm_get_device_handles(struct list_head *devices, unsigned *
   return true;
 }

+static int read_pattern_from_fd(int fd, const char *format, ...) {
+  if (fd == -1) {
+    return 0;
+  }
+
+  va_list args;
+  va_start(args, format);
+  char Tmp[16];
+  int Read = pread(fd, Tmp, sizeof(Tmp), 0);
+  if (Read == -1) {
+    return 0;
+  }
+  Tmp[Read] = '\0';
+  int matches = vsscanf(Tmp, format, args);
+  va_end(args);
+  return matches;
+}
+
 static int gpuinfo_msm_query_param(int gpu, uint32_t param, uint64_t *value) {
   struct drm_msm_param req = {
     .pipe = MSM_PIPE_3D0, // Only the 3D pipe.
@@ -479,10 +506,13 @@ void gpuinfo_msm_refresh_dynamic_info(struct gpu_info *_gpu_info) {
   dynamic_info->encode_decode_shared = true;

   // GPU clock
+  uint64_t current_gpu_freq;
+  if (read_pattern_from_fd(gpu_info->freq_file, "%lu", &current_gpu_freq) == 1) {
+    SET_GPUINFO_DYNAMIC(dynamic_info, gpu_clock_speed, current_gpu_freq / 1000000);
+  }
+
   uint64_t val;
   if (gpuinfo_msm_query_param(gpu_info->fd, MSM_PARAM_MAX_FREQ, &val) == 0) {
-    // TODO: No way to query current clock speed.
-    SET_GPUINFO_DYNAMIC(dynamic_info, gpu_clock_speed, val / 1000000);
     SET_GPUINFO_DYNAMIC(dynamic_info, gpu_clock_speed_max, val / 1000000);
   }

If there was some way to correlate to the hardcoded path correctly then this is likely precise enough for this application.

README.markdown

robclark · 2023-04-11T16:08:02Z

fwiw, there is a msm_gpu_freq_change tracepoint, which might be a reasonable way to monitor gpu freq. It works in the same way as the i915 tracepoint (not sure if nvtop uses that?)..
As far as temp, I think there are usually one or more tsens's linked to gpu.. and I think they should show up in sysfs. Not sure what the portable way to fish that out would be.. maybe find the cooling-device associated with the gpu and then see what tsens's it uses?

If there was some portable way to query this but most cases I'm hardcoding the /sys/devices/platform/soc@0/3d00000.gpu/ path in things I'm doing locally.

~~For "current" frequency through fdinfo it might be reasonable to average all the drm-maxfreq-gpu frequencies? Not sure if actually better.~~ Nevermind drm-maxfreq-gpu won't work since that is just the max frequency of the GPU?

yeah, just the max-freq

But it's possible that we could add something in fdinfo..

(Longer term, ideally we could get everything from fdinfo so that tools like nvtop don't need much driver-specific support)

robclark · 2023-04-11T16:15:19Z

If there was some way to correlate to the hardcoded path correctly then this is likely precise enough for this application.

there is /proc/device-tree.. but I guess if we can get more from fdinfo or some other standardized way then that might be better for making things work the same way across drivers..

Syllo · 2023-04-12T11:12:56Z

Hello,
Thanks for this patch, I'll take a look at it shortly.

I generally am not against using a dependency if it simplifies the code and is pretty much guaranteed to be present on the different distributions. Even more so if it can be used by multiple drivers. Although we'd need some wrapper around the lib since nvtop relies on the dynamic loader to support multiple GPUs without requiring all the GPU libs to be present on all the systems.

@robclark you mentioned trace-points, but if I remember correctly they usually require some additional permission from user space and I want to avoid depending on anything that requires elevated privileges.

robclark · 2023-04-12T16:07:18Z

I generally am not against using a dependency if it simplifies the code and is pretty much guaranteed to be present on the different distributions. Even more so if it can be used by multiple drivers. Although we'd need some wrapper around the lib since nvtop relies on the dynamic loader to support multiple GPUs without requiring all the GPU libs to be present on all the systems.

The dependency would be libGL, if you wanted to use GL_RENDERER to get the GPU name in a generic way. (Maybe you'd also need gbm to ensure you are getting the correct GPU in the case of multiple GPUs?)

@robclark you mentioned trace-points, but if I remember correctly they usually require some additional permission from user space and I want to avoid depending on anything that requires elevated privileges.

Yeah, I think trace-points would need root.. although fdinfo also needs permissions to access information about other user's processes.

We could probably hold off on freq/temp for now.. there is currently some discussion on dri-devel about how we could expose some of this in a more portable way. (Fwiw, the fdinfo stuff for utilization and memory usage should also work on i915/amdgpu.. my hope is that we can get to the point where tools like nvtop don't need vendor specific stuff, at least for the upstream drm drivers.)

Syllo · 2023-04-12T09:12:52Z

src/extract_gpuinfo_msm.c

+static const char drm_msm_engine_gpu[] = "drm-engine-gpu";
+static const char drm_msm_cycles_gpu[] = "drm-cycles-gpu";
+static const char drm_msm_maxfreq_gpu[] = "drm-maxfreq-gpu";
+static const char drm_msm_resident_mem[] = "drm-resident-memory";


Sounds good. I guess that there is little chance for the name to be changed?

Syllo · 2023-04-16T11:38:48Z

src/extract_gpuinfo_msm.c

+};
+
+struct __attribute__((__packed__)) unique_cache_id {
+  unsigned client_id;


Can there be multiple Adreno GPUs on one system? If yes, is the client-id global?
Because the doc either propose the client-id to be unique or having the client-id+pdev tuple be unique

So far only a single one, and I wouldn't expect that to change unless qcom decides to start making dgpu's.. if there is no pdev field then the client-id is globally unique.

(In practice, my patchset that adds GPU mem info also makes client-id globally unique in all cases, because that is a simpler implementation on the kernel side.)

Syllo · 2023-04-16T11:46:48Z

src/extract_gpuinfo_msm.c

+
+  struct sysinfo info;
+  if (sysinfo(&info) == 0) {
+    SET_GPUINFO_DYNAMIC(dynamic_info, total_memory, info.totalram);


Has the GPU access to the entirety of the RAM or is this an approximation to get the info to show up?

yes, GPU can access all RAM.. or rather any RAM can be mapped to the GPU. You might want to use MSM_PARAM_VA_SIZE to get the size of the GPU's address space, which might be less than "all RAM" (ie. on some devices it is 4GB.. but even in this case each process using the GPU has it's own 4GB address space so a combination of multiple processes could map all RAM).

Syllo · 2023-04-16T11:50:38Z

src/info_messages_linux.c

  bool hasAMD = false;
  struct gpu_info *gpuinfo;
  list_for_each_entry(gpuinfo, devices, list) {
    if (strcmp(gpuinfo->vendor->name, "Intel") == 0) {
      hasIntel = true;
    }
+    if (strcmp(gpuinfo->vendor->name, "msm") == 0) {


If I understood correctly, fdinfo support for this driver was added in Linux version 6. Hence, it may be worth warning about this limitation.

correct, v6.0 for GPU utilization (and v6.5 hopefully for memory usage)

Syllo · 2023-04-16T12:03:26Z

@robclark thanks for the precision.
AMDGPU and Intel both have the fdinfo and hwmon support.
AMD exposes temp, fan and power device info through hwmon. Intel only provides power for their dedicated graphics cards.

robclark · 2023-04-16T15:23:24Z

Sounds good. I guess that there is little chance for the name to be changed?

There is still some debate, so I'd hold off merging at least the GPU memory info until that is resolved.

robclark · 2023-04-16T15:35:22Z

@robclark thanks for the precision.
AMDGPU and Intel both have the fdinfo and hwmon support.
AMD exposes temp, fan and power device info through hwmon. Intel only provides power for their dedicated graphics cards.

There are also hwmon's that expose temp for adreno.. the open question is how for userspace to find them in a portable way, since they aren't directly linked (currently) to the drm device. I guess we need to invent some way to link them so that userspace can figure out which hwmon's look at.. there are a lot, associated with various different hw blocks on the SoC.

robclark reviewed Apr 10, 2023

View reviewed changes

Sonicadvance1 force-pushed the msm_support branch from 4f1be9c to 2f149e0 Compare April 11, 2023 08:00

robclark reviewed Apr 11, 2023

View reviewed changes

README.markdown Outdated Show resolved Hide resolved

Sonicadvance1 force-pushed the msm_support branch from 2f149e0 to acff34e Compare April 11, 2023 16:56

README MSM information.

6814ba5

Sonicadvance1 force-pushed the msm_support branch from acff34e to 6814ba5 Compare April 11, 2023 16:57

Syllo reviewed Apr 16, 2023

View reviewed changes

Syllo merged commit 9b3c12c into Syllo:master May 21, 2023

Sonicadvance1 deleted the msm_support branch May 21, 2023 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSM and Adreno support #204

MSM and Adreno support #204

Sonicadvance1 commented Apr 10, 2023 •

edited

Loading

robclark Apr 10, 2023

Syllo Apr 12, 2023

robclark commented Apr 10, 2023

robclark Apr 10, 2023

Sonicadvance1 Apr 11, 2023

Sonicadvance1 Apr 11, 2023

Sonicadvance1 commented Apr 11, 2023 •

edited

Loading

Sonicadvance1 commented Apr 11, 2023

robclark commented Apr 11, 2023

robclark commented Apr 11, 2023

Syllo commented Apr 12, 2023

robclark commented Apr 12, 2023

Syllo Apr 12, 2023

Syllo Apr 16, 2023

robclark Apr 16, 2023

Syllo Apr 16, 2023

robclark Apr 16, 2023

Syllo Apr 16, 2023

robclark Apr 16, 2023

Syllo commented Apr 16, 2023

robclark commented Apr 16, 2023

robclark commented Apr 16, 2023

MSM and Adreno support #204

MSM and Adreno support #204

Conversation

Sonicadvance1 commented Apr 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robclark commented Apr 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sonicadvance1 commented Apr 11, 2023 • edited Loading

Sonicadvance1 commented Apr 11, 2023

robclark commented Apr 11, 2023

robclark commented Apr 11, 2023

Syllo commented Apr 12, 2023

robclark commented Apr 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Syllo commented Apr 16, 2023

robclark commented Apr 16, 2023

robclark commented Apr 16, 2023

Sonicadvance1 commented Apr 10, 2023 •

edited

Loading

Sonicadvance1 commented Apr 11, 2023 •

edited

Loading