Question on benchmarking gordon workload #3

jczaja · 2023-04-14T13:59:58Z

Hi,

My name is Jacek Czaja and I'm on of Intel engineers to help with having JAX projects running efficiently on Intel devices.
I was given this repo link (among others) to have its content enabled on JAX with Intel GPU and optimized for performance.
I was able to run unit tests on Intel GPU, but I'm struggling to do benchmarking of this gordon functionality.

What do I need?
I would like to do bench-marking of functionality of this repository on Intel HW. In order to do so I need representative(something that is close to real use case that should be optimized) example of usage of gordon functionality that runs at least a minute , so I can take a look at bottlenecks and try to optimize them. Please point me to such an example of gordon functionality.

beckermr · 2023-05-02T11:47:36Z

I am so sorry I missed this!

To make this into something useful, set the env var GORDON_NM to something big before running. The default is 500 so you'll want something a lot bigger. Once you do that, run the test suite and it should take a while.

jczaja · 2023-06-01T08:26:40Z

@beckermr Hi, thanks for helpful hint. I 'm just resuming this task. I was told that functionalities tested in those Unit tests are to be part of bigger workload. So do you know which element out of those three (test_gordon.py) is taking more execution time than other in this bigger workload? That info would help me to prioritize optimization efforts.

Anyway, As soon as we have something improved we will keep you posted. Thanks

beckermr · 2023-06-01T14:02:15Z

I do not have an estimate for this.

jczaja · 2023-06-13T15:07:17Z

I have started to do some profiling of those unit tests, but majority of time is spent on single-threaded CPU rather than on XPU(Ponteveccio). So I'm looking if python code could be easily run on XPU(via JAX) rather than using regular numpy(CPU). This PR : #4 will shorten initialization time of test so I can easily see performance of other areas of code. Please review

beckermr · 2023-06-13T17:44:58Z

Done and thank you!

jczaja · 2023-06-14T15:25:49Z

@beckermr What is typical GORDON_NM value used in target workload? I'm asking as I should be looking at performance optimization of functionality as close as to your target model as possible.

beckermr · 2023-06-14T15:35:27Z

As big as we can without overflowing the device memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on benchmarking gordon workload #3

Question on benchmarking gordon workload #3

jczaja commented Apr 14, 2023

beckermr commented May 2, 2023

jczaja commented Jun 1, 2023

beckermr commented Jun 1, 2023

jczaja commented Jun 13, 2023

beckermr commented Jun 13, 2023

jczaja commented Jun 14, 2023

beckermr commented Jun 14, 2023

Question on benchmarking gordon workload #3

Question on benchmarking gordon workload #3

Comments

jczaja commented Apr 14, 2023

beckermr commented May 2, 2023

jczaja commented Jun 1, 2023

beckermr commented Jun 1, 2023

jczaja commented Jun 13, 2023

beckermr commented Jun 13, 2023

jczaja commented Jun 14, 2023

beckermr commented Jun 14, 2023