Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on benchmarking gordon workload #3

Open
jczaja opened this issue Apr 14, 2023 · 7 comments
Open

Question on benchmarking gordon workload #3

jczaja opened this issue Apr 14, 2023 · 7 comments

Comments

@jczaja
Copy link
Contributor

jczaja commented Apr 14, 2023

Hi,

My name is Jacek Czaja and I'm on of Intel engineers to help with having JAX projects running efficiently on Intel devices.
I was given this repo link (among others) to have its content enabled on JAX with Intel GPU and optimized for performance.
I was able to run unit tests on Intel GPU, but I'm struggling to do benchmarking of this gordon functionality.

What do I need?
I would like to do bench-marking of functionality of this repository on Intel HW. In order to do so I need representative(something that is close to real use case that should be optimized) example of usage of gordon functionality that runs at least a minute , so I can take a look at bottlenecks and try to optimize them. Please point me to such an example of gordon functionality.

@beckermr
Copy link
Collaborator

beckermr commented May 2, 2023

I am so sorry I missed this!

To make this into something useful, set the env var GORDON_NM to something big before running. The default is 500 so you'll want something a lot bigger. Once you do that, run the test suite and it should take a while.

@jczaja
Copy link
Contributor Author

jczaja commented Jun 1, 2023

@beckermr Hi, thanks for helpful hint. I 'm just resuming this task. I was told that functionalities tested in those Unit tests are to be part of bigger workload. So do you know which element out of those three (test_gordon.py) is taking more execution time than other in this bigger workload? That info would help me to prioritize optimization efforts.

Anyway, As soon as we have something improved we will keep you posted. Thanks

@beckermr
Copy link
Collaborator

beckermr commented Jun 1, 2023

I do not have an estimate for this.

@jczaja
Copy link
Contributor Author

jczaja commented Jun 13, 2023

I have started to do some profiling of those unit tests, but majority of time is spent on single-threaded CPU rather than on XPU(Ponteveccio). So I'm looking if python code could be easily run on XPU(via JAX) rather than using regular numpy(CPU). This PR : #4 will shorten initialization time of test so I can easily see performance of other areas of code. Please review

@beckermr
Copy link
Collaborator

Done and thank you!

@jczaja
Copy link
Contributor Author

jczaja commented Jun 14, 2023

@beckermr What is typical GORDON_NM value used in target workload? I'm asking as I should be looking at performance optimization of functionality as close as to your target model as possible.

@beckermr
Copy link
Collaborator

As big as we can without overflowing the device memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants