-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on benchmarking gordon workload #3
Comments
I am so sorry I missed this! To make this into something useful, set the env var |
@beckermr Hi, thanks for helpful hint. I 'm just resuming this task. I was told that functionalities tested in those Unit tests are to be part of bigger workload. So do you know which element out of those three (test_gordon.py) is taking more execution time than other in this bigger workload? That info would help me to prioritize optimization efforts. Anyway, As soon as we have something improved we will keep you posted. Thanks |
I do not have an estimate for this. |
I have started to do some profiling of those unit tests, but majority of time is spent on single-threaded CPU rather than on XPU(Ponteveccio). So I'm looking if python code could be easily run on XPU(via JAX) rather than using regular numpy(CPU). This PR : #4 will shorten initialization time of test so I can easily see performance of other areas of code. Please review |
Done and thank you! |
@beckermr What is typical GORDON_NM value used in target workload? I'm asking as I should be looking at performance optimization of functionality as close as to your target model as possible. |
As big as we can without overflowing the device memory. |
Hi,
My name is Jacek Czaja and I'm on of Intel engineers to help with having JAX projects running efficiently on Intel devices.
I was given this repo link (among others) to have its content enabled on JAX with Intel GPU and optimized for performance.
I was able to run unit tests on Intel GPU, but I'm struggling to do benchmarking of this gordon functionality.
What do I need?
I would like to do bench-marking of functionality of this repository on Intel HW. In order to do so I need representative(something that is close to real use case that should be optimized) example of usage of gordon functionality that runs at least a minute , so I can take a look at bottlenecks and try to optimize them. Please point me to such an example of gordon functionality.
The text was updated successfully, but these errors were encountered: