Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: are these dmesg messages expected? #343

Closed
preda opened this issue Feb 21, 2018 · 3 comments
Closed

Q: are these dmesg messages expected? #343

preda opened this issue Feb 21, 2018 · 3 comments

Comments

@preda
Copy link

preda commented Feb 21, 2018

Ubuntu 17.10, ROCm 1.7, Vega64

When I run my OpenCL app, I see plenty of such entries appearing in dmesg. Are these normal/expected and nothing to worry about, or do they signal some problem?

[ 2258.451590] Evicting PASID 1 queues
[ 2258.459225] Restoring PASID 1 queues
[ 2261.663077] Evicting PASID 1 queues
[ 2261.671167] Restoring PASID 1 queues
[ 2270.003068] Evicting PASID 1 queues
[ 2270.011013] Restoring PASID 1 queues
[ 2273.263280] Evicting PASID 1 queues
[ 2273.271263] Restoring PASID 1 queues
[ 2276.529269] Evicting PASID 1 queues
[ 2276.535340] kfd2kgd: update_invalid_user_pages: Failed to get user pages: -14
[ 2276.535357] kfd2kgd: update_invalid_user_pages: Failed to get user pages: -14
[ 2276.535433] Restoring PASID 1 queues
[ 2279.792647] Evicting PASID 1 queues
[ 2279.799514] kfd2kgd: update_invalid_user_pages: Failed to get user pages: -14
[ 2279.799570] Restoring PASID 1 queues
[ 2283.091064] Evicting PASID 1 queues
[ 2283.099425] Restoring PASID 1 queues
@fxkamd
Copy link
Collaborator

fxkamd commented Feb 21, 2018

These messages are normal. The evicting/restoring messages are a bit verbose, and we could probably turn them into debug messages that aren't printed in the log by default.

The "Failed to get user pages" happens if userptr memory is freed while it's still mapped for GPU access. This can result from an optimization in the OpenCL runtime that tries to keep user pages mapped to avoid repeatedly mapping and unmapping them unnecessarily. These messages aren't a problem as long as the GPU doesn't try to access this invalid memory mapping. Again, this could probably be turned into debug messages.

@preda
Copy link
Author

preda commented Feb 21, 2018

Thanks! I was sort of guessing they are benign, because everything seemed to function correctly when they were present. Closing then.

@preda preda closed this as completed Feb 21, 2018
@Sfinx
Copy link

Sfinx commented Aug 7, 2019

Yep, the AMD driver is too chatty, Ubuntu kernel 5.2.7-050207-lowlatency + rocm-opencl 1.2.0-2019070446. Every time I'm running OpenCL examples I'm getting this in kernel logs:

[ 3507.287389] Over-subscription is not allowed for SDMA.
[ 3511.201126] Over-subscription is not allowed for SDMA.
[ 3558.449204] Over-subscription is not allowed for SDMA.
[ 3565.242940] Evicting PASID 32784 queues
[ 3565.244793] Restoring PASID 32784 queues
[ 3565.291636] Over-subscription is not allowed for SDMA.

They should present in debug builds

fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 3, 2020
During normal usage, especially if jobs are started and stopped in rapid
succession, the kernel log is filled with messages like this:

[38732.522910] Restoring PASID 0x8003 queues
[38732.666767] Evicting PASID 0x8003 queues
[38732.714074] Restoring PASID 0x8003 queues
[38732.815633] Evicting PASID 0x8003 queues
[38732.834961] Restoring PASID 0x8003 queues
[38732.840536] Evicting PASID 0x8003 queues
[38732.869846] Restoring PASID 0x8003 queues
[38732.893655] Evicting PASID 0x8003 queues
[38732.927975] Restoring PASID 0x8003 queues

According to [1], these messages are expected, but they carry little
value for the end user, so turn them into debug messages.

[1] ROCm/ROCm#343

Signed-off-by: Julian Sax <jsbc@gmx.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants