-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make output fields self-identifying? #41
Comments
That's a great idea to make backwards compatibility easier. It could also add sonar version to the line so that we can make "database" migrations. |
I am working on this ... |
What is the difference between |
At the moment, gpus= is a binary bitmask of the cards used by the process for this sample. For example, the value for a process that uses cards 0, 1 and 3 would be 1011. This is not the only possible representation and not even necessarily the best one, see comments in the code, but it was pretty convenient to emit and to parse (for up to 64 cards per node...). For zombie processes that are known to use the GPUs, but where it can't be determined which GPUs, it's set to !0, but frankly I don't like this representation much, even if it works in practice. Happy to change/discuss, this is a good time to do so. Some of this discussion may touch on matters of what it is sonar is supposed to detect and report. The difference between gpumem% and gpukib is that on some cards some of the time it is possible to determine one of these but not the other, and vice versa. For example, on the NVIDIA cards we have we can read both quantities for running processes but only gpukib for some zombies. Since we can detect the total amount of memory here we could translate gpukib into gpumem%, though. On the other hand, on our AMD cards there is no support for detecting the absolute amount of memory used, nor the total amount of memory on the cards, only the percentage of gpu memory used. Rather than encoding the logic for dealing with this mess in sonar, it seemed better - certainly for the time being - to report what we can report and let the analyzer sort it out. |
Which also reminds me, there's probably no need to report subsecond precision for the sonar record timestamp, it just takes up space in the log. |
Thanks for explanations! |
This is now fixed in #65 |
We could do it also like this:
|
Pedantically, for CSV the correct format is
That said, I'm fine with this format for the data. |
Thanks for clarification! I am also fine with bitmask but the "csv inside csv" seems more general and less explaining needed but maybe also uglier? But we don't need to parse it by eye, only with scripts. |
I think the generality of list-of-numbers-or-special-token wins here over the bitmask, which is going to bite us at some point. And it's a good time to make the change, when we're introducing field names. |
make output fields self-identifying; closes #41
As we experiment with sonar and add more fields and move it to new systems there will come a time when some log data are older and some are newer and data of different ages will have different columns. For a while it will be good enough to always add new columns at the end of the line, but in the long term this will likely become brittle. It may be good to add an identifier to each field to reduce the impact of this problem. The identifier can be a short name and can prefix the field, for example,
would do it for now. Names (columns) could be removed if they turn out to be worthless but names would never be reused with a different meaning.
The text was updated successfully, but these errors were encountered: