Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#1948 add libcall args sym printing: syscall args support #1975

Merged
merged 4 commits into from
Apr 14, 2017

Conversation

mxmssh
Copy link
Contributor

@mxmssh mxmssh commented Apr 7, 2017

Added arguments printing of already known syscalls from drsyscall's
tables for Windows and Linux.

Added arguments printing of already known syscalls from drsyscall's
tables for Windows and Linux.
@mxmssh
Copy link
Contributor Author

mxmssh commented Apr 7, 2017

I am planning to use the same syscall_info_t structure to describe other libcalls.

@mxmssh mxmssh changed the title i#1959 add libcall args sym printing: syscall args support i#1948 add libcall args sym printing: syscall args support Apr 7, 2017
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use the drsyscall library through public interfaces, rather than breaking abstractions and using its private, internal tables. It sounds like adding one simple interface would eliminate the need for all of the uses of private members here, at least at first glance.

(DROPTION_SCOPE_CLIENT, "all_args", 2, "Number of library call arguments to print",
"Number of library call arguments to print. Specify 0 to disable. Drltrace will print "
"symbolic representation of arguments in case of known library call including all "
"known arguments.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is confusing: so if the actual arguments are known, they will all be printed and this option ignored, right? So then this should option should be named "num_unknown_args" or something, right? And the short description should have "unknown" in it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the idea was simple: if user wants to print arguments, he would use -all_args and in this case drltrace will provide as much as possible information. If user doesn't want to print he can specify 0 and disable arguments printing at all (for example to increase performance).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest splitting into two options: one that controls the number of unknown args printed ("num_unknown _args" or sthg), and another that is either a max on args of any type ("max_args" or sthg) or maybe just a boolean controlling whether any arg printing is done (the max seems more useful).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that I understand. For example user specified -num_unknown_args 3 -max_args 7.
Should we print 3 arguments for unknown library calls and 7 arguments for known libcalls or 7 arguments for both types?
What should we have by default for -max_args?
Boolean controlling looks more reasonable for me. For example user can specify -num_unknown_args 5 -print_known_args. So we know that we should print everything for known system calls and maximum 5 arguments for unknown. By default we may set -num_unknown_args to be 0 and -print_known_args to be true ? (or may be false to demonstrate maximum performance by default).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"-num_unknown_args 3 -max_args 7" means never print more than 7, and furthermore don't print more than 3 when the types and # of args are not known.

Definitely there is a use case where no args should be printed. The question is whether some users would want to limit the # of args to some small number for cleaner output: see some but not all. On Linux there are never more than 6, but on Windows where there can be over 20 it may be a different story.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got it, will implement it this way

#elif MACOS
for (i = 0; i < count_syscall_info_bsd; i++)
hashtable_add_new_name_entry(syscall_info_bsd[i], &syscall_info_bsd[i]);
#else /* LINUX */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safer to have #elif defined(LINUX) and #else #error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

hashtable_add_new_name_entry(syscall_gdi32_info[i], &syscall_gdi32_info[i]);
for (i = 0; i < num_ntdll_syscalls(); i++)
hashtable_add_new_name_entry(syscall_ntdll_info[i], &syscall_ntdll_info[i]);
#elif MACOS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined(MACOS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

print_symbolic_args(func, wrapcxt);

/* XXX: we have to print a module id + offset here instead of an
* absolute address and provide a dump a module list at the end of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"provide a dump a"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dr_fprintf(outf, ")");
}

void print_symbolic_args(app_pc func, void *wrapcxt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style violation again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return (syscall_info_t *)hashtable_lookup(&addr2entry_table, func);
}

void print_args(app_pc func, void *wrapcxt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return true; /* keep going */
}

syscall_info_t *lookup_entry(app_pc func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style violation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return true; /* keep going */
}

syscall_info_t *lookup_entry(app_pc func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

libcall_entry = lookup_entry(func);
if (libcall_entry != NULL) {
/* drsys_iterate_arg_types employs static enumeration. */
res = drsys_iterate_arg_types((drsys_syscall_t *)libcall_entry,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the abstraction barrier of the drsyscall library: drsys_syscall_t is an opaque type and a user of the library should not have assumptions about what it looks like. See the suggestion above for using library interfaces instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, implemented a new function: drsys_iterate_arg_types_by_sysname

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's cleaner to separate things and have a lookup whose output can be used to the existing iterator, rather than having a multi-operation iterator.

BTW, if drsys_name_to_syscall() can be used statically, which should be true, then no new interface function is needed, right? Call drsys_name_to_syscall() and pass the resulting handle to drsys_iterate_arg_types().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, what about hashtables initialization ? we have drsyscall_os_init but I need to provide a context for this function...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean: any hashtables inside drsyscall are invisible to the user here and are initialized when drsyscall_init() is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, drsys_init()/drsys_exit() will do everything. So it works perfect the only one problem is ASSERT at lines 252-257, it raises error when I am sending Zw* syscall. I can replace Zw* with Nt* but probably I am missing something ?

Copy link
Contributor Author

@mxmssh mxmssh Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERT at lines 252-257 in drsyscall.c


/* XXX: We need to add support of printing from secondary table (e.g. ioctl) (xref i#1549) */
#ifdef WINDOWS
extern dr_os_version_info_t win_ver; /* already defined in drsyscall_windows.c */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are exported by the drsyscall library, though. They're not part of the library interface! All uses here should be through interfaces.

It sounds like you could add a static drsys_lookup_syscall() routine that takes in a name and returns a drsys_syscall_t "handle", right? Then there's no need to break the library's abstractions in multiple ways here by using its internal, private data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I've created three functions that user can use externally.

Added arguments printing of already known syscalls from drsyscall's
tables for Windows and Linux.
Added two new arguments for drltrace -num_unknown_args and max_args
to limit maximum arguments to be printed for known and unknown
libcalls and syscalls.
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better. I have a few minor comments/suggestions.

@@ -62,7 +62,8 @@
* + Add argument values and return values. The number and type of each
* argument and return would likely come from the filter configuration
* file, or from querying debug information.
* Today we have simple type-blind printing via -all_args.
* Today we have simple type-blind printing via -all_args and usage of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_unknown_args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#define MAX_LIBCALL_NAME 255

/* XXX: The functions print_simple_value and print_arg were taken from drstrace.
* It would be better to move them in drsyscall and import in drstrace and here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a lot of code to duplicate. Is there an issue # for the XXX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

static void
print_arg(drsys_arg_t *arg)
{
dr_fprintf(outf, "\n arg %d: ", arg->ordinal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format is completely different from the unknown-arg format: the unknown is all one one line, separated by commas. This has a separate line per arg. Shouldn't they be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do we want to print unknown args one per line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worry about log size here, compact unknown args printing scheme saves a lot of space...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want compact, should the known args be one line (x, y, z, ...) then, closer to Linux ltrace and strace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of known I think we have pretty good scheme of printing. One line would be a bit complicated to read in case of known arguments while for unknown arguments it looks ok. I would prefer to leave current scheme for now...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like log size or screen space is not a concern, then.

Having different formats complicates post-processing the log: now to parse the file there are two different formats to match.

It also seems odd to have a syscall with a known name but unknown types (e.g., NtCreateEnclave) have its 9 arguments all on separate lines with unknown types, while a syscall with an unknown name is printed completely differently even though its arg types are also unknown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, ok let's do one arg per line for all schemes.

if (name[0] == 'Z' && name[1] == 'w')
dr_snprintf(syscall_name, BUFFER_SIZE_ELEMENTS(syscall_name), "Nt%s", name+2);
else
dr_snprintf(syscall_name, BUFFER_SIZE_ELEMENTS(syscall_name), "%s", name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid this redundant copy by having a pointer that points either at name or syscall_name

else
dr_snprintf(syscall_name, BUFFER_SIZE_ELEMENTS(syscall_name), "%s", name);

res = drsys_name_to_syscall(syscall_name, &syscall);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we change this routine to succeed with the Zw variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want additional flag e.g. ignore_zw_nt_mismatch or sthg like that ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought drsyscall had entries for both the Nt and Zw forms. Are you sure it doesn't? I thought that today either query would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see code in drsyscall_windows.c that is adding duplicate entries for Zw to the name2num_table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is correct both versions work. We have an assert at https://github.com/DynamoRIO/drmemory/blob/master/drsyscall/drsyscall.c#L251

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NtQueryPerformanceCounter ZwQueryPerformanceCounter
ASSERT FAILURE (thread 2760): C:\drmemory\drmemory\drsyscall\drsyscall.c:257: st
ri_eq(sysinfo->name, name) IF_WINDOWS(|| strcasestr(sysinfo->name, name) != NULL
) (name<->num mismatch)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert should be relaxed of course.


/* XXX: We employ two schemes of arguments printing. drsyscall is used
* to get a symbolic representation of arguments for known library calls.
* For the rest of library calls we use type-blind printing and -unknown_args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
print_symbolic_args(name, wrapcxt, func);

/* XXX: we have to print a module id + offset here instead of an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i#?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created

drsys_syscall_t *syscall;
char syscall_name[MAX_LIBCALL_NAME];

if (options.max_args > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: "if (options.max_args == 0) return;" to make the rest of the function un-indented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"unknown args printing.");

static droption_t<uint> op_max_args
(DROPTION_SCOPE_CLIENT, "num_max_args", 6, "Maximum number of arguments to print",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal to duplicated these default values. The plan is to switch drltrace.c to use droption also and share these definitions directly, right?

Copy link
Contributor Author

@mxmssh mxmssh Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm preparing a separate commit for that.

Added arguments printing of already known syscalls from drsyscall's
tables for Windows and Linux.
Added two new arguments for drltrace -num_unknown_args and max_args
to limit maximum arguments to be printed for known and unknown
libcalls and syscalls.
#ifdef DEBUG
ASSERT(stri_eq(sysinfo->name, name)
/* ignore possible Nt/Zw mismatch */
if (((sysinfo->name[0] == 'N' && sysinfo->name[1] == 't') ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be Windows only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change in the next commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the next commit? It needs to update to master and thus re-run the checks anyway, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm assuming by "next commit" you mean tacking onto a future pull request? Seems cleaner to be part of this one, not an unrelated pull request, right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, yes, just need to pass through the local tests.

@derekbruening
Copy link
Contributor

There's no test of the output yet? We should add a test soon.

@mxmssh
Copy link
Contributor Author

mxmssh commented Apr 14, 2017

Yes, it's important.

Added arguments printing of already known syscalls from drsyscall's
tables for Windows and Linux.
Added two new arguments for drltrace -num_unknown_args and max_args
to limit maximum arguments to be printed for known and unknown
libcalls and syscalls.
@mxmssh mxmssh merged commit d63d2a9 into master Apr 14, 2017
@mxmssh mxmssh deleted the i1948-add_syscalls_symarg_printing branch April 14, 2017 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants