Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback on include #727

Closed
wants to merge 14 commits into from
Closed

Conversation

edhoedt
Copy link

@edhoedt edhoedt commented Aug 28, 2017

Adds the possibility to fetch 'include' rules files from external data sources (eg: database or remote server) by allowing developers to set a custom callback for the 'include' directive.
This update allows developers to create interfaces between yara and other services without requiring write access on the disk to store rules source files.
If the callback function is not set, yara behaves as in previous versions and tries to get the requested file from the disk. If it is set, yara will not try to read the file from disk.

Setting a callback is done via:
void yr_compiler_set_include_callback(YR_COMPILER* compiler, YR_COMPILER_INCLUDE_CALLBACK_FUNC callback, void* user_data)

The callback itself must have the following signature:

const char* include_callback(
    const char* include_name,
    const char* calling_rule_filename,
    const char* calling_rule_namespace,
    void* user_data);

Parameters

  • include_name: name of the requested file as stated by the 'import' directive.
  • calling_rule_filename: the requesting file name (NULL if not a file, ie: compiling from string).
  • calling_rule_namespace: namespace (NULL if undefined).
  • user_data: additional user data.

Returns:

  • The content of the file to include as a single string

@edhoedt
Copy link
Author

edhoedt commented Aug 31, 2017

Any comments @plusvic ? Anything I should improve to have it merged?

void* user_data)
{
compiler->include_callback = include_callback;
compiler->user_data = user_data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler->user_data is being used by the error callback set with yr_compiler_set_callback, you shouldn't modify its value because both callbacks could be used at the same time.

docs/capi.rst Outdated
*``include_name``: name of the requested file.
*``calling_rule_filename``: the requesting file name (NULL if not a file).
*``calling_rule_namespace``: namespace (NULL if undefined).
And should return the requested file as a string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no mention to who's responsible for freeing the memory pointed to by the callback's result. I assume that the external code is responsible for doing that, but it's probably a good idea to clearly state it in the documentation.

libyara/lexer.l Outdated
}
// Workaround for flex issue: https://github.com/westes/flex/issues/58
yypush_buffer_state(YY_CURRENT_BUFFER, yyscanner);
yy_scan_string(res, yyscanner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are passing the pointer received from the callback to yy_scan_string, this is fine as long as it points to memory allocated in the heap (if the callback returns a pointer to it's own stack the memory will be invalid as soon as the callback returns), but this introduces again the question of who's the responsible for freeing the memory and when it's safe to do it. Those question should be addressed in the documentation.

I'm not sure if the right approach here is making the external code responsible for freeing the memory, forcing the callback to allocate the returned buffer with a specific function provide by YARA (it could be simply yr_malloc), and letting YARA free that buffer when the parsing is done seems like a safer alternative.

With the current design is trickier to get everything right. As an example, I think you have a memory leak in the yara-python code that uses this feature (more details in VirusTotal/yara-python#58)

libyara/lexer.l Outdated
snprintf(buffer, sizeof(buffer),
"can't open include file: %s", yyextra->lex_buf);
yyerror(yyscanner, compiler, buffer);
const char* res = compiler->include_callback(yyextra->lex_buf, current_file_name, compiler->current_namespace->name, compiler->user_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep line length under 80 characters.

 * Limited lines to 80 chars
 * include callback is now using its own user_data and does not conflict with other callbacks
 * Copying include callback result to the stack before pushing it into the flex scanner
 * yara's memory management functions are now exposed through the API
 * Making the lexer reponsible for freeing the memory allocated by the include callback
 * Updating docs accordingly
@edhoedt
Copy link
Author

edhoedt commented Sep 1, 2017

Thanks for the feedback @plusvic , I believe I fixed everything accordingly. Could you please tell me if you see any other issues? If everything is OK, I will proceed with the corresponding fixes in yara-python.

libyara/lexer.c Outdated
if (res != NULL)
{
int error_code = _yr_compiler_push_file_name(compiler, yyextra->lex_buf);
const char* res_dup = yr_strdup(res);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to duplicate the string, now it's safe to pass the pointer returned by the callback to yara_yy_scan_string and free it after that function returns.

docs/capi.rst Outdated
@@ -695,6 +699,31 @@ Functions
Enables the specified rule. After being disabled with :c:func:`yr_rule_disable`
a rule can be enabled again by using this function.

.. c:function:: void* yr_calloc(size_t count, size_t size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exposing all these functions as part of the API I would do something simpler but even more flexible. Add an argument to yr_compiler_set_include_callback which is a pointer to the free function that YARA should use to deallocate the memory returned by the callback. This way the program can use whatever allocator it wants, as long as the program provides the appropriate free function. The program could even use a statically allocated buffer and pass YARA a NULL instead of a pointer to the free function, indicating the buffer don't need to be de-allocated at all. For example:

typedef  void (*YR_COMPILER_INCLUDE_FREE_FUNC) (void*)

YR_API void yr_compiler_set_include_callback(
     YR_COMPILER* compiler,
     YR_COMPILER_INCLUDE_CALLBACK_FUNC include_callback,
     YR_COMPILER_INCLUDE_FREE_FUNC include_free,
     void* user_data)

libyara/lexer.l Outdated
#else
if (s != NULL)
#endif
if (compiler->include_callback == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elegant way of implementing this is handling all includes with the callback, and providing a default callback that implements the standard includes. That way you don't need this if statement here. For the usual includes you can implement a default callback that YARA uses unless some other callback is specified by the user.

* adding a callback to allow external code to free the memory used to return the include callback's result
* API: un-exposing yara's memory management functions
* making include callbacks the only way ton include files
* adding a default include callback to open files from the disk (mimicking original behavior)
* updating docs accordingly
@edhoedt
Copy link
Author

edhoedt commented Sep 5, 2017

@plusvic I followed your suggestion but I changed the free function signature a little bit:
void include_free_func (char* callback_return_ptr, void* user_data)
callback_return_ptr is a pointer to the value returned by the callback. user_data is the same as passed to the callback.
I also updated yara-python: VirusTotal/yara-python#58
Any other issue?

@adulau
Copy link

adulau commented Sep 15, 2017

The patch is quite neat and nifty for https://github.com/CIRCL/yara-validator. Will you plan to merge it soon? Thank you.

Copy link
Member

@plusvic plusvic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting better. Just two memory leaks to be fixed.

{
if(file_length != fread(file_buffer, 1, file_length, fh))
{
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a memory leak here. If the file can't be fully read for some reason file_buffer is never freed. Add a yr_free(file_buffer) before the return statement. A call to fclose is also required to prevent leaking file handles.

base_err_msg = "callback failed to provide include resource: %s";
}
size_t err_buff_len = sizeof(base_err_msg) + sizeof(yyextra->lex_buf);
char* error_buffer = (char*) yr_malloc(err_buff_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a memory leak here, error_buffer is never freed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(base_err_msg) returns the pointer's size, not the buffer's size.

@plusvic
Copy link
Member

plusvic commented Oct 6, 2017

Already merged in c0fa0bf

@plusvic plusvic closed this Oct 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants