You can integrate YARA into your C/C++ project by using the API provided by the libyara library. This API gives you access to every YARA feature and it's the same API used by the command-line tools yara
and yarac
.
The first thing your program must do when using libyara is initializing the library. This is done by calling the :cyr_initialize
function. This function allocates any resources needed by the library and initializes internal data structures. Its counterpart is :cyr_finalize
, which must be called when you are finished using the library.
In a multi-threaded program only the main thread must call :cyr_initialize
and :cyr_finalize
. No additional work is required from other threads using the library.
Before using your rules to scan any data you need to compile them into binary form. For that purpose you'll need a YARA compiler, which can be created with :cyr_compiler_create
. After being used, the compiler must be destroyed with :cyr_compiler_destroy
.
You can use :cyr_compiler_add_file
, :cyr_compiler_add_fd
, or :cyr_compiler_add_string
to add one or more input sources to be compiled. Both of these functions receive an optional namespace. Rules added under the same namespace behave as if they were contained within the same source file or string, so, rule identifiers must be unique among all the sources sharing a namespace. If the namespace argument is NULL
the rules are put in the default namespace.
The :cyr_compiler_add_file
, :cyr_compiler_add_fd
, and :cyr_compiler_add_string
functions return the number of errors found in the source code. If the rules are correct they will return 0. If any of these functions return an error the compiler can't be used anymore, neither for adding more rules nor getting the compiled rules.
For obtaining detailed error information you must set a callback function by using :cyr_compiler_set_callback
before calling any of the compiling functions. The callback function has the following prototype:
void callback_function(
int error_level,
const char* file_name,
int line_number,
const YR_RULE* rule,
const char* message,
void* user_data)
4.0.0
Possible values for error_level
are YARA_ERROR_LEVEL_ERROR
and YARA_ERROR_LEVEL_WARNING
. The arguments file_name
and line_number
contain the file name and line number where the error or warning occurred. file_name
is the one passed to :cyr_compiler_add_file
or :cyr_compiler_add_fd
. It can be NULL
if you passed NULL
or if you're using :cyr_compiler_add_string
. rule is a pointer to the YR_RULE structure representing the rule that contained the error, but it can be NULL it the error is not contained in a specific rule. The user_data
pointer is the same you passed to :cyr_compiler_set_callback
.
By default, for rules containing references to other files (include "filename.yara"
), YARA will try to find those files on disk. However, if you want to fetch the imported rules from another source (eg: from a database or remote service), a callback function can be set with :cyr_compiler_set_include_callback
.
- This callback receives the following parameters:
include_name
: name of the requested file.calling_rule_filename
: the requesting file name (NULL if not a file).calling_rule_namespace
: namespace (NULL if undefined).user_data
same pointer passed to :cyr_compiler_set_include_callback
.
It should return the requested file's content as a null-terminated string. The memory for this string should be allocated by the callback function. Once it is safe to free the memory used to return the callback's result, the include_free function passed to :cyr_compiler_set_include_callback
will be called. If the memory does not need to be freed, NULL can be passed as include_free instead. You can completely disable support for includes by setting a NULL callback function with :cyr_compiler_set_include_callback
.
The callback function has the following prototype:
const char* include_callback(
const char* include_name,
const char* calling_rule_filename,
const char* calling_rule_namespace,
void* user_data);
The free function has the following prototype:
void include_free(
const char* callback_result_ptr,
void* user_data);
After you successfully added some sources you can get the compiled rules using the :cyr_compiler_get_rules
function. You'll get a pointer to a :cYR_RULES
structure which can be used to scan your data as described in scanning-data
. Once :cyr_compiler_get_rules
is invoked you can not add more sources to the compiler, but you can call :cyr_compiler_get_rules
multiple times. Each time this function is called it returns a pointer to the same :cYR_RULES
structure. Notice that this behaviour is new in YARA 4.0.0, in YARA 3.X and 2.X :cyr_compiler_get_rules
returned a new copy the :cYR_RULES
structure.
Instances of :cYR_RULES
must be destroyed with :cyr_rules_destroy
.
If your rules make use of external variables (like in the example below), you must define those variables by using any of the yr_compiler_define_XXXX_variable
functions. Variables must be defined before rules are compiled with yr_compiler_add_XXXX
and they must be defined with a type that matches the context in which the variable is used in the rule, a variable that is used like my_var == 5 can't be defined as a string variable.
While defining external variables with yr_compiler_define_XXXX_variable
you must provide a value for each variable. That value is embedded in the compiled rules and used whenever the variable appears in a rule. However, you can change the value associated to an external variable after the rules has been compiled by using any of the yr_rules_define_XXXX_variable
functions.
Compiled rules can be saved to a file and retrieved later by using :cyr_rules_save
and :cyr_rules_load
. Rules compiled and saved in one machine can be loaded in another machine as long as they have the same endianness, no matter the operating system or if they are 32-bit or 64-bit systems. However files saved with older versions of YARA may not work with newer versions due to changes in the file layout.
You can also save and retrieve your rules to and from generic data streams by using functions :cyr_rules_save_stream
and :cyr_rules_load_stream
. These functions receive a pointer to a :cYR_STREAM
structure, defined as:
typedef struct _YR_STREAM
{
void* user_data;
YR_STREAM_READ_FUNC read;
YR_STREAM_WRITE_FUNC write;
} YR_STREAM;
You must provide your own implementation for read
and write
functions. The read
function is used by :cyr_rules_load_stream
to read data from your stream and the write
function is used by :cyr_rules_save_stream
to write data into your stream.
Your read
and write
functions must respond to these prototypes:
size_t read(
void* ptr,
size_t size,
size_t count,
void* user_data);
size_t write(
const void* ptr,
size_t size,
size_t count,
void* user_data);
The ptr
argument is a pointer to the buffer where the read
function should put the read data, or where the write
function will find the data that needs to be written to the stream. In both cases size
is the size of each element being read or written and count
the number of elements. The total size of the data being read or written is size
* count
. The read
function must return the number of elements read, the write
function must return the total number of elements written.
The user_data
pointer is the same you specified in the :cYR_STREAM
structure. You can use it to pass arbitrary data to your read
and write
functions.
Once you have an instance of :cYR_RULES
you can use it directly with one of the yr_rules_scan_XXXX
functions described below, or create a scanner with :cyr_scanner_create
. Let's start by discussing the first approach.
The :cYR_RULES
you got from the compiler can be used with :cyr_rules_scan_file
, :cyr_rules_scan_fd
or :cyr_rules_scan_mem
for scanning a file, a file descriptor and a in-memory buffer respectively. The results from the scan are returned to your program via a callback function. The callback has the following prototype:
int callback_function(
YR_SCAN_CONTEXT* context,
int message,
void* message_data,
void* user_data);
Possible values for message
are:
CALLBACK_MSG_RULE_MATCHING
CALLBACK_MSG_RULE_NOT_MATCHING
CALLBACK_MSG_SCAN_FINISHED
CALLBACK_MSG_IMPORT_MODULE
CALLBACK_MSG_MODULE_IMPORTED
CALLBACK_MSG_TOO_MANY_MATCHES
CALLBACK_MSG_CONSOLE_LOG
Your callback function will be called once for each rule with either a CALLBACK_MSG_RULE_MATCHING
or CALLBACK_MSG_RULE_NOT_MATCHING
message, depending if the rule is matching or not. In both cases a pointer to the :cYR_RULE
structure associated with the rule is passed in the message_data
argument. You just need to perform a typecast from void*
to YR_RULE*
to access the structure. You can control whether or not YARA calls your callback function with CALLBACK_MSG_RULE_MATCHING
and CALLBACK_MSG_RULE_NOT_MATCHING
messages by using the SCAN_FLAGS_REPORT_RULES_MATCHING
and SCAN_FLAGS_REPORT_RULES_NOT_MATCHING
as described later in this section.
This callback is also called with the CALLBACK_MSG_IMPORT_MODULE
message. All modules referenced by an import
statement in the rules are imported once for every file being scanned. In this case message_data
points to a :cYR_MODULE_IMPORT
structure. This structure contains a module_name
field pointing to a null terminated string with the name of the module being imported and two other fields module_data
and module_data_size
. These fields are initially set to NULL
and 0
, but your program can assign a pointer to some arbitrary data to module_data
while setting module_data_size
to the size of the data. This way you can pass additional data to those modules requiring it, like the Cuckoo-module
for example.
Once a module is imported the callback is called again with the CALLBACK_MSG_MODULE_IMPORTED. When this happens message_data
points to a :cYR_OBJECT_STRUCTURE
structure. This structure contains all the information provided by the module about the currently scanned file.
If during the scan a string hits the maximum number of matches, your callback will be called once with the CALLBACK_MSG_TOO_MANY_MATCHES
. When this happens, message_data
is a YR_STRING*
which points to the string which caused the warning. If your callback returns CALLBACK_CONTINUE
, the string will be disabled and scanning will continue, otherwise scanning will be halted.
Your callback will be called from the console module (console-module
) with the CALLBACK_MSG_CONSOLE_LOG
message. When this happens, the message_data
argument will be a char*
that is the string generated by the console module. Your callback can do whatever it wants with this string, including logging it to an external logging source, or printing it to stdout.
Lastly, the callback function is also called with the CALLBACK_MSG_SCAN_FINISHED
message when the scan is finished. In this case message_data
is NULL
.
Notice that you shouldn't call any of the yr_rules_scan_XXXX
functions from within the callback as those functions are not re-entrant.
Your callback function must return one of the following values:
CALLBACK_CONTINUE
CALLBACK_ABORT
CALLBACK_ERROR
If it returns CALLBACK_CONTINUE
YARA will continue normally, CALLBACK_ABORT
will abort the scan but the result from the yr_rules_scan_XXXX
function will be ERROR_SUCCESS
. On the other hand CALLBACK_ERROR
will abort the scanning too, but the result from yr_rules_scan_XXXX
will be ERROR_CALLBACK_ERROR
.
The user_data
argument passed to your callback function is the same you passed yr_rules_scan_XXXX
. This pointer is not touched by YARA, it's just a way for your program to pass arbitrary data to the callback function.
All yr_rules_scan_XXXX
functions receive a flags
argument that allows to tweak some aspects of the scanning process. The supported flags are the following ones:
SCAN_FLAGS_FAST_MODE
SCAN_FLAGS_NO_TRYCATCH
SCAN_FLAGS_REPORT_RULES_MATCHING
SCAN_FLAGS_REPORT_RULES_NOT_MATCHING
The SCAN_FLAGS_FAST_MODE
flag makes the scanning a little faster by avoiding multiple matches of the same string when not necessary. Once the string was found in the file it's subsequently ignored, implying that you'll have a single match for the string, even if it appears multiple times in the scanned data. This flag has the same effect of the -f
command-line option described in command-line
.
SCAN_FLAGS_REPORT_RULES_MATCHING
and SCAN_FLAGS_REPORT_RULES_NOT_MATCHING
control whether the callback is invoked for rules that are matching or for rules that are not matching respectively. If SCAN_FLAGS_REPORT_RULES_MATCHING
is specified alone, the callback will be called for matching rules with the CALLBACK_MSG_RULE_MATCHING
message but it won't be called for non-matching rules. If SCAN_FLAGS_REPORT_RULES_NOT_MATCHING
is specified alone, the opposite happens, the callback will be called with CALLBACK_MSG_RULE_NOT_MATCHING
messages but not with CALLBACK_MSG_RULE_MATCHING
messages. If both flags are combined together (the default) the callback will be called for both matching and non-matching rules. For backward compatibility, if none of these two flags are specified, the scanner will follow the default behavior.
Additionally, yr_rules_scan_XXXX
functions can receive a timeout
argument which forces the scan to abort after the specified number of seconds (approximately). If timeout
is 0 it means no timeout at all.
The yr_rules_scan_XXXX
functions are enough in most cases, but sometimes you may need a fine-grained control over the scanning. In those cases you can create a scanner with :cyr_scanner_create
. A scanner is simply a wrapper around a :cYR_RULES
structure that holds additional configuration like external variables without affecting other users of the :cYR_RULES
structure.
A scanner is particularly useful when you want to use the same :cYR_RULES
with multiple workers (it could be a separate thread, a coroutine, etc) and each worker needs to set different set of values for external variables. In that case you can't use yr_rules_define_XXXX_variable
for setting the values of your external variables, as every worker using the :cYR_RULES
will be affected by such changes. However each worker can have its own scanner, where the scanners share the same :cYR_RULES
, and use yr_scanner_define_XXXX_variable
for setting external variables without affecting the rest of the workers.
This is a better solution than having a separate :cYR_RULES
for each worker, as :cYR_RULES
structures have large memory footprint (specially if you have a lot of rules) while scanners are very lightweight.
SCAN_FLAGS_FAST_MODE
: Enable fast scan mode.SCAN_FLAGS_NO_TRYCATCH
: Disable exception handling.SCAN_FLAGS_REPORT_RULES_MATCHING
: If thisSCAN_FLAGS_REPORT_RULES_NOT_MATCHING