Yara integrated software to handle archive file data.
C++ C Python JavaScript Makefile M4 Other
Clone or download
Permalink
Failed to load latest commit information.
dockerize dockerize - switch to official Canonical image Mar 7, 2018
libs pdf parser - added support for unencoded plain text streams and dicti… Mar 12, 2018
test update to nose tests since the call to yextend has changed (now use -… Nov 21, 2017
test_files Add dummy rule and rule with comma in string Dec 8, 2017
test_rulesets Add dummy rule and rule with comma in string Dec 8, 2017
.gitignore updated gitignore file Jan 15, 2018
.travis.yml update to travis config file - add comms to slack channel Nov 30, 2017
AUTHORS updates to AUTHORS and CONTRIBUTORS Aug 26, 2016
CONTRIBUTORS update to CONTRIBUTORS list Nov 26, 2017
COPYING updated README & file comments Feb 27, 2017
LICENSE Update LICENSE Jan 31, 2018
Makefile.am Corrections on Makefile.am to include pdf_font.cpp Feb 27, 2018
QUICKSTART updated QUICKSTART Aug 27, 2016
README.md README - removing poppler utils requirement Jan 14, 2018
autogen.sh initial ingest - ywrapper 1.0 Aug 22, 2014
bayshore_content_scan.cpp scan pdf - logic adjustment to handle case where utf8 is at play inst… Mar 2, 2018
bayshore_content_scan.h initial ingest - pdf parser Jan 14, 2018
build.sh initial ingest of build helper shell script Aug 26, 2016
configure.ac updates to README Nov 22, 2017
main.cpp initial ingest - pdf parser Jan 14, 2018
run_yextend changed scan results to array instead of dict Dec 4, 2017
wrapper.cpp initial ingest - pdf parser Jan 14, 2018
wrapper.h initial ingest - pdf parser Jan 14, 2018

README.md

yextend

Coverity Scan Build Status OpenHub-Status

Yara integrated software to handle archive file data.

yextend was written for the sake of augmenting yara. yara by itself is great but we realized that it could not natively handle archived content in the granular way that we needed it to. For instance, if we were hunting for malware and it happened to be buried a few levels into archived content, yara in its native form could not help us. So what we have done is natively handle the inflation of archived content. And we pass the inflated content of each discovered resource to yara so that it can work its magic natively on one file's payload. Then yara does what it does quite well in terms of pattern matching and such based on a given set of rules.

Software Credits

- Yara is originally authored by Victor M. Alvarez (https://github.com/VirusTotal/yara) - License: https://raw.githubusercontent.com/VirusTotal/yara/master/COPYING
- json.hpp is authored by Niels Lohmann (https://github.com/nlohmann/json) - License: https://raw.githubusercontent.com/nlohmann/json/develop/LICENSE.MIT  *** Note that for Yextend to compile I had to make a change to the original json.hpp - function "get_string" was causing name conflicts with function "get_string" from Yara. So I renamed json.hpp's function to "j_get_string"

Notes:

  • (12/08/2017) yextend version 1.6

    • added support for bzip2
    • added numerous new file type detection patterns
    • added macro file detection inside of .docm files
    • added support for JSON output (standard and advanced)
  • (08/30/2016) yextend version 1.5

    • native parsing for PDF documents such that the Yara rulesets are now running against:
      • the target PDF's binary data set
      • the raw text (extracted out of the binary data set) from the target PDF
    • file type detection has been re-written so that it is now performed against a Yara ruleset
    • added some initial debug code (based on bool flag)
    • more test files added
    • nose based unit tests added
    • yextend is now licensed via the BSD (3-clause) license
  • (03/18/2016) yextend version 1.4 - output enhancements and runtime helper prog

    • output now includes the offset and string definition identifier for every hit reported by Yara
    • output now includes the name of the Yara ruleset file at hand
    • initial release of run_yextend prog
  • (10/24/2015) yextend version 1.3 will only work with yara 3.4.

    • if your rules have data in the 'meta' section they will now show up in the output, take a look at 'RULEWITHMETA' below and you will see an example of such output
  • (05/28/2015) yextend version 1.2 will only work with yara 3.3 and above

  • This software was written and tested on Linux (both Fedora and Debian). Ports to other platforms are currently TBD.

  • If a dir (and not a file) is passed in then this version will process all of the files at that top level. Subdirectories are not processed yet.

Requirements to build and run:

  • g++ (GNU c++ compiler - c++11)
  • autoconf 2.69 or above
  • openssl devel lib (sudo dnf install openssl-devel or sudo apt-get install libssl-dev)
  • zlib devel lib (sudo dnf install zlib-devel or sudo apt-get install zlib1g-dev)
  • bzlib devel lib (sudo dnf install bzip2-devel or sudo apt-get install libbz2-dev)
  • libarchive (v4) be installed (sudo dnf install libarchive-devel or sudo apt-get install libarchive-dev)
  • pcrecpp (sudo dnf install pcre-devel or sudo apt-get install libpcre3-dev)
  • uuid (sudo dnf install uuid-devel / libuuid-devel or sudo apt-get install uuid-dev)
  • nose (sudo dnf install python2-nose or sudo apt-get install python-nose)
  • yara v3.X be fully installed
  • if you are running yara pre-version 3.1.X then yara v3 lib header files to be moved to a specific location after a typical yara install, steps: A. cd into the dir where you extracted yara (for this example I will use "/tmp/yara") B. sudo cp /tmp/yara/libyara/include/yara/* /usr/local/include/yara/

Instructions:

1 - Make sure all requirements set forth above are met

2 - Extract our software in the directory of your choice (referred to as THEDIR from now on)

- cd THEDIR
- tar -xvzf yextend.tar.gz

3 - Build (make sure the executable bit is set on the file system for build.sh):

- ./build.sh

4 - Test (optional)

- make unittests

5 - Run:

- Note on run modes:

	- No arguments/switches passed in means the output is intended for human eyes
	- The -j switch means the output is for machine to consume [JSON]
	- The -a switch, used in conjunction -j, means that file structure/hierarchy will be added to the machine consumable JSON output

- 2 options to run:

A. use executable run_yextend - it wraps the native yextend executable. To run:

- the program 'run_yextend' takes in 2 mandatory arguments, 1 optional:

	1. A yara ruleset file or directory of ruleset files
	2. A file name or a directory of target files
	3. '-j' for json output [optional]

usage:
    -r or --ruleset             rule_entity
    -t or --target              target_file_entity
    -j

    examples:
	 ./run_yextend -r rule_entity -t target_file_entity
    ./run_yextend --ruleset rule_entity -t target_file_entity
    ./run_yextend -r rule_entity -t target_file_entity -j

	***** make sure the executable bit is set on the file system for run_yextend *****

B. run yextend executable - prefix the run statement by telling LD_LIBRARY_PATH where the yara shared object lib (or its symlink) is. If you changed nothing during the yara install then that value is '/usr/local/lib'

- the program 'yextend' takes in 2 mandatory arguments, 1 optional:

	1. A yara ruleset file
	2. A file name or a directory name where the target files exist
	3. '-j' for json output [optional]

usage:
    -r rule_entity
    -t target_file_entity
    -j

example:

	- LD_LIBRARY_PATH=/usr/local/lib ./yextend -r ~/Desktop/bayshore.yara.rules -t /tmp/targetfiles/filex
	- LD_LIBRARY_PATH=/usr/local/lib ./yextend -r ~/Desktop/bayshore.yara.rules -t /tmp/targetfiles/
	- LD_LIBRARY_PATH=/usr/local/lib ./yextend -r ~/Desktop/bayshore.yara.rules -t /tmp/targetfiles/filex -j

	*****
		if you don't want to set LD_LIBRARY_PATH on each prog run then you can set it in your .bashrc (or equivalent on your system) as such:

		export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

		then the program runs would be as such:

			- ./yextend -r ~/Desktop/bayshore.yara.rules -t /tmp/targetfiles/filex
			- ./yextend -r ~/Desktop/bayshore.yara.rules -t /tmp/targetfiles/			
	*****

6 - Analyze output. The output will be structured as such (number of result stanzas will obviously vary based on the content at hand):

===============================ALPHA===================================
Ruleset File Name: w
File Name: x
File Size: y
File Signature (MD5): z

=======================================================================

Yara Result(s): RULE1:[detected offsets=0x48:$a,hit_count=1]
Scan Type: a
Parent File Name: b
Child File Name: c
File Signature (MD5): d


Yara Result(s): RULE2, RULEWITHMETA:[type=Simple string search,signature=Looking for 95,author=Dre,testint=9,d=false,detected offsets=0x0:$a-0x1:$a-0x2:$a-0x3:$a,hit_count=4], RULE3
Scan Type: v
Parent File Name: x
Child File Name: y
File Signature (MD5): z


===============================OMEGA===================================

For each result stanza you will be presented with a listing of the relevant Yara rules that hit. This will be in the key/value pair with key "Yara Result(s)". The value here is formatted as follows:

	RULE_ID:[META_DATA,OFFSETS,HIT_COUNT], RULE_ID[OFFSETS,HIT_COUNT], ... RULE_ID[OFFSETS,HIT_COUNT]

	Inside the square brackets:

		The "META_DATA" are comma-separated and represent the data from your rules "meta" section. As this is optional you may see no meta-data output from yextend.
		The OFFSETS are comma-separated. Each offset listing is delimited by a dash and represent the offset in the content where the rule hit took place and the respective string definition identifier from your Yara rule.
		The HIT_COUNT is simply a listing of the number of rule hits.

	Here is an example:

		EXE_DROP:[detected offsets=0x4e:$a,hit_count=1], MZ_PORTABLE_EXE, S95:[detected offsets=0xd94e:$a-0x2c5b5:$b,hit_count=2]

	Breakdown: In this example there are 3 rules that hit:

		RULE_ID: EXE_DROP
		META_DATA: No meta data listed as there is none in the rule
		OFFSETS: detected offsets=0x4e:$a ("0x4e" is the offset in the content and "$a" is the respective string definition identifier)
		HIT_COUNT: hit_count=1

		RULE_ID: MZ_PORTABLE_EXE (Nothing else is listed because this rule only has the required "condition" section)

		RULE_ID: S95
		META_DATA: No meta data listed as there is none in the rule
		OFFSETS: detected offsets=0xd94e:$a-0x2c5b5:$b ("0xd94e" is one offset in the content that hit on string definition "$a". "0x2c5b5" is another offset in the content that hit on string definition "$b")
		HIT_COUNT: hit_count=2



A. example output from one of the test files:

	===============================ALPHA===================================
	Ruleset File Name: test_rules/ruleset_blah
	File Name: test_files/rands_tarball.tar.gz
	File Size: 271386
	File Signature (MD5): 74edc10648f6d65e90cd859120eaa31b

	=======================================================================

	Yara Result(s): JUDO:[detected offsets=0x48:$a,hit_count=1]
	Scan Type: Yara Scan (ASCII Text File) inside GZIP Archive file
	Parent File Name: AAA.gz
	Child File Name: AAA
	File Signature (MD5): bf6aadaf4b6fb726040c1131d809bfc2


	Yara Result(s): EXE_DROP:[detected offsets=0x4e:$a,hit_count=1], MZ_PORTABLE_EXE, S95:[detected offsets=0xd94e:$a-0x2c5b5:$a,hit_count=2]
	Scan Type: Yara Scan (Windows - Portable Executable) inside ZIP 2.0 (deflation) file
	Parent File Name: rand987.zip
	Child File Name: spoolsy.exe
	File Signature (MD5): 25ca7beed707e94ce70137f7d0c7b14e


	===============================OMEGA===================================


	This is based on file "test_files/rands_tarball.tar.gz" that has the following structure:

	rands_tarball.tar.gz
	|_ rands.tar
	   |_ rand123.zip
	      |_ AAA.gz
	         |_ AAA
	      rand987.zip
	      |_ spoolsy.exe


B. another more complex example using one of the test files:


	===============================ALPHA===================================
	Ruleset File Name: test_rules/ruleset_blah
	File Name: test_files/step1-zips.tar.gz
	File Size: 2400255
	File Signature (MD5): 98178b84fd9280fa1ed469c6512cd0ee

	=======================================================================

	Yara Result(s): EXE_DROP:[type=EXE DROP detection,signature=Some test sig,uuid=ce354f47467b31e4addcf4b7e2c79a19,detected offsets=0x4e:$a,hit_count=1], MZ_PORTABLE_EXE, S95:[type=Simple string search,signature=Looking for 95,author=Dre,detected offsets=0xd94e:$a-0x2c5b5:$a,hit_count=2]
	Scan Type: Yara Scan (Windows - Portable Executable) inside POSIX ustar format file
	Parent File Name: test_files/step1-zips.tar
	Child File Name: spoolsy.exe
	File Signature (MD5): 25ca7beed707e94ce70137f7d0c7b14e


	Yara Result(s): CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x40:$a-0x5a:$a-0xbf:$a-0x154:$a-0x175:$a-0x18f:$a-0x1ad:$a-0x1d6:$a-0x1fc:$a-0x235:$a-0x25c:$a-0x282:$a-0x2b4:$a-0x2dc:$a-0x302:$a-0x330:$a-0x359:$a-0x37f:$a-0x3b3:$a-0x3f4:$a-0x453:$a-0x479:$a-0x499:$a-0x4c3:$a-0x4e9:$a-0x51e:$a-0x54a:$a-0x570:$a-0x5a7:$a-0x5d0:$a-0x5f6:$a-0x619:$a,hit_count=33]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: Lorem-winlogon-spoolsy-pptx.docx
	Child File Name: [Content_Types].xml
	File Signature (MD5): 64a53ef2e6b1b9ed389232cc69898289

	Yara Result(s): S95:[type=Simple string search,signature=Looking for 95,author=Dre,detected offsets=0x2ee:$a-0x1698:$a,hit_count=2], CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x42:$a-0x5e:$a-0x1980:$a-0x19c4:$a,hit_count=5]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: Lorem-winlogon-spoolsy-pptx.docx
	Child File Name: word/theme/theme1.xml
	File Signature (MD5): 3badbd456f490c89a3a9a118c0eb9aca

	...

	Yara Result(s): CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x49:$a-0x65:$a-0x89:$a-0xa5:$a-0xd7:$a-0xf3:$a-0xd0d:$a-0xd16:$a,hit_count=9]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: word/embeddings/Microsoft_Office_PowerPoint_Presentation1.pptx
	Child File Name: ppt/presentation.xml
	File Signature (MD5): c312ab4768729234a74c94a67ccce68d


	Yara Result(s): LOREM:[detected offsets=0x31b:$a-0x4881:$a,hit_count=2], CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x40:$a-0x5c:$a-0x80:$a-0x9c:$a-0xce:$a-0xea:$a-0x5d07:$a-0x5d10:$a,hit_count=9]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: word/embeddings/Microsoft_Office_PowerPoint_Presentation1.pptx
	Child File Name: ppt/slides/slide2.xml
	File Signature (MD5): f8625ea3f7e5c1e10ca78b9dc72ebcf5


	Yara Result(s): CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x40:$a-0x5c:$a-0x80:$a-0x9c:$a-0xce:$a-0xea:$a-0x587:$a-0x590:$a,hit_count=9]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: word/embeddings/Microsoft_Office_PowerPoint_Presentation1.pptx
	Child File Name: ppt/slides/slide3.xml
	File Signature (MD5): 0bd56ae94bb2d043477c4d2189c44723


	Yara Result(s): CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x40:$a-0x5c:$a-0x80:$a-0x9c:$a-0xce:$a-0xea:$a-0x587:$a-0x590:$a,hit_count=9]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: word/embeddings/Microsoft_Office_PowerPoint_Presentation1.pptx
	Child File Name: ppt/slides/slide1.xml
	File Signature (MD5): 01b2314a10da34be2161e2057198e5ef

	...

	Yara Result(s): S95:[type=Simple string search,signature=Looking for 95,author=Dre,detected offsets=0xa48:$a-0xd21:$a,hit_count=2], CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x46:$a-0x62:$a-0x86:$a-0xa2:$a-0xd4:$a-0xf0:$a-0x349:$a-0x352:$a,hit_count=9]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: word/embeddings/Microsoft_Office_PowerPoint_Presentation1.pptx
	Child File Name: ppt/slideLayouts/slideLayout16.xml
	File Signature (MD5): 51f0848ae9a1c864fe467bddac461cde


	Yara Result(s): S95:[type=Simple string search,signature=Looking for 95,author=Dre,detected offsets=0x6c0:$a-0x999:$a,hit_count=2], CTXTESTXML:[type=nocase 'XML' search,signature=Some test XML sig,testint=9,d=false,detected offsets=0x2:$a-0x46:$a-0x62:$a-0x86:$a-0xa2:$a-0xd4:$a-0xf0:$a-0x33f:$a-0x348:$a,hit_count=9]
	Scan Type: Yara Scan (XML Document) inside ZIP 2.0 (deflation) file
	Parent File Name: word/embeddings/Microsoft_Office_PowerPoint_Presentation1.pptx
	Child File Name: ppt/slideLayouts/slideLayout6.xml
	File Signature (MD5): eefa70fd1d867683d9a4ad3e36b4b9fa


	===============================OMEGA===================================


	This last example is based on file "test_files/step1-zips.tar.gz" that has the following structure:

	step1-zips.tar.gz
	|_ docx-embedded-step3.xlsx.zip
	   |_ docx-embedded-step3.xlsx
	      |_ Lorem-winlogon-spoolsy-pptx.docx
	         |_ winlogon.exe
	         |_ spoolsy.exe
	         |_ pptx-plain.pptx
	|_ Lorem-winlogon-spoolsy-pptx.docx.zip
	   |_ Lorem-winlogon-spoolsy-pptx.docx
	      |_ winlogon.exe
	      |_ spoolsy.exe
	      |_ pptx-plain.pptx
	|_ spoolsy.exe


C. An example of standard JSON output:

	./run_yextend -r test_rulesets/BAYSHORE_Officex1.yar -t test_files/Lorem-winlogon.docx.bz2 -j
	[
		{
    		"file_name": "test_files/Lorem-winlogon.docx.bz2",
    		"file_signature_MD5": "26b485701d6a47618a7f915aa7a38045",
    		"file_size": 208940,
    		"scan_results": [
        		{
            		"child_file_name": "word/document.xml",
            		"file_signature_MD5": "9d58972d9a528da89c971597e4aa1844",
            		"parent_file_name": "test_files/Lorem-winlogon.docx",
            		"scan_type": "Yara Scan (Office Open XML)  inside BZIP2 Archive file",
            		"yara_matches_found": false
        		},
        		{
            		"child_file_name": "word/embeddings/oleObject1.bin",
            		"description": "suspicious marco action",
            		"detected offsets": [
                		"0x12231:$a02",
                		"0x135c5:$a03",
                		"0x55dc7:$a03",
                		"0x5f1e2:$a03",
                		"0x55db9:$a09",
                		"0x5f1d5:$a09",
                		"0x55e31:$a12",
                		"0x1037d:$a15"
            		],
            		"file_signature_MD5": "27250593fccbfb679320a74be9459d17",
            		"hit_count": "8",
            		"parent_file_name": "test_files/Lorem-winlogon.docx",
            		"scan_type": "Yara Scan (Microsoft Office document (DOC PPT XLS)) embedded in an Office Open XML file",
            		"yara_matches_found": true,
            		"yara_rule_id": "maldoc_suspicious_strings"
        		},
        		{
            		"child_file_name": "word/embeddings/oleObject1.bin",
            		"file_signature_MD5": "6674f1535a94304e58f26d06f348190d",
            		"parent_file_name": "test_files/Lorem-winlogon.docx",
            		"scan_type": "Yara Scan (Office Open XML)  inside BZIP2 Archive file",
            		"yara_matches_found": false
        		}
    		],
    		"yara_matches_found": true,
    		"yara_ruleset_file_name": "test_rulesets/BAYSHORE_Officex1.yar"
		}
	]


D. An example of advanced JSON output:

         ./run_yextend -r test_rulesets/BAYSHORE_Officex1.yar -t test_files/Lorem-winlogon.docx.bz2 -j -a
         [
             {
                 "yara_matches_found": true,
                 "scan_results": [
                     {
                         "yara_matches_found": false,
                         "file_name": "test_files/Lorem-winlogon.docx",
                         "file_size": 210193,
                         "children": [
                             {
                                 "yara_matches_found": false,
                                 "file_size": 29639,
                                 "file_name": "word/document.xml",
                                 "file_signature_MD5": "9d58972d9a528da89c971597e4aa1844",
                                 "scan_type": "Yara Scan (Office Open XML)  inside BZIP2 Archive file"
                             },
                             {
                    	     "yara_matches_found": true,
                                 "file_name": "word/embeddings/oleObject1.bin",
                    	     "scan_type": "Yara Scan (Microsoft Office document (DOC PPT XLS)) embedded in an Office Open XML file",
                                 "yara_results": {
                                     "maldoc_suspicious_strings": {
                                         "description": "suspicious marco action",
                                         "hit_count": "8",
                                         "offsets": [
                                            "0x12231:$a02",
                                            "0x135c5:$a03",
                                            "0x55dc7:$a03",
                                            "0x5f1e2:$a03",
                                            "0x55db9:$a09",
                                            "0x5f1d5:$a09",
                                            "0x55e31:$a12",
                                            "0x1037d:$a15"
                                         ]
                                     }
                                 },
                                 "file_size": 392192,
                                 "file_signature_MD5": "27250593fccbfb679320a74be9459d17"
                             },
                             {
                                 "yara_matches_found": false,
                                 "file_size": 8,
                                 "file_name": "word/embeddings/oleObject1.bin",
                                 "file_signature_MD5": "6674f1535a94304e58f26d06f348190d",
                                 "scan_type": "Yara Scan (Office Open XML)  inside BZIP2 Archive file"
                             }
                         ],
                         "file_signature_MD5": "d8f0fab30eae91687c0d80f8dd08218f"
                     }
                 ],
   	             "file_name": "test_files/Lorem-winlogon.docx.bz2",
   	             "yara_ruleset_file_name": "test_rulesets/BAYSHORE_Officex1.yar",
   	             "file_size": 208940,
   	             "children": [
                    {
           	            "yara_matches_found": false,
           	            "file_size": 210193,
                        "file_name": "test_files/Lorem-winlogon.docx",
                        "file_signature_MD5": "d8f0fab30eae91687c0d80f8dd08218f"
                    }
                ],
                "file_signature_MD5": "26b485701d6a47618a7f915aa7a38045"
            }
        ]