This repository contains a collection of reference data that can be used with the MinHash-based Code Relationship & Investigation Toolkit (MCRIT).
The scope is to cover popular, typically statically linked code that is commonly encountered during binary / malware analysis.
This includes both artefacts introduced by compilers themselves as well as (precompiled) third party libraries that provide access to common algorithms and data structures.
The data found in this repository has been processed with the following tool chain:
- Starting with raw data, typically containing
.LIB
(.A
) or.OBJ
(.O
), optionally 7z was used to extract the contents, then lib2smda has been used to instrument IDA Pro to parse these files, extract their code and symbols and finally export them into individual SMDA disassembly files. - These files are then merged into a single SMDA report, performing deduplication per PicHash and Function Symbol if appropriate.
- Alternatively,
.DLL
and.EXE
files have been directly processed using SMDA or optionally IDA Pro if*.PDB
files are available. - Finally, the SMDA reports have been submitted once into a vanilla installation of MCRIT and the MCRIT export functionality has been used to convert to an immediately usable format.
This repository contains both the final SMDA files and the ready-to-import MCRIT files, which can be imported using Data/Import in MCRITweb or using the CLI.
This repository is intended to grow over time, as we find time to process more of the scattered artefacts from several previous endeavors.
If you feel that something especially relevant is missing, please open an issue and/or provide input data and we will see what we can do.
Compilers
Libraries
Reference code extracted from all files containing precompiled code found in installations for various compiler toolchains.
Many thanks to Daniel Enders for creating these reference binaries during his Master thesis in 2022!
The source file used to compile these included as many Golang standard library files as possible to create coverage for common functions.
When using these with MCRIT, you probably want to have as few as possible / the most fitting version only as you may otherwise run into performance issues. We noticed that the similarity in Golang library functions can lead to huge candidate clusters for which all functions will have to be matched.
Name | Date | Version | MCRIT | SMDA |
---|---|---|---|---|
Golang | 2014-05-05 | 1.2.2 | x86 / x64 | x86 / x64 |
Golang | 2014-06-18 | 1.3 | x86 / x64 | x86 / x64 |
Golang | 2014-12-10 | 1.4 | x86 / x64 | x86 / x64 |
Golang | 2015-08-19 | 1.5 | x86 / x64 | x86 / x64 |
Golang | 2016-02-17 | 1.6 | x86 / x64 | x86 / x64 |
Golang | 2016-08-15 | 1.7 | x86 / x64 | x86 / x64 |
Golang | 2017-02-16 | 1.8 | x86 / x64 | x86 / x64 |
Golang | 2017-08-24 | 1.9 | x86 / x64 | x86 / x64 |
Golang | 2018-02-16 | 1.10 | x86 / x64 | x86 / x64 |
Golang | 2018-08-24 | 1.11 | x86 / x64 | x86 / x64 |
Golang | 2019-02-25 | 1.12 | x86 / x64 | x86 / x64 |
Golang | 2019-09-03 | 1.13 | x86 / x64 | x86 / x64 |
Golang | 2020-02-25 | 1.14 | x86 / x64 | x86 / x64 |
Golang | 2020-08-11 | 1.15 | x86 / x64 | x86 / x64 |
Golang | 2021-02-16 | 1.16 | x86 / x64 | x86 / x64 |
Golang | 2021-08-16 | 1.17 | x86 / x64 | x86 / x64 |
Golang | 2022-03-15 | 1.18 | x86 / x64 | x86 / x64 |
Having used an installer for the respective version of VS, we crawl its directory structure to discover and process all *.LIB
and *.OBJ
, sort them by bitness, and merge the code found into a single file.
Thanks to Check Point Research for processing VS 2015, 2017, 2019, and 2022.
Name | Version | MCRIT | SMDA |
---|---|---|---|
VS 6 Express | 8168 | x86 | x86 |
VS 2003 Express | 3077 | x86 | x86 |
VS 2005 Express | 50727 | x86 | x86 |
VS 2008 Express | ----- | x86 | x86 |
VS 2010 Express | 30319 | x86 | x86 |
VS 2012 Express | ----- | x86 / x64 | x86 / x64 |
VS 2013 Express | ----- | x86 / x64 | x86 / x64 |
VS 2015 Pro | ----- | x86 / x64 | x86 / x64 |
VS 2017 Pro | ----- | x86 / x86-MFC / x64 / x64-MFC | x86 / x86-MFC / x64 / x64-MFC |
VS 2019 Pro | ----- | x86 / x86-MFC / x64 / x64-MFC | x86 / x86-MFC / x64 / x64-MFC |
VS 2022 Pro | ----- | x86 / x86-MFC / x64 / x64-MFC | x86 / x86-MFC / x64 / x64-MFC |
Having used an installer for the Windows version of a MinGW release, we crawl its directory structure to discover and process all *.A
and *.O
, sort them by bitness, and merge the code found into a single file.
Name | Date | Version | MCRIT | SMDA |
---|---|---|---|---|
MinGW r1 | XXXX-XX-XX | - | x86 / x64 | x86 / x64 |
MinGW r2 | XXXX-XX-XX | - | x86 / x64 | x86 / x64 |
MinGW r3 | 2012-07-14 | trunk_r5214 gcc4.7.1 binutils cvs-20120714 | x86 / x64 | x86 / x64 |
MinGW r4 | 2012-10-27 | v2.0.7 gcc4.7.2 binutils2.23 | x86 / x64 | x86 / x64 |
MinGW r5 | 2012-11-04 | v2.0.7 gcc4.7.2 binutils2.23 | x86 / x64 | x86 / x64 |
MinGW r6 | 2013-04-13 | v2.0.8 gcc4.7.3 binutils2.23.2 | x86 / x64 | x86 / x64 |
MinGW r7 | 2013-04-13 | trunk_r5784 gcc4.8.0 binutils2.23.2 | x86 / x64 | x86 / x64 |
MinGW r8 | 2013-06-01 | trunk_r5876 gcc4.8.1 binutils2.23.2 | x86 / x64 | x86 / x64 |
MinGW r9 | - | - | x86 / x64 | x86 / x64 |
MinGW r10 | 2013-11-17 | v3.0.0 gcc4.8.2 binutils2.23.2 | x86 / x64 | x86 / x64 |
MinGW r11 | 2014-05-22 | v3.1.0 gcc4.8.3 binutils2.24 | x86 / x64 | x86 / x64 |
MinGW r12 | 2014-07-30 | v3.1.0 gcc4.9.1 binutils2.24 | x86 / x64 | x86 / x64 |
MinGW r13 | 2014-11-10 | v3.3.0 gcc4.9.2 binutils2.24 | x86 / x64 | x86 / x64 |
MinGW r14 | 2015-06-30 | v4.0.2 gcc4.9.3 binutils2.25 | x86 / x64 | x86 / x64 |
MinGW r15 | 2015-07-10 | v4.0.2 gcc5.1 binutils2.25 | x86 / x64 | x86 / x64 |
MinGW r16 | 2015-07-21 | v4.0.2 gcc5.2 binutils2.25 | x86 / x64 | x86 / x64 |
MinGW r17 | 2015-12-01 | v4.0.4+ gcc5.2 binutils2.25.1 | x86 / x64 | x86 / x64 |
MinGW r18 | 2015-12-05 | v4.0.4+ gcc5.3 binutils2.25.1 | x86 / x64 | x86 / x64 |
MinGW r19 | 2016-06-14 | v4.0.6 gcc5.4 binutils2.25.1 | x86 / x64 | x86 / x64 |
MinGW r20 | 2016-06-14 | v4.0.6 gcc6.1 binutils2.25.1 | x86 / x64 | x86 / x64 |
MinGW r21 | 2016-09-27 | v4.0.6 gcc6.2 binutils2.27 | x86 / x64 | x86 / x64 |
MinGW r22 | 2016-12-29 | v4.0.6 gcc6.3 binutils2.27 | x86 / x64 | x86 / x64 |
MinGW r23 | - | - | x86 / x64 | x86 / x64 |
MinGW r24 | - | - | x86 / x64 | x86 / x64 |
MinGW r25 | 2017-02-20 | v5.0.1+1 gcc6.3 binutils2.27 | x86 / x64 | x86 / x64 |
MinGW r26 | 2017-06-02 | v5.0.2 gcc7.1 binutils2.28 | x86 / x64 | x86 / x64 |
MinGW r27 | 2017-08-16 | v5.0.2 gcc7.2 binutils2.29 | x86 / x64 | x86 / x64 |
MinGW r28 | 2018-02-07 | v5.0.3 gcc7.3 binutils2.29.1 | x86 / x64 | x86 / x64 |
MinGW r29 | 2018-11-01 | v5.0.4 gcc8.2 binutils2.31.1 | x86 / x64 | x86 / x64 |
MinGW r30 | 2019-02-27 | v6.0.0 gcc8.3 binutils2.31.1 | x86 / x64 | x86 / x64 |
MinGW r31 | 2019-10-14 | v6.0.0 gcc9.2 binutils2.32 | x86 / x64 | x86 / x64 |
MinGW r32 | 2020-04-30 | v7.0.0 gcc9.3 binutils2.34 | x86 / x64 | x86 / x64 |
MinGW r33 | 2021-02-27 | v8.0.0 gcc10.2 binutils2.36.1 | x86 / x64 | x86 / x64 |
MinGW r34 | 2021-07-13 | v8.0.2 gcc10.3 binutils2.36.1 | x86 / x64 | x86 / x64 |
MinGW r35 | 2021-08-15 | v9.0.0 gcc11.2 binutils2.36.1 | x86 / x64 | x86 / x64 |
MinGW r36 | - | - | x86 / x64 | x86 / x64 |
MinGW r37 | 2022-04-26 | v10.0.0 gcc11.3 binutils2.38 | x86 / x64 | x86 / x64 |
MinGW r38 | 2022-08-23 | v10.0.0 gcc12.2 binutils2.39 | x86 / x64 | x86 / x64 |
Thanks to Nim-IDA-FLIRT-Generator by @hunterbr72, we were able to produce object files for Nim, which we could then turn into MCRIT symbols.
Name | Version | MCRIT | SMDA |
---|---|---|---|
Nim | 1.2.10 | x86 / x64 | x86 / x64 |
Nim | 1.4.8 | x86 / x64 | x86 / x64 |
Nim | 1.6.14 | x86 / x64 | x86 / x64 |
Ben Herzog wrote a great reverser's guide to Rust and provided some example binaries with full symbols (PDB) and covering different standard library functions.
Name | Date | Version | MCRIT | SMDA |
---|---|---|---|---|
Rust RE-Tour | 2023-06-01 | Rosetta | x86 / x64 | x86 / x64 |
Depending on how the library code is distributed, we extract and convert code similar to the above outlined methodology. In some cases, we also processed code found "as-is".
aPLib is a popular compression library implementing LZ.
Dates are estimates based on file timestamps found in distributed files.
Name | Date | Version | Compiler | MCRIT | SMDA |
---|---|---|---|---|---|
aPLib | 1998-05-03 | 0.12b | as distributed | x86 PE | x86 PE |
aPLib | 1998-09-23 | 0.17b | as distributed | x86 PE | x86 PE |
aPLib | 1998-10-03 | 0.18b | as distributed | x86 PE | x86 PE |
aPLib | 1998-11-05 | 0.19b | as distributed | x86 PE | x86 PE |
aPLib | 1999-01-14 | 0.20b | as distributed | x86 PE | x86 PE |
aPLib | 1999-05-26 | 0.22 | as distributed | x86 PE | x86 PE |
aPLib | 2001-01-24 | 0.26 | as distributed | x86 PE | x86 PE |
aPLib | 2002-04-18 | 0.36 | as distributed | x86 PE / x86 ELF | x86 PE / x86 ELF |
aPLib | 2004-10-16 | 0.42 | as distributed | x86 PE / x86 ELF | x86 PE / x86 ELF |
aPLib | 2005-10-08 | 0.43 | as distributed | x86 PE / x86 ELF | x86 PE / x86 ELF |
aPLib | 2008-06-22 | 0.44 | as distributed | x86 PE / x86 ELF | x86 PE / x86 ELF |
aPLib | 2009-07-29 | 1.01 | as distributed | x86 PE / x86 ELF / x64 PE / x64 ELF | x86 PE / x86 ELF / x64 PE / x64 ELF |
aPLib | 2014-01-20 | 1.10 | as distributed | x86 PE / x86 ELF / x64 PE / x64 ELF | x86 PE / x86 ELF / x64 PE / x64 ELF |
aPLib | 2014-07-21 | 1.11 | as distributed | x86 PE / x86 ELF / x64 PE / x64 ELF | x86 PE / x86 ELF / x64 PE / x64 ELF |
zlib is a popular compression library implementing the Deflate algorithm.
Dates taken from Changelog file / release notes.
Source lib files taken from the Shiftmedia project.
Name | Date | Version | Compiler | MCRIT | SMDA |
---|---|---|---|---|---|
libzlib | 2013-04-28 | 1.2.8 | MSVC12 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2013-04-28 | 1.2.8 | MSVC14 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2016-12-31 | 1.2.9 | MSVC12 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2016-12-31 | 1.2.9 | MSVC14 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2017-01-02 | 1.2.10 | MSVC12 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2017-01-02 | 1.2.10 | MSVC14 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2017-01-15 | 1.2.11 | MSVC12 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2017-01-15 | 1.2.11 | MSVC14 | x86 PE / x64 PE | x86 PE / x64 PE |
libzlib | 2017-01-15 | 1.2.11 | MSVC15 | x86 PE / x64 PE | x86 PE / x64 PE |