Skip to content

danielplohmann/mcrit-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCRIT Reference Data

This repository contains a collection of reference data that can be used with the MinHash-based Code Relationship & Investigation Toolkit (MCRIT).
The scope is to cover popular, typically statically linked code that is commonly encountered during binary / malware analysis. This includes both artefacts introduced by compilers themselves as well as (precompiled) third party libraries that provide access to common algorithms and data structures.

The data found in this repository has been processed with the following tool chain:

  • Starting with raw data, typically containing .LIB (.A) or .OBJ (.O), optionally 7z was used to extract the contents, then lib2smda has been used to instrument IDA Pro to parse these files, extract their code and symbols and finally export them into individual SMDA disassembly files.
  • These files are then merged into a single SMDA report, performing deduplication per PicHash and Function Symbol if appropriate.
  • Alternatively, .DLL and .EXE files have been directly processed using SMDA or optionally IDA Pro if *.PDB files are available.
  • Finally, the SMDA reports have been submitted once into a vanilla installation of MCRIT and the MCRIT export functionality has been used to convert to an immediately usable format.

This repository contains both the final SMDA files and the ready-to-import MCRIT files, which can be imported using Data/Import in MCRITweb or using the CLI.

This repository is intended to grow over time, as we find time to process more of the scattered artefacts from several previous endeavors.

If you feel that something especially relevant is missing, please open an issue and/or provide input data and we will see what we can do.

Compilers

Libraries

Compilers

Reference code extracted from all files containing precompiled code found in installations for various compiler toolchains.

Golang

Many thanks to Daniel Enders for creating these reference binaries during his Master thesis in 2022!
The source file used to compile these included as many Golang standard library files as possible to create coverage for common functions.
When using these with MCRIT, you probably want to have as few as possible / the most fitting version only as you may otherwise run into performance issues. We noticed that the similarity in Golang library functions can lead to huge candidate clusters for which all functions will have to be matched.

Name Date Version MCRIT SMDA
Golang 2014-05-05 1.2.2 x86 / x64 x86 / x64
Golang 2014-06-18 1.3 x86 / x64 x86 / x64
Golang 2014-12-10 1.4 x86 / x64 x86 / x64
Golang 2015-08-19 1.5 x86 / x64 x86 / x64
Golang 2016-02-17 1.6 x86 / x64 x86 / x64
Golang 2016-08-15 1.7 x86 / x64 x86 / x64
Golang 2017-02-16 1.8 x86 / x64 x86 / x64
Golang 2017-08-24 1.9 x86 / x64 x86 / x64
Golang 2018-02-16 1.10 x86 / x64 x86 / x64
Golang 2018-08-24 1.11 x86 / x64 x86 / x64
Golang 2019-02-25 1.12 x86 / x64 x86 / x64
Golang 2019-09-03 1.13 x86 / x64 x86 / x64
Golang 2020-02-25 1.14 x86 / x64 x86 / x64
Golang 2020-08-11 1.15 x86 / x64 x86 / x64
Golang 2021-02-16 1.16 x86 / x64 x86 / x64
Golang 2021-08-16 1.17 x86 / x64 x86 / x64
Golang 2022-03-15 1.18 x86 / x64 x86 / x64

Microsoft Visual Studio

Having used an installer for the respective version of VS, we crawl its directory structure to discover and process all *.LIB and *.OBJ, sort them by bitness, and merge the code found into a single file.
Thanks to Check Point Research for processing VS 2015, 2017, 2019, and 2022.

Name Version MCRIT SMDA
VS 6 Express 8168 x86 x86
VS 2003 Express 3077 x86 x86
VS 2005 Express 50727 x86 x86
VS 2008 Express ----- x86 x86
VS 2010 Express 30319 x86 x86
VS 2012 Express ----- x86 / x64 x86 / x64
VS 2013 Express ----- x86 / x64 x86 / x64
VS 2015 Pro ----- x86 / x64 x86 / x64
VS 2017 Pro ----- x86 / x86-MFC / x64 / x64-MFC x86 / x86-MFC / x64 / x64-MFC
VS 2019 Pro ----- x86 / x86-MFC / x64 / x64-MFC x86 / x86-MFC / x64 / x64-MFC
VS 2022 Pro ----- x86 / x86-MFC / x64 / x64-MFC x86 / x86-MFC / x64 / x64-MFC

MinGW

Having used an installer for the Windows version of a MinGW release, we crawl its directory structure to discover and process all *.A and *.O, sort them by bitness, and merge the code found into a single file.

Name Date Version MCRIT SMDA
MinGW r1 XXXX-XX-XX - x86 / x64 x86 / x64
MinGW r2 XXXX-XX-XX - x86 / x64 x86 / x64
MinGW r3 2012-07-14 trunk_r5214 gcc4.7.1 binutils cvs-20120714 x86 / x64 x86 / x64
MinGW r4 2012-10-27 v2.0.7 gcc4.7.2 binutils2.23 x86 / x64 x86 / x64
MinGW r5 2012-11-04 v2.0.7 gcc4.7.2 binutils2.23 x86 / x64 x86 / x64
MinGW r6 2013-04-13 v2.0.8 gcc4.7.3 binutils2.23.2 x86 / x64 x86 / x64
MinGW r7 2013-04-13 trunk_r5784 gcc4.8.0 binutils2.23.2 x86 / x64 x86 / x64
MinGW r8 2013-06-01 trunk_r5876 gcc4.8.1 binutils2.23.2 x86 / x64 x86 / x64
MinGW r9 - - x86 / x64 x86 / x64
MinGW r10 2013-11-17 v3.0.0 gcc4.8.2 binutils2.23.2 x86 / x64 x86 / x64
MinGW r11 2014-05-22 v3.1.0 gcc4.8.3 binutils2.24 x86 / x64 x86 / x64
MinGW r12 2014-07-30 v3.1.0 gcc4.9.1 binutils2.24 x86 / x64 x86 / x64
MinGW r13 2014-11-10 v3.3.0 gcc4.9.2 binutils2.24 x86 / x64 x86 / x64
MinGW r14 2015-06-30 v4.0.2 gcc4.9.3 binutils2.25 x86 / x64 x86 / x64
MinGW r15 2015-07-10 v4.0.2 gcc5.1 binutils2.25 x86 / x64 x86 / x64
MinGW r16 2015-07-21 v4.0.2 gcc5.2 binutils2.25 x86 / x64 x86 / x64
MinGW r17 2015-12-01 v4.0.4+ gcc5.2 binutils2.25.1 x86 / x64 x86 / x64
MinGW r18 2015-12-05 v4.0.4+ gcc5.3 binutils2.25.1 x86 / x64 x86 / x64
MinGW r19 2016-06-14 v4.0.6 gcc5.4 binutils2.25.1 x86 / x64 x86 / x64
MinGW r20 2016-06-14 v4.0.6 gcc6.1 binutils2.25.1 x86 / x64 x86 / x64
MinGW r21 2016-09-27 v4.0.6 gcc6.2 binutils2.27 x86 / x64 x86 / x64
MinGW r22 2016-12-29 v4.0.6 gcc6.3 binutils2.27 x86 / x64 x86 / x64
MinGW r23 - - x86 / x64 x86 / x64
MinGW r24 - - x86 / x64 x86 / x64
MinGW r25 2017-02-20 v5.0.1+1 gcc6.3 binutils2.27 x86 / x64 x86 / x64
MinGW r26 2017-06-02 v5.0.2 gcc7.1 binutils2.28 x86 / x64 x86 / x64
MinGW r27 2017-08-16 v5.0.2 gcc7.2 binutils2.29 x86 / x64 x86 / x64
MinGW r28 2018-02-07 v5.0.3 gcc7.3 binutils2.29.1 x86 / x64 x86 / x64
MinGW r29 2018-11-01 v5.0.4 gcc8.2 binutils2.31.1 x86 / x64 x86 / x64
MinGW r30 2019-02-27 v6.0.0 gcc8.3 binutils2.31.1 x86 / x64 x86 / x64
MinGW r31 2019-10-14 v6.0.0 gcc9.2 binutils2.32 x86 / x64 x86 / x64
MinGW r32 2020-04-30 v7.0.0 gcc9.3 binutils2.34 x86 / x64 x86 / x64
MinGW r33 2021-02-27 v8.0.0 gcc10.2 binutils2.36.1 x86 / x64 x86 / x64
MinGW r34 2021-07-13 v8.0.2 gcc10.3 binutils2.36.1 x86 / x64 x86 / x64
MinGW r35 2021-08-15 v9.0.0 gcc11.2 binutils2.36.1 x86 / x64 x86 / x64
MinGW r36 - - x86 / x64 x86 / x64
MinGW r37 2022-04-26 v10.0.0 gcc11.3 binutils2.38 x86 / x64 x86 / x64
MinGW r38 2022-08-23 v10.0.0 gcc12.2 binutils2.39 x86 / x64 x86 / x64

Nim

Thanks to Nim-IDA-FLIRT-Generator by @hunterbr72, we were able to produce object files for Nim, which we could then turn into MCRIT symbols.

Name Version MCRIT SMDA
Nim 1.2.10 x86 / x64 x86 / x64
Nim 1.4.8 x86 / x64 x86 / x64
Nim 1.6.14 x86 / x64 x86 / x64

Rust

Ben Herzog wrote a great reverser's guide to Rust and provided some example binaries with full symbols (PDB) and covering different standard library functions.

Name Date Version MCRIT SMDA
Rust RE-Tour 2023-06-01 Rosetta x86 / x64 x86 / x64

Libraries

Depending on how the library code is distributed, we extract and convert code similar to the above outlined methodology. In some cases, we also processed code found "as-is".

aPLib

aPLib is a popular compression library implementing LZ.
Dates are estimates based on file timestamps found in distributed files.

Name Date Version Compiler MCRIT SMDA
aPLib 1998-05-03 0.12b as distributed x86 PE x86 PE
aPLib 1998-09-23 0.17b as distributed x86 PE x86 PE
aPLib 1998-10-03 0.18b as distributed x86 PE x86 PE
aPLib 1998-11-05 0.19b as distributed x86 PE x86 PE
aPLib 1999-01-14 0.20b as distributed x86 PE x86 PE
aPLib 1999-05-26 0.22 as distributed x86 PE x86 PE
aPLib 2001-01-24 0.26 as distributed x86 PE x86 PE
aPLib 2002-04-18 0.36 as distributed x86 PE / x86 ELF x86 PE / x86 ELF
aPLib 2004-10-16 0.42 as distributed x86 PE / x86 ELF x86 PE / x86 ELF
aPLib 2005-10-08 0.43 as distributed x86 PE / x86 ELF x86 PE / x86 ELF
aPLib 2008-06-22 0.44 as distributed x86 PE / x86 ELF x86 PE / x86 ELF
aPLib 2009-07-29 1.01 as distributed x86 PE / x86 ELF / x64 PE / x64 ELF x86 PE / x86 ELF / x64 PE / x64 ELF
aPLib 2014-01-20 1.10 as distributed x86 PE / x86 ELF / x64 PE / x64 ELF x86 PE / x86 ELF / x64 PE / x64 ELF
aPLib 2014-07-21 1.11 as distributed x86 PE / x86 ELF / x64 PE / x64 ELF x86 PE / x86 ELF / x64 PE / x64 ELF

libzlib

zlib is a popular compression library implementing the Deflate algorithm.
Dates taken from Changelog file / release notes.
Source lib files taken from the Shiftmedia project.

Name Date Version Compiler MCRIT SMDA
libzlib 2013-04-28 1.2.8 MSVC12 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2013-04-28 1.2.8 MSVC14 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2016-12-31 1.2.9 MSVC12 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2016-12-31 1.2.9 MSVC14 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2017-01-02 1.2.10 MSVC12 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2017-01-02 1.2.10 MSVC14 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2017-01-15 1.2.11 MSVC12 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2017-01-15 1.2.11 MSVC14 x86 PE / x64 PE x86 PE / x64 PE
libzlib 2017-01-15 1.2.11 MSVC15 x86 PE / x64 PE x86 PE / x64 PE

About

A collection of ready-to-use library code and symbols for the MinHash-based Code Relationship & Investigation Toolkit (MCRIT)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages