Assemblage is a distributed binary corpus discovery, generation, and archival tool built to provide high-quality labeled metadata for the purposes of building training data for machine learning applications of binary analysis and other applications (static / dynamic analysis, reverse engineering, etc...).
You can now find our paper on arxiv
A brief introduction to the APIs is provided at this link, and deployment instructions can be found here
We include only the subset of binaries for which permissive licenses can be ascertained, please checkout our data sheet.
For up to date info and download, please visit the dataset page
The code in this repository is published under MIT license.