This repository has the aim to help who wants to study malwares.
In the datasets folder you can find three different (compressed) datasets:
- labeled_3131_malwares_json.tar.gz - Contains the scans of 3131 malwares in json format.
- labeled_5414malwares.tar.gz - Contains the scans of 5414 malwares in a raw format.
- labeled_8074malwares_json.tar.gz - Contains the scans, made with VirusTotal, of 8074 malwares in json format.
The total unique samples across the three datasets are 16467.
Another dataset with the scans of Virus Total and the associated reports of Cuckoo Sandbox on 5351 samples is here
In the tools folder you can find the following scripts:
- vtextractor.sh - Allows to download the reports associated to a list of MD5s in a raw format.
- jsonconverter.sh - Generate json files from raw files using a structure compatible with AVclass.
GPLv3+