Skip to content

Metrics for building better malware ground truths

License

Notifications You must be signed in to change notification settings

chubbymaggie/STASE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is STASE ?

STASE provides a set of metrics to describe a dataset of malware labels.

Goal:

  • evaluate the properties of malware datasets
  • identify potential bias in experimental studies
  • analyze the decision and classification of antivirus products

Usage

Input: a dataset of labels formatted as a CSV or CSV.GZ file

  • columns: antivirus products
  • rows: malware files

Output: metrics introduce in this research paper (soon to be released)

Example:

python3 stase.py sample.csv.gz output.json

{
    "equiponderance": 0.2422919148,
    "equiponderance_idx":8.0,
    "exclusivity":0.2626262626,
    "recognition":0.1051423324,
    "synchronicity":0.1677210336,
    "genericity":0.5233236152,
    "uniformity":0.2926562999,
    "uniformity_idx":48.0,
    "divergence":0.7568027211,
    "consensuality":0.2227891156,
    "resemblance":0.6406466991,
    "labels":328.0,
    "apps":99.0,
    "avs":66.0,
}

Technical details:

  • implemented in Python 3 (dependencies in requirements.txt)
  • use multiprocessing for performance
  • shipped with Ouroboros

TODO

  • Handle more input formats and options

Pull request accepted !

About

Metrics for building better malware ground truths

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages