State Change

ember/features.py: change row variables -2018.10
remove resource directory -2018.10
change script files -2018.10
add 01_extract.py, 02_train.py, 03_predict.py, 04_get_accuracy.py -2018.10
(this refer to ember/init.py, ember/features.py)
add utils directory -2018.10
add Test directory -2018.10
add output directory -2018.12
add multiprocess job of extracting freature - 2019.01
Failed to develop multiprocess predcit. The AI framework developer ban it. - 2019.01

# Reference https://github.com/endgameinc/ember

H. Anderson and P. Roth, "EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”, in ArXiv e-prints. Apr. 2018.

@ARTICLE{2018arXiv180404637A,  
  author = {{Anderson}, H.~S. and {Roth}, P.},  
  title = "{EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models}",  
  journal = {ArXiv e-prints},  
  archivePrefix = "arXiv",  
  eprint = {1804.04637},  
  primaryClass = "cs.CR",  
  keywords = {Computer Science - Cryptography and Security},  
  year = 2018,  
  month = apr,  
  adsurl = {http://adsabs.harvard.edu/abs/2018arXiv180404637A},  
}

Install

Above python 3.6.8

sudo apt install python-pip3

;Install virtualenv
$ virtualenv env -p python3
$ . ./env/bin/activate

;Install python modules
(env)$ pip3 install -r requirements.txt

Prerequisite

inputfile(csv including label) structure without column's names

how to Run

Progress

01_extract.py or 01_extract_multi.py
02_train.py
03_predict.py
04_get_accuracy.py

Detail

extract features from trainsets
If you run, jsonl file is created.

(env)python 01_extract.py -d [TrainSet path] -c [TrainSet label path] -o [output path]

If you want to mulitprocess, try 01_extract_multi.py.
My computer is I7-8700 and not use Graphic card.
When I use 01_extract_multi.py, It is faster 1500% than 01.extract.py

Note that you must change number of processor and number of trainsets

82: pool = multiprocessing.Pool(number of processor)

88: for x in tqdm.tqdm(pool.imap_unordered(extract_unpack, extractor_iterator), total=number of trainsets):

(env)python 01_extract_multi.py -d [TrainSet path] -c [TrainSet label path] -o [output path]

train.py

(env) python 02_train.py -d [jsonl path] -o [output path]

03_predict.py

(env) python 03_predict.py -m [model.txt path] -d [testdataset path] -o [output path]

4. 04_get_accuracy.py ``` (env) python 04_get_accuracy.py -c [result of 03_predict.py path] -l [tesdataset label path] ```

To Do

Pipelien from scikit-learn.
GUI or web UI.
guide videos.
01_extract_multi.py auto setting.
K-Fold evaluation

Screenshot of run

01_extract.py

03_predict.py

04_get_accuracy.py

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
ember		ember
output		output
screenshot		screenshot
utils		utils
.gitignore		.gitignore
01_extract.py		01_extract.py
01_extract_multi.py		01_extract_multi.py
02_train.py		02_train.py
03_predict.py		03_predict.py
04_get_accuarcy.py		04_get_accuarcy.py
LICENSE		LICENSE
README.md		README.md
Threshold.ipynb		Threshold.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

State Change

Install

Prerequisite

how to Run

Progress

Detail

To Do

Screenshot of run

About

Releases

Packages

Languages

License

choisungwook/ember

Folders and files

Latest commit

History

Repository files navigation

State Change

Install

Prerequisite

how to Run

Progress

Detail

To Do

Screenshot of run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages