Skip to content

avast/genrex

Repository files navigation

GenRex 🦖

PyPI

GenRex is a tool that generates regular expressions from strings such as artifacts dynamically generated by samples.

For more information, check out:

Citation

For citing, please use the following entry for the original paper:

@misc{genrex-2023,
  doi = {10.1109/TrustCom60117.2023.00123},
  url = {...},
  author = {Regeciova, Dominika and Kolar, Dusan},
  title = {GenRex: Leveraging Regular Expressions for Dynamic Malware Detection},
  publisher = {IEEE},
  year = {2023}
}

🐍 Minimal supported version of Python is 3.10.

Installation

pip install -U genrex-py

How to Use

import genrex

# pre-process the input from cuckoo_format reports (default: False)
cuckoo_format = False

# When True, GenRex will store the original strings as well,
# when False, GenRex will store the processed strings only
store_original_strings = True

results = genrex.generate(cuckoo_format=cuckoo_format, store_original_strings=store_original_strings, directory="samples")

print("Results:")

for result in results:
    print(result)


"""
Results:
Regex: hello[0-9a-f]                               # regular expression
Ngram: hell                                        # common part of strings in cluster
Unique: 6                                          # how many unique strings are in cluster
Min: 3                                             # minimal number of occurrences in samples
Max: 3                                             # maximal number of occurrences in samples
Average: 3.0                                       # average number of occurrences in samples
Resources: ['hello1', 'hello2', ..., 'helloc']     # list of preprocessed strings from cluster
Originals: []                                      # list of original strings from cluster
                                                   # (if store_original_strings is True)
Input type: ''                                     # input types (if defined)
Hashes: ['source1', 'source2']                     # list of sources

['input_type', 'ngram', 'original_regexes', 'originals', 'regex', 'similar_regex', 'similars', 'unique']
"""    
import genrex

cuckoo_format = True

# When True, GenRex will store the original strings as well,
# when False, GenRex will store the processed strings only
store_original_strings = False

results = genrex.generate(
    cuckoo_format=cuckoo_format,
    store_original_strings=store_original_strings,
    input_type=genrex.InputType.MUTEX,
    source={
        "source1": ["helloa", "hellob", "helloc"],
        "source2": ["hello1", "hello2", "hello3"],
    }
)

print("Results:")

for result in results:
    res = result.return_printable_dict()
    print("Regex:", res["regex"])            # regular expression
    print("Ngram:", res["ngram"])            # common part of strings in cluster
    print("Unique:", res["unique"])          # how many unique strings are in cluster
    print("Min:", res["min"])                # minimal number of occurrences in samples
    print("Max:", res["max"])                # maximal number of occurrences in samples
    print("Avg:", res["average"])            # average number of occurrences in samples
    print("Resources:", res["resources"])    # list of preprocessed strings from cluster
    print("Type:", res["input_type"])        # input types (if defined)
    print("Hashes:", res["hashes"])          # list of sources

"""
Results:
Regex: (^|\\)hello[0-9a-f]$
Ngram: hell
Unique: 6
Min: 3
Max: 3
Avg: 3.0
Type: 'mutex'
Resources: ['helloc', 'hellob', 'hello3', 'hello1', 'helloa', 'hello2']
Hashes: ['source1', 'source2']
"""

How to develop

Install GenRex in development mode with all necessary dependencies.

make setup

Tests

You can run tests with the following command:

make tests

License

Copyright (c) 2024 Avast Software, licensed under the MIT license. See the LICENSE file for more details.