MS3 atomic updates of persistent integer variables doesnt work in MT environment #2306

mmelo-yottaa · 2020-05-04T20:40:59Z

Describe the bug

I added the following rule to OWASP CRS v3.0.2 running in Modsecurity 3.

SecRule &ARGS "@eq 0" "id:942001, phase:2,nolog,pass, setvar:'resource.xcount=+1'"

and the following configuration to initialize a resource collection:

SecAction "id:90003,phase:1,nolog,pass,initcol:resource=%{REQUEST_HEADERS:Host}"

Examining LMDB after pushing 1000s of requests that match this rule through modsec shows:

for single threaded modsec, the counter is always correct
for multithreaded modsec, the counter is off by small amount

Which indicates to me the read-modify-write of the counter is not atomic and some threads are very occasionally pushing back an incremented obsolete counter value.

Per https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-(v2.x)#persistent-storage - seems I should be able to atomically update a persistent storage variable in a modsec3 multithreaded environment.

I am running libmodsecurity v3.0.4 with nginx 1.15.8 and v1.0.1 of the nginx modsec connector. libmodsecurity and the LMDB consumer are both built with Version: LMDB 0.9.24: (July 24, 2019)

If I send 1000s of requests to nginx+modsec serially via ' ab -n 5000 -c 1 -H "Host: www.yssltest.com" http://127.0.0.1:80/index.html', the counter is correct after all requests are processed. ab's report matches the number of nginx access log entries, matches the counter in LMDB.

If I use "ab -n 5000 -c 10 -H "Host: www.yssltest.com" (sending requests on 10 concurrent connections) ab's report matches the number of nginx access log entries but the counter in LMDB is behind by a small amount (usually ~5 lost counts for 5000 requests).

The host is a 4 core system, nginx is configured for a master process, 4 worker processes, and 1024 connections.

Logs and dumps

Output of:

DebugLogs (level 9)
AuditLogs
Error logs
If there is a crash, the core dump file.

Notice: Be carefully to not leak any confidential information.

To Reproduce

Steps to reproduce the behavior:

A curl command line that mimics the original request and reproduces the problem. Or a ModSecurity v3 test case.

[e.g: curl "modsec-full/ca/..\..\..\..\..\..\/\etc/\passwd" or issue-394.json]

Expected behavior

I expected the counter in persistent collection in LMDB to match the number of rule hits both in a single thread environment and a multithreaded environment.

Server (please complete the following information):

ModSecurity v3.0.4 with nginx-connector v1.0.1]
nginx-1.15.8
LMDB 0.9.24: (July 24, 2019)
host is CentOS7 7.6.1810

Rule Set (please complete the following information):

OWAS CRS v3.0.2

Additional context

ab output showing 1000 requests:

mike@centos7:~$ ab -n 1000 -c 10 -H "Host: www.waftest.com" http://127.0.0.1:80/index.html
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests

Server Software: nginx/1.15.8
Server Hostname: 127.0.0.1
Server Port: 80

Document Path: /index.html
Document Length: 612 bytes

Concurrency Level: 10
Time taken for tests: 25.736 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 845000 bytes
HTML transferred: 612000 bytes
Requests per second: 38.86 [#/sec] (mean)
Time per request: 257.365 [ms] (mean)
Time per request: 25.736 [ms] (mean, across all concurrent requests)
Transfer rate: 32.06 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 54 256 129.1 220 916
Waiting: 15 166 103.6 130 839
Total: 54 256 129.1 220 916

Percentage of the requests served within a certain time (ms)
50% 220
66% 273
75% 320
80% 348
90% 437
95% 520
98% 596
99% 654
100% 916 (longest request)
mike@centos7:~$

NGINX access logs showing 1000 processed requests, all receiving 200/OK:
mike@centos7:/opt/lmdb_test$ sudo rm /var/log/nginx/access.log
... test runs...
mike@centos7:/opt/lmdb_test$ sudo wc -l /var/log/nginx/access.log
1000 /var/log/nginx/access.log

All lines in access log are "[04/May/2020:16:xx:xx -0400] - www.waftest.com - 127.0.0.1 "GET /index.html HTTP/1.0" 200 "-" "ApacheBench/2.3" - - 0.000"

NGINX error logs:
no entries

Read of LMDB after MT run, counter is off by 1 (it reads 999):

mike@centos7:/opt/ModSecurity$ sudo /opt/lmdb_test/miketest /opt/ModSecurity-nginx/modsec-shared-collections
Environment Info
Map address: 0x7f123b817000
Map size: 10485760
Page size: 4096
Max pages: 2560
Number of pages used: 8
Last transaction ID: 7000
Max readers: 126
Number of readers used: 0
Version: LMDB 0.9.24: (July 24, 2019)
...
key: 0x7f123b819fe4 www.waftest.com::::xcount, data: 0x7f123b819ffd 999

Here is the counter added to CRS REQUEST-942-APPLICATION-ATTACK-SQLI.conf (complete rule file uploaded):
REQUEST-942-APPLICATION-ATTACK-SQLI.conf.txt

SecRule &ARGS "@eq 0" "id:942001,
phase:2,nolog,pass,
setvar:'tx.xstart=%{DURATION}',
setvar:'resource.xcount=+1'"

Collection is initialized in modsecurity.conf (complete modsec conf uploaded):

modsecurity.conf.txt
SecAction "id:90003,phase:1,nolog,pass,initcol:resource=%{REQUEST_HEADERS:Host}"

zimmerle · 2020-05-05T00:12:43Z

Hi @mmelo-yottaa,

We are missing a manual for all the wonders in v3, including the details on the implementation of the collections.

As of v3, the collection backend is the one that is theoretically responsible to handle concurrent access (read and/or write). Therefore, concurrence should be handled on the LMDB, memory, or what else you have selected as a backend. It was made that way because it meant to extend the collections to also be used to external key:value servers (#1139, #1140). Notice that there is a minimal delta from the time that the data is read to the time that the data is written, that is on ModSec and that is likely to be the cause of the discrepancy that you have noticed.

From our experience having a lock in place for "atomicity" may not be a practical approach in a modern server. For an Apache with 4 threads, that is ok, but, image an Nginx with 64+ workers. The lock will be ending up hurting the performance. Looking at that scenario, we have opted to simplify the collections to address possibilities such as collection for your entire data server (remote key:value servers) in the opposite for a collection in a single server or worker.

Usually, the rules that use such increments are made to avoid enumeration (brute force of any flavor); in busy servers, it is acceptable to have the discrepancy as oppose to delivery content slowly to the user. Tell me more about your use case, we may be able to help you with the rules.

Notice: It is possible to reduce the processing time to compute +1 just before save, however, we may have to sacrifice usability (logs) and at the end of the day, won't be a guarantee that discrepancy hasn't occurred. Anyway, tell me more about your use case.

mmelo-yottaa · 2020-05-05T20:16:27Z

Hi Felipe!

What I'm trying to accomplish is to gather metrics from MS3 showing execution times for different categories of CRS rules. I am trying to gather these statistics by origin domain, since we are a service-provider and hence we front-end web traffic for many different origins. To minimize the overhead this metric-gathering could add to MS3 I am looking to create a persistent resource collection that will store only raw data (per-site) - a count and total execution time for each rule category. Processing the data will happen outside the MS3 process.

We also allow our customers to create custom rules that can execute before and/or after the CRS rule sets. The execution-time metrics will allow us to see 1) specific site traffic that might require abnormally long execution times for certain rule subsets and 2) specific site custom rules that take an disproportional amount of execution time. Once we have the metrics we can optimize both rules and client-originated requests.

Another approach to export MS3 metrics to an external collector might be via the modsecurity nginx connector. But from my look at the connector source code it seems NGINX and libmodsecurity are very loosely coupled with respect to being able to push information from libmodsecurity to NGINX for NGINX to process.

Thanks for the info about the atomicity of persistent collections, I dont think it would work for us to collect metrics if we didn't have a high level of confidence the numbers were accurate. It seems LMDB supports transactions (one can 'bracket' calls to mdb_get/mdb_put with txn_begin/txn_commit) but the LMDB API is masked behind MS3's persistent-collection "API". I wouldn't want to direct-access LMDB and potentially contend with MS3 accessing it.

Any further information and/or suggestions are very welcome! Thanks.

zimmerle · 2020-05-11T22:47:17Z

Your suggestion is a very ModSecurity "hackish" way to measure those numbers. I would suggest using something different, something outside ModSecurity. I think Linux instrumentations will be a perfect fit for it, in specific the SystemTap.

Here is a very minimal stap script that may give you an idea for a starting point:
https://github.com/SpiderLabs/ModSecurity-nginx/blob/master/ngx-modsec.stp

With stap it is possible to create more comprehensive scripts that give the power to achieve the numbers that you are looking for, without putting too much overhead in your experiment, by crafting code atop of the rule engine.

Once you have such data captured you can use utilities such as flame graphs to visualize it. A made this script for a talk on nignxconf back in 2015, to use as an example we may be able to develop something fancier. Let me know what do you think.

defanator · 2020-05-12T07:27:08Z

@zimmerle by the way, that script needs some attention after a number of the changes introduced within the library - it doesn't work in its current state. I've noticed that a few weeks ago, haven't had a chance to take a deeper look myself.

zimmerle self-assigned this May 4, 2020

zimmerle added the 3.x Related to ModSecurity version 3.x label May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MS3 atomic updates of persistent integer variables doesnt work in MT environment #2306

MS3 atomic updates of persistent integer variables doesnt work in MT environment #2306

mmelo-yottaa commented May 4, 2020 •

edited

Loading

zimmerle commented May 5, 2020

mmelo-yottaa commented May 5, 2020

zimmerle commented May 11, 2020

defanator commented May 12, 2020

MS3 atomic updates of persistent integer variables doesnt work in MT environment #2306

MS3 atomic updates of persistent integer variables doesnt work in MT environment #2306

Comments

mmelo-yottaa commented May 4, 2020 • edited Loading

zimmerle commented May 5, 2020

mmelo-yottaa commented May 5, 2020

zimmerle commented May 11, 2020

defanator commented May 12, 2020

mmelo-yottaa commented May 4, 2020 •

edited

Loading