dataset: cleanup datasets that hit the memcap while loading #10155

norg · 2024-01-12T14:42:10Z

Datasets that hit the memcap limit need to be discarded if the memcap is hit or otherwise the datasets are still loaded with partial data while the signature is not loaded due to the memcap error.

Ticket: #6678

Make sure these boxes are signed before submitting your Pull Request -- thank you.

I have read the contributing guide lines at
https://docs.suricata.io/en/latest/devguide/contributing/contribution-process.html
I have signed the Open Information Security Foundation contribution agreement at
https://suricata.io/about/contribution-agreement/ (note: this is only required once)
I have updated the user guide (in doc/userguide/) to reflect the changes made (if applicable)

Link to redmine ticket:

https://redmine.openinfosecfoundation.org/issues/6678

Describe changes:

move the memcap check within the DatasetGet function to ensure a proper cleanup

Provide values to any of the below to override the defaults.

To use a pull request use a branch name like pr/N where N is the
pull request number.

Alternatively, SV_BRANCH may also be a link to an
OISF/suricata-verify pull-request.

SV_REPO=
SV_BRANCH=
SU_REPO=
SU_BRANCH=
LIBHTP_REPO=
LIBHTP_BRANCH=

Datasets that hit the memcap limit need to be discarded if the memcap is hit or otherwise the datasets are still loaded with partial data while the signature is not loaded due to the memcap error. Ticket: OISF#6678

codecov · 2024-01-12T15:00:42Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (1dcf69b) 82.19% compared to head (8900975) 82.08%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10155      +/-   ##
==========================================
- Coverage   82.19%   82.08%   -0.12%     
==========================================
  Files         974      974              
  Lines      271825   271825              
==========================================
- Hits       223416   223115     -301     
- Misses      48409    48710     +301

Flag	Coverage Δ
fuzzcorpus	`62.66% <33.33%> (-0.35%)`	⬇️
suricata-verify	`61.39% <33.33%> (-0.02%)`	⬇️
unittests	`62.84% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

suricata-qa · 2024-01-12T21:30:04Z

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

field	baseline	test	%
	SURI_TLPW2_autofp_stats_chk
.uptime	181	192	106.08%

Pipeline 17495

catenacyber · 2024-01-15T15:18:48Z

How did you test this ?

norg · 2024-01-15T15:55:13Z

How did you test this ?

I generated a dataset with random HTTP URLs and added entries of those to the list until I triggered the warning but I was still able to check for those URLs via suricatasc as described in redmine.

I will add the example set file to redmine.

inashivb

Your patch makes sense to me. Thanks! :)
Do you have an idea about how to have such stuff tested in our test suite?

norg · 2024-01-15T16:10:31Z

Your patch makes sense to me. Thanks! :) Do you have an idea about how to have such stuff tested in our test suite?

We could start with what we have on redmine so one single rule, the dataset file and let it start, check for the error message and afterwards use the socket to check for an entry being available (which it should not). Not sure if that is fully being covered by suricata-verify options.

catenacyber · 2024-01-15T20:21:54Z

Thanks for the replies.

Can it be tested with a suricata-verify test ? with a rule using a dataset with a small memcap..?

inashivb · 2024-01-16T05:33:59Z

Thanks for the replies.

Can it be tested with a suricata-verify test ? with a rule using a dataset with a small memcap..?

My guess is no. Because, we will get error messages in both the cases (before and after this patch) and the only real test is checking if the dataset still exists after there has been an error (i.e. the hash cleanup wasn't performed). Since we don't integrate suricatasc in any way w s-v, I'm not very clear how to add this as a test. 🤔

catenacyber · 2024-01-16T10:10:54Z

the only real test is checking if the dataset still exists after there has been an error (i.e. the hash cleanup wasn't performed)

Is there a memory leak ?

catenacyber · 2024-01-16T10:12:18Z

src/detect-dataset.c

@@ -406,10 +406,6 @@ int DetectDatasetSetup (DetectEngineCtx *de_ctx, Signature *s, const char *rawst
        SCLogError("failed to set up dataset '%s'.", name);
        return -1;
    }
-    if (set->hash && SC_ATOMIC_GET(set->hash->memcap_reached)) {


What about detect-data rep.c ?

that should be covered as well since this also uses the DatasetGet function and would now benefit from the same handling within src/dataset.c so even a good additional argument to move it like we plan to :)

So in detect-datarep.c it was currently missing completely to check for memcap reached.

Looks good indeed.

Should that be reflected in the commit message ?

I think it's already quite generic to datasets

I think it's already quite generic to datasets

From what I understand, it is different :

this commit adds the check for datarep, it did not exist before (it should be testable by suricata-verify)

this commit moves the check for dataset to avoid dangling memory leaking before final cleanup

If this is correct, I think the commit message should reflect the behavior change for the datarep keyword...

inashivb · 2024-01-16T11:12:59Z

Is there a memory leak ?

Not exactly. See this code path here in master: https://github.com/OISF/suricata/blob/master/src/datasets.c#L752
This list should not have been updated as the latest set crosses memcap limits.
So, as a fix for this, Andreas adds a check right before this updation and frees the recently alloc'd set in https://github.com/OISF/suricata/blob/master/src/datasets.c#L680

catenacyber

Thanks for the work

CI : ✅
Code : cool
Commits segmentation : ok
Commit messages : 🟠 This fixes more than the commit message says as it also fixes completely memcap for datarep keyword
Git ID set : looks fine for me
CLA : ok
Doc update : not needed
Redmine ticket : ok
Rustfmt : not needed
Tests : 🟠 maybe some test can be added (SV test with datarep ? )
Dependencies added: none

catenacyber

Could you rebase with an improved commit message Andreas ?

norg · 2024-04-16T12:48:01Z

Could you rebase with an improved commit message Andreas ?

done via #10858

dataset: cleanup datasets that hit the memcap while loading

8900975

Datasets that hit the memcap limit need to be discarded if the memcap is hit or otherwise the datasets are still loaded with partial data while the signature is not loaded due to the memcap error. Ticket: OISF#6678

norg requested a review from victorjulien as a code owner January 12, 2024 14:42

inashivb reviewed Jan 15, 2024

View reviewed changes

catenacyber reviewed Jan 16, 2024

View reviewed changes

catenacyber reviewed Jan 17, 2024

View reviewed changes

catenacyber requested changes Mar 21, 2024

View reviewed changes

norg closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset: cleanup datasets that hit the memcap while loading #10155

dataset: cleanup datasets that hit the memcap while loading #10155

norg commented Jan 12, 2024

codecov bot commented Jan 12, 2024

suricata-qa commented Jan 12, 2024

catenacyber commented Jan 15, 2024

norg commented Jan 15, 2024

inashivb left a comment

norg commented Jan 15, 2024

catenacyber commented Jan 15, 2024

inashivb commented Jan 16, 2024

catenacyber commented Jan 16, 2024

catenacyber Jan 16, 2024

norg Jan 16, 2024 •

edited

catenacyber Jan 16, 2024

norg Jan 17, 2024

catenacyber Jan 17, 2024

inashivb commented Jan 16, 2024

catenacyber left a comment •

edited

catenacyber left a comment

norg commented Apr 16, 2024

dataset: cleanup datasets that hit the memcap while loading #10155

dataset: cleanup datasets that hit the memcap while loading #10155

Conversation

norg commented Jan 12, 2024

Provide values to any of the below to override the defaults.

codecov bot commented Jan 12, 2024

Codecov Report

suricata-qa commented Jan 12, 2024

catenacyber commented Jan 15, 2024

norg commented Jan 15, 2024

inashivb left a comment

Choose a reason for hiding this comment

norg commented Jan 15, 2024

catenacyber commented Jan 15, 2024

inashivb commented Jan 16, 2024

catenacyber commented Jan 16, 2024

catenacyber Jan 16, 2024

Choose a reason for hiding this comment

norg Jan 16, 2024 • edited

Choose a reason for hiding this comment

catenacyber Jan 16, 2024

Choose a reason for hiding this comment

norg Jan 17, 2024

Choose a reason for hiding this comment

catenacyber Jan 17, 2024

Choose a reason for hiding this comment

inashivb commented Jan 16, 2024

catenacyber left a comment • edited

Choose a reason for hiding this comment

catenacyber left a comment

Choose a reason for hiding this comment

norg commented Apr 16, 2024

norg Jan 16, 2024 •

edited

catenacyber left a comment •

edited