Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect-file-data: always inspect http body - v2 #9063

Closed
wants to merge 1 commit into from

Conversation

jasonish
Copy link
Member

Run inspection even if the HTTP response body is not done. This
optimization causes file_data inspection to occur after the file has
been pruned.

Ticket: https://redmine.openinfosecfoundation.org/issues/5868

Previous PR:

Changes from last PR:

  • PR is now against master as this is now an issue on master and 6.0.x
  • Just remove the optimization, instead of abort'ing in it

SV_BRANCH=pr/1225

Run inspection even if the HTTP response body is not done. This
optimization causes file_data inspection to occur after the file has
been pruned.

Ticket: OISF#5868
@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Merging #9063 (b56f965) into master (643e674) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9063      +/-   ##
==========================================
- Coverage   82.35%   82.33%   -0.02%     
==========================================
  Files         969      969              
  Lines      273655   273643      -12     
==========================================
- Hits       225359   225314      -45     
- Misses      48296    48329      +33     
Flag Coverage Δ
fuzzcorpus 64.57% <100.00%> (-0.01%) ⬇️
suricata-verify 60.61% <100.00%> (-0.05%) ⬇️
unittests 62.91% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

WARNING:

field baseline test %
SURI_TLPW2_single_stats_chk
.uptime 996 1480 148.59%
SURI_TLPW1_stats_chk
.uptime 219 247 112.79%
SURI_TLPR1_stats_chk
.tcp.overlap 1124925 1040224 92.47%

Pipeline 14779

@victorjulien victorjulien self-assigned this Jun 26, 2023
@victorjulien
Copy link
Member

I'm missing an analysis of the issue.

@victorjulien
Copy link
Member

I've done some analysis and what I'm coming to is the following:

there is a fundamental mismatch between how file_data works for HTTP1 and file tracking, as file_data in this case actually inspects the HTTP body, and not the extracted files. For this reason there is a File::flags flag FILE_USE_DETECT that all implementations use, except HTTP1. We've been tracking the body vs file issue here https://redmine.openinfosecfoundation.org/issues/4141

As a result of this mismatch, the file is "pruned" (meaning buffer slides forward) before the file.data inspection happens. The problem is not that the file.data happens too late, nor is the logic in removed in this PR an "optimization".

Ideally we move to File based inspection, but if we can't do that in time (@jlucovsky what is the status of that?) we should make sure the file prune/slide logic doesn't run too early in case of HTTP1.

@jlucovsky
Copy link
Contributor

I've done some analysis and what I'm coming to is the following:

there is a fundamental mismatch between how file_data works for HTTP1 and file tracking, as file_data in this case actually inspects the HTTP body, and not the extracted files. For this reason there is a File::flags flag FILE_USE_DETECT that all implementations use, except HTTP1. We've been tracking the body vs file issue here https://redmine.openinfosecfoundation.org/issues/4141

As a result of this mismatch, the file is "pruned" (meaning buffer slides forward) before the file.data inspection happens. The problem is not that the file.data happens too late, nor is the logic in removed in this PR an "optimization".

Ideally we move to File based inspection, but if we can't do that in time (@jlucovsky what is the status of that?) we should make sure the file prune/slide logic doesn't run too early in case of HTTP1.

Right now it's looking good except one s-v test fails (http-body-inspect). I posted an update to #8785 that includes performance measurements.

@jasonish
Copy link
Member Author

I've done some analysis and what I'm coming to is the following:

there is a fundamental mismatch between how file_data works for HTTP1 and file tracking, as file_data in this case actually inspects the HTTP body, and not the extracted files. For this reason there is a File::flags flag FILE_USE_DETECT that all implementations use, except HTTP1. We've been tracking the body vs file issue here https://redmine.openinfosecfoundation.org/issues/4141

As a result of this mismatch, the file is "pruned" (meaning buffer slides forward) before the file.data inspection happens. The problem is not that the file.data happens too late, nor is the logic in removed in this PR an "optimization".

Ideally we move to File based inspection, but if we can't do that in time (@jlucovsky what is the status of that?) we should make sure the file prune/slide logic doesn't run too early in case of HTTP1.

Understood, so it needs to be fixed from another direction. Any pointers there? I also wonder if we could get a comment on the special logic for HTTP in detect-file-data.c. Its a little interesting that removing that code didn't break anything, at least not in our basic set of tests.

@jasonish
Copy link
Member Author

I've done some analysis and what I'm coming to is the following:
there is a fundamental mismatch between how file_data works for HTTP1 and file tracking, as file_data in this case actually inspects the HTTP body, and not the extracted files. For this reason there is a File::flags flag FILE_USE_DETECT that all implementations use, except HTTP1. We've been tracking the body vs file issue here https://redmine.openinfosecfoundation.org/issues/4141
As a result of this mismatch, the file is "pruned" (meaning buffer slides forward) before the file.data inspection happens. The problem is not that the file.data happens too late, nor is the logic in removed in this PR an "optimization".
Ideally we move to File based inspection, but if we can't do that in time (@jlucovsky what is the status of that?) we should make sure the file prune/slide logic doesn't run too early in case of HTTP1.

Understood, so it needs to be fixed from another direction. Any pointers there? I also wonder if we could get a comment on the special logic for HTTP in detect-file-data.c. Its a little interesting that removing that code didn't break anything, at least not in our basic set of tests.

Forcing

    ff->flags |= FILE_USE_DETECT;

when http1 seems to do the trick as well.

@victorjulien
Copy link
Member

Forcing

    ff->flags |= FILE_USE_DETECT;

when http1 seems to do the trick as well.

This uses File::content_inspected. How is it updated in this case?

@jasonish
Copy link
Member Author

jasonish commented Jun 27, 2023

Forcing

    ff->flags |= FILE_USE_DETECT;

when http1 seems to do the trick as well.

This uses File::content_inspected. How is it updated in this case?

This would apply different sliding logic in FilePruneFile and does fix the tests - both the 0 size file StreamBufferGrow. Setting FILE_STORE works as well. Anyways, I'm missing a component here so have requested pointers.

For the 6.0 fix.

@jasonish
Copy link
Member Author

jasonish commented Jul 3, 2023

#9140

@jasonish jasonish closed this Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants