Curl_fnmatch takes too long time on complicated pattern #2291

Closed
bagder opened this Issue Feb 6, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@bagder
Member

bagder commented Feb 6, 2018

I did this

Run test case 1307 with the attached patch applied. This is OSS-fuzz issue 5908 (still not open to the public).

Patch: 0001-unit1307-add-test-that-takes-too-long-time-to-execut.patch

I expected the following

The test should execute swiftly and return error/success.

This happened

The execution of unit1307 takes many seconds. On my 3.5GHz machine, it takes more than 7 seconds. On slower hardware it can take much more time.

curl/libcurl version

Both 7.58.0 and git master work like this

operating system

OS agnostic.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 6, 2018

Member

ping @monnerat, would you mind taking a look? You fixed it up recently, maybe you have a good idea on how this can be improved...

Member

bagder commented Feb 6, 2018

ping @monnerat, would you mind taking a look? You fixed it up recently, maybe you have a good idea on how this can be improved...

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 6, 2018

Collaborator

would you mind taking a look?

I'm quite busy right now, but I'll do some profiling on it ASAP and report here.
I can already imagine this is probably caused by backtracking for the * processing (even if you have limited occurrences to 5). Code profiling will tell us more.

Your attached patch shows empty here... without patch, runtest.pl 1307 takes less than a second here. If not yet to be disclosed, you can send this patch to my perso e-mail :-)

Collaborator

monnerat commented Feb 6, 2018

would you mind taking a look?

I'm quite busy right now, but I'll do some profiling on it ASAP and report here.
I can already imagine this is probably caused by backtracking for the * processing (even if you have limited occurrences to 5). Code profiling will tell us more.

Your attached patch shows empty here... without patch, runtest.pl 1307 takes less than a second here. If not yet to be disclosed, you can send this patch to my perso e-mail :-)

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 6, 2018

Member
curl -L https://github.com/curl/curl/files/1699088/0001-unit1307-add-test-that-takes-too-long-time-to-execut.patch.txt

... works for me!

Member

bagder commented Feb 6, 2018

curl -L https://github.com/curl/curl/files/1699088/0001-unit1307-add-test-that-takes-too-long-time-to-execut.patch.txt

... works for me!

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 7, 2018

Collaborator

... works for me!

Strange: can get it with curl, but FF does not show it :-(

Now that I can see it, I have a straightforward explanation: this is a pathological case!
There are 3 recursivity levels, always failing at the end of string. In addition, the current algorithm tries to match (recursively) the ?s with each string character; * regrouping is not performed because they are separated by the ?s.

I think I can develop a patch for this particular case, but we can think of other pathological cases, like
"*[A-Z]*[A-Z]*[[:blank:]]", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

IMHO, the only way to avoid them is to compile the pattern into a Deterministic Finite Automaton first, then apply this DFA to the string. The drawbacks are: bigger code, possible more time consumed in non-pathological cases and possible high memory needs depending on the pattern. By using such a DFA, backtracking is never needed. But maybe this is a bit "overthought" for such an insignifiant function ...?

I'll submit a PR with a fix for the "*?*?*" case.

Collaborator

monnerat commented Feb 7, 2018

... works for me!

Strange: can get it with curl, but FF does not show it :-(

Now that I can see it, I have a straightforward explanation: this is a pathological case!
There are 3 recursivity levels, always failing at the end of string. In addition, the current algorithm tries to match (recursively) the ?s with each string character; * regrouping is not performed because they are separated by the ?s.

I think I can develop a patch for this particular case, but we can think of other pathological cases, like
"*[A-Z]*[A-Z]*[[:blank:]]", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

IMHO, the only way to avoid them is to compile the pattern into a Deterministic Finite Automaton first, then apply this DFA to the string. The drawbacks are: bigger code, possible more time consumed in non-pathological cases and possible high memory needs depending on the pattern. By using such a DFA, backtracking is never needed. But maybe this is a bit "overthought" for such an insignifiant function ...?

I'll submit a PR with a fix for the "*?*?*" case.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 7, 2018

Member

a pathological case

That's certainly not an understatement. The result of the fuzzer throwing crazy input at our APIs.

Member

bagder commented Feb 7, 2018

a pathological case

That's certainly not an understatement. The result of the fuzzer throwing crazy input at our APIs.

@monnerat monnerat closed this in a0984ea Feb 7, 2018

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 7, 2018

Collaborator

I did not include your patch in a0984ea. I let you do it in case you want to refine it.

Collaborator

monnerat commented Feb 7, 2018

I did not include your patch in a0984ea. I let you do it in case you want to refine it.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 8, 2018

Member

Thanks, yes you did right. I'll see if I can brush it up a little and commit it separately.

Member

bagder commented Feb 8, 2018

Thanks, yes you did right. I'll see if I can brush it up a little and commit it separately.

@curl curl locked as resolved and limited conversation to collaborators May 9, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.