New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curl_fnmatch takes too long time on complicated pattern #2291

Closed
bagder opened this Issue Feb 6, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@bagder
Member

bagder commented Feb 6, 2018

I did this

Run test case 1307 with the attached patch applied. This is OSS-fuzz issue 5908 (still not open to the public).

Patch: 0001-unit1307-add-test-that-takes-too-long-time-to-execut.patch

I expected the following

The test should execute swiftly and return error/success.

This happened

The execution of unit1307 takes many seconds. On my 3.5GHz machine, it takes more than 7 seconds. On slower hardware it can take much more time.

curl/libcurl version

Both 7.58.0 and git master work like this

operating system

OS agnostic.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 6, 2018

Member

ping @monnerat, would you mind taking a look? You fixed it up recently, maybe you have a good idea on how this can be improved...

Member

bagder commented Feb 6, 2018

ping @monnerat, would you mind taking a look? You fixed it up recently, maybe you have a good idea on how this can be improved...

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 6, 2018

Collaborator

would you mind taking a look?

I'm quite busy right now, but I'll do some profiling on it ASAP and report here.
I can already imagine this is probably caused by backtracking for the * processing (even if you have limited occurrences to 5). Code profiling will tell us more.

Your attached patch shows empty here... without patch, runtest.pl 1307 takes less than a second here. If not yet to be disclosed, you can send this patch to my perso e-mail :-)

Collaborator

monnerat commented Feb 6, 2018

would you mind taking a look?

I'm quite busy right now, but I'll do some profiling on it ASAP and report here.
I can already imagine this is probably caused by backtracking for the * processing (even if you have limited occurrences to 5). Code profiling will tell us more.

Your attached patch shows empty here... without patch, runtest.pl 1307 takes less than a second here. If not yet to be disclosed, you can send this patch to my perso e-mail :-)

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 6, 2018

Member
curl -L https://github.com/curl/curl/files/1699088/0001-unit1307-add-test-that-takes-too-long-time-to-execut.patch.txt

... works for me!

Member

bagder commented Feb 6, 2018

curl -L https://github.com/curl/curl/files/1699088/0001-unit1307-add-test-that-takes-too-long-time-to-execut.patch.txt

... works for me!

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 7, 2018

Collaborator

... works for me!

Strange: can get it with curl, but FF does not show it :-(

Now that I can see it, I have a straightforward explanation: this is a pathological case!
There are 3 recursivity levels, always failing at the end of string. In addition, the current algorithm tries to match (recursively) the ?s with each string character; * regrouping is not performed because they are separated by the ?s.

I think I can develop a patch for this particular case, but we can think of other pathological cases, like
"*[A-Z]*[A-Z]*[[:blank:]]", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

IMHO, the only way to avoid them is to compile the pattern into a Deterministic Finite Automaton first, then apply this DFA to the string. The drawbacks are: bigger code, possible more time consumed in non-pathological cases and possible high memory needs depending on the pattern. By using such a DFA, backtracking is never needed. But maybe this is a bit "overthought" for such an insignifiant function ...?

I'll submit a PR with a fix for the "*?*?*" case.

Collaborator

monnerat commented Feb 7, 2018

... works for me!

Strange: can get it with curl, but FF does not show it :-(

Now that I can see it, I have a straightforward explanation: this is a pathological case!
There are 3 recursivity levels, always failing at the end of string. In addition, the current algorithm tries to match (recursively) the ?s with each string character; * regrouping is not performed because they are separated by the ?s.

I think I can develop a patch for this particular case, but we can think of other pathological cases, like
"*[A-Z]*[A-Z]*[[:blank:]]", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

IMHO, the only way to avoid them is to compile the pattern into a Deterministic Finite Automaton first, then apply this DFA to the string. The drawbacks are: bigger code, possible more time consumed in non-pathological cases and possible high memory needs depending on the pattern. By using such a DFA, backtracking is never needed. But maybe this is a bit "overthought" for such an insignifiant function ...?

I'll submit a PR with a fix for the "*?*?*" case.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 7, 2018

Member

a pathological case

That's certainly not an understatement. The result of the fuzzer throwing crazy input at our APIs.

Member

bagder commented Feb 7, 2018

a pathological case

That's certainly not an understatement. The result of the fuzzer throwing crazy input at our APIs.

@monnerat monnerat closed this in a0984ea Feb 7, 2018

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 7, 2018

Collaborator

I did not include your patch in a0984ea. I let you do it in case you want to refine it.

Collaborator

monnerat commented Feb 7, 2018

I did not include your patch in a0984ea. I let you do it in case you want to refine it.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 8, 2018

Member

Thanks, yes you did right. I'll see if I can brush it up a little and commit it separately.

Member

bagder commented Feb 8, 2018

Thanks, yes you did right. I'll see if I can brush it up a little and commit it separately.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.