-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SR-13534: fix normalize_html test #22
Fix SR-13534: fix normalize_html test #22
Conversation
API change: This adds a new exported function in cmark.h. Closes commonmark#330.
This is taken from GitHub's fix: github@66a0836
This commit ports Vicent Marti's fix in cmark-gfm. (384cc9d) His commit message follows: As explained on the previous commit, it is trivial to DoS the CMark parser by generating a document where all the link reference names hash to the same bucket in the hash table. This will cause the lookup process for each reference to take linear time on the amount of references in the document, and with enough link references to lookup, the end result is a pathological O(N^2) that causes medium-sized documents to finish parsing in 5+ minutes. To avoid this issue, we propose the present commit. Based on the fact that all reference lookup/resolution in a Markdown document is always performed as a last step during the parse process, we've reimplemented reference storage as follows: 1. New references are always inserted at the end of a linked list. This is an O(1) operation, and does not check whether an existing (duplicate) reference with the same label already exists in the document. 2. Upon the first call to `cmark_reference_lookup` (when it is expected that no further references will be added to the reference map), the linked list of references is written into a fixed-size array. 3. The fixed size array can then be efficiently sorted in-place in O(n log n). This operation only happens once. We perform this sort in a _stable_ manner to ensure that the earliest link reference in the document always has preference, as the spec dictates. To accomplish this, every reference is tagged with a generation number when initially inserted in the linked list. 4. The sorted array is then compacted in O(n). Since it was sorted in a stable way, the first reference for each label is preserved and the duplicates are removed, matching the spec. 5. We can now simply perform a binary search for the current `cmark_reference_lookup` query in O(log n). Any further lookup calls will also be O(log n), since the sorted references table only needs to be generated once. The resulting implementation is notably simple (as it uses standard library builtins `qsort` and `bsearch`), whilst performing better than the fixed size hash table in documents that have a high number of references and never becoming pathological regardless of the input.
This is kivikakk's commit 62166fe in cmark-gfm.
This reverts commit 9f760ce.
This reverts commit c5732b2.
This reverts commit 745b877.
This reverts commit abc45c5.
This reverts commit 54c990d.
This is illegal according to the C standard, sec. 7.1.4. "If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined." 7.24.1(2): "Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero […] pointer arguments on such a call shall still have valid values, as described in 7.1.4." See https://www.imperialviolet.org/2016/06/26/nonnull.html
in accordance with spec change.
Otherwise we can get quadratic increase in size with deeply nested structures. See commonmark#355.
We were having failures on some CI machines.
cc @akyrtzi for review |
Thank you for looking into this! 🙏 But we intend Such fixes should go to |
7287cde
to
381fb3b
Compare
I close this PR and open another one against upstream. |
PR against upstream - commonmark#360 |
Jira Link:
https://bugs.swift.org/browse/SR-13534
Description:
The cause of the hang is unknown, but this issue in CTest might be somehow related. The aim of this changes is to avoid using doctest.