-
Notifications
You must be signed in to change notification settings - Fork 886
Fix infinite loop #430 #432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This should fix the remaining corner cases that can cause infinite loops. Previous iterations did not account for scenarios where the “end” index was less than the “start” index. If the “end” index is ever less than or equal to the “start” index, the “end” will be adjusted to to be “start” + 1 allow the full range to be extracted and replaced.
Nice. |
Another option would be: else: # raw html
- if len(items) - right_listindex <= 1: # last element
+ if len(items) - right_listindex <= 1 and right_listindex > i: # last element
right_listindex -= 1 |
That doesn't look like a similar solution. |
Let me rephrase; that doesn't look equivalent. I would have to break down all the cases to see if they were equivalent for sure. |
I believe the only way the |
It's not really a revert at all. If anything, it is doubling down on the same concept. It is actually just increasing the range that 5d91369 was normalizing without the extra variable. |
@ryneeverett You are probably right about the decrement being done improperly. I would need to verify that the issue is because of the decrement, and test out your suggestion to verify that it solves all past problems as well and doesn't cause a regression. If you have evidence that supports it being equivalent, I wouldn't mind moving to it. I just don't have much time right now to look into this further than what I already have. |
Oh, I see. I'm starting to think it just doesn't make much sense any more to assign |
@facelessuser I believe it's equivalent but not better because it requires the confusing |
Ok, here's a suggested diff on this PR: else: # raw html
- if len(items) - right_listindex <= 1: # last element
+ if len(items) == right_listindex: # last element
right_listindex -= 1
- if right_listindex <= i:
+ if right_listindex == i:
right_listindex = i + 1 This avoids the unnecessary decrement, not to mention making the intent a lot clearer. |
Unfortunately, that isn't going to work like you think. The following code: if len(items) == right_listindex: # last element
right_listindex -= 1 This will augment an index range of (2, 3) for an array of length 3 and make it (2,2): >>> [1, 2, 3][2:3]
[3]
>>> [1, 2, 3][2:2]
[] So now we have a regression in the content extraction. HTML tags will go MIA. |
I should add that while your suggestion would actually fix my example above because: if right_listindex == i:
right_listindex = i + 1 would make the adjustment to normalize the end, it would not fix the following example where >>> [1, 2, 3][1:3]
[2, 3]
>>> [1, 2, 3][1:2]
[2] Hope that is a little more clear. |
@ryneeverett I revisited the code this morning. To be honest, the whole conditional decrementing logic is odd. It appears the else: # raw html
- if len(items) - right_listindex <= 1: # last element
- right_listindex -= 1
- if right_listindex <= i:
+ if right_listindex < (i + 1):
right_listindex = i + 1
+ elif right_listindex > (i + 1):
+ right_listindex -= 1
placeholder = self.markdown.htmlStash.store('\n\n'.join(
items[i:right_listindex]))
del items[i:right_listindex] So you see, it appears that What I suspect is that if 'markdown' in attrs.keys():
items[i] = items[i][left_index:] # remove opening tag
placeholder = self.markdown.htmlStash.store_tag(
left_tag, attrs, i + 1, right_listindex + 1)
items.insert(i, placeholder)
if len(items) - right_listindex <= 1: # last nest, no tail
right_listindex -= 1
items[right_listindex] = items[right_listindex][
:-len(right_tag) - 2] # remove closing tag Maybe the real work needs to be done here: def _stringindex_to_listindex(self, stringindex, items):
"""
Same effect as concatenating the strings in items,
finding the character to which stringindex refers in that string,
and returning the index of the item in which that character resides.
"""
items.append('dummy')
i, count = 0, 0
while count <= stringindex:
count += len(items[i])
i += 1
return i - 1 |
The coverage report definitely shows that some parts of this code never get run in the tests. It would be nice to clean that up. If a few of the code branches will never get executed perhaps they could be removed. Or if there are edge cases which they need to be there for, then we should add tests to account for them so we don't break things again later. In any event, the provided PR fixes the known issue, so I'm going to accept it. Thank you @facelessuser for the work and @ryneeverett for reviewing it. If either of you want to clean up this code some more, I'd be happy to accept that as well. |
I think it was a lot more clear what But yes, "a quirk where it becomes too small" is at the bottom of these infinite regressions.
I was able to follow your examples and reasoning, but I believe this behavior would be correct in those circumstances. Can you give a concrete markdown example of when this would be undesirable? There is no such example in the test suite.
Agreed. I'll try to have a look at that some time. |
I believe my current pull request will hang with your changes. I think I even tried it to confirm. We we would still have an infinite loop, as it didn't acquire the full extent of the tag. Basically as I mentioned earlier we need this current logic as shown below or an equivalent (like in this pull request) that achieves the same end result. if right_listindex < (i + 1):
right_listindex = i + 1
elif right_listindex > (i + 1):
right_listindex -= 1 What you suggested is not the same, and in some circumstances will still cause infinite loops as it will insert content over and over before the end without fully removing the end. That is my experience when looking into this issue. |
The tests run fine for me. |
Okay, it didn't hang but the results were very different for me: --- /Users/facelessuser/Desktop/Python-Markdown/tests/extensions/extra/raw-html.html
+++ actual_output.html
@@ -10,23 +10,29 @@
<div name="Example">
<p>The text of the <code>Example</code> element.</p>
<div name="DefaultBlockMode">
-<p>This text gets wrapped in <code>p</code> tags.</p>
+<p>This text gets wrapped in <code>p</code> tags.
+</div></p>
+<p>The tail of the <code>DefaultBlockMode</code> subel</p>
</div>
-<p>The tail of the <code>DefaultBlockMode</code> subelement.</p>
<p name="DefaultSpanMode">
-This text <em>is not</em> wrapped in additional <code>p</code> tags.</p>
-<p>The tail of the <code>DefaultSpanMode</code> subelement.</p>
+This text <em>is not</em> wrapped in additional <code>p</code> tags.
+</p>
+The tail of the <code>DefaultSpanMode</code> subelem</p>
<div name="SpanModeOverride">
This <code>div</code> block is not wrapped in paragraph tags.
-Note: Subelements are not required to have tail text.</div>
+Note: Subelements are not required to have tail text.
+</div>
+hzzhzkh:7</div>
<p name="BlockModeOverride">
-<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags.</p>
-</p>
+<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags</p>
<p>The tail of the <code>BlockModeOverride</code> subelement.</p>
<div name="RawHtml">
Raw html blocks may also be nested.
</div>
+
+
+</p>
</div>
<p>This text is after the markdown in html.</p>
<div name="issue308">
@@ -39,7 +45,8 @@
Raw html blocks may also be nested.
</div>
-<p>Markdown is <em>still</em> active here.</p>
+Markdown is *still* active here.
+
</div>
<p>Markdown is <em>active again</em> here.</p>
<div> Are you doing something different than just replacing these lines? else: # raw html
- if len(items) - right_listindex <= 1: # last element
+ if len(items) == right_listindex: # last element
right_listindex -= 1
- if right_listindex <= i:
+ if right_listindex == i:
right_listindex = i + 1 Maybe I'm missing something, but this looks very wrong... |
Strange, that's exactly what I'm doing and I get no errors. |
Never mind, you're right, I had one other experimental change that I hadn't removed. Your changes do pass. Interesting. Maybe you're right and my understanding is off. I think we need to stress this portion of the code more to be sure. |
Interestingly, all you need to pass all tests is: else: # raw html
right_listindex = i + 1
placeholder = self.markdown.htmlStash.store('\n\n'.join(
items[i:right_listindex]))
del items[i:right_listindex]
items.insert(i, placeholder) What this says is one of two things, 1: all this extra complexity isn't needed, or 2: we are not testing things nearly well enough. Food for thought...I'm not sure which it is. |
This should fix the remaining corner cases that can cause infinite
loops. Previous iterations did not account for scenarios where the
“end” index was less than the “start” index. If the “end” index is
ever less than or equal to the “start” index, the “end” will be
adjusted to to be “start” + 1 allow the full range to be extracted and
replaced.