IndexError when defining links in nested block elements #584

smathot · 2017-10-02T11:21:31Z

When using nested block elements of the following kind:

<div markdown="1">

[link]: a_link

<div markdown="1">

</div>

</div>

An IndexError occurs:

doc-pelican (0.7)$ python3 test.py 
Markdown 2.6.9
Traceback (most recent call last):
  File "test.py", line 22, in <module>
    print(md.convert(text))
  File "/usr/lib/python3/dist-packages/markdown/__init__.py", line 371, in convert
    root = self.parser.parseDocument(self.lines).getroot()
  File "/usr/lib/python3/dist-packages/markdown/blockparser.py", line 65, in parseDocument
    self.parseChunk(self.root, '\n'.join(lines))
  File "/usr/lib/python3/dist-packages/markdown/blockparser.py", line 80, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/usr/lib/python3/dist-packages/markdown/blockparser.py", line 98, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/usr/lib/python3/dist-packages/markdown/extensions/extra.py", line 130, in run
    block = self._process_nests(element, block)
  File "/usr/lib/python3/dist-packages/markdown/extensions/extra.py", line 97, in _process_nests
    self.run(element, block[nest_index[-1][0]:nest_index[-1][1]],  # last
IndexError: list index out of range

This happens only under very specific circumstances (but reliably so). For example, adding an extra blank line below the link definition will fix the error.

Hope this is useful!

The text was updated successfully, but these errors were encountered:

facelessuser · 2017-10-02T15:46:32Z

Hmm, I may take a look at this in the next couple of days if I have a chance. Issue doesn't look obvious to me.

waylan · 2017-10-02T19:09:59Z

Hmm, there could be a number of things going on here. Some initial thoughts without verifying anything...

Do you have the extra extension enabled?

I also find it peculiar that the only content within the first <div> is a link reference. Of course, link references get removed from the document, so in the output the <div> would have no content. I realize this may be a minimum test case (which we appreciate receiving, thank you), but this specific scenario would make no sense in the real world. Does the error occur with other content or only a link reference?

Finally, the second div is nested within the first. ~~I don't recall off-hand, but it may not be necessary to use markdown=1 in the nested div.~~ Does the error still occur if that is the case?

facelessuser · 2017-10-02T19:23:57Z

From what I recall, for each level of nested div you want to parse, markdown=1 must be set. Whether or not that contributes to the error yet, I don't know.

smathot · 2017-10-03T08:05:37Z

Do you have the extra extension enabled?

Yes, that's where the Exception comes from.

Does the error occur with other content or only a link reference?

This is indeed just a minimal example. I noticed the error in a real Markdown document, and stripped it down to this. You can play around with the script below to find out when the error does, and does not, occur. To me, the pattern is not obvious.

Finally, the second div is nested within the first. I don't recall off-hand, but it may not be necessary to use markdown=1 in the nested div. Does the error still occur if that is the case?

Removing the markdown=1 from the nested div makes the error go away—but also prevents the Markdwon in the nested div from being parsed!

Here's an executable test script:

#!/usr/bin/env python3
# coding=utf-8

from markdown import Markdown, __version__

print('Markdown %s' % __version__.version)
md = Markdown(extensions=['markdown.extensions.extra'])
text = '''
<div markdown="1">

[link]: a_link

<div markdown="1">

</div>

</div>
'''
print(md.convert(text))

facelessuser · 2017-10-03T13:31:27Z

Is it possible you could give a stripped down practical application? It may prompt a quicker fix. The above example, while the simplest case, does not seem like it needs an urgent fix because it seems impractical.

I plan to look into it regardless, but the critical nature of it may affect how quickly I get to it as the above example doesn't seem like a scenario we should run into very often.

facelessuser · 2017-10-07T03:13:55Z

It seems by preserving the line where a reference was (before being stripped out) prevents bad indexing of the calculated positions of the content (calculated when parsing raw HTML). Essentially the reference preprocessor can change the tag_data indexing by removing the line in which it existed. I'll have a pull together shortly.

Preserve the line which a reference was on to prevent raw HTML indexing issue. Ref #584.

facelessuser added a commit that referenced this issue Oct 7, 2017

Fix raw html reference issue

9b43efb

Preserve the line which a reference was on to prevent raw HTML indexing issue. Ref #584.

facelessuser mentioned this issue Oct 7, 2017

Fix raw html reference issue #585

Merged

jannschu mentioned this issue Nov 15, 2017

Additional paragraph when using Markdown in raw HTML #595

Closed

waylan added the bug Bug report. label Nov 28, 2017

waylan closed this as completed in 1de595a Jan 4, 2018

waylan mentioned this issue Sep 2, 2020

Refactor HTML Parser #803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError when defining links in nested block elements #584

IndexError when defining links in nested block elements #584

smathot commented Oct 2, 2017

facelessuser commented Oct 2, 2017

waylan commented Oct 2, 2017 •

edited

Loading

facelessuser commented Oct 2, 2017

smathot commented Oct 3, 2017

facelessuser commented Oct 3, 2017

facelessuser commented Oct 7, 2017

IndexError when defining links in nested block elements #584

IndexError when defining links in nested block elements #584

Comments

smathot commented Oct 2, 2017

facelessuser commented Oct 2, 2017

waylan commented Oct 2, 2017 • edited Loading

facelessuser commented Oct 2, 2017

smathot commented Oct 3, 2017

facelessuser commented Oct 3, 2017

facelessuser commented Oct 7, 2017

waylan commented Oct 2, 2017 •

edited

Loading