Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError when defining links in nested block elements #584

Closed
smathot opened this issue Oct 2, 2017 · 6 comments
Closed

IndexError when defining links in nested block elements #584

smathot opened this issue Oct 2, 2017 · 6 comments
Labels
bug Bug report.

Comments

@smathot
Copy link

smathot commented Oct 2, 2017

When using nested block elements of the following kind:

<div markdown="1">

[link]: a_link

<div markdown="1">

</div>

</div>

An IndexError occurs:

doc-pelican (0.7)$ python3 test.py 
Markdown 2.6.9
Traceback (most recent call last):
  File "test.py", line 22, in <module>
    print(md.convert(text))
  File "/usr/lib/python3/dist-packages/markdown/__init__.py", line 371, in convert
    root = self.parser.parseDocument(self.lines).getroot()
  File "/usr/lib/python3/dist-packages/markdown/blockparser.py", line 65, in parseDocument
    self.parseChunk(self.root, '\n'.join(lines))
  File "/usr/lib/python3/dist-packages/markdown/blockparser.py", line 80, in parseChunk
    self.parseBlocks(parent, text.split('\n\n'))
  File "/usr/lib/python3/dist-packages/markdown/blockparser.py", line 98, in parseBlocks
    if processor.run(parent, blocks) is not False:
  File "/usr/lib/python3/dist-packages/markdown/extensions/extra.py", line 130, in run
    block = self._process_nests(element, block)
  File "/usr/lib/python3/dist-packages/markdown/extensions/extra.py", line 97, in _process_nests
    self.run(element, block[nest_index[-1][0]:nest_index[-1][1]],  # last
IndexError: list index out of range

This happens only under very specific circumstances (but reliably so). For example, adding an extra blank line below the link definition will fix the error.

Hope this is useful!

@facelessuser
Copy link
Collaborator

Hmm, I may take a look at this in the next couple of days if I have a chance. Issue doesn't look obvious to me.

@waylan
Copy link
Member

waylan commented Oct 2, 2017

Hmm, there could be a number of things going on here. Some initial thoughts without verifying anything...

Do you have the extra extension enabled?

I also find it peculiar that the only content within the first <div> is a link reference. Of course, link references get removed from the document, so in the output the <div> would have no content. I realize this may be a minimum test case (which we appreciate receiving, thank you), but this specific scenario would make no sense in the real world. Does the error occur with other content or only a link reference?

Finally, the second div is nested within the first. I don't recall off-hand, but it may not be necessary to use markdown=1 in the nested div. Does the error still occur if that is the case?

@facelessuser
Copy link
Collaborator

From what I recall, for each level of nested div you want to parse, markdown=1 must be set. Whether or not that contributes to the error yet, I don't know.

@smathot
Copy link
Author

smathot commented Oct 3, 2017

Do you have the extra extension enabled?

Yes, that's where the Exception comes from.

Does the error occur with other content or only a link reference?

This is indeed just a minimal example. I noticed the error in a real Markdown document, and stripped it down to this. You can play around with the script below to find out when the error does, and does not, occur. To me, the pattern is not obvious.

Finally, the second div is nested within the first. I don't recall off-hand, but it may not be necessary to use markdown=1 in the nested div. Does the error still occur if that is the case?

Removing the markdown=1 from the nested div makes the error go away—but also prevents the Markdwon in the nested div from being parsed!

Here's an executable test script:

#!/usr/bin/env python3
# coding=utf-8

from markdown import Markdown, __version__

print('Markdown %s' % __version__.version)
md = Markdown(extensions=['markdown.extensions.extra'])
text = '''
<div markdown="1">

[link]: a_link

<div markdown="1">

</div>

</div>
'''
print(md.convert(text))

@facelessuser
Copy link
Collaborator

Is it possible you could give a stripped down practical application? It may prompt a quicker fix. The above example, while the simplest case, does not seem like it needs an urgent fix because it seems impractical.

I plan to look into it regardless, but the critical nature of it may affect how quickly I get to it as the above example doesn't seem like a scenario we should run into very often.

@facelessuser
Copy link
Collaborator

It seems by preserving the line where a reference was (before being stripped out) prevents bad indexing of the calculated positions of the content (calculated when parsing raw HTML). Essentially the reference preprocessor can change the tag_data indexing by removing the line in which it existed. I'll have a pull together shortly.

facelessuser added a commit that referenced this issue Oct 7, 2017
Preserve the line which a reference was on to prevent raw HTML indexing issue. Ref #584.
@waylan waylan added the bug Bug report. label Nov 28, 2017
@waylan waylan closed this as completed in 1de595a Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report.
Projects
None yet
Development

No branches or pull requests

3 participants