-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation of Treeprocessor: run() argument and stripTopLevelTags #949
Comments
Ignoring Markdown for a moment, an Conversely, Markdown can contain any number of elements at the root level. Therefore, we create a throwaway As for why However, for Markdown, no I don't expect you will ever find a reason for |
I appreciate why the single Responding to your final remark, in what I've written I need to set Removing |
This is what we expect to happen. If you are trying to output a complete HTML document, not a document fragment, then that is outside the scope of the Markdown library. I recommend using a templating library for that. Regardless, if the documentation is in error, thanks for the report. I wonder if perhaps the issue has been addressed in the proposed changes in #946. |
Regarding the actual issue, the documentation is not wrong from a certain perspective. The documentation refers to an "ElementTree object." In other words, an instance of an object from the ElementTree library. It usually would be an So in the end, while it is not wrong, the documentation should be clarified. And I just checked #946 and it appears that the author of that PR make the same mistake you did. They actually changed it so that it can only be interpreted incorrectly. In any event, I've noted the issue, so it should be resolved there. |
Ah, I see what you mean! But yes, the distinction between the ElementTree library and that library's As a concrete suggestion, how about (for the Treeprocessors section:
Regarding the other point, I'm no fan of templating libraries (largely because they manipulate XML as text). I initially expected the python-Markdown module to produce an ElementTree as output, rather than a string, or at least have that as an option. At the danger of turning a documentation issue into an enhancement request, it would be very useful if the module could optionally make available the result of To be clear, what I would hope to receive from any parsing library is a parse tree. That python-Markdown can go ahead and serialise it is nice, but an extra. If I were planning to do any more with the parsed result than wrap-and-serialise (which at present I can very conveniently do with the Treeprocessor -- nice design), then the first thing I'd do is to re-parse the library's XML output. |
While understandable, that is not practical considering the post processors. Without those, which only run on the serialized text, the output would be incomplete. Note that at least one of the postprecossors is solely responsible for avoiding the serializer's escaping. Therefore, you should only ever use the final text as output. We have explored other options. But they all required either using our own custom object or adding custom types to elementtree and altering the serializer to handle them. This has been discussed in detail elsewhere. |
Ah, I see – the parser stashes found literal HTML into the All that said, it is still useful in at least some cases (eg, mine!) to do some tree-manipulation in the Treeprocessor step, so I hope that you won't remove the |
It doesn't. We are just using plain text strings delimitated with uncommon Unicode characters for our index keys. Those Unicode characters ensure the strings are unique so we can find them easily for replacement after serialization. ElementTree doesn't care as they are just valid text. |
Hmmm... @nxg, would a post processor work for you? Can you tell me more about what are you trying to accomplish? I'll figure out how to document the |
Hello @merriam. I'm doing something very simple: I'm just converting Markdown to standalone XHTML. I certainly could achieve this by printing out unbalanced XML fragments before and after the string I get from python-Markdown (I'd feel... dirty, but it wouldn't kill me). I was very pleased to discover the Treeprocessor in the python-Markdown docs (this is the first time I've used this library). That way, I can do essentially all the application processing inside a Treeprocessor (this processing is almost trivial in my present case: simply wrapping the This is not a complaint; I'm describing only how I use the library, and not suggesting that you should change things. Python-Markdown is advertised as a markdown-to-serialised-string library, rather than a markdown-to-parse-tree library, and its design decisions are in the service of that goal (I'm not a particular fan of ElementTree, by the way – any parse-tree would do). But that's OK: it does what I want in this case as long as I can set I hope these remarks are helpful. |
Actually that has no bearing on things. As mentioned previously, the placeholders are just plain text and they are swapped out after the tree has been serialized to a string. In fact, we used to use ElementTree's built-in HTML serializer. However, there were a few instances where the general purpose sterilizer needed to be tweaked. In the end, it was easier to maintain our own simplified fork. |
One documentation glitch; one possible omission:
(a) The Treeprocessor documentation says that the
run(root)
method takes an ElementTree as argument. It appears (based onprint("root={}".format(root))
) that it is in fact given an Element (specifically a single<div>
element). It also seems to demand an Element is returned, and not an ElementTree as documented.(b) Within
core.py
, the Element returned fromtreeprocessor.run()
(is first serialised and then) has itsdoc_tag
removed ifself.stripTopLevelTags
is truthy. Thisself.stripTopLevelTags
flag is not documented as far as I can see (googling forsite:python-markdown.github.io striptopleveltags
returns nothing), but setting it to False is essential if one wants to have a Treeprocessor restructure the tree.I am of course reluctant to rely on any undocumented functionality.
(My motivation here was to have a Treeprocessor wrap the parsed tree in a html/body element: I thought I was getting something badly wrong until I found
self.stripTopLevelTags
and discovered thatcore.py
was stripping what I'd added)The text was updated successfully, but these errors were encountered: