Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you get the parent node of a node? #177

Closed
Larivact opened this issue Jun 3, 2017 · 17 comments
Closed

How do you get the parent node of a node? #177

Larivact opened this issue Jun 3, 2017 · 17 comments
Assignees
Milestone

Comments

@Larivact
Copy link
Contributor

@Larivact Larivact commented Jun 3, 2017

import mwparserfromhell as mw
tree = mw.parse('prefix {{foo|http://example.com}} bar')
link = tree.filter_external_links()[0]

How do you get the parent node of link?

@Larivact

This comment has been minimized.

Copy link
Contributor Author

@Larivact Larivact commented Jun 3, 2017

tree.get(tree.index(link, True))

I think this should be documented as it isn't obvious.

@Larivact Larivact closed this Jun 3, 2017
@lahwaacz

This comment has been minimized.

Copy link

@lahwaacz lahwaacz commented Jun 3, 2017

That does not get you the parent of the link, but its greatest-great-grandparent 😉 If you make the input little more complex, such as a link in two nested templates, you'll notice the difference.

Actually, mwparserfromhell has the search implemented internally:

>>> import mwparserfromhell as mw
>>> tree = mw.parse('prefix {{foo|{{bar|prefix http://example.com suffix}}}} bar')
>>> link = tree.filter_external_links()[0]
>>> tree._do_strong_search(link, True)[0]
'prefix http://example.com suffix'

Note that unlike your approach, this will not get you the parent Node instance, but the parent Wikicode instance, which can be either top-level (coinciding with your tree variable) or nested inside other nodes.

@Larivact

This comment has been minimized.

Copy link
Contributor Author

@Larivact Larivact commented Jun 3, 2017

But I want the parent Node not the parent Wikicode.

@Larivact Larivact changed the title How do you get the parent of a node? How do you get the parent node of a node? Jun 3, 2017
@Larivact Larivact reopened this Jun 3, 2017
@earwig

This comment has been minimized.

Copy link
Owner

@earwig earwig commented Jun 3, 2017

_do_strong_search is for finding the Wikicode object that contains a node (or group of nodes). There isn't an easy way to tell it to give you the node that contains a Wikicode object.

The bigger question is, what exactly are you trying to accomplish here?

@Larivact

This comment has been minimized.

Copy link
Contributor Author

@Larivact Larivact commented Jun 3, 2017

I am working on a script that detects dead links and automatically inserts the Dead link template.
When a dead link however is an argument of a particular template I want to insert the Dead link template after the parent template instead of after the argument.

I therefore need a way to get the parent node of a node.
I am frankly a bit baffled that there is no straightforward solution for such a natural use case.

@earwig

This comment has been minimized.

Copy link
Owner

@earwig earwig commented Jun 3, 2017

Ah. So what about a case like {{cite web|url=''http://example.com''}}—the "parent node" of the URL is the italic tag, but I assume you would want to get the {{cite web}} template. So maybe you are looking for the parent template of a node?

But how about {{quote|text=blah blah http://example.com blah blah blah}}? I assume here you wouldn't want to get the {{quote}} template because it's not in a designated list of templates that you should be adding {{dead link}} after. Is it more specific to say you are looking for the parent template of a node in a certain list of template names?

There are more bizarre cases, like {{cite web|url={{quote|text=http://example.com}}}}. I imagine here you would want to find {{cite web}} rather than nothing. Again you can see how the logic gets convoluted. You might consider constructs like this one impossible, but you would be surprised what bizarre wikicode people decide to write. Still, it might be useful to ignore cases like this one for your sanity...

So there's a big difference between finding the arbitrary deepest node that contains a particular node and finding the deepest template (in a set of valid template names) that contains a particular node. I think this is what you are trying to do. Yes, you can use the first one to solve the second one, but solving the second straight away might be easier for you.

@Larivact

This comment has been minimized.

Copy link
Contributor Author

@Larivact Larivact commented Jun 3, 2017

The link wouldn't be wrapped in additional nodes. Isn't finding the parent template just iterating trough the parents? If finding the parent template is easier than finding the parent node how do I do it?

@lahwaacz

This comment has been minimized.

Copy link

@lahwaacz lahwaacz commented Jun 3, 2017

Thinking of mwparser integration, what about writing something like get_ancestors to let the users choose which is the correct node for their use case?

@earwig

This comment has been minimized.

Copy link
Owner

@earwig earwig commented Jun 3, 2017

Isn't finding the parent template just iterating trough the parents?

In theory, yes, but the graph structure isn't represented this way so I wouldn't implement it that way.

This is how I would do it:

def find_containing_template(code, link, names):
    """Return the template matching *names* in *code* that contains *link*."""
    for template in code.filter_templates():
        if link in template and template.name.matches(names):
            return template
    return None

Example:

>>> c = mwp.parse("blah blah <ref>{{cite web|url={{quote|text=http://example.com}}}}</ref> blah blah")
>>> names = ["cite web", "cite book", "whatever"]
>>> t = find_containing_template(c, "http://example.com", names)
>>> print t
u'{{cite web|url={{quote|text=http://example.com}}}}'
>>> c.insert_after(t, "{{dead link}}")
>>> c
u'blah blah <ref>{{cite web|url={{quote|text=http://example.com}}}}{{dead link}}</ref> blah blah'
@earwig

This comment has been minimized.

Copy link
Owner

@earwig earwig commented Jun 3, 2017

@lahwaacz That would be useful. We can keep this issue open for the sake of solving that question.

@Larivact

This comment has been minimized.

Copy link
Contributor Author

@Larivact Larivact commented Jun 3, 2017

But that requires iterating through all templates which surely is less efficient than iterating through just the parents. An abstract syntax tree is inherently hierarchical. Implementing it in a way that doesn't let you efficiently access the parent of a node seems like a fundamentally flawed design decision.

@earwig

This comment has been minimized.

Copy link
Owner

@earwig earwig commented Jun 3, 2017

¯\_(ツ)_/¯

Storing ancestor pointers everywhere would make a lot of insertion/grafting operations more tedious and I've never needed them before, so that's why they don't exist. Parsing is much more expensive than iterating over the tree (not to mention network I/O being the limiting factor with most bots) so whatever efficiency savings you get by going from O(n) to O(1) is probably minimal in the grand scheme of things.

Still, you raise a valid concern—the tree structure itself is likely going to change at some point to accommodate #40, so I might add them then. TBD.

@Larivact

This comment has been minimized.

Copy link
Contributor Author

@Larivact Larivact commented Jun 3, 2017

Well thanks for your support I appreciate it. But I don't see how parent pointers would "make a lot of insertion/grafting operations more tedious".

@lahwaacz

This comment has been minimized.

Copy link

@lahwaacz lahwaacz commented Jun 3, 2017

I have a version of get_ancestors based on Wikicode._get_children, which seems to work:

>>> def get_ancestors(wikicode, obj):
...     for node in wikicode.nodes:
...         for code in node.__children__():
...             for child in code.nodes:
...                 if obj is child:
...                     return [child]
...                 else:
...                     sub = get_ancestors(code, obj)
...                     if sub:
...                         return [node, child] + sub
... 
>>> get_ancestors(tree, link)
['{{foo|{{bar|prefix [http://example.com example] suffix}}}}',
 '{{bar|prefix [http://example.com example] suffix}}',
 '[http://example.com example]']
@earwig

This comment has been minimized.

Copy link
Owner

@earwig earwig commented Jun 3, 2017

Maybe that wasn't clear—I mean internal implementations of insertion/grafting operations, e.g. setters like this, because they would need to update the parent pointer(s) when doing so. I don't think it should change anything for the end user, but it does require going through all the code carefully and adding a bunch of new tests, so it's not a trivial change necessarily.

@Larivact

This comment has been minimized.

Copy link
Contributor Author

@Larivact Larivact commented Jun 3, 2017

@earwig Couldn't the references be property independent?

Your find_containing_template function doesn't work when there are two identical links, it only finds the first occurrence.

earwig added a commit that referenced this issue Jun 3, 2017
@earwig

This comment has been minimized.

Copy link
Owner

@earwig earwig commented Jun 3, 2017

I don't know what you mean by the references being property-independent...

Anyway, the above commit should be an improvement.

I added a modified version of @lahwaacz's get_ancestors function and a get_parent as well. We can solve your original question like this:

>>> import mwparserfromhell as mw
>>> tree = mw.parse('prefix {{foo|http://example.com}} bar')
>>> link = tree.filter_external_links()[0]
>>> tree.get_parent(link)
'{{foo|http://example.com}}'

Here is a fixed version of my find_containing_template function (probably still a little awkward but it does the job):

def find_containing_template(code, link, names):
    """Return the template matching *names* in *code* that contains *link*."""
    for node in reversed(code.get_ancestors(link)):
        if isinstance(node, mwparserfromhell.nodes.Template) and node.name.matches(names):
            return node
    return None

Hope this helps.

@earwig earwig self-assigned this Jun 3, 2017
@earwig earwig added this to the version 0.5 milestone Jun 3, 2017
@earwig earwig added the aspect: tree label Jun 3, 2017
@earwig earwig closed this Jun 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.