Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added google/napoleon docstring support to fix #504 #617

Closed
wants to merge 1 commit into from
Closed

Added google/napoleon docstring support to fix #504 #617

wants to merge 1 commit into from

Conversation

wolph
Copy link
Contributor

@wolph wolph commented Aug 13, 2015

No description provided.

@@ -29,11 +29,13 @@
r'\s*:type\s+%s:\s*([^\n]+)', # Sphinx
r'\s*:param\s+(\w+)\s+%s:[^\n]+', # Sphinx param with type
r'\s*@type\s+%s:\s*([^\n]+)', # Epydoc
'\s*%s\s*\((.+?)\):.+', # Google param with type
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check for Args. Shouldn't we do that? Otherwise it might just much

Also I don't think you need .+.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's basically a context free language which would require to count the parenthesis, since that's not possible in a regular language it seemed like .+? would be the best solution here. We could try everything except for : instead but I'm not sure that will work correctly in all cases.

As for the Args, while not impossible, that would require a fairly large rewrite. The entire _search_param_in_docstr function wouldn't be usable anymore as it's only allowed to match using this regex when Args: precedes it. Or have I missed something in the code that would easily allow for this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's basically a context free language

I understand that, but you could check it before hand. That makes it a little bit more complicated (in the Python area), but could still work. Obviously modifying regexes would not be enough anymore.

@davidhalter
Copy link
Owner

@wolph If you want this to get merged, you need to invest some time in writing more detailed regexes. Sorry, but there's too many wildcards up there for me.

@wolph
Copy link
Contributor Author

wolph commented Feb 2, 2016

@davidhalter I'm planning to, I'm just fully booked till march. After that I'll have time to work on this issue.

No worries, I haven't forgotten :)

@davidhalter
Copy link
Owner

Cool! :)

@achalddave
Copy link

@wolph Are you still working on this? I can take a look at it if not.

I can play with the regexes, or we could also modify this bit so we can specify the parse method using a function instead of a regex string. However, I do think that adding sphinxdoc as a dependency wouldn't be that bad of an idea as suggested in issue #504 - this syntax is ambiguous so it'd probably be nicer to just let someone else take care of the parsing (otherwise, the set of docstrings that Jedi handles correctly will likely not be the same as the set that Sphinx handles, and people may well be using both).

If we're sticking with regexes, I think there isn't much to improve to this one (apart from possibly replacing .+? with [^\n]+?)

@wolph
Copy link
Contributor Author

wolph commented Mar 3, 2016

@achalddave If you have time to work on it, please do. I have it planned for this month but I'm currently still finishing my backlog of tasks so help is greatly appreciated.

As for a parsing function instead of a regex, that would be the better solution but it might be a bit of a big patch in that case. To properly parse this you would need a stack based approach, although I think a simple version (count the parentheses) could work too and might be enough already for this case.

That regex will definitely fail since the arguments can actually span multiple lines.

@achalddave
Copy link

The arguments can span multiple lines, but I think it would be safe to say that the type will be on the same line? Maybe that's just the way I've used it - then leaving the regex as is is fine.

Even with a stack-based approach, we'd make the assumption that there are no stray open or close parentheses within the type annotation, which is a fair assumption to make, but may be a different assumption from the one that Sphinx makes (and even if we use the same assumptions as them, their assumptions may change). But if adding Sphinx is a nogo, I can try and work on this at some point after mid March.

@wolph
Copy link
Contributor Author

wolph commented Mar 3, 2016

True, that's probably a fair assumption. If it needs to span multiple lines than perhaps it's just too complex ;)

I'm personally in favour of handling this using the native Sphinx code, but it does add an external dependency and this might be part of a private API which could break if it's somehow out of date or is ever changed. That last bit is an assumption but might be worth investigating, is this part of a stable API or not.

@achalddave
Copy link

@davidhalter What're your thoughts? I don't actually know if this regex can actually be made more specific without rejecting valid strings. I'd say for now something simple like this would still be better than no support for Google style docstrings (at least for me :) ).

@davidhalter
Copy link
Owner

As I've mentioned before, I don't like the current regexes. They are too broad.

However, I don't think we can fix this by just improving regexes. We need to parse the docstring in the case of google docstrings! I would recommend to split this logic from the sphinx/epydoc parsing. It's just something different.

Adding a dependency just for this is really not an option, sorry.

@davidhalter
Copy link
Owner

Closing, because it's been open for quite a long time and no progress has been made. Let me know when you're done!

@davidhalter davidhalter closed this Jul 3, 2016
@wolph
Copy link
Contributor Author

wolph commented Jul 3, 2016

Sure, no problem :)

The proposed pull request works good enough for me anyhow

@Erotemic
Copy link

@davidhalter What exactly needs to be done on this branch to make it an acceptable merge? From what I understand, a good solution should be based on parsing rather than using context insensitive regexes. And using a third party library like pyparsing is out of the question?

If pyparsing is not ok, would a more heuristic based approach based on docstring indentation work? My basic idea is to parse a google-style docstring into blocks based on indentations. Ignoring the first line, the code would search for the first line that matches the Args tag in the correct indentation followed by a colon and then where the next non-empty line is indented. All lines in this indentation scope would then be considered part of the Args block. Then each non-empty line at the root of the Args indentation scope would be considered as new args lines. Google style enforces that continuations of lines should be indented one further, so this should isolate each arg line. Once each arg line is isolated a regex could be used to extract the name, type, and description. I believe this should ameliorate concerns about over-matching.

What do you think?

@davidhalter
Copy link
Owner

And using a third party library like pyparsing is out of the question?

For this issue, yes. It's just too small of an improvement for Jedi.

If pyparsing is not ok, would a more heuristic based approach based on docstring indentation work? My basic idea is to parse a google-style docstring into blocks based on indentations.

That's exactly what I would have done. Happy to merge such a pull request.

The only alternative to that is to use the Jedi parser itself. You could create a grammar file for it and somehow use that. It's probably not even too hard (if you understand parsers). However I think that's almost too much work for something that simple. (Not sure though, the parser is pretty good, might be more of an issue to write a tokenizer that fits).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants