Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Doc Strings for Classes and Modules #340

Closed
spillz opened this issue Nov 3, 2013 · 46 comments
Closed

Missing Doc Strings for Classes and Modules #340

spillz opened this issue Nov 3, 2013 · 46 comments
Labels

Comments

@spillz
Copy link

spillz commented Nov 3, 2013

Using v0.7.0

Maybe I'm missing something obvious, but

import jedi
s = jedi.Script('import jedi\njed',2,3)
c = s.completions()[0]
print c
print c.doc == ''
print jedi.__doc__ == ''

s = jedi.Script('import jedi\njedi.Scr',2,8)
c = s.completions()[0]
print c
print c.doc == ''
print jedi.Script.__doc__ == ''

returns

<Completion: jedi>
True
False
<Completion: Script>
True
False

Is there some way to enable these doc strings?

I want to be able to display doc popups next to completion popup items as per the screenshot below (which currently only works with functions because of this limitation):

codeblocks-jedi-popup

@davidhalter
Copy link
Owner

definitely a bug, it seems to work in some cases, though?

@spillz
Copy link
Author

spillz commented Nov 3, 2013

definitely a bug, it seems to work in some cases, though?

Sometimes - e.g. with xmlrpclib some of the classes in that module have jedi docs. I have yet to see a module have docs.

@spillz
Copy link
Author

spillz commented Nov 7, 2013

Maybe I'm missing something obvious

The obvious thing I was missing it that it is straightforward enough to use follow_definition to retrieve the doc from a class or module (see code below).

At least for a module, that makes some sense because the completion for say "jed" refers to to an import statement ("import jedi") which then refers to the module ("jedi"). I guess this has the benefit of jedi not needing to actually load any modules until the user tries to retrieve module members (e.g. "jedi."). But it is slightly problematic for showing doc strings in an IDE's completion popup because it could take a long time to load the module just to get a doc.

In the case of a class, it looks like jedi tries to return the docstring for the init statement, which is an empty string if there isn't one. Maybe classes and functions should have calldoc in addition to doc properties. calldoc could be a formatted callspec + docstring (using the init member docstring for a class) and doc could be the formatted class or function docstring (without args). Alternatively, just add a classdoc property and leave doc as is.

import jedi
s = jedi.Script('import jedi\njed',2,3)
c = s.completions()[0]
print c.type
print c.follow_definition()[0].raw_doc
print jedi.Script.__doc__ == ''

s = jedi.Script('import jedi\njedi.Scr',2,8)
c = s.completions()[0]
print c.type
print c.follow_definition()[0].raw_doc
print jedi.Script.__doc__ == ''

s = jedi.Script('import xmlrpclib\nxmlrpclib.Marshall',2,18)
c = s.completions()[0]
print c
print c.type
print c.doc
print c.follow_definition()[0].raw_doc
print jedi.Script.__doc__ == ''

@spillz
Copy link
Author

spillz commented Nov 7, 2013

One thing I find confusing:

s = jedi.Script('import jedi\njedi.Scr',2,8)
c = s.completions()[0]
print c.type, c

s = jedi.Script('import xmlrpclib\nxmlrpclib.Marshall',2,18)
c = s.completions()[0]
print c.type, c

outputs

import <Completion: Script>
class <Completion: Marshaller>

Why a class in one case and an import in the other? (Perhaps this explains why some modules have a doc and others don't?)

@davidhalter
Copy link
Owner

Well - that looks like another bug.

I plan to add a new api command called "documentation" anyway. This call would return a documentation object with different docstrings that describe the class - We could then also integrate that into the Completion object.

One of the problems is that we even have an "import" type. But I'm just not sure if we can even get rid of it. If we cannot resolve the import we will still have to write something and that's obviously import. But I will think about this again probably (also speed could be an issue if you think about libraries such as numpy).

Therefore I would be very careful to just call follow_definition on all "import" things, especially if the completion is on imports - because that would mean that Jedi loads all the libraries (that can be a few hundreds if you type import p<tab>). follow_definition is something that should (could) be used only on entries that are visible on the screen - like 10, because it could significantly slow down things. jedi-vim also doesn't use follow_definition yet.

@spillz
Copy link
Author

spillz commented Nov 7, 2013

Therefore I would be very careful to just call follow_definition on all "import" things

Agree, and if someone can get classes to stop reporting themselves as imports then this takes care of the more important docs anyway.

@davidhalter
Copy link
Owner

Agree, and if someone can get classes to stop reporting themselves as imports

This doesn't actually happen. In the above case Script is an import. You're basically importing an import. But I agree, it's confusing :-) This fact makes me seriously reconsider if we shouldn't always follow them.

@ColinDuquesnoy
Copy link
Contributor

This fact makes me seriously reconsider if we shouldn't always follow them.

I am using follow_definition since a few days in pyqode.python. I haven't noticed any slowdowns until now. (I am working with big libraries such as numpy and PyQt4)

@davidhalter
Copy link
Owner

Hmmm. Thank you for the feedback. I also doubt that it's a problem, normally.

It might lead to problems if you're using a lot of imports, like import wx, numpy, PySide, os which would mean that for an empty completion it would follow all of them, which makes this one time significantly slower. But then again this is not a huge issue, because in 99% of the cases all of those libraries will be loaded at some point.

@ColinDuquesnoy
Copy link
Contributor

for an empty completion it would follow all of them

I've just tested that (following all completion objects for import |) and found a bug:

Traceback (most recent call last):
  File "/home/colin/dev/pyqode/core/pyqode/core/server.py", line 305, in _execWorker
    results = worker(conn, caller_id)
  File "/home/colin/dev/pyqode/core/pyqode/core/modes/code_completion.py", line 88, in __call__
    completions.append(prov.complete(*self.__args))
  File "/home/colin/dev/pyqode/python/pyqode/python/modes/code_completion.py", line 220, in complete
    definition = completion.follow_definition()[0]
  File "/home/colin/dev/jedi/jedi/cache.py", line 129, in wrapper
    result = func(self)
  File "/home/colin/dev/jedi/jedi/api/classes.py", line 408, in follow_definition
    defs = [BaseDefinition(self._evaluator, d, d.start_pos) for d in defs]
  File "/home/colin/dev/jedi/jedi/api/classes.py", line 408, in <listcomp>
    defs = [BaseDefinition(self._evaluator, d, d.start_pos) for d in defs]
AttributeError: 'GlobalNamespace' object has no attribute 'start_pos'

@davidhalter
Copy link
Owner

Hmm could you post the exact call to Jedi?

Because jedi.Script('import ').completions() works for me.

@ColinDuquesnoy
Copy link
Contributor

I am following the definition of every import completion

@ColinDuquesnoy
Copy link
Contributor

Here is the code:

import jedi
script = jedi.Script("import ")

for c in script.completions():
    type = c.type
    # follow import to get the real type
    if c.type == 'import':
        try:
            definition = c.follow_definition()[0]
            type = definition.type
        except IndexError:
            # no definition
            pass
    print(type)  # I use `type` to create the associated icon in pyqode

which outputs:

/usr/bin/python2.7 /home/colin/Dev/jedi/test_follow_def.py
Traceback (most recent call last):
  File "/home/colin/Dev/jedi/test_follow_def.py", line 8, in <module>
    definition = c.follow_definition()[0]
  File "/home/colin/Dev/jedi/jedi/cache.py", line 129, in wrapper
    result = func(self)
  File "/home/colin/Dev/jedi/jedi/api/classes.py", line 408, in
follow_definition
    defs = [BaseDefinition(self._evaluator, d, d.start_pos) for d in defs]
AttributeError: 'GlobalNamespace' object has no attribute 'start_pos'

Process finished with exit code 1

@davidhalter
Copy link
Owner

Arghh... This is painful.

Just calling follow_definition can really slow things down, because if you also follow statements, so well this part is easy, you just don't follow statements. That's not really a problem, since people have never complained when we show them the code of a statement as a description.

e.g. displaying foo = 'asdf' as a description is totally fine and it's probably even questionable to see class str or instance str() there.

So we just follow import types... Well... This also kind of sucks, since an import can be a statement again, which makes our API behave inconsistently. However, this happens rarely (most of the time explicit imports are really classes or functions), we should really care about the most typical use cases for Python. It's not like we will ever have perfect autocompletion anyway.

The big problem is that in your above example (though buggy ATM, but let's suppose we've already fixed it) Jedi would import every single f***ing module in the sys.path. That will undoubtedly slow down the process and use like 10 GB RAM.

Uffff... If only there was a consistent solution?! Maybe we should only follow imports if we're not doing autocompletion on imports? Or maybe we should just follow second level imports? Or a combination of those two?

Any ideas @tkf @dbrgn @asmeurer @ColinDuquesnoy?

I just know that this issue is really blocking me right now, as I'm trying to refactor the API to use Definition classes everywhere, which has been a concern for quite a while.

@tkf
Copy link
Contributor

tkf commented Mar 17, 2014

Jedi would import every single f***ing module in the sys.path

This is not required for just listing modules, right? You can just search through sys.path and list them. It is required to import only if you want to show module document. Then why not just import the module only when it is requested (e.g., .doc is accessed). Or editor can just store .full_name and request document using .full_name just before it pops up document.

@ColinDuquesnoy
Copy link
Contributor

The only place where we should follow definition of imports is in the form from XXX import YYY. XXX will always be a package/module while YYY might be a function or class. As a user I would expect follow_definition of XXX to return XXX immediately without any additional parsing, while following YYY would parse the XXX. Am I missing something?

@davidhalter
Copy link
Owner

@ColinDuquesnoy Well following it is not just important for the type attribute, it also defines the docstring.

@ColinDuquesnoy
Copy link
Contributor

But you don't need to parse the entire module just for the docstring, right?

@davidhalter
Copy link
Owner

Well but we need to at least open the module, parse a part and so on. If you happen to have a few hundred modules installed, that's not good. It's even worse if some of those modules have been compiled - you would import a lot of modules.

Plus at the moment the procedure for getting a docstring is by parsing the whole file, because we probably need it anyway.

@tkf
Copy link
Contributor

tkf commented Mar 17, 2014

If I understand correctly from XXX import YYY is not problem here, since in this case we don't need to search all top level modules in sys.path. You need to parse XXX/__init__.py and search *.py files under XXX/ anyway, even if we don't need YYY.__doc__.

@ColinDuquesnoy
Copy link
Contributor

The issue is that Jedi sometimes reports imported classes and functions as imports. At the moment the only way to find out if an import is a class, a function or a module/package is to use follow_definition.

Maybe we should forget about this solution and fix it internally so that user will have the correct completion type directly from calling Script.completions(). You could have an internal function that check import statements before returning the completions?

That being said, I don't know enough of the Jedi internals so I might be completely off.

@davidhalter
Copy link
Owner

@tkf

  1. Jedi works like you're describing it in your first comment anyway. The problem is that IDE developers tend to want that information (docstrings) in all cases, because that's much easier to process (think list comprehensions).
    Your point is a pretty good one, but plugin APIs sometimes just sucks (e.g. it's not possible to display docstrings in VIM script the lazy way).
  2. Yes, exactly (your second comment). The issue is only with import XXX statements.

@tkf
Copy link
Contributor

tkf commented Mar 17, 2014

@davidhalter

The problem is that IDE developers tend to want that information (docstrings) in all cases, because that's much easier to process (think list comprehensions).

Well I think that's their problem ;) We have nice "lazy" attribute .doc so they should not access it until they need it. I think they just have to improve their programming style, using something like promise/deferred. It is just not a good idea to invoke intensive computation/IO (e.g., accessing .doc) when not needed.

e.g. it's not possible to display docstrings in VIM script the lazy way

Really? I've heard VIM got some async libs too.

At the bottom line, you can combine Jedi with multiprocessing to do the job in background and wait for some time to finish. If it is not done until timeout, you just ignore it. This way, you can use Jedi without blocking even if editor does not have async API.

I really don't think it is a good idea to "take care" of IDE developers problem at Jedi side. If you have to do some hard computation (or IO access) to get some value, let's just do it. User may request it and OK to wait for a while. In that case, truncating the search does not help.

But in the end, I think you are the one to program all of this if you want, so I don't want to say too much. I just want to note this point, since nobody mentioning it.

@tkf
Copy link
Contributor

tkf commented Mar 17, 2014

BTW, it is nice if something like .get_doc(timeout=100) can be implemented. I guess it is possible by measuring time at each method call in evaluator class or something like that.

@davidhalter
Copy link
Owner

Good points.

BTW, it is nice if something like .get_doc(timeout=100) can be implemented. I guess it is possible by measuring time at each method call in evaluator class or something like that.

I plan to add some sort of an asynchronous interface to do things like that, but I guess to do that for all methods would be kind of stupid. (Plus I'm really sad that I'm using properties all over, it would have been easy to just say doc(self, evaluate_all=True) to solve the issue above. Well... Maybe I should just create getters and deprecate certain properties.)

But in the end, I think you are the one to program all of this if you want

Haha, that's not exactly true - you're alive and well again, now :-)

And BTW: VIMs async support is horrible. In general VIM script is the spawn of satan. :-)

@tkf
Copy link
Contributor

tkf commented Mar 17, 2014

some sort of an asynchronous interface

I don't think adding timeout option makes it async. Async interface would be something like .async_doc(callback=...) which returns immediately and callback is called with a string as its argument. But since Python has GIL I don't think it helps much. Well, here I am assuming that Jedi does some hard computation as well as IO access. But if it is bounded by IO access, I guess async interface makes sense.

Haha, that's not exactly true - you're alive and well again, now :-)

You got me... well, not in this case though. I don't think we should add complicated way to truncate evaluation in Jedi (except for the explicit timeout stuff) so I don't want to help in this very case ;-)

@spillz
Copy link
Author

spillz commented Mar 18, 2014

A few thoughts:

  • I don't see how you can get around somehow separately caching the top level module docstrings to get doc hints for say import x|, assuming you do want to be able to provide them. Given that you can't unload "extension" modules, it doesn't make sense to import them, even lazily. Would be good to have an option to do a one time enumeration (and cache) of modules and their docs, which plugins can offer to the user during their initial run.
  • Following imports in the other cases seem less problematic to me, so long as the evaluation of docs is lazy (i.e. don't unnecessarily import until the doc is actually requested -- e.g. in my plugin for Code::Blocks IDE I want the full list of modules content after the user types "." but only fetch docs for the highlighted one)
  • In my plugin for the Code::Blocks IDE all of the calls to jedi made from the IDE are aynchronous (or they soon will be). So the worst case is that there will be no completion hints while jedi is busy.

@tkf
Copy link
Contributor

tkf commented Mar 18, 2014

@spillz I think caching top-level modules and its docstring is a good idea!

@davidhalter
Copy link
Owner

@ColinDuquesnoy I fixed the problem you had, above. Used way more time than I thought, but it should be working now.

@ColinDuquesnoy
Copy link
Contributor

Ok thanks, I will give it a try this weekend and will let you know how it worked.

@ColinDuquesnoy
Copy link
Contributor

@davidhalter I confirm the problem is fixed. Now the results are as you expected, it takes more than 25s to follow every import (when the cache has been cleaned) and the memory used by Jedi rise up to 200MiB on my system.

However following definition of imports when the user typed at least one letter of the module/package is pretty fast (which will happen 99% of the time), e.g. this code will work fast and will collect the correct type for any completion but I feel like I am hacking jedi to get the results I want :/

import jedi

code = "import "
line = 1
script = jedi.Script(code, line)
completions = script.completions()
current_line = code.splitlines()[line-1].strip()
# HERE IS THE IMPORTANT PART
follow_definitions = current_line != 'import' and current_line != 'from'
for c in completions:
    type = c.type
    # follow import to get the real type
    if c.type == 'import' and follow_definitions:
        try:
            definition = c.follow_definition()[0]
            type = definition.type
        except IndexError:
            # no definition
            pass
    print(type, c.name)  # I use `type` to create the associated icon in pyqode

@davidhalter
Copy link
Owner

I finally changed the behavior of Completion.type.

For Completion.doc (and every other documentation object), I have the following suggestion:

  1. I've wanted a Script.documentation() for quite some time now. This function would return a Documentation or Doc object that contains all the gathered docstrings along the line, including rendered call signatures (look at the parser representation). This would also include Attribute Docstrings.
  2. It would make sense to always return Documentation in Completion.doc and Definition.doc. However, since those two already have a defined behavior, we should probably use Definition.get_doc().
  3. The Documentation object would also make it possible to get rid of the ugly doc and raw_doc properties, which are a) poorly documented and b) just not important enough to have two attributes (it's confusing).

Having changed to this behavior, we would just be left with first class Documentation, Definition and Completion objects. In addition to this there would be CallSignature objects, which are essentially sub-classes of Definition with extra information.

Does this make sense? The idea is to really create an easy and understandable API.

@tkf
Copy link
Contributor

tkf commented Mar 24, 2014

+1 but maybe it's a good idea to discuss API in a dedicated issue.

@ColinDuquesnoy
Copy link
Contributor

That looks good to me. I really like theScript.documentation() idea and I also found doc and raw_doc to be confusing, nice you can get rid of those! :)

It would make sense to always return Documentation in Completion.doc and Definition.doc. However, since those two already have a defined behavior, we should probably use Definition.get_doc().

So you would just mark Definiton.doc as deprecated and advice the user to use get_doc instead? What is your position about deprecated functions, will they disappear after a few versions? (If yes how much?)

@davidhalter
Copy link
Owner

@tkf I will probably still start a detailed discussion about Script.documentation, but this is really about the two attributes doc and raw_doc.

What is your position about deprecated functions, will they disappear after a few versions?

@ColinDuquesnoy They will disappear, but we haven't yet removed anything. We still have to come up with a good deprecation process. Currently they are just deprecated.

@tkf
Copy link
Contributor

tkf commented Mar 24, 2014

@davidhalter Well I don't mind how you organize issue list. Just suggesting.

If adding new API does not increase functionality at this point, maybe it is better to stick with what we have now. We may have another set of functionality which does not fit with Documentation in the future.

Another possible API is to add optional arguments to .get_doc(). For example, .get_doc(raw=True) is also does the same job. Since usually you use either raw doc or doc, this could be simple to use for people who think

% python -m this | grep nested
Flat is better than nested.

Brainstorming other possible functionality to consider: fetching and mixing documentation from parent classes, "additional strings" (PEP 257), fetching parameter document and type from :param: and :type:, ...

@davidhalter
Copy link
Owner

True. Thank you for your thoughts.

@davidhalter
Copy link
Owner

I think I'm going directly for the Documentation idea. It's a very simple class with a __str__ and raw function. I don't think there's a big danger if we use it here. Even with @tkf's suggestion of get_doc(raw=True), we might have needed to deprecate raw, which is also bad. The behavior - as of now - is very limited and will be expanded upon arrival of the Script.documentation method.

My only remaining question is: How should we call the documentation method of Definition and Completion?

  • get_doc
  • documentation
  • get_documentation

davidhalter added a commit that referenced this issue Mar 25, 2014
…are in a from clause or if its a longer imnport
@asmeurer
Copy link
Contributor

You can do timeouts in Python using signal handlers http://docs.python.org/2/library/signal.html#signal.setitimer. The bad news is that it doesn't work on Windows. Also, I think it might not work against an atomic (i.e., written in C) operation (I could be wrong about that). That's usually not an issue unless you are using built in Python functions against some large data.

The other gotcha with it is that you have to use one of the standard signals. You can't just invent your own. So e.g., if you set up a handler for sigint, nothing else can use that now. I think Jedi typically runs in its own process so this might not be an issue.

That being said, I've used this in a few applications before, and it seems to work pretty well. I've never used it with subprocesses or async, though. It may actually be easier to have a timeout in that case, as you can work around the atomic operation issue.

OTOH, you could leave it to the plugin authors, as in most cases the editors themselves should have better timeout mechanisms, I'd imagine, since they were designed with interactivity in mind (and Python wasn't so much).

@tkf
Copy link
Contributor

tkf commented Mar 25, 2014

@asmeurer I didn't know that. That's good to know. I think you are right about letting plugin handle timeout.

@davidhalter Isn't description also related to documentation? How about full_name? file path? How about fetching actual source code of the object? Can't you use it as a help too? You may say these are another property but you are already mixing parameters in non-raw document so doc is not "pure" docstring anymore. So even at this stage difference between document object and other objects are not so solid. In the future Document may not be an appropriate name and you'd want to deprecate it. It's no different than get_doc(raw=True). I am just worried that this API is getting in without serious consideration. Other developers might not notice this thread at all since it started at middle of an issue.

@davidhalter
Copy link
Owner

I'm going to start a separate discussion, soon. But to address your concerns right now: Documentation will only be about providing a way to access docstrings (which includes generated function signatures as strings). But it's really about providing a good API for docstrings.

@tkf
Copy link
Contributor

tkf commented Mar 25, 2014

If we are sure that Documentation has only two API methods __str__() [1] and raw(), even in the future versions, then I say it's a bad API. get_doc(raw=bool) is simpler since it is flat. I said +1 since Documentation is more extensible. But making an API extensible does not always make it better. We need to have some idea how we are going to extend it.

[1] BTW, I missed this point. Using __str__? I don't know if it's a good idea. But if you do that, make sure to use __unicode__ for Python 2.

@tkf
Copy link
Contributor

tkf commented Mar 25, 2014

I think __str__ is not a good idea because then plugin authors expect it to have all string methods, like .split(). You use __str__ to allow duck typing, right? Then having Documentation object which is not compatible with str is a cause of trouble.

@davidhalter
Copy link
Owner

I'm closing this one. __str__ might really not be a good idea (at the same time we could also inherit from str... Let's discuss Documentation in #392. I think that is the main discussion point. This issue has been solved.

@asmeurer
Copy link
Contributor

Won't str(object) always coerce the output of __str__ to a string?

@asmeurer
Copy link
Contributor

Oh I guess not. It works if it's a subclass of str, but raises TypeError if it isn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants