Request: Instructions for Installing Hooks for Large Packages #102

Closed
blink1073 opened this Issue Dec 30, 2012 · 71 comments

Comments

Projects
None yet
5 participants
Contributor

blink1073 commented Dec 30, 2012

First, thanks very much for jedi, it is a fantastic library.
I just went through the process of adding jedi to the Spyder IDE in place of rope (for completions, calltips, docstrings, and goto), and it went very smoothly.
However, the one thing holding me back is the >18sec it takes to complete on numpy on my machine, every time I start the editor, since scientific computing is the target use for Spyder.
You mentioned it was possible to install "hooks" for large dynamic libraries, but I could not figure out how based on poking through the code base.
Can you offer any more insight?

Owner

davidhalter commented Dec 30, 2012

Yeah well, you could write hooks to do this at the startup (so you don't have to wait while you're completing). But this does not improve the overall performance. If it needs 18 seconds the first time, it still needs 18 seconds later on.

So as you can see there is really no way at the moment to really improve the situation. For me it's strange, that it uses > 18s.
Could you run test/run.py pylab twice and post the output (the first time the files are not cached by the OS)? Also, how old is your machine? Because importing numpy is using just 5 seconds (the second time, I'm not sure about the first).

I'll probably try to document that any further (and there will maybe also be a real interface for this stuff in the future).

Owner

davidhalter commented Dec 30, 2012

Also of this 5 seconds parsing of the __builtin__ module takes 0.7 seconds.

And this: test/run.py pylab 19 -> Summary: (0 fails of 1 tests) in 3.311s (just one test instead of 20. There seems to be a big difference between your and my machine (but I'm using a 3.5 year old notebook, it cannot be that good).

Contributor

blink1073 commented Dec 30, 2012

I had run it on my netbook running on Ubuntu, but I just ran it on my Desktop (Windows 7 64bit, 8Gb Memory, Quad-Core 3.2Ghz), and got the following from the lastest repo pull:

C:\Users\...\jedi\test>python run.py pylab
Solution @4 not right, received ['module __init__'], wanted ['module numpy']
Solution @7 not right, received ['module __init__'], wanted ['module random']
run 10 tests with 2 fails (pylab_.py)

Summary: (2 fails of 10 tests) in 15.721s
run 10 tests with 2 fails (pylab_.py)

C:\Users\...\jedi\test>python run.py pylab
Solution @4 not right, received ['module __init__'], wanted ['module numpy']
Solution @7 not right, received ['module __init__'], wanted ['module random']
run 10 tests with 2 fails (pylab_.py)

Summary: (2 fails of 10 tests) in 15.672s
run 10 tests with 2 fails (pylab_.py)
Contributor

blink1073 commented Dec 30, 2012

And for the second:


C:\Users\...\jedi\test>python run.py pylab 19
run 1 tests with 0 fails (pylab_.py)

Summary: (0 fails of 1 tests) in 10.620s
run 1 tests with 0 fails (pylab_.py)
Contributor

blink1073 commented Dec 30, 2012

Subsequent calls are fast, as long as Spyder remains open.

import time
import jedi
code = 'import numpy; numpy.one'
script = jedi.Script(code, 1, len(code) + 1, None)
t0 = time.time()
c = script.complete()
print time.time() - t0, c[0].word
code = 'import numpy; numpy.zer'
script = jedi.Script(code, 1, len(code) + 1, None)
c = script.complete()
print time.time() - t0, c[0].word
17.9330000877 ones
17.9630000591 zeros

And yikes, wx is a beast:

...
code = 'import wx; wx.Fr'
...
code = 'import wx; wx.Men'
...
54.1449999809 FR_DOWN
54.2249999046 Menu

And PyQ4 for good measure:

...
code = 'from PyQt4 import QtGui; QtGui.QMen'
...
code = 'from PyQt4 import QtGui; QtGui.QMessage'
...
code = 'from PyQt4 import QtCore; QtCore.QObjec'
...
code = 'from PyQt4 import QtCore; QtCore.QEvent'
...
27.7739999294 QMenu
27.8039999008 QMessageBox
35.8739998341 QObject
35.893999815 QEvent
Owner

davidhalter commented Dec 30, 2012

Seriously? This is really different from what I'm seeing here, the first time:

4.16713905334 ones

Second time:

2.25945520401 ones

Which I think are acceptable times.

And as I said my machine is old (2.3 ghz). So now we have to find out why it's so much slower on your machine. I'm not too surprised that it's slow on your netbook. But the Desktop? Strange. Do you have a Linux distribution on your Desktop? And could you try it there? Maybe the performance is really different for Linux and Windows.

I'm not opposed to use caching for builtins. But I always thought parsing was fast enough. But I really want to know what's the problem here.

Collaborator

dbrgn commented Dec 31, 2012

In any case, a few seconds is still a few seconds. You could execute the hook while starting up the editor and run it in a background thread/process. Then it should be ready when the user has loaded his file and found his entry point to the code. It is very rare that someone opens the editor and immediately begins using the autocompletion.

Contributor

blink1073 commented Dec 31, 2012

Same computer, with Ubuntu 12.04:

silvester@ubuntu:~/jedi/test$ python run.py pylab 19
run 1 tests with 0 fails (pylab_.py)

Summary: (0 fails of 1 tests) in 4.186s
run 1 tests with 0 fails (pylab_.py)
silvester@ubuntu:~/jedi/test$ python run.py pylab 19
run 1 tests with 0 fails (pylab_.py)

Summary: (0 fails of 1 tests) in 2.292s
run 1 tests with 0 fails (pylab_.py)
silvester@ubuntu:~/jedi/test$ python run.py pylab 
run 10 tests with 0 fails (pylab_.py)

Summary: (0 fails of 10 tests) in 3.520s
run 10 tests with 0 fails (pylab_.py)

@dbrgn, that does sound like a good approach. @davidhalter, I suppose what I'd really like is the option to store a cache for a chosen set of libraries, and manually clear a library cache based on the external library or user's needs.

Owner

davidhalter commented Dec 31, 2012

Argh. Windows is really much slower. Would be interesting to know why! Could you do python -m cProfile ./run.py pylab and send me the output (windows only)? Would be interesting!

I suppose what I'd really like is the option to store a cache for a chosen set of libraries

Yes I could add that. But the question is much more if this is enough? 10 seconds is still too much. I'm thinking about using pickle or something similar to improve the times. What do you think? I'm just not sure if temporary caching is a good solution (loading the temporary files might be slow, too).

and manually clear a library cache based on the external library or user's needs.

What do you mean by that?

Owner

davidhalter commented Dec 31, 2012

Haha I also found out that wx needs almost 1 GB of RAM 😄 I really have to look into that :-)
Time for me there is 11.611s.

Contributor

blink1073 commented Dec 31, 2012

I sent the profile via email. Also, I had assumed you would use a binary database to store the cached results, and that we could remove a single library from the database when it was desired to refresh it. This could be as a result of a library version change, a file modified date, an md5 sum, or an age-out, for example. But it would be up to the user program to initiate the refresh.

Contributor

blink1073 commented Jan 1, 2013

Using a thread to warm up Jedi did not help. Even running in a QThread, it effectively locks the the UI while it is building the parse structure for numpy. I was not invoking autocomplete, just trying to type spaces.

class Jedi(object):

    def __init__(self): 
        self._is_ready = False
        QTimer.singleShot(2000, self.warm_up)

    def warm_up(self):
        self._startup_thread = JediThread()
        self._startup_thread.parent = self
        self._startup_thread.start()

    def get_completion_list(self, source_code, line, col, filename):
        while not self._is_ready:
            pass
        script = jedi.Script(source_code, line, col, None)
        ...


class JediThread(QThread):

    def run(self):
        code = 'import numpy; numpy.one'
        script = jedi.Script(code, 1, len(code), None)
        script.complete()
        self.parent._is_ready = True
Collaborator

tkf commented Jan 1, 2013

Why not run Jedi in a background process? It is required also for support Python 2 and 3 and different sys.path. I am using this async approach in Emacs Jedi binding and it is working well with numpy, scipy and matplotlib.

Contributor

blink1073 commented Jan 2, 2013

Thanks @tkf, I'll copy your method and report back.

Collaborator

dbrgn commented Jan 2, 2013

If running a QThread locks the UI while running, then either the QThreads are not meant for such things, or the implementation is not working correctly.

In any case, I think a background process is the better solution.

Contributor

blink1073 commented Jan 2, 2013

It is also unresponsive when using threading.Thread(target=self.warmup), FWIW

Owner

davidhalter commented Jan 8, 2013

I'm still working on it...

I'm trying to optimize it for wx. But it's really a beast, by default the following libraries are being loaded:

__init__.py - 61 lines
_core.py - 14831 lines
_controls.py - 7517 lines
_gdi.py - 7759 lines
_windows.py - 5028 lines
_misc.py - 6698 lines

So in conclusion there are 41894 lines to process. I'm not exactly sure why this uses almost a GB ram.... But I'm trying to find out.

davidhalter added a commit that referenced this issue Jan 8, 2013

Owner

davidhalter commented Jan 8, 2013

The first patch was just a little tweak of fast_parser. Now I'm using __slots__ to lower the memory footprint.

Python is simply amazing. It's so easy to get memory footprint details.

e.g. parsing only wx._core without slots `parsing.Name was using 17012184 bytes, now it's using 1818096 bytes (17 Mb to 1.8 Mb).

Here are the detailed stats before:

Partition of a set of 367901 objects. Total size = 66531112 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  16233   4 17012184  26  17012184  26 dict of jedi.parsing.Name
     1 105204  29  7747120  12  24759304  37 tuple
     2  17531   5  6517104  10  31276408  47 unicode
     3  19952   5  5586560   8  36862968  55 dict of jedi.parsing.NamePart
     4  47829  13  5196392   8  42059360  63 list
     5   4124   1  4321952   6  46381312  70 dict of jedi.parsing.Statement
     6   3679   1  3855592   6  50236904  76 dict of jedi.parsing.Param
     7  23423   6  3027288   5  53264192  80 str
     8   8452   2  1960864   3  55225056  83 __builtin__.set
     9  19952   5  1788928   3  57013984  86 jedi.parsing.NamePart
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    10   1566   0  1641168   2  58655152  88 dict of jedi.parsing.Function
    11  59643  16  1431432   2  60086584  90 int
    12  16233   4  1038912   2  61125496  92 jedi.parsing.Name
    13    644   0   904928   1  62030424  93 dict (no owner)
    14    140   0   423200   1  62453624  94 dict of module
    15    423   0   379056   1  62832680  94 type
    16   2955   1   378240   1  63210920  95 types.CodeType
    17    423   0   356136   1  63567056  96 dict of type
    18   2714   1   325680   0  63892736  96 function
    19    277   0   290296   0  64183032  96 dict of jedi.parsing.Lambda

and after:

Partition of a set of 343915 objects. Total size = 43862576 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 105211  31  7747880  18   7747880  18 tuple
     1  17531   5  6517104  15  14264984  33 unicode
     2  19952   6  5586560  13  19851544  45 dict of jedi.parsing.NamePart
     3  47829  14  5196392  12  25047936  57 list
     4  23446   7  3028408   7  28076344  64 str
     5  16233   5  2467416   6  30543760  70 jedi.parsing.Name
     6   8452   2  1960864   4  32504624  74 __builtin__.set
     7  19952   6  1788928   4  34293552  78 jedi.parsing.NamePart
     8   1566   0  1641168   4  35934720  82 dict of jedi.parsing.Function
     9  59643  17  1431432   3  37366152  85 int
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    10    644   0   904928   2  38271080  87 dict (no owner)
    11   3679   1   765232   2  39036312  89 jedi.parsing.Param
    12   4124   1   725824   2  39762136  91 jedi.parsing.Statement
    13    140   0   423200   1  40185336  92 dict of module
    14    423   0   380016   1  40565352  92 type
    15   2955   1   378240   1  40943592  93 types.CodeType
    16    423   0   358056   1  41301648  94 dict of type
    17   2713   1   325560   1  41627208  95 function
    18    277   0   290296   1  41917504  96 dict of jedi.parsing.Lambda
    19    248   0   259904   1  42177408  96 dict of jedi.parsing.PyFuzzyParser

So in conclusion from ~400 mb ram to 66.5 to 43.8. I think that's as good as it gets. A 15k Python file just uses a little memory footprint.
Unfortunately it's not possible to use __slots__ with classes that inherit from str. It would be possible to reduce the memory footprint a little bit further 15-20%. But that's not on my list right now.

I think parsing is a little bit faster now. But I'll have to test this further. Don't try yet, I didn't even push yet.

Collaborator

dbrgn commented Jan 9, 2013

Awesome, I've never heard of __slots__ before... But it makes a lot of sense.

Owner

davidhalter commented Jan 10, 2013

I'm currently evaluating pickle, to improve this any further. It looks promising. The load time for wx with cPickle is 0.768434047699 (Which would be really an improvement).

Contributor

blink1073 commented Jan 10, 2013

Excellent, it sounds like you're most of the way there. The ability to selectively write to and pull from a pickle for a specific library would be great. These could be stored in ~/.python-jedi. Let me know if there are any Windows-specific tests you'd like to run, to figure out the lack of caching we saw.

Owner

davidhalter commented Jan 10, 2013

These could be stored in ~/.python-jedi

Where would I store these files in windows?

Contributor

blink1073 commented Jan 10, 2013

Spyderlib gives a good example: http://code.google.com/p/spyderlib/source/browse/spyderlib/userconfig.py, get_home_dir()

Collaborator

dbrgn commented Jan 10, 2013

In Windows, you would put the config files into the %AppData% directory.

Be careful that Windows doesn't properly support directories starting with a dot.

Owner

davidhalter commented Jan 10, 2013

I just realized again that Windows sucks so hard. 💌 I tried to install powershell. Not possible. 💖

Contributor

blink1073 commented Jan 10, 2013

You can create a .folder on Windows in python, just not from Windows
Explorer.

On Thursday, January 10, 2013, David Halter wrote:

I just realized again that Windows sucks so hard. [image: ☀️]


Reply to this email directly or view it on GitHubhttps://github.com/davidhalter/jedi/issues/102#issuecomment-12092918.

Collaborator

dbrgn commented Jan 10, 2013

Exactly. Or from cmd. But it leads to problem if someone renames it in explorer and can't rename it back... And other undefined behavior.

Owner

davidhalter commented Jan 10, 2013

Not my problem 😃

Collaborator

dbrgn commented Jan 10, 2013

Yeah, but I still think you should rather use %appdata%/jedi/ to store config data.

Contributor

blink1073 commented Jan 10, 2013

Agreed, unless the user needs to access it directly for configuration purposes.

Collaborator

dbrgn commented Jan 11, 2013

But Windows doesn't provide proper solutions for config files. They assume there's always a GUI to configure stuff.

Users can still edit configuration files inside %appdata%, that's not a problem for people that know what they're doing. I just don't know of any better place to put them. Dotted directories are not hidden in Windows, but can't be renamed or deleted either from Windows Explorer, so that sucks.

Thank God (and Linus) for Linux! :)

Collaborator

tkf commented Jan 11, 2013

Isn't it for cache? Do we need config directory for Jedi? Isn't editor the "frontend" for the configuration? If it's only for cache user doesn't need to know the location (ideally).

Owner

davidhalter commented Jan 11, 2013

@tkf So what would be the location to do that in windows? And I would want something that only this user has access to.
@dbrgn @blink1073 I have absolutely no idea how to do this stuff in windows. If someone wants to change the current path (save it to home directory), I really would love to pull :-)

Owner

davidhalter commented Jan 11, 2013

@blink1073 Pickling should work by now. If you can, please make some tests against the dev branch (with windows and the netbook). I would be interested in performance. It should be working (all tests pass on my local machine, travis needs some time).

Performance is slower than before (for my test cases, which don't cover large files). The first time this makes sense, since saving the pickle files is necessary, the second time it doesn't. In my opinion it should be faster. It's not much slower, but still. I need to investigate that a little further (As I said, big files might differ).

Collaborator

dbrgn commented Jan 11, 2013

I don't have any time until thursday due to upcoming exams next week, but it should be fairly easy...

Nevermind, I did it anyways. PR follows in a minute. Good thing I had a Windows XP VM sitting around :)

Collaborator

dbrgn commented Jan 13, 2013

Ah, I didn't know that .config is part of the XDG specification :)

The appdirs library doesn't look too bad. I'd create a lib or vendor package for such files though.

Owner

davidhalter commented Jan 13, 2013

@tkf How big is the library (if you strip certain parts)? Could you remove the unused parts and put it into Jedi? I don't want to do it, because I don't know this whole thing.

Contributor

blink1073 commented Jan 13, 2013

It boils down to one 300 line file.

Owner

davidhalter commented Jan 13, 2013

This sounds like a little bit too much, doesn't it?

Contributor

blink1073 commented Jan 13, 2013

I agree, @dbrgn had a good method.

Collaborator

tkf commented Jan 13, 2013

Most of the code is for making it work in Windows... It even uses ctypes to get the shit done in Windows! Also, it has detailed docstring. That's why it's bit long for its functionality.

You can eliminate functions such as the one used for getting logging directory. But I'd say put the file as-is. It would be easier to update the file when some bug is fixed in upstream.

I hope this module go in stdlib... It's boring but everybody needs it.

Collaborator

tkf commented Jan 14, 2013

I just remember that Python does the similar thing to have per user site-packages directory [1]. If you check /usr/lib/python*/site.py, you can see that it does os.environ.get("APPDATA") or "~". It seems that os.environ.get("APPDATA") does not work always. Probably that's why the appdirs module implements several methods to fetch system paths.

[1] http://www.python.org/dev/peps/pep-0370/

Owner

davidhalter commented Jan 14, 2013

Most of the code is for making it work in Windows...

Yeah. I really don't care. Calling ctypes for such a simple task is just too complicated. I would just switch back to what we have now (or something a little bit different). I'm strongly opposed to adding a 300 line file that does nothing else than returning one path (50 lines would be too much, imho).

But if @dbrgn also thinks this is the way to go, well then just do it. So he has the last word :-)

Collaborator

dbrgn commented Jan 14, 2013

Let's stay with the current 4 line version as long as we don't get bugreports :)

Collaborator

tkf commented Jan 14, 2013

My initial point was suggesting to use XDG-compliant path in Unix-like systems except Mac OS (I did not want to talk about Windows!). It looks like you can't use XDG paths in Mac OS, so using paths specified by apple in Mac OS is probably better.

There are two things we can do before getting bug report from Windows users. It looks like Windows expect capitalized vendor name and application name under APPDATA. For example, Python uses $APPDATA/Python/Python27. Also, as I said, it looks like APPDATA might not be defined sometimes (But I don't know when it happens). So, I suggest to use something like this in Windows (untested, as I don't have Windows...).

os.path.expanduser(os.path.join(os.environ.get("APPDATA") or "~", "Jedi", "Jedi"))

This is essentially what happens in site.py.

Collaborator

dbrgn commented Jan 14, 2013

@davidhalter somehow the merge commit vanished from the dev branch... did you force push again? never ever do force pushes to upstream repositories :

Suggestion by @tkf sounds good. What do you think about the pull request?

Collaborator

tkf commented Jan 14, 2013

It looks like ~/Library/Caches/Jedi is the recommended place for Mac OS.
http://stackoverflow.com/a/5084892/727827

Owner

davidhalter commented Jan 14, 2013

never ever do force pushes to upstream repositories

No I didn't!! I used it one time and never again. (I'm scared of the -f anywhere...)

Really strange.

Collaborator

dbrgn commented Jan 14, 2013

@davidhalter Oh, strange :/ Might be possible that I force pushed once to your repo instead of to mine. But very unprobable. And you would have noticed it upon pushing. But it doesn't matter anymore now :)

@tkf Alright, I'll fix that.

Contributor

blink1073 commented Jan 27, 2013

Sorry @davidhalter, I did not see where you had asked me to do more testing. I sold the netbook, as its slowness finally outweighed its portability. Here are the test results using the latest dev branch on Windows 7:

c:\Users\silvester\jedi\test>python run.py
run 72 tests with 0 fails (arrays.py)
run 43 tests with 0 fails (basic.py)
run 101 tests with 0 fails (classes.py)
run 1 tests with 0 fails (complex.py)
run 22 tests with 0 fails (decorators.py)
run 13 tests with 0 fails (docstring.py)
run 49 tests with 0 fails (dynamic.py)
run 95 tests with 0 fails (functions.py)
run 14 tests with 0 fails (generators.py)
Solution @83 not right, received ['module __init__'], wanted ['module import_tre
e']
Solution @93 not right, received ['module __init__'], wanted ['module pkg']
run 46 tests with 2 fails (goto.py)
run 53 tests with 0 fails (imports.py)
run 21 tests with 0 fails (invalid.py)
run 9 tests with 0 fails (isinstance.py)
run 2 tests with 0 fails (keywords.py)
run 1 tests with 0 fails (named_param.py)
run 39 tests with 0 fails (ordering.py)
run 51 tests with 0 fails (renaming.py)
run 18 tests with 0 fails (std.py)
run 4 tests with 0 fails (sys_path.py)
run 25 tests with 0 fails (types.py)
run 0 tests with 0 fails (__init__.py)

Summary: (2 fails of 679 tests) in 32.343s
run 72 tests with 0 fails (arrays.py)
run 43 tests with 0 fails (basic.py)
run 101 tests with 0 fails (classes.py)
run 1 tests with 0 fails (complex.py)
run 22 tests with 0 fails (decorators.py)
run 13 tests with 0 fails (docstring.py)
run 49 tests with 0 fails (dynamic.py)
run 95 tests with 0 fails (functions.py)
run 14 tests with 0 fails (generators.py)
run 46 tests with 2 fails (goto.py)
run 53 tests with 0 fails (imports.py)
run 21 tests with 0 fails (invalid.py)
run 9 tests with 0 fails (isinstance.py)
run 2 tests with 0 fails (keywords.py)
run 1 tests with 0 fails (named_param.py)
run 39 tests with 0 fails (ordering.py)
run 51 tests with 0 fails (renaming.py)
run 18 tests with 0 fails (std.py)
run 4 tests with 0 fails (sys_path.py)
run 25 tests with 0 fails (types.py)
run 0 tests with 0 fails (__init__.py)

c:\Users\silvester\jedi\test>python run.py pylab 19
run 1 tests with 0 fails (pylab_.py)

Summary: (0 fails of 1 tests) in 18.227s
run 1 tests with 0 fails (pylab_.py)

c:\Users\silvester\jedi\test>python run.py pylab 19
run 1 tests with 0 fails (pylab_.py)

Summary: (0 fails of 1 tests) in 5.280s
run 1 tests with 0 fails (pylab_.py)
Owner

davidhalter commented Jan 27, 2013

What do you think? Does this improve the situation? (I think 5s is more or less ok. Also I don't think there's an easy
way to be faster than that (I don't think rope is faster there?).

I'm sorry that I'm not doing too much at the moment. I'm pretty busy with other projects. But I'll take care of this one again.

Contributor

blink1073 commented Jan 27, 2013

By the way, I have been using jedi (0.5.5) for over a month in my version of spyder, and it has performed admirably. I moved all calls to jedi to QThreads, which was in keeping with what spyder was doing for other background tasks. Everything other than the initial loading of large libraries is as fast or faster than rope was, and I have only encountered rare instances where jedi could not find the right answer. Should I be keeping track of those situations and reporting them as bugs? For instance, the following does not find a definition, should it?

import numpy as np
ones = getattr(np, 'ones')
ones(

Also, one area that rope seemed to do slightly better is with dynamic docstrings. For example, the following prints "name" instead of the usual docstring for functools.partial.

import jedi
code = 'import functools; functools.partial('
script = jedi.Script(code, 1, len(code), None)
print script.get_in_function_call().executable.base.docstr
Contributor

blink1073 commented Jan 27, 2013

On raw startup, I can start autocomplete on numpy instantaneously using rope. It may be hard to convince the spyderlib guys that 5sec is good enough for their core audience.

Owner

davidhalter commented Jan 27, 2013

and it has performed admirably

Good! I like to hear that. I've used it now for a different project and it really sucked (sometimes). But that maybe just the latest revision :-)

Should I be keeping track of those situations and reporting them as bugs?

Yes, please everything! Seriously, my problem is to pin down the situations where it fails. I know it does, but it's hard to find it, unless you're stumbling over these things.

For instance, the following does not find a definition, should it?

Please open two new issues.

On raw startup, I can start autocomplete on numpy instantaneously using rope. It may be hard to convince the spyderlib guys that 5sec is good enough for their core audience.

Yes you're right. But do you know how rope works there? Do you guys preload numpy at startup? Or do you have some hooks to preload numpy in rope? Because on my machine:

time python -c "import numpy"

real    0m2.041s
user    0m0.100s
sys 0m0.024s

the second time:

time python -c "import numpy"

real    0m0.109s
user    0m0.080s
sys 0m0.024s

So once the files are in cache, it's fast. So this could be the reason why rope is so fast, because it doesn't analyze the python code of some libraries. I have to look into that. I don't think rope will do a good job without either analyzing (slow, but maybe the usage of c builtin ast makes it fast) the numpy files or importing them (fast).

Also I'm still on this. Jedi does some very strange stuff with third-party libraries (which I don't really understand, because I'm not using them -> Tkinter, pyparsing, etc).

Contributor

blink1073 commented Jan 28, 2013

Indeed, they are using the extension_modules option in rope, which tells rope to load the module directly. Here spyder's complete list:

"PyQt4", "PyQt4.QtGui", "QtGui", "PyQt4.QtCore", "QtCore",
        "PyQt4.QtScript", "QtScript", "os.path", "numpy", "scipy", "PIL",
        "OpenGL", "array", "audioop", "binascii", "cPickle", "cStringIO",
        "cmath", "collections", "datetime", "errno", "exceptions", "gc",
        "imageop", "imp", "itertools", "marshal", "math", "mmap", "msvcrt",
        "nt", "operator", "os", "parser", "rgbimg", "signal", "strop", "sys",
        "thread", "time", "wx", "wxPython", "xxsubtype", "zipimport", "zlib"

Pierre Raybaut (spyder's main developer), asked me to ask you if you had any plans for a Debian release, and what your thoughts are on the overall stability of the project. For the time being, the intent is to use rope as the primary, and use jedi to find something if rope cannot find it. I would personally prefer to do the switch all at once, assuming we can find a suitable warm up time workaround and you feel confident in a stable version. I would be willing to assist in the debian packaging, though I have not created a debian package personally.

Owner

davidhalter commented Jan 28, 2013

I will look into this speed issue. But I probably need quite some time (debugging performance issues is hard).

Also I think that I need to address the issue of "Why Jedi is better than rope for autocompletion" in a more scientific way. I know many use cases where Rope doesn't even try to complete.

There are already Debian Packages, the discussion is here: davidhalter/jedi-vim#50. (Btw, there's also a bunch of arch packages).

Contributor

blink1073 commented Feb 1, 2013

Yes, we saw the Debian package after the fact. I set it up so just numpy would preload at startup right after the main window is loaded. The UI is then slightly slower than usual for 2-3 sec. Then, autocomplete on numpy is instantaneous. I tested on Windows 7 and Ubuntu. I set up the following bake-off and ran it with both the rope version and the jedi version. Based on these results, I am going to recommend spyder moves to jedi for the next release. The only thing we are really missing is the initial speed of other large libraries like PyQt and wx.

import numpy as np
np.zeros()  # rope:  no goto, doc 0.16s
                # jedi: goto, doc 0.02s
np.zeros_like()  # rope: goto 0.08s, doc 0.14s
                       # jedi: goto 0.04s, doc 0.05s

from traits.api import HasTraits, Instance
HasTraits()  # rope: no doc
                   # jedi: doc 0.04s
Instance()  # rope: wrong docstring 0.39s 
                 # jedi: doc 0.02s

import functools
functools.partial() # rope: no goto, doc 0.001 sec
                           # jedi: goto, wrong doc

class Other(object):

    def do_something(self):
        """does something"""
        pass


class Test(object):

    def __init__(self):
        self.str = '123'
        self.other = Other()

    def other(self):
        self.str.capitalize  # rope: complete 0.06s
                                   # jedi: complete 0.02s
        self.other.do_something()  # rope: fails to complete 
                                               # jedi: complete 0.02s

# code completion in large file: spyderlib/widgets/sourcecode/codeeditor.py  (~2300 lines)
# complete on "rope.": rope 0.4s, jedi 0.2s

Also, adding the following to parser.py in PyFuzzyParser.parser helps in a multithreaded environment by allowing other threads to run as well. I tested this by completing on jedi in one thread while writing to a file in another, and tuned the sleep amount to somewhat optimize the total time for both operations.

self.iterator = iter(self)
# allow time for other threads to run
if not threading.current_thread().name == 'MainThread':
    time.sleep(0.0001)
# This iterator stuff is not intentional. It grew historically.
for token_type, tok in self.iterator:
Owner

davidhalter commented Feb 1, 2013

As a short overview I wrote this article a few days ago:

http://jedidjah.ch/code/2013/1/19/why_jedi_not_rope/

I didn't post it here, because it's not specific enough (just bullet points instead of real world examples). I will be more precise in the future.

But I think it's not a good idea to switch exactly now. I really have to fix a few things that are not working out as they should (depending on which version of Jedi you use performance could be much better, not startup but completion...) So I really have to get into a few things again, write another post where exactly rope is worse (the above is a very good example for that). And then we should propose to switch. At the moment I really don't like the dev branch (there are some huge problems with the parser at times...). This is not a good sign and I should first change that.

But thank you for the extensive report. I will base my next blog post on that (if I may).

Contributor

blink1073 commented Feb 1, 2013

All right, I will try and contain my enthusiasm for now. By all means use anything I have posted here. Jedi really is awesome!

Collaborator

tkf commented Feb 1, 2013

@davidhalter Nice blog post! Now I have something to link to when I asked that question.

@dbrgn dbrgn referenced this issue in tooxie/shiva-server May 2, 2013

Closed

Read config from env #74

Owner

davidhalter commented May 7, 2013

@blink1073 Jedi is pretty fast now. Always. At least on my computer :-) After an inital indexing of the data (do the caching), you can reopen VIM/whatever and complete numpy in 1.6s on a 4 year old computer. I think that's pretty good. Numpy just takes a little bit of time (a few hundred python files, think about IO). Could you try to test it again?

The slow thing (load time) now is pickle:

       29    1.200    0.041    1.369    0.047 {cPickle.load}

I don't think that we can really change that. There are a LOT of objects to load. Getting rid of pickle (using another, faster solution) will probably only improve performance minimal.

Contributor

blink1073 commented May 9, 2013

Great work @davidhalter, here are the results on my machine (about 2.5x faster than the last time):

c:\Users\silvester\Dropbox\workspace\jedi\test>python run.py pylab 19
pylab_.py                1 tests and 0 fails.

Summary: (0 fails of 1 tests) in 8.141s

c:\Users\silvester\Dropbox\workspace\jedi\test>python run.py pylab 19
pylab_.py                1 tests and 0 fails.

Summary: (0 fails of 1 tests) in 2.162s
Owner

davidhalter commented May 11, 2013

@blink1073 For the discussion about the inclusion of Jedi within spyder, wait until we release #145. That might be in 2-3 weeks, when 0.6.1 is being released.

Contributor

blink1073 commented Oct 6, 2013

I am working on the Spyder re-integration now. Two things:

-- If I turn of gc while loading pickles in cache.py, I get a 20% improvement in the import time for preload_module('PyQt4')

import gc  # at the top of the file
with open(self._get_hashed_path(path), 'rb') as f:
    gc.disable()
    parser_cache_item = pickle.load(f)
    gc.enable()

-- We used to have access to the supers for a class, which enabled us to show the superclasses in the documentation for the class with Script().get_in_function_call().executable.base.supers. This behavior is now hidden in a private variable:

import jedi
code = 'from collections import OrderedDict; OrderedDict'
script = jedi.Script(code, 1, len(code))
supers = script.goto_definitions()[0]._definition.get_super_classes()
Contributor

asmeurer commented Oct 6, 2013

You might reenable the gc in its own with block, or at least a finally block.

Owner

davidhalter commented Oct 6, 2013

@blink1073 In the case of the performance improvement, just open a Pull Request. I think 20% is definitely worth it!

We used to have access to the supers for a class, which enabled us to show the superclasses in the documentation for the class with Script().get_in_function_call().executable.base.supers.

That's intentional. The definition was never really an intended part of the public API. Sorry for the confusion earlier. If you want to access super classes, please add a feature request.

Contributor

blink1073 commented Feb 2, 2014

I'm calling this fixed by #363 and a slew of other improvements along the way. Great work @davidhalter.

@blink1073 blink1073 closed this Feb 2, 2014

@spyder-bot spyder-bot referenced this issue in spyder-ide/spyder Feb 17, 2015

Closed

Augment or replace rope with Jedi #1213

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment