Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Scaling #719

Closed
quicknir opened this issue Jun 13, 2016 · 17 comments
Closed

Project Scaling #719

quicknir opened this issue Jun 13, 2016 · 17 comments

Comments

@quicknir
Copy link

So I've had some mostly positive, yet still mixed experiences working with rtags on a very large C++ project. Auto completion is still a bit hit or miss, and flycheck takes very very long periods of time to tag errors. At some point while I was home I decided to pull the latest rtags and rebuild and play around with the rtags source itself. Within rtags everything works beautifully. I think a lot of this is related to scale. As changes are made and tested, is there a particular codebase that is used as kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how well rtags works there. If this project is large enough to reproduce some of the issues I'm seeing, maybe that could be used to help try to fix these issues if possible.

@Andersbakken
Copy link
Owner

The completion results are kinda completely generated by clang with little
chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works
great but I appreciate that some operations likely don't scale as well as
they maybe could. I think you guys' project is a lot lot bigger than that.
I have an idea I want to test out that might improve the efficiency of
looking up symbols by name (rtags-find-symbol) but completions are a little
harder to optimize. I don't entirely know why flycheck should take so long
though. Are you finding that it's slow even if everything's fully parsed
and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com wrote:

So I've had some mostly positive, yet still mixed experiences working with
rtags on a very large C++ project. Auto completion is still a bit hit or
miss, and flycheck takes very very long periods of time to tag errors. At
some point while I was home I decided to pull the latest rtags and rebuild
and play around with the rtags source itself. Within rtags everything works
beautifully. I think a lot of this is related to scale. As changes are made
and tested, is there a particular codebase that is used as kind of the
standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how
well rtags works there. If this project is large enough to reproduce some
of the issues I'm seeing, maybe that could be used to help try to fix these
issues if possible.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#719, or mute the thread
https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk
.

@Andersbakken
Copy link
Owner

The more I think about it the more I don't entirely see why the size of the
project should matter for the two operations you mentioned (diagnostics and
completions). Let me see if I can put something in that might help debug
the situation.

Anders

On Tue, Jun 14, 2016 at 7:47 PM, Anders Bakken agbakken@gmail.com wrote:

The completion results are kinda completely generated by clang with little
chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works
great but I appreciate that some operations likely don't scale as well as
they maybe could. I think you guys' project is a lot lot bigger than that.
I have an idea I want to test out that might improve the efficiency of
looking up symbols by name (rtags-find-symbol) but completions are a little
harder to optimize. I don't entirely know why flycheck should take so long
though. Are you finding that it's slow even if everything's fully parsed
and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com
wrote:

So I've had some mostly positive, yet still mixed experiences working
with rtags on a very large C++ project. Auto completion is still a bit hit
or miss, and flycheck takes very very long periods of time to tag errors.
At some point while I was home I decided to pull the latest rtags and
rebuild and play around with the rtags source itself. Within rtags
everything works beautifully. I think a lot of this is related to scale. As
changes are made and tested, is there a particular codebase that is used as
kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how
well rtags works there. If this project is large enough to reproduce some
of the issues I'm seeing, maybe that could be used to help try to fix these
issues if possible.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#719, or mute the thread
https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk
.

@Andersbakken
Copy link
Owner

Actually. I have one that I just added.

If you run rdm like this:

rdm --completion-logs

You will see some info about the generation of completions on the c++ side.
It would be interesting to get some numbers for this on your big project
and compare to numbers for equivalent operations inside a smaller project
like rtags.

Anders

On Thu, Jun 16, 2016 at 12:46 AM, Anders Bakken agbakken@gmail.com wrote:

The more I think about it the more I don't entirely see why the size of
the project should matter for the two operations you mentioned (diagnostics
and completions). Let me see if I can put something in that might help
debug the situation.

Anders

On Tue, Jun 14, 2016 at 7:47 PM, Anders Bakken agbakken@gmail.com wrote:

The completion results are kinda completely generated by clang with
little chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works
great but I appreciate that some operations likely don't scale as well as
they maybe could. I think you guys' project is a lot lot bigger than that.
I have an idea I want to test out that might improve the efficiency of
looking up symbols by name (rtags-find-symbol) but completions are a little
harder to optimize. I don't entirely know why flycheck should take so long
though. Are you finding that it's slow even if everything's fully parsed
and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com
wrote:

So I've had some mostly positive, yet still mixed experiences working
with rtags on a very large C++ project. Auto completion is still a bit hit
or miss, and flycheck takes very very long periods of time to tag errors.
At some point while I was home I decided to pull the latest rtags and
rebuild and play around with the rtags source itself. Within rtags
everything works beautifully. I think a lot of this is related to scale. As
changes are made and tested, is there a particular codebase that is used as
kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how
well rtags works there. If this project is large enough to reproduce some
of the issues I'm seeing, maybe that could be used to help try to fix these
issues if possible.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#719, or mute the thread
https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk
.

@quicknir
Copy link
Author

The size of the project generally ends up mattering because of C++'s terrible "module" system. Basically as you write code, some fraction of the code is in header files. All that code just gets copied and pasted into any other file that requires it.

While a larger project should not be #including more files in each file, when you start thinking about transitive includes, it becomes clear that the larger the project, the more transitive includes you get. So you'd actually expect files in the most dependent layer to scale in size (post pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large project.

I will add the flag you suggest and see what I come up with. Interestingly the speed of the database is usually excellent, but every once in a while there is a looooong pause. Not sure what causes this.

@Andersbakken
Copy link
Owner

I get what you're saying about the transitive includes. I happen to work a
lot on some of our core tooling headers as well and basically every time I
touch them several hundreds of files get dirtied.

Anders

On Thu, Jun 16, 2016 at 7:08 AM, quicknir notifications@github.com wrote:

The size of the project generally ends up mattering because of C++'s
terrible "module" system. Basically as you write code, some fraction of the
code is in header files. All that code just gets copied and pasted into any
other file that requires it.

While a larger project should not be #including more files in each file,
when you start thinking about transitive includes, it becomes clear
that the larger the project, the more transitive includes you get. So you'd
actually expect files in the most dependent layer to scale in size (post
pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large project.

I will add the flag you suggest and see what I come up with. Interestingly
the speed of the database is usually excellent, but every once in a while
there is a looooong pause. Not sure what causes this.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSk8QEZNEkps5OpWFIbq6cX0U_Jhaks5qMVjpgaJpZM4Iz9Qk
.

@Andersbakken
Copy link
Owner

I just improved the completion logs a little btw.

Anders

On Fri, Jun 17, 2016 at 10:57 PM, Anders Bakken agbakken@gmail.com wrote:

I get what you're saying about the transitive includes. I happen to work a
lot on some of our core tooling headers as well and basically every time I
touch them several hundreds of files get dirtied.

Anders

On Thu, Jun 16, 2016 at 7:08 AM, quicknir notifications@github.com
wrote:

The size of the project generally ends up mattering because of C++'s
terrible "module" system. Basically as you write code, some fraction of the
code is in header files. All that code just gets copied and pasted into any
other file that requires it.

While a larger project should not be #including more files in each file,
when you start thinking about transitive includes, it becomes clear
that the larger the project, the more transitive includes you get. So you'd
actually expect files in the most dependent layer to scale in size (post
pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large
project.

I will add the flag you suggest and see what I come up with.
Interestingly the speed of the database is usually excellent, but every
once in a while there is a looooong pause. Not sure what causes this.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSk8QEZNEkps5OpWFIbq6cX0U_Jhaks5qMVjpgaJpZM4Iz9Qk
.

@quicknir
Copy link
Author

So i pulled to latest rtags and rebuilt. Overall it's a bit better behaved now, though sometimes auto completion is very slow still. I get messages like this:

CODE COMPLETION 7.072s reparsing translation unit <redacted>
CODE COMPLETION 16.69s Generated completions for <redacted> successfully in 181 ms

Is this benchmark purely the compilation time for that translation unit? The truth is that seems a bit high even for something at the most dependent layer of a large project. 16 seconds is a lot of compiler cycles.Even 7 seconds seems like a lot. I should try comping it separately and benchmarking it.

Other than that, another issue I'm finding is that because preparing completions can be so slow, and seemingly blocking for the server, it also affects other things. For instance, go to definition which is normally very snappy, is sometimes extremely slow as completions are being prepared in the background and then the server simply doesn't answer for many seconds. Maybe completion prep could be done asynchronously in a thread? Though I realize that introduces considerably complexity.

@quicknir
Copy link
Author

I did some benchmarking on my own, and indeed it takes quite a while to rebuild the file. Specifically it takes around 17 seconds to compile my end files, though that is with code generation. Turning off debug symbols reduces this to 14, and adding -fsyntax-only reduced it to 11. Is rtags using -fsyntax-only? As you can see it makes quite a huge difference. I'm guessing you already use this though.

Is there any way to do any kind of caching of header files? Fundamentally I think this is the only way to solve this. How about precompiled headers, or pretokenized headers? http://clang.llvm.org/docs/PTHInternals.html. In principle these would solve the problem.

@quicknir
Copy link
Author

Actually, this doesn't seem that terrible: http://clang.llvm.org/docs/PCHInternals.html.

@Andersbakken
Copy link
Owner

We actually do have some pch-code though it's kinda experimental and I
can't remember to what degree it works.

You can start rdm with --pch-enabled

We've tried and failed at trying to auto-generate sensible pch headers in
the past but I think if your project already has pch headers and you enable
that switch it might have a chance at working.

The times for generating the translation unit does seem long. Due to how
completion works in libclang you also have to reparse the translation unit
after creating it and it seems like between the two operations it took the
whole 16 seconds. Once that is done (and we try to do it preemptively when
you switch buffers and keep it in the cache etc) completions should be
reasonably snappy though (like the 180ms in your log).

Anders

On Wed, Jun 29, 2016 at 12:47 PM, quicknir notifications@github.com wrote:

Actually, this doesn't seem that terrible:
http://clang.llvm.org/docs/PCHInternals.html.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSikqHp4gahJ_zPzRBOb5OB0gl70oks5qQsvpgaJpZM4Iz9Qk
.

@quicknir
Copy link
Author

quicknir commented Jul 5, 2016

Is there any way to disable this preemptive caching? I find for me it has relatively few hits, and on the other hand what happens is that it's often caching when I want it to go to definition; in this case I now get as long a pause for goto def as I do for auto completion, whereas normally its instant.

I'll try to look into pch and see what the deal is. The problem though is that ultimately for this to work well, rtags itself has to generate the pch's. Otherwise if you modify a file in your project that has precompiled headers, and then switch to another file, you could be using the stale copy of the first.

@quicknir
Copy link
Author

quicknir commented Jul 5, 2016

I recently came across this link: http://stackoverflow.com/questions/26989374/faster-code-completion-with-clang and now I'm more confused. It seems like there is a specialized precompiled preamble available for parsing translation units in libclang. If that's the case, then I can't understand why auto completion is taking so long? My source file is only about 200 lines, all of the benchmarks I did were raw compilations without using any precompiled preambles. Surely with a precompiled preamble this should be lightning fast?

@quicknir
Copy link
Author

quicknir commented Jul 6, 2016

Another update: I finally decided to give you complete me (daemon) (ycmd) a shot on emacs after putting it off for a while. Indeed, based on its performance characteristics, it seems to indeed use this precompiled preamble trick. When you first open a file, there's a decent (async) pause, about 5-10 seconds during which you can't use autocomplete and don't get flycheck errors. After that though, everything happens pretty instantly. Auto completion and flycheck are under a second response time, sometimes basically instant. If you add/remove an include, or if you modify one of the includes and then save the file, it will detect that the preamble has changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is fantastic for making extended edits to a single file. This approach isn't strictly as good as using precompiled headers, but it's far less work for the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and rtags for code navigation as it has many features that ycmd does not offer (ycmd does not keep a database at all). The combination is working well but obviously it's more effort as an end user, as you have to setup both, deal with minor quirks of both of various natures (for ycmd it took me about a full day to get everything working perfectly, my friends' rtags setup that I copied had a similar experience getting rtags to work perfectly with our build system).

Anyhow it's just a thought I guess, that it might be possible with a reasonable dev effort to get the same performance as ycmd, as its just using built in libclang features.

@Andersbakken
Copy link
Owner

There are almost certainly some bugs in the current rtags auto-completion
code. I'll try to dig in to figure out what we're doing wrong but it's a
fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com wrote:

Another update: I finally decided to give you complete me (daemon) (ycmd)
a shot on emacs after putting it off for a while. Indeed, based on its
performance characteristics, it seems to indeed use this precompiled
preamble trick. When you first open a file, there's a decent (async) pause,
about 5-10 seconds during which you can't use autocomplete and don't get
flycheck errors. After that though, everything happens pretty instantly.
Auto completion and flycheck are under a second response time, sometimes
basically instant. If you add/remove an include, or if you modify one of
the includes and then save the file, it will detect that the preamble has
changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is
fantastic for making extended edits to a single file. This approach isn't
strictly as good as using precompiled headers, but it's far less work for
the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and
rtags for code navigation as it has many features that ycmd does not offer
(ycmd does not keep a database at all). The combination is working well but
obviously it's more effort as an end user, as you have to setup both, deal
with minor quirks of both of various natures (for ycmd it took me about a
full day to get everything working perfectly, my friends' rtags setup that
I copied had a similar experience getting rtags to work perfectly with our
build system).

Anyhow it's just a thought I guess, that it might be possible with a
reasonable dev effort to get the same performance as ycmd, as its just
using built in libclang features.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk
.

@Andersbakken
Copy link
Owner

I've refactored and simplified the completion code a little bit. I think
it's a little bit faster now. I did see that ycm uses the relatively
recently added CXTranslationUnit_CreatePreambleOnFirstParse flag so I added
that too but I find that I still have to do a reparse before the preamble
is generated and as such you won't be able to use completions for several
seconds after opening up a new buffer.

Once it's finished with the reparse I seem to be getting quite snappy
completions though. Can you give it a shot and see it there's any
improvements in your project?

Anders

On Thu, Jul 7, 2016 at 10:27 AM, Anders Bakken agbakken@gmail.com wrote:

There are almost certainly some bugs in the current rtags auto-completion
code. I'll try to dig in to figure out what we're doing wrong but it's a
fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com wrote:

Another update: I finally decided to give you complete me (daemon) (ycmd)
a shot on emacs after putting it off for a while. Indeed, based on its
performance characteristics, it seems to indeed use this precompiled
preamble trick. When you first open a file, there's a decent (async) pause,
about 5-10 seconds during which you can't use autocomplete and don't get
flycheck errors. After that though, everything happens pretty instantly.
Auto completion and flycheck are under a second response time, sometimes
basically instant. If you add/remove an include, or if you modify one of
the includes and then save the file, it will detect that the preamble has
changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is
fantastic for making extended edits to a single file. This approach isn't
strictly as good as using precompiled headers, but it's far less work for
the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and
rtags for code navigation as it has many features that ycmd does not offer
(ycmd does not keep a database at all). The combination is working well but
obviously it's more effort as an end user, as you have to setup both, deal
with minor quirks of both of various natures (for ycmd it took me about a
full day to get everything working perfectly, my friends' rtags setup that
I copied had a similar experience getting rtags to work perfectly with our
build system).

Anyhow it's just a thought I guess, that it might be possible with a
reasonable dev effort to get the same performance as ycmd, as its just
using built in libclang features.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk
.

@Andersbakken
Copy link
Owner

I made another small change just now that makes completions work again
completions that occur with :: -> . etc.

Anders

On Fri, Jul 8, 2016 at 3:11 PM, Anders Bakken agbakken@gmail.com wrote:

I've refactored and simplified the completion code a little bit. I think
it's a little bit faster now. I did see that ycm uses the relatively
recently added CXTranslationUnit_CreatePreambleOnFirstParse flag so I added
that too but I find that I still have to do a reparse before the preamble
is generated and as such you won't be able to use completions for several
seconds after opening up a new buffer.

Once it's finished with the reparse I seem to be getting quite snappy
completions though. Can you give it a shot and see it there's any
improvements in your project?

Anders

On Thu, Jul 7, 2016 at 10:27 AM, Anders Bakken agbakken@gmail.com wrote:

There are almost certainly some bugs in the current rtags auto-completion
code. I'll try to dig in to figure out what we're doing wrong but it's a
fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com
wrote:

Another update: I finally decided to give you complete me (daemon)
(ycmd) a shot on emacs after putting it off for a while. Indeed, based on
its performance characteristics, it seems to indeed use this precompiled
preamble trick. When you first open a file, there's a decent (async) pause,
about 5-10 seconds during which you can't use autocomplete and don't get
flycheck errors. After that though, everything happens pretty instantly.
Auto completion and flycheck are under a second response time, sometimes
basically instant. If you add/remove an include, or if you modify one of
the includes and then save the file, it will detect that the preamble has
changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is
fantastic for making extended edits to a single file. This approach isn't
strictly as good as using precompiled headers, but it's far less work for
the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and
rtags for code navigation as it has many features that ycmd does not offer
(ycmd does not keep a database at all). The combination is working well but
obviously it's more effort as an end user, as you have to setup both, deal
with minor quirks of both of various natures (for ycmd it took me about a
full day to get everything working perfectly, my friends' rtags setup that
I copied had a similar experience getting rtags to work perfectly with our
build system).

Anyhow it's just a thought I guess, that it might be possible with a
reasonable dev effort to get the same performance as ycmd, as its just
using built in libclang features.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk
.

@quicknir
Copy link
Author

Indeed, the improvement is dramatic! Very very nice! flycheck however still seems to be running at a similar speed to before. There are some other minor issues with auto completion (showing private members for instance) but I'll open separate tickets for those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants