Project Scaling #719

quicknir · 2016-06-13T02:46:02Z

So I've had some mostly positive, yet still mixed experiences working with rtags on a very large C++ project. Auto completion is still a bit hit or miss, and flycheck takes very very long periods of time to tag errors. At some point while I was home I decided to pull the latest rtags and rebuild and play around with the rtags source itself. Within rtags everything works beautifully. I think a lot of this is related to scale. As changes are made and tested, is there a particular codebase that is used as kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how well rtags works there. If this project is large enough to reproduce some of the issues I'm seeing, maybe that could be used to help try to fix these issues if possible.

Andersbakken · 2016-06-15T02:47:42Z

The completion results are kinda completely generated by clang with little
chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works
great but I appreciate that some operations likely don't scale as well as
they maybe could. I think you guys' project is a lot lot bigger than that.
I have an idea I want to test out that might improve the efficiency of
looking up symbols by name (rtags-find-symbol) but completions are a little
harder to optimize. I don't entirely know why flycheck should take so long
though. Are you finding that it's slow even if everything's fully parsed
and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com wrote:

So I've had some mostly positive, yet still mixed experiences working with
rtags on a very large C++ project. Auto completion is still a bit hit or
miss, and flycheck takes very very long periods of time to tag errors. At
some point while I was home I decided to pull the latest rtags and rebuild
and play around with the rtags source itself. Within rtags everything works
beautifully. I think a lot of this is related to scale. As changes are made
and tested, is there a particular codebase that is used as kind of the
standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how
well rtags works there. If this project is large enough to reproduce some
of the issues I'm seeing, maybe that could be used to help try to fix these
issues if possible.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#719, or mute the thread
https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk
.

Andersbakken · 2016-06-16T07:46:38Z

The more I think about it the more I don't entirely see why the size of the
project should matter for the two operations you mentioned (diagnostics and
completions). Let me see if I can put something in that might help debug
the situation.

Anders

On Tue, Jun 14, 2016 at 7:47 PM, Anders Bakken agbakken@gmail.com wrote:

The completion results are kinda completely generated by clang with little
chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works
great but I appreciate that some operations likely don't scale as well as
they maybe could. I think you guys' project is a lot lot bigger than that.
I have an idea I want to test out that might improve the efficiency of
looking up symbols by name (rtags-find-symbol) but completions are a little
harder to optimize. I don't entirely know why flycheck should take so long
though. Are you finding that it's slow even if everything's fully parsed
and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com
wrote:

So I've had some mostly positive, yet still mixed experiences working
with rtags on a very large C++ project. Auto completion is still a bit hit
or miss, and flycheck takes very very long periods of time to tag errors.
At some point while I was home I decided to pull the latest rtags and
rebuild and play around with the rtags source itself. Within rtags
everything works beautifully. I think a lot of this is related to scale. As
changes are made and tested, is there a particular codebase that is used as
kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how
well rtags works there. If this project is large enough to reproduce some
of the issues I'm seeing, maybe that could be used to help try to fix these
issues if possible.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#719, or mute the thread
https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk
.

Andersbakken · 2016-06-16T07:52:11Z

Actually. I have one that I just added.

If you run rdm like this:

rdm --completion-logs

You will see some info about the generation of completions on the c++ side.
It would be interesting to get some numbers for this on your big project
and compare to numbers for equivalent operations inside a smaller project
like rtags.

Anders

On Thu, Jun 16, 2016 at 12:46 AM, Anders Bakken agbakken@gmail.com wrote:

The more I think about it the more I don't entirely see why the size of
the project should matter for the two operations you mentioned (diagnostics
and completions). Let me see if I can put something in that might help
debug the situation.

Anders

On Tue, Jun 14, 2016 at 7:47 PM, Anders Bakken agbakken@gmail.com wrote:

The completion results are kinda completely generated by clang with
little chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works
great but I appreciate that some operations likely don't scale as well as
they maybe could. I think you guys' project is a lot lot bigger than that.
I have an idea I want to test out that might improve the efficiency of
looking up symbols by name (rtags-find-symbol) but completions are a little
harder to optimize. I don't entirely know why flycheck should take so long
though. Are you finding that it's slow even if everything's fully parsed
and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com
wrote:

So I've had some mostly positive, yet still mixed experiences working
with rtags on a very large C++ project. Auto completion is still a bit hit
or miss, and flycheck takes very very long periods of time to tag errors.
At some point while I was home I decided to pull the latest rtags and
rebuild and play around with the rtags source itself. Within rtags
everything works beautifully. I think a lot of this is related to scale. As
changes are made and tested, is there a particular codebase that is used as
kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how
well rtags works there. If this project is large enough to reproduce some
of the issues I'm seeing, maybe that could be used to help try to fix these
issues if possible.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#719, or mute the thread
https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk
.

quicknir · 2016-06-16T14:08:41Z

The size of the project generally ends up mattering because of C++'s terrible "module" system. Basically as you write code, some fraction of the code is in header files. All that code just gets copied and pasted into any other file that requires it.

While a larger project should not be #including more files in each file, when you start thinking about transitive includes, it becomes clear that the larger the project, the more transitive includes you get. So you'd actually expect files in the most dependent layer to scale in size (post pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large project.

I will add the flag you suggest and see what I come up with. Interestingly the speed of the database is usually excellent, but every once in a while there is a looooong pause. Not sure what causes this.

Andersbakken · 2016-06-18T05:58:07Z

I get what you're saying about the transitive includes. I happen to work a
lot on some of our core tooling headers as well and basically every time I
touch them several hundreds of files get dirtied.

Anders

On Thu, Jun 16, 2016 at 7:08 AM, quicknir notifications@github.com wrote:

The size of the project generally ends up mattering because of C++'s
terrible "module" system. Basically as you write code, some fraction of the
code is in header files. All that code just gets copied and pasted into any
other file that requires it.

While a larger project should not be #including more files in each file,
when you start thinking about transitive includes, it becomes clear
that the larger the project, the more transitive includes you get. So you'd
actually expect files in the most dependent layer to scale in size (post
pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large project.

I will add the flag you suggest and see what I come up with. Interestingly
the speed of the database is usually excellent, but every once in a while
there is a looooong pause. Not sure what causes this.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSk8QEZNEkps5OpWFIbq6cX0U_Jhaks5qMVjpgaJpZM4Iz9Qk
.

Andersbakken · 2016-06-18T06:19:32Z

I just improved the completion logs a little btw.

Anders

On Fri, Jun 17, 2016 at 10:57 PM, Anders Bakken agbakken@gmail.com wrote:

I get what you're saying about the transitive includes. I happen to work a
lot on some of our core tooling headers as well and basically every time I
touch them several hundreds of files get dirtied.

Anders

On Thu, Jun 16, 2016 at 7:08 AM, quicknir notifications@github.com
wrote:

The size of the project generally ends up mattering because of C++'s
terrible "module" system. Basically as you write code, some fraction of the
code is in header files. All that code just gets copied and pasted into any
other file that requires it.

While a larger project should not be #including more files in each file,
when you start thinking about transitive includes, it becomes clear
that the larger the project, the more transitive includes you get. So you'd
actually expect files in the most dependent layer to scale in size (post
pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large
project.

I will add the flag you suggest and see what I come up with.
Interestingly the speed of the database is usually excellent, but every
once in a while there is a looooong pause. Not sure what causes this.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSk8QEZNEkps5OpWFIbq6cX0U_Jhaks5qMVjpgaJpZM4Iz9Qk
.

quicknir · 2016-06-28T13:26:12Z

So i pulled to latest rtags and rebuilt. Overall it's a bit better behaved now, though sometimes auto completion is very slow still. I get messages like this:

CODE COMPLETION 7.072s reparsing translation unit <redacted>
CODE COMPLETION 16.69s Generated completions for <redacted> successfully in 181 ms

Is this benchmark purely the compilation time for that translation unit? The truth is that seems a bit high even for something at the most dependent layer of a large project. 16 seconds is a lot of compiler cycles.Even 7 seconds seems like a lot. I should try comping it separately and benchmarking it.

Other than that, another issue I'm finding is that because preparing completions can be so slow, and seemingly blocking for the server, it also affects other things. For instance, go to definition which is normally very snappy, is sometimes extremely slow as completions are being prepared in the background and then the server simply doesn't answer for many seconds. Maybe completion prep could be done asynchronously in a thread? Though I realize that introduces considerably complexity.

quicknir · 2016-06-29T19:46:06Z

I did some benchmarking on my own, and indeed it takes quite a while to rebuild the file. Specifically it takes around 17 seconds to compile my end files, though that is with code generation. Turning off debug symbols reduces this to 14, and adding -fsyntax-only reduced it to 11. Is rtags using -fsyntax-only? As you can see it makes quite a huge difference. I'm guessing you already use this though.

Is there any way to do any kind of caching of header files? Fundamentally I think this is the only way to solve this. How about precompiled headers, or pretokenized headers? http://clang.llvm.org/docs/PTHInternals.html. In principle these would solve the problem.

quicknir · 2016-06-29T19:47:53Z

Actually, this doesn't seem that terrible: http://clang.llvm.org/docs/PCHInternals.html.

Andersbakken · 2016-07-02T17:21:44Z

We actually do have some pch-code though it's kinda experimental and I
can't remember to what degree it works.

You can start rdm with --pch-enabled

We've tried and failed at trying to auto-generate sensible pch headers in
the past but I think if your project already has pch headers and you enable
that switch it might have a chance at working.

The times for generating the translation unit does seem long. Due to how
completion works in libclang you also have to reparse the translation unit
after creating it and it seems like between the two operations it took the
whole 16 seconds. Once that is done (and we try to do it preemptively when
you switch buffers and keep it in the cache etc) completions should be
reasonably snappy though (like the 180ms in your log).

Anders

On Wed, Jun 29, 2016 at 12:47 PM, quicknir notifications@github.com wrote:

Actually, this doesn't seem that terrible:
http://clang.llvm.org/docs/PCHInternals.html.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSikqHp4gahJ_zPzRBOb5OB0gl70oks5qQsvpgaJpZM4Iz9Qk
.

quicknir · 2016-07-05T14:27:18Z

Is there any way to disable this preemptive caching? I find for me it has relatively few hits, and on the other hand what happens is that it's often caching when I want it to go to definition; in this case I now get as long a pause for goto def as I do for auto completion, whereas normally its instant.

I'll try to look into pch and see what the deal is. The problem though is that ultimately for this to work well, rtags itself has to generate the pch's. Otherwise if you modify a file in your project that has precompiled headers, and then switch to another file, you could be using the stale copy of the first.

quicknir · 2016-07-05T16:24:52Z

I recently came across this link: http://stackoverflow.com/questions/26989374/faster-code-completion-with-clang and now I'm more confused. It seems like there is a specialized precompiled preamble available for parsing translation units in libclang. If that's the case, then I can't understand why auto completion is taking so long? My source file is only about 200 lines, all of the benchmarks I did were raw compilations without using any precompiled preambles. Surely with a precompiled preamble this should be lightning fast?

quicknir · 2016-07-06T22:58:09Z

Another update: I finally decided to give you complete me (daemon) (ycmd) a shot on emacs after putting it off for a while. Indeed, based on its performance characteristics, it seems to indeed use this precompiled preamble trick. When you first open a file, there's a decent (async) pause, about 5-10 seconds during which you can't use autocomplete and don't get flycheck errors. After that though, everything happens pretty instantly. Auto completion and flycheck are under a second response time, sometimes basically instant. If you add/remove an include, or if you modify one of the includes and then save the file, it will detect that the preamble has changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is fantastic for making extended edits to a single file. This approach isn't strictly as good as using precompiled headers, but it's far less work for the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and rtags for code navigation as it has many features that ycmd does not offer (ycmd does not keep a database at all). The combination is working well but obviously it's more effort as an end user, as you have to setup both, deal with minor quirks of both of various natures (for ycmd it took me about a full day to get everything working perfectly, my friends' rtags setup that I copied had a similar experience getting rtags to work perfectly with our build system).

Anyhow it's just a thought I guess, that it might be possible with a reasonable dev effort to get the same performance as ycmd, as its just using built in libclang features.

Andersbakken · 2016-07-07T17:27:56Z

There are almost certainly some bugs in the current rtags auto-completion
code. I'll try to dig in to figure out what we're doing wrong but it's a
fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com wrote:

Another update: I finally decided to give you complete me (daemon) (ycmd)
a shot on emacs after putting it off for a while. Indeed, based on its
performance characteristics, it seems to indeed use this precompiled
preamble trick. When you first open a file, there's a decent (async) pause,
about 5-10 seconds during which you can't use autocomplete and don't get
flycheck errors. After that though, everything happens pretty instantly.
Auto completion and flycheck are under a second response time, sometimes
basically instant. If you add/remove an include, or if you modify one of
the includes and then save the file, it will detect that the preamble has
changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is
fantastic for making extended edits to a single file. This approach isn't
strictly as good as using precompiled headers, but it's far less work for
the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and
rtags for code navigation as it has many features that ycmd does not offer
(ycmd does not keep a database at all). The combination is working well but
obviously it's more effort as an end user, as you have to setup both, deal
with minor quirks of both of various natures (for ycmd it took me about a
full day to get everything working perfectly, my friends' rtags setup that
I copied had a similar experience getting rtags to work perfectly with our
build system).

Anyhow it's just a thought I guess, that it might be possible with a
reasonable dev effort to get the same performance as ycmd, as its just
using built in libclang features.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk
.

Andersbakken · 2016-07-08T22:11:59Z

I've refactored and simplified the completion code a little bit. I think
it's a little bit faster now. I did see that ycm uses the relatively
recently added CXTranslationUnit_CreatePreambleOnFirstParse flag so I added
that too but I find that I still have to do a reparse before the preamble
is generated and as such you won't be able to use completions for several
seconds after opening up a new buffer.

Once it's finished with the reparse I seem to be getting quite snappy
completions though. Can you give it a shot and see it there's any
improvements in your project?

Anders

On Thu, Jul 7, 2016 at 10:27 AM, Anders Bakken agbakken@gmail.com wrote:

There are almost certainly some bugs in the current rtags auto-completion
code. I'll try to dig in to figure out what we're doing wrong but it's a
fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com wrote:

Another update: I finally decided to give you complete me (daemon) (ycmd)
a shot on emacs after putting it off for a while. Indeed, based on its
performance characteristics, it seems to indeed use this precompiled
preamble trick. When you first open a file, there's a decent (async) pause,
about 5-10 seconds during which you can't use autocomplete and don't get
flycheck errors. After that though, everything happens pretty instantly.
Auto completion and flycheck are under a second response time, sometimes
basically instant. If you add/remove an include, or if you modify one of
the includes and then save the file, it will detect that the preamble has
changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is
fantastic for making extended edits to a single file. This approach isn't
strictly as good as using precompiled headers, but it's far less work for
the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and
rtags for code navigation as it has many features that ycmd does not offer
(ycmd does not keep a database at all). The combination is working well but
obviously it's more effort as an end user, as you have to setup both, deal
with minor quirks of both of various natures (for ycmd it took me about a
full day to get everything working perfectly, my friends' rtags setup that
I copied had a similar experience getting rtags to work perfectly with our
build system).

Anyhow it's just a thought I guess, that it might be possible with a
reasonable dev effort to get the same performance as ycmd, as its just
using built in libclang features.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk
.

Andersbakken · 2016-07-09T21:52:42Z

I made another small change just now that makes completions work again
completions that occur with :: -> . etc.

Anders

On Fri, Jul 8, 2016 at 3:11 PM, Anders Bakken agbakken@gmail.com wrote:

I've refactored and simplified the completion code a little bit. I think
it's a little bit faster now. I did see that ycm uses the relatively
recently added CXTranslationUnit_CreatePreambleOnFirstParse flag so I added
that too but I find that I still have to do a reparse before the preamble
is generated and as such you won't be able to use completions for several
seconds after opening up a new buffer.

Once it's finished with the reparse I seem to be getting quite snappy
completions though. Can you give it a shot and see it there's any
improvements in your project?

Anders

On Thu, Jul 7, 2016 at 10:27 AM, Anders Bakken agbakken@gmail.com wrote:

There are almost certainly some bugs in the current rtags auto-completion
code. I'll try to dig in to figure out what we're doing wrong but it's a
fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com
wrote:

Another update: I finally decided to give you complete me (daemon)
(ycmd) a shot on emacs after putting it off for a while. Indeed, based on
its performance characteristics, it seems to indeed use this precompiled
preamble trick. When you first open a file, there's a decent (async) pause,
about 5-10 seconds during which you can't use autocomplete and don't get
flycheck errors. After that though, everything happens pretty instantly.
Auto completion and flycheck are under a second response time, sometimes
basically instant. If you add/remove an include, or if you modify one of
the includes and then save the file, it will detect that the preamble has
changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is
fantastic for making extended edits to a single file. This approach isn't
strictly as good as using precompiled headers, but it's far less work for
the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and
rtags for code navigation as it has many features that ycmd does not offer
(ycmd does not keep a database at all). The combination is working well but
obviously it's more effort as an end user, as you have to setup both, deal
with minor quirks of both of various natures (for ycmd it took me about a
full day to get everything working perfectly, my friends' rtags setup that
I copied had a similar experience getting rtags to work perfectly with our
build system).

Anyhow it's just a thought I guess, that it might be possible with a
reasonable dev effort to get the same performance as ycmd, as its just
using built in libclang features.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#719 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk
.

quicknir · 2016-07-10T04:24:53Z

Indeed, the improvement is dramatic! Very very nice! flycheck however still seems to be running at a similar speed to before. There are some other minor issues with auto completion (showing private members for instance) but I'll open separate tickets for those.

jbeigel mentioned this issue Jul 11, 2016

Problem w/ automatic company popup #727

Closed

adzenith mentioned this issue Nov 2, 2016

rdm: process requested files first #847

Open

cviebig mentioned this issue Nov 4, 2016

Scaling up to large source base ? #436

Open

quicknir closed this as completed Dec 12, 2016

suo mentioned this issue Sep 6, 2017

Diagnostics slow in large project #1049

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Scaling #719

Project Scaling #719

quicknir commented Jun 13, 2016

Andersbakken commented Jun 15, 2016

Andersbakken commented Jun 16, 2016

Andersbakken commented Jun 16, 2016

quicknir commented Jun 16, 2016

Andersbakken commented Jun 18, 2016

Andersbakken commented Jun 18, 2016

quicknir commented Jun 28, 2016

quicknir commented Jun 29, 2016

quicknir commented Jun 29, 2016

Andersbakken commented Jul 2, 2016

quicknir commented Jul 5, 2016

quicknir commented Jul 5, 2016

quicknir commented Jul 6, 2016

Andersbakken commented Jul 7, 2016

Andersbakken commented Jul 8, 2016

Andersbakken commented Jul 9, 2016

quicknir commented Jul 10, 2016

Project Scaling #719

Project Scaling #719

Comments

quicknir commented Jun 13, 2016

Andersbakken commented Jun 15, 2016

Andersbakken commented Jun 16, 2016

Andersbakken commented Jun 16, 2016

quicknir commented Jun 16, 2016

Andersbakken commented Jun 18, 2016

Andersbakken commented Jun 18, 2016

quicknir commented Jun 28, 2016

quicknir commented Jun 29, 2016

quicknir commented Jun 29, 2016

Andersbakken commented Jul 2, 2016

quicknir commented Jul 5, 2016

quicknir commented Jul 5, 2016

quicknir commented Jul 6, 2016

Andersbakken commented Jul 7, 2016

Andersbakken commented Jul 8, 2016

Andersbakken commented Jul 9, 2016

quicknir commented Jul 10, 2016