Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include detection caching #175

Merged
merged 21 commits into from Aug 26, 2016
Merged

Conversation

matthijskooijman
Copy link
Collaborator

This PR adds caching to the include detection mechanism, greatly improving the speed of builds with a lot of files.

This code was only just finished and has not seen a lot of testing yet. Unit tests should be added for it as well. However, I wanted to get the code out for review ASAP to collect some comments.

This PR is made on top of #174, so the first dozen or so commits should be ignored until that is merged.

This function never fails, so no need to return and check for an error
value.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Originally, this code was written to find all include statements in a
source file, for further processing. However, since a while the
include finding was changed to run the preprocessor to see if any
includes are missing, and this class is used to find the missing
includes in the error message.

In practice, the preprocessor will bail out at the first missing
include, since there is no meaningful way for it to continue. This means
there will be at most one include in the error output. If there
would be multiple includes (either because the preprocessor decided to
continue, or there is some other output that accidentially matches the
regex), it would be harmful to actually process all of them, since the
absence of the first include might influence the presence of the second
(e.g. in the presence of #ifdefs).

This commit enforces that only one include is returned at a time,
slightly simplifying the code in the process. The filename and class
should be renamed to singular as well, but this will be left for a
future commit to reduce noise.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Previously, during include detection, every pass through
IncludesToIncludeFolders would iterate over all included filenames found
so far, and rebuild the list of include folders from the list of
included libraries again.

This commit consistes of these changes:
 - IncludesToIncludeFolders now looks at ctx.IncludeJustFound to find
   new libraries (instead of ctx.Includes which contains all included
   filenames found so far). To minimize changes in this commit, it still
   creates a slice with just the single include found and processes
   that. A future commit will simplify this.
 - With that change, ctx.Includes is no longer used so it is removed.
 - The resolveIncludeFolders function is removed. It used to build the
   list of include folders, based on the list of imported libaries. This
   list was built from scratch again every time, including the core and
   variant folders. These two folders are now added once, by
   ContainerFindIncludes, and adding the library's source folders to the
   include folder list is done by IncludesToIncludeFolders.
 - ContainerFindIncludes would start by calling IncludesToIncludeFolders
   (even before any includes were detected) to ensure that the include
   path contained the core and variant folders on the first preprocessor
   run already. Now that these paths are added by ContainerFindIncludes,
   there is no need for this extra call to IncludesToIncludeFolders, so
   it is removed.
 - resolveLibraries is modified to actually return only newly imported
   libraries, not all imported libraries.
 - resolveLibrary now returns the library it resolved to, instead of
   just adding it to the markImportedLibrary map.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This code was written to handle processing of multiple includes after
each other, but it only needs to handle one. This allows simplifying the
code a bit. Behaviour should not be changed.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
The removed code iterates over the detected libraries and adds their
source directories to the queue for running further detection on them.
However, IncludesToIncludeFolders (called by findIncludesUntilDone(),
which is called a bit above the removed code) already does this, so
there is no need to do it again. Since Ctx.FoldersWithSources is a
UniqueStringQueue, this unneeded code was not affecting behaviour in any
way.

It seems likely that this code had a purpose previously, but it was lost
in some refactor. However, a quick scan of the commit that introduced it
(d6e378: Implemented library to library dependency full discovery)
suggests that this code was useless from the start.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Previously, it was called once for the main cpp file, and then in a loop
for any extra source files from the sketch or dependent libraries.
However, this can be simplified by just adding the main .cpp file to the
queue, and then letting the loop handle it.

This produces a tiny change in the order in which source files are
processed for includes, but any setups affected by this are already
prone to errors.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This field was used during include detection, to collect the source
folders whose source files should be processed for includes. This field
was handled by the CollectAllSourceFilesFromFoldersWithSources pass,
which finds all source files in these folders and adds them to the
Context.CollectedSourceFiles queue instead.

This commit changes all code that used to add directories to the
FoldersWithSourceFiles queue to instead add all contained source files
to the CollectedSourceFiles queue. For this, the
CollectAllSourceFilesFromFoldersWithSources pass is transformed into
regular function and renamed to QueueSourceFilesInFolder, so it can be
directly called without needing any particular context entries.

Note that the filename that contains the latter function is no longer
entirely accurate, but this will be fixed in a later commit (left
untouched in this commit to simplify review).

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This prepares for a change in the next commit.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
In previous commits, this method has been signficantly simplified,
reducing it to little more than a wrapper around the resolveLibrary
helper function.

This commit renames resolveLibrary to ResolveLibrary to make it public
and lets findIncludesUntilDone call it directly (instead of going
through IncludesToIncludeFolders). findIncludesUntilDone now also takes
care of processing the ResolveLibrary result, which was previously done
by IncludesToIncludeFolders.

The signature of ResolveLibrary is also changed to accept a Context, so
it can now extract the needed variables itself, instead of needing
IncludesToIncludeFolders to extract them.

Note that the filename that contains ResolveLibrary is no longer
entirely accurate, but this will be fixed in a later commit (to simplify
review of this commit).

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This function used to be a pass, but it was reduced to a regular
function in 1001916 (Remove Context.FoldersWithSourceFiles). By now, the
function is only called from container_find_includes.go, so it makes
sense to just move it into that file. This does not change any code, it
just moves the function and renames it to be private.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Since 631b09f (Remove IncludesToIncludeFolders.Run), the primary
function in this file changed to become ResolveLibrary. This updates the
filename to match. No code is changed, this just renames a file.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This struct contains the path of a source file, together with the
library or sketch it is contained in. This allows the SourceFile methods
to generate the full paths to the source file, object file and
dependency file, making it easier to pass around source files.

This commit only adds the struct. Next commits will start to use this
struct in the include detection, where it fills an immediate need, later
it will be used in more places as well.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This is a queue of SourceFile objects, similar to UniqueStringQueue. It
is not used yet, only added.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This helper does not do much, but it will be expanded in a later commit.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Previously, the include detection kept a queue of strings representing
full paths to source files that still needed to be processed and
iterated over it. This commit turns this into a queue containing the
(new) SourceFile struct. This gives the include detection code a bit
more context about a source file, allowing to also generate the object
and dependency file paths.

Previously this was not feasible, since it did not know the origin
(library/sketch) of a path, so it could not decide where within the
build path the object file should live.

This does not actually use this new context yet, but it will be useful
when introducing caching next.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This greatly reduces compilation time of big sketches (or sketches using
big libraries) when only a small change has been mad. Instead of
rerunning include detection for *all* source files, it is now only rerun
for changed files (and usually more if the actual list of includes
changed).

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
@ArduinoBot
Copy link
Contributor

✅ Build completed.

⬇️ Build URL: http://downloads.arduino.cc/PR/arduino-builder/arduino-builder-175.zip

ℹ️ To test this build:

  1. Replace arduino-builder binary (you can find it where you installed the IDE) with the provided one

@PaulStoffregen
Copy link
Sponsor

I'm trying this now with the Teensy Audio library. Seems to make no difference. The "g++ -E" step is being run on all the (many) library files every time I click Verify.

@PaulStoffregen
Copy link
Sponsor

It does make a huge speedup when I compile the Ethernet library AdvancedChatServer example for Arduino Mega.

@PaulStoffregen
Copy link
Sponsor

PaulStoffregen commented Aug 2, 2016

Hmmm... it works if I try IRremote's IRrecvDumpV2 example, compiling for Teensy 3.2. So it does seem to work with other libs, but there's something about the Teensy Audio library that's causing it to not use the cache data at all. :(

I'm testing on Linux 64 bit.

To reproduce this, start with a clean copy of 1.6.10 and then run this installer to add all the Teensy stuff into that 1.6.10 copy, then replace arduino-builder. Select Teensy 3.2 from Tools > Boards, and then open File > Examples > Audio > Synthesis > Guitar. Every Verify re-detects all the library dependencies, not using the cached results at all.

@PaulStoffregen
Copy link
Sponsor

When it does work, wow, really speeds things up! Hopefully this can work for the audio lib... it's a very large library with a lot of files, so perhaps a tough test case?

@matthijskooijman
Copy link
Collaborator Author

@PaulStoffregen, thanks for testing and providing a broken testcase, I'll dig into it (but probably not before this weekend or next week).

You link to an installer for Teensyduino, but is there also another installation method available? e.g. just a zip to unpack, or perhaps a boards manager url? I don't really like to run arbitrary binary installers without knowing what they'll do exactly (though it's not a blocking problem, just wondering). Is there anything in teensyduino that cannot be expressed in a board manager json file?

I just ran the installer and tested the example you mention, and I can reproduce the problem.

@matthijskooijman
Copy link
Collaborator Author

I found a bit of time in the train and couldn't resist looking into this already :-) Turns out this caching code exposes two existing bugs in the code, which were conveniently hiding each other:

  1. When compiling a library, its utility directory is added to the include path if it exists (only of the library itself, not of other libraries). However, when doing include detection for a library, the utility directory is not put on the include path. In case of the Teensy Audio library this makes (for example) "sqrt_integer.h" show up as a missing include file, but arduino-builder cannot find any library for it, so it is intended to bail out (but see next).
  2. When no library is found for an include file, arduino-builder is supposed to bail out. It does this by running the preprocessor again, but instead of parsing the error, the error is shown to the user. However, this preprocessor is run on the main sketch file, not on the file that was being processed. If that compiles succesfully, no error is returned so include processing continues with the next source file in the list.

The reason that the second bug doesn't normally cause any actual problems is that if an include is really missing, the error about it will show up later when actually compiling the problematic file.

The reason the first bug doesn't normally cause problems is that when such an include from the utility folder is not found, due to the second bug, this only aborts include detection for the current file, and continues with the rest of the files. When actually compiling the problematic file, the utility directory is added to the include path as expected, and the compile compiles succesfully.

The first bug could cause problems if a utility-path include is above an include for another library, and that other library is not included from other places (such as the sketch). This would cause inlude detection to miss that other library and the build will fail later on. I'll write a testcase to confirm this behaviour.

As for fixing these, I think the fixes are fairly simple:

  1. Add the utility folder to the include path for the current library. The SourceFile class I introduced give the include detection sufficient context to know a source file is from a library and to add the utility path to the include path.
  2. An obvious fix would be to run the preprocessor on the current file, but I think just printing the error output from the previous preprocessor run (which was saved in order to parse it for #include error messages) is even more efficient and true (less likely to suddenly not fail due to subtle differences in the compiler commandline).

Fixing this will probably take a while longer, my train ride is just over :-p

@PaulStoffregen
Copy link
Sponsor

Oh, I didn't realize the utility folder would matter.

I'm continuing to use this as I work on another project. This particular code has a lot of extra .cpp and .c and .h files in the sketch folder. It seems to be caching those very nicely. Saves a lot of time when working with more complex code using a lot of files! Very nice.

Looks like there may be a similar issue with my heavily modified copy of the SD library. Can't reproduce it with the built-in SD lib. If you really want another test case, I could trim this down to something I can share.

@matthijskooijman
Copy link
Collaborator Author

Looks like there may be a similar issue with my heavily modified copy of the SD library.

Does it have a utility folder? If so, it's likely the same issue.

@PaulStoffregen
Copy link
Sponsor

Both have utility folders, but one is src/utility, the other is just the utility without any src folder.

@matthijskooijman
Copy link
Collaborator Author

Hm, src/utility should not be affected, since that isn't added to the include path during compilation. If you could provide a testcase, that would be useful (I'm also seeing if I can fix the other issue now, so perhaps you could await that before spending time on it).

When an include could not be resolved, the preprocessor was ran again to
show the include error as reported by the compiler to the user. However,
the preprocessor was now always run against the main sketch, instead of
the file currently being processed. Unless the problem was in fact in
the main sketch, this caused no error to be shown and include detection
continued with the next file instead of being aborted as intended.

This fixes the problem by running the preprocessor on the right file. In
the future, this error handling might be a bit more improved, but for
now this should be sufficient.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
@PaulStoffregen
Copy link
Sponsor

Yup, the one with src/utility seems to always work.

Even with minor issues, excellent work on this caching. It's going to make a lot of Arduino users much happier.

This path is constructed and checked when loading the library instead of
in various places among the code, slightly simplifying the code.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
When compiling source files from a library, the utility folder of the
library itself is in the include path. When doing include detection,
this did not happen, which could lead to compilation errors. Because of
the broken (but recently fixed) errorhandling in include detection, this
error was not visible to the user. It would only have a visible effect
if a library included another library *after* including a file from its
utility folder through the include path, and that other library was not
otherwise pulled in.

In practice, this would almost never occur, but with the new caching the
cache would be invalidated and this problem became visible.

This is fixed by simply including the utility folder in the include path
during include detection. This required explicitly passing the include
path through the GCCPreprocRunnerForDiscoveringIncludes and
GCCPreprocRunner.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
@matthijskooijman
Copy link
Collaborator Author

I just pushed a fix for the previously discovered issues. With this, caching works as expected for the Audio library example you indicated. Let me know if you find other issues, thanks again for testing!

@ArduinoBot
Copy link
Contributor

✅ Build completed.

⬇️ Build URL: http://downloads.arduino.cc/PR/arduino-builder/arduino-builder-175.zip

ℹ️ To test this build:

  1. Replace arduino-builder binary (you can find it where you installed the IDE) with the provided one

@FrankBoesing
Copy link

Works great !
esp. for my very large projects (with large libs like mp3 and so on, for Teensy) , it is a valuable help.

@matthijskooijman
Copy link
Collaborator Author

I've found one bug/limitation with this implementation: When compilation does not complete successfully, a lot of includes are also re-detected even though nothing changed for them. This is because the caching relies on .d files generated by compilation, so if the compilation finds an error and aborts, the caching works only for file up to and excluding the file with the error.

The fix for this would be to let the include detection gcc command actually generate .d files as well, but that requires modifying the recipes, which might be somewhat tricky in terms of compatibility. I'm not sure if there is another easier fix for this...

@cmaglie
Copy link
Member

cmaglie commented Aug 26, 2016

I've found one bug/limitation with this implementation: When compilation does not complete successfully, a lot of includes are also re-detected even though nothing changed for them. This is because the caching relies on .d files generated by compilation, so if the compilation finds an error and aborts, the caching works only for file up to and excluding the file with the error.

In this case the cache is lost and the library detection is performed again from the beginning, but this is not worse compared to the current code when this happens every time, right?

If yes, I'm for merging anyway since it's a very good improvement, even with this limitation.

@matthijskooijman
Copy link
Collaborator Author

In this case the cache is lost and the library detection is performed again from the beginning, but this is not worse compared to the current code when this happens every time, right?

Correct (it is even still a little bit better, since the cache will still be used up to the file with the error).

@cmaglie cmaglie merged commit 6f5e242 into arduino:master Aug 26, 2016
@cmaglie cmaglie added this to the 1.3.21 milestone Aug 26, 2016
@matthijskooijman matthijskooijman deleted the include-cache branch January 31, 2017 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants