diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md index d5fa140..1d19460 100644 --- a/_posts/2024/2024-02-06-about-caching-in-eo.md +++ b/_posts/2024/2024-02-06-about-caching-in-eo.md @@ -6,21 +6,21 @@ author: Alekseeva Yana --- -## Introduction -Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an -assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools, -helping to assemble a project faster, namely caching, task parallelization, distributed building and much more. -The subject of this article is caching, because completed tasks caching allows not to spend resources again. -So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work. -While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin` -for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of -compilation time and caching time is not the most reliable verification. Unit tests were written showing that -cache does not work correctly. Also reading a file was necessary for getting a programme name -that slowed down an assembly. -That we came to conclusion that we need caching with a reliable verification which does not require reading a file -from disk. And using cache should save us enough time for building a project. - -The goal of this article is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`) +## Introduction +In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution. +While developing [EO](https://github.com/objectionary/eo) we found a caching +[error](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin` +for EO version `0.34.0`. The error occurred because the cache was searched for the needed file using +a comparison of compilation time and caching time. +This is not the most reliable verification method, +because caching time does not have to be equal to compilation time. +[Unit tests](https://github.com/objectionary/eo/pull/2749) were written to show that the +cache does not work correctly. Additionally, reading a file was necessary to obtain a program name +that slowed down the build process. +That we came to the conclusion that we need caching with a reliable verification method +that does not require reading a file system. Using a cache should save us enough time for building a project. + +The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`) and to create effective caching in [EO](https://github.com/objectionary/eo). @@ -29,17 +29,18 @@ and to create effective caching in [EO](https://github.com/objectionary/eo). ### ccache/sccache In compiled programming languages, building a project takes a long time. -The reason of long compilation is time is spent on preparing, optimizing and checking the code, and so on. +The reason for the lengthy compilation time is that time is spent on preparing, +optimizing, checking the code, and so on. To speed up the assembly of compiled languages, ccache and sccache are used. -Let's look at the compilation scheme using C++ as an example, +Let's look at the compilation scheme using C++ as an example to imagine the build process in compiled languages:
@@ -92,15 +94,13 @@ But in `Maven` there is no build-time caching as such.
`Maven` suggests rebuilding only changed project modules to speed up the build process.
### Gradle
-`Gradle`, like `Maven`, builds a project in
-[LifeCycles Gradle](https://docs.gradle.org/current/userguide/build_lifecycle.html), which consists of phases.
But unlike `Maven`, `Gradle` builds projects using a task graph -
[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph),
in which some tasks can be executed synchronously.
-To speed up project builds, `Gradle` uses incremental builds
+To speed up project builds, `Gradle` employs incremental builds
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work).
-For an incremental build to work, the tasks that are used to build the project must have
-source and output files must be specified.
+For an incremental build to work, the tasks used to build the project must have specified
+source and output files.
```
task myTask {
inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory
@@ -112,14 +112,13 @@ task myTask {
}
}
```
-Every time before executing a task, `Gradle` makes a fingerprint of the path
+Before executing a task, `Gradle` makes a fingerprint of the path
and contents of the source files and saves it.
-If the task completes successfully, then `Gradle` also makes a fingerprint from the resulting files.
+If the task completes successfully, `Gradle` also makes a fingerprint from the resulting files.
To avoid re-fingerprinting the original files, `Gradle` checks the last modification time and the size of the original
-files before reassembling. Thus, when the project is rebuilt, some or all of the tasks may be
-not completed, but to use the results already obtained.
-`Gradle` also stores fingerprints of previous builds so that projects can be built quickly, for example when switching
-from one branch to another - `Build Cache`.
+files before reassembling. This allows `Gradle` to use the results already obtained when the project is rebuilt.
+Additionally, `Gradle` stores fingerprints of previous builds enabling quick project builds,
+for example when switching from one branch to another - known as the - `Build Cache`.
@@ -127,10 +126,10 @@ from one branch to another - `Build Cache`.
### EO build cache
EO code is compiled using the `Maven` build system.
-For this purpose, the `eo-maven-plugin` plugin was written,
-which contains the goals necessary for working with EO code.
-As was written above, the assembly of projects in `Maven` occurs in a certain order of phases.
-In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version):
+For this purpose, the `eo-maven-plugin` plugin was created,
+which contains the necessary goals for working with EO code.
+As mentioned earlier, the assembly of projects in `Maven` occurs in a specific order of phases.
+In the diagram you can observe the main phases and their goals for the EO last version of the compiler:
@@ -140,8 +139,8 @@ In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin`
are highlighted in green.
-But the actual work with EO code takes place in `AssembleMojo`.
-`AssembleMojo` is the goal consisting of other goals that work with the EO file
+However, the actual work with EO code takes place in `AssembleMojo`.
+`AssembleMojo` is the goal consisting of other goals that work with the EO file, as shown in
[Picture 4](/images/AssembleMojo.svg).
@@ -153,30 +152,27 @@ Each goal in `AssembleMojo` is a specific compilation step for EO code, and we n
caching at each step to speed up the assembly of the EO program.
In EO version `0.34.0`,
-caching for different `Mojo` was done using unrelated different `Footprint` and `Optimization` interfaces,
+caching for different `Mojo` was done using unrelated `Footprint` and `Optimization` interfaces,
within which mostly the same methods were used.
The difference between interfaces is that in `Footprint` the EO version of the compiler is checked,
while the rest of the checks are exactly the same.
-Now goals are `ParseMojo`, `OptimazeMojo` и `ShakeMojo` , in which caching can be applied,
+Now, goals are `ParseMojo`, `OptimazeMojo` и `ShakeMojo` , in which caching can be applied,
have directory of results and directory of cache.
-The disadvantages of initial caching in EO:
-* the compilation time and the time of saving to the cache must be equal.
-The problem with this verification is that the moment of compilation and the moment of saving to the cache must coincide.
-* verification data is read from a file on disk. This is a long and expensive operation.
-* each purpose uses its own classes and interfaces for data caching.
-This makes the code difficult to extensibility and readability.
+The disadvantages of initial caching in EO include:
+* The compilation time and the time of saving to the cache must be equal, which can be challenging to verify.
+* Verification data is read from a file on disk, which is a long and expensive operation.
+* Each purpose uses its own classes and interfaces for data caching, making the code difficult to extend and read.
-Therefore, our target is to create a single class responsible for caching data
-and loading the necessary data from the cache, which can be used for any `Mojo` from the `eo-maven-plugin`.
+To address these disadvantages, the following solutions are proposed:
-How do we want to fix this disadvantages:
-1) Create a new class `Cache` that will be responsible for data verification, saving to cache and loading from cache.
+
+1) Create a new class `Cache` responsible for data verification, saving to cache and loading from cache.
```
public class Cache {