Skip to content

Commit

Permalink
feat(objectionary#56):fix grammar and text
Browse files Browse the repository at this point in the history
  • Loading branch information
Yanich96 committed May 21, 2024
1 parent 5c065a5 commit 8f27368
Showing 1 changed file with 18 additions and 40 deletions.
58 changes: 18 additions & 40 deletions _posts/2024/2024-02-06-about-caching-in-eo.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ for EO version `0.34.0`. The bug occurred because the old verification method
used compilation time and caching time to search for a cached file.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
We came to the conclusion that we need caching with a reliable verification method.
We came to conclusion that we need caching with a reliable verification method.
Furthermore, this verification method should refrain from reading the file content.

The goal is to implement effective caching in EO.
Expand All @@ -38,8 +38,7 @@ Let's look at the assembly scheme using C++ as an example [Picture 1](/images/de
which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
2) The compiler receives the file `.cpp` from the preprocessor and compiles it into an object file - `.obj`.
At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
At the compilation stage, parser checks whether the code matches rules of a specific programming language.
To speed up compilation, different files of the same project might be compiled in parallel.
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) combines object files
into an executable `.exe` file.
Expand All @@ -51,7 +50,7 @@ and [sccache](https://github.com/mozilla/sccache) are used.
When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times.
This approach can significantly accelerate the build process of certain packages.
The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information) is
based on:
* the file contents
Expand Down Expand Up @@ -81,26 +80,12 @@ be incorporated during the development of the EO caching mechanism.
[Gradle](https://gradle.org) builds projects using a
[task graph](https://docs.gradle.org/current/userguide/build_lifecycle.html) that allows for synchronous execution
of certain tasks. A task represents a unit of work in `Gradle` project.


`Gradle` employs
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
For an incremental build to work, the tasks used to build the project must have specified
input and output files.
The provided code snippet demonstrates the implementation of a task in Gradle:
```
task myTask {
inputs.file 'src/main/java/MyTask.somebody' // Specify the input file
outputs.file 'build/classes/java/main/MyTask.somebody' // Specify the output file
doLast {
// Task actions go here
// This code will only be executed if the inputs or outputs have changed
}
}
```


To understand how `Incremental build` works, consider the following steps:
To enable an incremental build, the project tasks must specify their input and output files.
`Incremental build` uses a hash to detect changes in the inputs and the outputs.
The single hash contains the paths and the contents of all the input files or output files.
1) Before executing a task, `Gradle` takes a hash of the input files and saves it.
Expand All @@ -111,13 +96,13 @@ The single hash contains the paths and the contents of all the input files or ou
In the opposite case, the task performs an action again and rewrites outputs.


In addition to `Incremental build`, `Gradle` also stores hash of previous each build, enabling quick project builds,
In addition to `Incremental build`, `Gradle` also stores hash of each previous build, enabling quick project builds,
for example when switching from one git branch to another. This feature is known as
the [Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).


`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs.
And the EO compiler consists from a unit of work in `Maven` (the last section contains a detailed description).
And the EO compiler consists of a unit of work in `Maven` (the last section contains a detailed description).
Steps of the EO compiler can have input and output files.
Building upon the concept of `Gradle Incremental Build`, we can use its principles to develop the EO caching mechanism.

Expand All @@ -126,9 +111,7 @@ Building upon the concept of `Gradle Incremental Build`, we can use its principl
[Maven](https://maven.apache.org) automates and manages Java-project builds.
`Maven` is based on the concept of
[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
which include default, clean, and site lifecycles.
Each lifecycle consists of `phases` and these `phases` consist of sets of `goals`.
One `phase` can consist of several `goals`.
which includes default, clean, and site lifecycles.

In Maven, there are default phases and goals for building any projects:

Expand All @@ -143,10 +126,6 @@ There are `goals` tied to the Maven lifecycle, as shown in [Picture 2](/images/d
It's also possible to add a new `goal` to a desired phase by modifying the `pom.xml` file.
Additionally, Maven also supports `goals` that are not bound to any build phase
and can be executed outside the build lifecycle, directly through the command line.
The sequence of achieving `goals` is as follows:
1) The `goals` tied to the Maven lifecycle are executed first.
2) The `goals` added to the `pom.xml` file are executed second.
3) The `goals` that are not tied to `phases` can be executed last.


`Maven` can utilize caching mechanisms through the `takari-lifecycle-plugin` and `maven-build-cache-extension`:
Expand All @@ -156,9 +135,9 @@ The sequence of achieving `goals` is as follows:
functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages
[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild),
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract
top-level classes that implement specific build actions, denoted as methods annotated `@Builder`.
top-level classes that implement specific build actions.
They can produce various types of outputs, including generated/output files on the filesystem,
build messages, and project model mutations. For each `@Builder` annotated method, a maven mojo,
build messages, and project model mutations. For each `builder` annotated method, a maven mojo,
which represents a maven `goal`, is generated.
When a `builder` is run for a given set of inputs, it produces and saves to the specified directory the same outputs.
Any changes in the inputs result in the removal of outputs.
Expand All @@ -183,7 +162,6 @@ This API can be applied to cache EO compilation stages as it operates with `goal
It does not use hashing algorithms, which can slow down project build times,
and it does not have separate cache directories.
Each `builder` has own directories for input and output data related to their work.
The operational principle of the Takari Incremental API is similar to the operation of caching in EO.

### EO build cache

Expand All @@ -209,9 +187,9 @@ However, the actual work with EO code takes place in `AssembleMojo`.
</p>

Each goal within `AssembleMojo` is a distinct compilation step for EO code.
These tasks happen one after the other, and each task relies on the output of the one before it.
Each task has directories for input and output data, as well as a directory for storing cached data.
Using the program name, each task can receive and store data.
These goals happen one after the other. Each goal has directories for input and output data,
as well as a directory for storing cached data.
Using the program name, each goal can receive and store data.


The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`.
Expand Down Expand Up @@ -243,10 +221,10 @@ The logic for checking the relevance of cached data is presented below:
1) We create EO program, named "example".
Intermediate files during compilation of this program will have the same name, but not the format
(e.g. `example.eo`, `example.xml`).
2) When the EO compiler compiles this program task, it saves intermediate files of compilation steps into cache.
Each compilation step has own caching directory.
2) When the EO compiler compiles this program task, it saves files of compilation steps into cache.
Each compilation step has its own caching directory.
3) When the EO compiler starts a project build again, it will check if there is a file, named "example",
in the cache of each step. If such a file exists,
in the cache of step. If such a file exists,
then it is enough to check that the last modification time of this file at the current step
is later than at the previous step. If this condition is true,
then the finished file can be retrieved from the cache.
Expand All @@ -258,7 +236,7 @@ The logic for checking the relevance of cached data is presented below:

4) If the EO program file [Picture 5](/images/RewritingInCacheEO1.svg)
or an intermediate file [Picture 6](/images/RewritingInCacheEO2.svg) have changed,
then the previously cached files becomes invalid.
then the previously cached files become invalid.
In this case, the compilation step performs an action again and rewrites outputs.

<p align="center">
Expand Down

0 comments on commit 8f27368

Please sign in to comment.