[SPIKE] [WIP] Refactor Elasticsearch Exporter tests #9258

npepinpe · 2022-04-29T17:44:54Z

Description

This spike explores an idea to refactor the Elasticsearch exporter tests. The main goals here were:

Cover as much as possible via only unit tests. This gives us a faster feedback cycle when changing something - developers can focus on first fixing unit tests, which have smaller blast radius and faster execution, before even looking at integration tests.
Narrow the scope of the integration tests as much as possible. This meant getting rid of the broker, workflow engine semantics, etc., and just testing the integration of the exporter instance with Elasticsearch.

In general, I wanted to keep the same coverage as before. I think the coverage is now higher, but that was not the original motivation.

To do this, I introduced controllable test implementations of the exporter API under a new module exporter-tests. You can see in the first commit that I had some kind of test harness to simulate the exporter director, but I decided to drop it in the end. I think if we need it, we can introduce it later, and it doesn't seem all that necessary. I also added them as a new module and not in the test-util module because I would like to move away from catch-all modules like util and test-util. These not only complicate our dependency tree by being ubiquitous, but they're shallow modules without clear boundaries, high coupling to others, and low cohesion within themselves. However this isn't a strict requirement from my POV, so I'm happy to have this challenged. Note that in there I also started making use of Spotbugs annotations more heavily. This is something I would like to try but I'm still not convinced of the value, so I'm fine with dropping it (but would like to still try to see if we can get some value out of it).

I also made some changes to the ProtocolFactory, which generates random records. The main changes are: always produce positive longs, and make sure every build is reproducible. The first one is because our records have many long typed fields which are supposed to have timestamp semantics, where negative values are invalid. It would have been better if the types themselves carried those semantics (e.g. an Instance instead of long), but as it is, I think it's an acceptable workaround. This came about because Elastic expects these fields to have those semantics, and rejects records where a timestamp would be negative. The second change also came from practical use, though I should've seen it before. It's much more useful that every run of a test gives the same record. We're not doing property based tests here, but just want to have a record all filled out, and a test we can easily re-run. So we always construct the factory with a fixed seed. If used in property based tests, the seed can be provided to have different ones over time.

The rest of the changes are then only in the Elasticsearch exporter module. There were some production changes, notably splitting the ElasticsearchClient class and renaming it to ElasticClient. We could revert the renaming, but since we now use the official ElasticsearchClient provided by Elastic for test verifications, there were some ugly name collisions. But as I said, we can revert it, it's not that big a deal. The class was split into the following:

IndexRouter: a poorly named class whose responsibility is to compute the ID of a record, and the index of a record. It also can compute only, say, the prefix of the record. The goal is that it encapsulates the part of the exporter's API that defines index names and record IDs, both of which must be stable for consumers.
TemplateReader: reads templates from the resources, optionally substituting some of the properties for configuration properties, e.g. the index prefix, the number of shards, etc.
RestClientFactory: what the name says, takes in the ElasticsearchExporterConfiguration and produces a RestClient. This was helpful to also reuse the same factory in tests to create an official ElasticsearchClient using the same RestClient.

Splitting allows us to add more unit tests for each of these things, which was more complicated to do and verify with a bigger ElasticClient class. The ElasticClient class is now only responsible for buffering records in memory, flushing them as a bulk request, and sending the PUT /_index_template and PUT /_component_template/ requests, using the above classes. So essentially it mostly deals with wiring things together and communicating with Elastic. It could be further reduced to be honest to only Elastic communication, and we move the wiring things together to the exporter. It will be easier now to do this.

Two new DTOs were added:

BulkRequestAction: a bulk request in Elastic is a list of newline delimited pairs of commands. Since we only index documents, each pair consists of an IndexAction and the document to index. The action specifies the ID, index name, and routing of the actual document. Using an explicit domain object instead of just Map simplifies maintenance, readability, and testability (much easier to assert against a known structure).
Template: this represents a deserialized template. You can use the DTO when reading a component or index template from the resources via TemplateReader, or even when parsing the GetIndexTemplate request. This again simplifies writing assertions for tests.

Finally getting to the tests. There is no class hierarchy anymore, and most of the utilities were removed. What we have are now two utilities:

TestClient: a simple wrapper around both the official high level and low level REST client from Elastic. I initially used the clients directly, but this had two downsides. One is that the ElasticsearchClient is not close-able (yet it does own resources, e.g. threads, sockets, etc.), and it has some quirks which are not ideal if you're not familiar with Elastic. At times I found using the low level client simpler (for example, for GetIndexTemplate, this is much easier than the DTOs returned by the high level client).
TestSupport: the configuration we use, especially the IndexConfiguration, is nice for user input but terrible for programmatic access. Since we have one fixed field per value type, writing parameterized tests is a pain. We could use reflection, but that would be somewhat brittle. So TestSupport has some methods to deal with these things, e.g. disable indexing for value type X, or enable indexing for record type Y. Similarly, it also provides a convenience method to create a default Elastic container with some memory constraints and security settings.

After that we have mostly unit tests (everything ending in Test) and integration tests (everything ending in IT). The unit tests are, well, unit tests, not much to say there. I did extend the coverage a little bit, but I think it's not fully there. I mostly focused on keeping the same coverage as before at least.

The integration tests, there are three. Generally, I would propose always keeping a static container for Elastic which is reused by all tests, and having a unique prefix configured for each tests so the tests are isolated.

ElasticClientIT: tests the integration between the client and an Elastic instance. I think this is useful as it has an even narrower scope than the ElasticsearchExporterIT, since you just test client and database.
ElasticsearchExporterIT: tests the integration between the exporter and Elastic. It doesn't test all configuration, etc., but just focuses on making sure whatever is exported can be read back as expected, or that if we create the component template, it does exist (but it won't check its contents, that's covered by the client tests, for example).
FaulTolerantIT: here we have a standalone test class, because this one needs to start Elastic later, so it cannot be packaged with the other one. I can imagine we could add more tests here, but I just refactored the existing one. I don't really see how we could put it in the other IT class, unless we enforce some ordering, which I'd rather avoid.

Related issues

related to #

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
New content is added to the release announcement
If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

npepinpe · 2022-04-29T18:03:17Z

I'm confused as to why the code quality fails. I did have to change the checkstyle configuration, but locally it passes fine. Is the CI check not running with the latest configuration? Is it maybe pulling it again from the snapshots repository instead of reading it from the local one?

Yup, that's the one - we only validate but don't install, so the changes to build-tools aren't present in later modules. Unfortunately, if we go to install, then we will also run the checks in verify, which we don't want to do here. Any ideas @oleschoenburg or @remcowesterhoud ?

npepinpe · 2022-04-29T18:06:33Z

One nice outcome: the exporter tests use to take 8 minutes to run, and now take a little less than 3 minutes (half of which is just building the module) :)

And I think I actually increased the test coverage, so...

npepinpe · 2022-04-29T18:09:05Z

Another GHA issue: it seems the unit tests don't find the new module I added. Why? 🤔

lenaschoenburg · 2022-05-02T07:33:34Z

it seems the unit tests don't find the new module I added

Can you elaborate which modules you mean? I'm seeing that the new zeebe-exporter-test module is found: https://github.com/camunda/zeebe/runs/6232882989?check_suite_focus=true

lenaschoenburg · 2022-05-02T07:37:51Z

Oh dear, it looks like the maven command to list all projects failed with some error and then the error message is used as input to the build matrix. For example: https://github.com/camunda/zeebe/runs/6232883105?check_suite_focus=true

That's not great! We should definitely fail the project list step if maven encounters errors!

npepinpe · 2022-05-02T07:39:44Z

Ah, so it found it, but then failed on some other project because the dependency was not installed? 🤔

lenaschoenburg · 2022-05-02T07:46:56Z

The only failed jobs are the Java code formatting job, which is caused by what you described here: #9258 (comment)
Then a couple of jobs that are spawned with invalid input due to this: #9258 (comment)

and then two legitimate failures here: https://github.com/camunda/zeebe/runs/6232882455?check_suite_focus=true and here: https://github.com/camunda/zeebe/runs/6232874252?check_suite_focus=true

I'll fix the issue with the invalid project list. For the java code formatting job I think installing the build-tools module first would work and makes sense as a solution 👍 Do you want to do this in this PR?

npepinpe · 2022-05-02T07:49:15Z

Let's do it here and if it works I'll cherry pick the commit into a separate PR.

9262: Install build-tools before validating r=npepinpe a=npepinpe ## Description This PR ensures we install the build-tools module before validating the code base. This is important since the build-tools module contains configuration files for various tools used to verify our code base. Without installing, then these changes are not propagated to the downstream modules. ## Related issues related to #9258 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>

npepinpe · 2022-05-02T12:54:20Z

I can't think of a nice solution for the matrix project other than installing everything, but that's less than optimal 😅

Since this is the, what, third or fourth time we have issues with things not installed, what if we did the following:

First job is build. This builds a clean repo into a local .m2 repository, such that everything we need is contained in the same workspace. Kind of like with Jenkins, though we don't need to worry too much about going offline or the likes.
All other jobs depends on build and downloads the folder which contains its own .m2 repository.

This would help ensure dependencies from the same monorepo are always up to date, and might cut down a bit on the time for the other jobs that they spend on building again. WDYT?

lenaschoenburg · 2022-05-02T13:20:41Z

Yeah, I've been playing around with a few variants and came to the same conclusion. I think we can use actions/cache here.

npepinpe · 2022-05-02T13:21:41Z

I tried with upload/download artifact, but oh boy, it takes ~6 minutes to upload the artifact 😅

lenaschoenburg · 2022-05-02T13:26:02Z

Ouch :D Maybe we can restrict it to the io/camunda/ path? And maybe actions/cache is a bit faster?

npepinpe · 2022-05-02T13:27:31Z

Oof, it's actually still running, so I guess uploading it is a no go (https://github.com/camunda/zeebe/runs/6257745562?check_suite_focus=true)

I don't know. Maybe it's also because I'm using a self hosted runner? Would it be faster to upload (though slower to build) on a GitHub hosted runner? 🤔

lenaschoenburg · 2022-05-02T13:34:09Z

Alternatively, we could just go back to finding maven modules by searching for pom.xml files. It'd be a bit faster and wouldn't require a mvn install. WDYT?

npepinpe · 2022-05-02T13:37:57Z

How was that - parsing the root POM file for the directory names, or just grabbing all pom.xml files? It does seem a lot more brittle unfortunately 😞

lenaschoenburg · 2022-05-02T13:42:24Z

It was grabbing all pom.xml files. I agree that a asking maven for a list would be better but I don't really see how collecting a list of pom.xml files would be more brittle. Every module must always have a pom.xml file and finding them shouldn't ever fail, right?

npepinpe · 2022-05-02T13:48:29Z

We might potentially have POM files which are not part of the root/main module, so shouldn't be picked up. That's the only thing I can imagine 🤷‍♂️

…cible manner

npepinpe · 2022-05-02T14:07:44Z

If you tar the archive before, it's considerably faster. Takes < 2 minutes when the whole thing is tar'd, instead of who knows how long (after 10 minutes it was at 35% 😄)

npepinpe · 2022-05-02T14:12:27Z

OK so I fixed it for the unit tests, but I guess I failed to grab the correct target/ sub-folders since the distro wasn't there, which caused all stages that build docker images to fail. Wonder how I could figure out what it actually tar'd 🤔

… mocking

npepinpe · 2022-05-14T12:23:32Z

Closing as this has been split and incorporated into other PRs

This was referenced May 2, 2022

Install build-tools before validating #9262

Merged

[PLACEHOLDER] ES Test Refactoring Issue #9263

Closed

npepinpe force-pushed the np-es-spike branch from beba3b7 to 877aed8 Compare May 2, 2022 08:52

npepinpe force-pushed the np-es-spike branch from bb41db6 to a66ed62 Compare May 2, 2022 13:13

npepinpe added 6 commits May 2, 2022 15:57

wip: add exporter-test module

e501931

test: wrap up spike

5ce6d96

refactor: revert some unnecessary changes

01f9215

docs: add some javadocs

8f801b2

refactor: revert some changes

96c9f82

refactor(protocol-test-util): ensure factory can be used in a reprodu…

035cc2a

…cible manner

npepinpe force-pushed the np-es-spike branch from 3b07656 to 63f1539 Compare May 2, 2022 14:00

npepinpe force-pushed the np-es-spike branch from 4c574cd to c809acf Compare May 2, 2022 14:24

npepinpe added 6 commits May 2, 2022 16:25

refactor(elasticsearch-exporter): extract ElasticClient interface for…

3c50580

… mocking

build: add missing dependency for exporter tests

f53da33

build: remove unnecessary dependencies

60410bd

build: build before all jobs

cd89966

build: tar workspace before uploading

71b7e2e

build: fix typo

f2a9120

npepinpe force-pushed the np-es-spike branch from c809acf to f2a9120 Compare May 2, 2022 14:25

npepinpe added 2 commits May 2, 2022 16:33

build: use recursive globbing

66d9c6d

build: no need to setup java for go tests anymore

2805767

lenaschoenburg mentioned this pull request May 2, 2022

GHA: Building the module list fails when adding new modules #9271

Closed

npepinpe added 2 commits May 2, 2022 16:46

build: fix specifying local repo for windows

7cf618a

build: exporter tests don't need to build docker anymore

1259bb8

This was referenced May 6, 2022

[EPIC] Refactor Elasticsearch Exporter Tests #8609

Closed

Add exporter API test implementations #9326

Merged

Allow convenient implicit casting of generated record's value type #9323

Merged

npepinpe closed this May 14, 2022

remcowesterhoud added the version:8.1.0-alpha2 label Jun 7, 2022

npepinpe deleted the np-es-spike branch July 25, 2022 10:06

Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPIKE] [WIP] Refactor Elasticsearch Exporter tests #9258

[SPIKE] [WIP] Refactor Elasticsearch Exporter tests #9258

npepinpe commented Apr 29, 2022 •

edited

Loading

npepinpe commented Apr 29, 2022 •

edited

Loading

npepinpe commented Apr 29, 2022 •

edited

Loading

npepinpe commented Apr 29, 2022

lenaschoenburg commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 14, 2022

[SPIKE] [WIP] Refactor Elasticsearch Exporter tests #9258

[SPIKE] [WIP] Refactor Elasticsearch Exporter tests #9258

Conversation

npepinpe commented Apr 29, 2022 • edited Loading

Description

Related issues

Definition of Done

npepinpe commented Apr 29, 2022 • edited Loading

npepinpe commented Apr 29, 2022 • edited Loading

npepinpe commented Apr 29, 2022

lenaschoenburg commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

lenaschoenburg commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 2, 2022

npepinpe commented May 14, 2022

npepinpe commented Apr 29, 2022 •

edited

Loading

npepinpe commented Apr 29, 2022 •

edited

Loading

npepinpe commented Apr 29, 2022 •

edited

Loading