Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt testcontainers to be testing framework agnostic #82

Merged
merged 10 commits into from
Nov 24, 2023

Conversation

pilosus
Copy link
Contributor

@pilosus pilosus commented Nov 21, 2023

About

Testing: Adapt "Testcontainers" implementation to unittest

References

Checklist

  • CLA is signed

Copy link

codecov bot commented Nov 22, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (037b8bb) 84.81% compared to head (7ae9a4a) 84.93%.

Files Patch % Lines
cratedb_toolkit/testing/testcontainers/cratedb.py 91.83% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #82      +/-   ##
==========================================
+ Coverage   84.81%   84.93%   +0.11%     
==========================================
  Files          48       48              
  Lines        1765     1805      +40     
==========================================
+ Hits         1497     1533      +36     
- Misses        268      272       +4     
Flag Coverage Δ
influxdb 42.65% <91.83%> (+1.07%) ⬆️
main 66.98% <91.83%> (+0.52%) ⬆️
mongodb 57.28% <91.83%> (+0.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear Vitaly,

thank you very much for your contribution, that looks excellent. Please give us some time to thoroughly review all the details.

On CI, there may be hiccups regarding to container boot timing / availability.

E       docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.43/containers/a519443561406c3f60e6d0b429cfd071736191f0149c9f32cc9deb9262818adc/start: Internal Server Error ("driver failed programming external connectivity on endpoint testcontainers-cratedb (68f8422f339f8d6b40efd574ecb54861825d9ccdcbba069cfc96e4a48032833a): Bind for 0.0.0.0:4200 failed: port is already allocated")

-- https://github.com/crate-workbench/cratedb-toolkit/actions/runs/6950127360/job/18922807956#step:6:4850

With kind regards,
Andreas.

@amotl

This comment was marked as off-topic.

cratedb_toolkit/testing/testcontainers/cratedb.py Outdated Show resolved Hide resolved
cratedb_toolkit/testing/testcontainers/cratedb.py Outdated Show resolved Hide resolved
Comment on lines 163 to 167
class CrateDBFixture:
"""
A little helper wrapping Testcontainer's `CrateDBContainer` and
CrateDB Toolkit's `DatabaseAdapter`, agnostic of the test framework.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate very much bringing this into the generic cratedb_toolkit.testing module namespace, to make it a re-usable component for other packages.

Copy link
Member

@amotl amotl Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about "naming things": If you have a better idea for the name, in order to better disambiguate from pytest's notion of "fixtures", because it is actually a testframework-agnostic adapter/wrapper around, well, DatabaseAdapter 1 and CrateDBContainer, please let me know 2.

Footnotes

  1. Also eventually to be renamed to CrateDBClientAdapter or something different.

  2. As with my other suggestions, the change itself can easily be done on a subsequent iteration. I am just taking the chance to talk with someone about it ;].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would CrateDBTestAdapter be a more sensible name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have DatabaseAdapter, I like the CrateDBTestAdapter name!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful. Let us rename it on behalf of a subsequent patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you already renamed it properly, thanks.

I proposed to change it later, because I feared this would add too much noise to this patch because the symbol would need to be changed at too many places. But, of course, it turns out that it already has been nicely decoupled from the test cases on behalf of the cratedb and cratedb_service pytest fixtures, so my fears were unfounded.

Comment on lines 140 to 160
class TestDrive:
"""
Use different schemas for storing the subsystem database tables, and the
test/example data, so that they do not accidentally touch the default `doc`
schema.
"""

EXT_SCHEMA = "testdrive-ext"
DATA_SCHEMA = "testdrive-data"

RESET_TABLES = [
f'"{EXT_SCHEMA}"."retention_policy"',
f'"{DATA_SCHEMA}"."raw_metrics"',
f'"{DATA_SCHEMA}"."sensor_readings"',
f'"{DATA_SCHEMA}"."testdrive"',
f'"{DATA_SCHEMA}"."foobar"',
f'"{DATA_SCHEMA}"."foobar_unique_single"',
f'"{DATA_SCHEMA}"."foobar_unique_composite"',
# cratedb_toolkit.io.{influxdb,mongodb}
'"testdrive"."demo"',
]
Copy link
Member

@amotl amotl Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First things first: I like that you bundled those bits of information into a little container class.

On the other hand, contrary to CrateDBFixture, this section is specific to the test suite for cratedb_toolkit, and is not meant to be shipped with cratedb_toolkit.testing.

Do you see a chance to decouple this and let it be configured in tests/conftest.py, maybe on behalf of just a bit more pytest .request / _conf / -fixture magic, but loosely coupled, so that there is no dependency path going from cratedb_testing to tests/conftest.py, and the configuration could be somehow elegantly inverted instead?

I am not sure if I am asking for too much here, or if you can follow my thoughts easily, or if the implementation would be too complicated. Please let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If reset tables aren’t need in testing module and only needed in conftest, I can simply move them there. But in this case shall I still try to remove explicit import of testing/testcontainers? It’s needed for CrateDBFixture. I probably don’t see the whole picture, but utils module is still explicitly imported there, so I don’t quite get why it’s different for testing module

Copy link
Member

@amotl amotl Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If reset tables aren’t need in testing module and only needed in conftest, I can simply move them there.

This specific set of tables are only meant to be reset on behalf of the cratedb-toolkit test suite, which is not being shipped as part of the package. On the other hand, cratedb_toolkit.testing intends to bundle generic testing helpers/utilities/fixtures.

In this spirit, you made the right choice to put CrateDBFixture there, but I think the table definition / test suite configuration itself, now excellently bundled into the TestDrive container class, should stay in tests/conftest.py. However, it can't be there in isolation, because it will need to be picked up by the generic CrateDBFixture in some way, because this one actually orchestrates the container lifecycle. Can you figure out a way to make that happen elegantly?

But in this case shall I still try to remove explicit import of testing/testcontainers? It’s needed for CrateDBFixture.

Are you referring to one of those? I think both are fine in general. It should be free for every module to use generic utilities from cratedb_toolkit.testing, but not the other way round, at least import-wise.

# File: cratedb_toolkit/testing/testcontainers/cratedb.py
from cratedb_toolkit.testing.testcontainers.util import KeepaliveContainer, asbool
# File: tests/conftest.py
from cratedb_toolkit.testing.testcontainers.cratedb import CrateDBFixture

I probably don’t see the whole picture, but utils module is still explicitly imported there, so I don’t quite get why it’s different for testing module

Are you referring to this import?

# File: cratedb_toolkit/testing/testcontainers/cratedb.py
from testcontainers.core.waiting_utils import wait_container_is_ready, wait_for_logs

I think it is also perfectly fine. You can pull in all desired utitilities into conftest.py, but the tricky part will be to define the test suite database connectivity configuration (TestDrive) there, and let it be picked up / consumed by the generic CrateDBFixture to be used properly at runtime, because it can't "import" something from tests/conftest.py.

Copy link
Contributor Author

@pilosus pilosus Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically, I've made CrateDBFixture.reset method to take reset tables in as a parameter without hardcoding them anyhow. That means, CrateDBFixture can still live in the cratedb_toolkit/testing/testcontainers/cratedb.py, but the reset tables are defined in the tests/conftest.py along with the fixtures that use CrateDBFixture. I hope this solves your concern with the loose coupling.

Copy link
Member

@amotl amotl Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was exactly what I was dreaming up regarding separation of concerns, and where I haven't been able to make any progress so far. Thank you very much for resolving that.

@pilosus
Copy link
Contributor Author

pilosus commented Nov 22, 2023

Hi @amotl

E       docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.43/containers/a519443561406c3f60e6d0b429cfd071736191f0149c9f32cc9deb9262818adc/start: Internal Server Error ("driver failed programming external connectivity on endpoint testcontainers-cratedb (68f8422f339f8d6b40efd574ecb54861825d9ccdcbba069cfc96e4a48032833a): Bind for 0.0.0.0:4200 failed: port is already allocated")

Ah, this

Bind for 0.0.0.0:4200 failed: port is already allocated

is probably because of the port 4200 being used in the CI this way:

    services:
      cratedb:
        image: crate/crate:nightly
        ports:
          - 4200:4200
          - 5432:5432

and the new container-to-host port mapping in the testcontainers code now uses explicit binding:
https://github.com/crate-workbench/cratedb-toolkit/pull/82/files#diff-a269e42ec7e05ba66ed28ac184deeaa00fb81a0db3ccb47d34635c654f1b68bcR113

that means, previous we had random port being used on the host machine, but now it's the same as inside the container.
I did it intentionally, to make testing easier in crash PR 408, because one of the tests is checking the host port in the logs.

I think I can either revert back to the random port binding, or simply use another port in the fixture (which is probably better that the former option)

@pilosus
Copy link
Contributor Author

pilosus commented Nov 22, 2023

I'll address other points too. Need more time to finalize.

@pilosus pilosus changed the title Tech/18 adapt testcontainers to unittest Adapt testcontainers to be testing framework agnostic Nov 22, 2023
@amotl
Copy link
Member

amotl commented Nov 22, 2023

Bind for 0.0.0.0:4200 failed: port is already allocated

is probably because of the port 4200 being used in the CI this way:

Ah I see. That makes sense.

I think I can either revert back to the random port binding, or simply use another port in the fixture (which is probably better that the former option)

I think I'd prefer the random port assignment again for now, because it seems to be the standard way Testcontainers is doing it. By keeping up that paradigm, one has to be rigid to exclusively pick up the DBURI provided by the test machinery, and "not just sloppily assume there will be something on localhost:4200 or another fixed port anyway" ;].

What I don't like about the random port assignment paradigm is that is usually a bit more complicated to re-use test containers across subsequent test runs, and so it is mostly forgotten to be implemented on behalf of corresponding test container managers.

However, in our little corner, there is the generic TC_KEEPALIVE=true mechanism, which will keep the container running, so there is actually no penalty for subsequent test cycles. Personally, I am keeping this container running, dedicatedly for test harness use, and I still have the freedom to whip up ad hoc instances of CrateDB on the vanilla ports 4200/5432 for other purposes. Restarting the CrateDB container on each test run is unbearable.

I'll address other points too. Need more time to finalize.

Thank you very much. Please take your time.

@pilosus pilosus marked this pull request as ready for review November 22, 2023 23:45
@pilosus
Copy link
Contributor Author

pilosus commented Nov 22, 2023

@amotl I've come up with the compromise solution to the hardcoded/random port binding for the testcontainers: the ports are to be taken in a a dict mapping inside container port to the port on the host machine. If the host port is None, a random host port will be generated. This follows Docker SDK for Python conventions carefully. Some reasonable default when no ports dict supplied are also given (4200 to a random port). See how it works in the crash PR.

Also, I've fixed other points you've given above. Added a bit more docs, prettified the code.
Please, let me know if I missed something.

@pilosus pilosus requested a review from amotl November 22, 2023 23:56
Comment on lines 163 to 167
class CrateDBFixture:
"""
A little helper wrapping Testcontainer's `CrateDBContainer` and
CrateDB Toolkit's `DatabaseAdapter`, agnostic of the test framework.
"""
Copy link
Member

@amotl amotl Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about "naming things": If you have a better idea for the name, in order to better disambiguate from pytest's notion of "fixtures", because it is actually a testframework-agnostic adapter/wrapper around, well, DatabaseAdapter 1 and CrateDBContainer, please let me know 2.

Footnotes

  1. Also eventually to be renamed to CrateDBClientAdapter or something different.

  2. As with my other suggestions, the change itself can easily be done on a subsequent iteration. I am just taking the chance to talk with someone about it ;].

Comment on lines 163 to 167
class CrateDBFixture:
"""
A little helper wrapping Testcontainer's `CrateDBContainer` and
CrateDB Toolkit's `DatabaseAdapter`, agnostic of the test framework.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would CrateDBTestAdapter be a more sensible name?

tests/conftest.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
Copy link
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again. Approved 💯. I will also add @seut and @matriv as secondary reviewers this time.

cratedb_toolkit/testing/testcontainers/cratedb.py Outdated Show resolved Hide resolved
Copy link

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your work @pilosus! I've left a few comments.

cratedb_toolkit/testing/testcontainers/cratedb.py Outdated Show resolved Hide resolved
:param user: optional username to access the DB; if None, try respective environment variable
:param password: optional password to access the DB; if None, try respective environment variable
:param dbname: optional database name to access the DB; if None, try respective environment variable
:param dialect: a string with the dialect to generate a DB URI
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want to expose dialect as parameter? is there a use case to change it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure, but I think when passed "http" instead of None, the corresponding utility function will return the HTTP-based connection URL, and not the SQLAlchemy-based connection URL, in the same manner as get_http_url() does it.

The dialect argument has already been part of the implementation before the refactoring, and I think it is part of the official/designated constructor signature. See 1.

Footnotes

  1. https://github.com/crate-workbench/cratedb-toolkit/blob/v0.0.2/cratedb_toolkit/testing/testcontainers/cratedb.py#L61

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dialect is normally used to use a DB specific dialect instead of a more generic one, e.g. psql vs jdbc, it's what we use to set crate instead of psql for the cases that we want to use the crate jdbc driver instead of the psql one. http vs binary psql protocol is not a dialect, but a protocol.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying that it's part of the PR, just mentioning that it seems weird to me to expose a "dialect".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've double checked the dialect thing and you are right @matriv - no need to have it in the class constructor. It's enough to have it in the get_connection_url as an argument with the default value crate. Gonna fix that

cratedb_toolkit/testing/testcontainers/cratedb.py Outdated Show resolved Hide resolved
@pilosus pilosus requested a review from matriv November 24, 2023 10:48
Copy link

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @pilosus !

Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just added a note about a config entry which is not completely related to this PR.

@@ -51,44 +57,91 @@ class CrateDBContainer(KeepaliveContainer, DbContainer):
CRATEDB_PASSWORD = os.environ.get("CRATEDB_PASSWORD", "")
CRATEDB_DB = os.environ.get("CRATEDB_DB", "doc")
KEEPALIVE = asbool(os.environ.get("CRATEDB_KEEPALIVE", os.environ.get("TC_KEEPALIVE", False)))
CMD_OPTS = {
"discovery.type": "single-node",
"node.attr.storage": "hot",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a partial hot/cold targeted custom setup which should be irrelevant for generic test clusters and especially in a single-node setup.

@amotl Any reason for this configuration? (as it was in the codebase already).

Suggested change
"node.attr.storage": "hot",

Copy link
Member

@amotl amotl Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, well spotted. It is indeed a configuration detail which is specific to the test cases for cratedb_toolkit.retention, where this code was initially conceived for.

In this spirit, it should not be part of the generic startup parameters, but at the same time, it shows we need the capacity to configure those details when needed.

@pilosus: Do you think we can improve this spot, so that corresponding configuration settings can be defined on behalf of the snippet in conftest.py? This time, it will probably not be so easy, because the test adapter will already need this information at startup time. Maybe you have an idea how to handle this elegantly?

While being at it: Of course, it would not just be about the specific node.attr.storage parameter, but about any other parameters as well.

Copy link
Member

@amotl amotl Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time, it will probably not be so easy, because the test adapter will already need this information at startup time. Maybe you have an idea how to handle this elegantly?

While being at it: Of course, it would not just be about the specific node.attr.storage parameter, but about any other parameters as well.

Ah, I see you already added cmd_opts to the constructor. 🙇

https://github.com/crate-workbench/cratedb-toolkit/blob/661370ffb0619e2a4c698c52627c99a1fb726bad/cratedb_toolkit/testing/testcontainers/cratedb.py#L94-L95

So, {"node.attr.storage": "hot", "path.repo": "/tmp/snapshots"} would just need to be moved over to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. And given that we use dict merge, you can even override the default settings

In [1]: from cratedb_toolkit.testing.testcontainers.cratedb  import CrateDBContainer

In [2]: c = CrateDBContainer(cmd_opts={"node.attr.storage": "cold"})

In [3]: c._command
Out[3]: '-Cdiscovery.type=single-node -Cnode.attr.storage=cold -Cpath.repo=/tmp/snapshots'

I'll remove the node.attr.storage from the default though

Copy link
Member

@amotl amotl Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. What do you think about path.repo, @seut? It could be convenient for testing to have it configured by default. Because you didn't mention it in your request, do you think it can stay?

Copy link
Contributor Author

@pilosus pilosus Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seut @amotl hmm, when I delete node.attr.storage=hot
tests:

  • tests/retention/test_cli.py::test_run_delete_basic
  • tests/retention/test_cli.py::test_run_delete_dryrun
  • tests/retention/test_cli.py::test_run_reallocate
    hang.

Here are the logs:

2023-11-24 17:39:14,494 [cratedb_toolkit.retention.setup.schema] INFO    : Installing retention policy bookkeeping table at database 'crate://crate:REDACTED@localhost:33018', table TableAddress(schema='testdrive-ext', table='retention_policy')
2023-11-24 17:39:14,902 [cratedb_toolkit.retention.store     ] INFO    : Connecting to database crate://crate:REDACTED@localhost:33018, table "testdrive-ext"."retention_policy"
Waiting to be ready...
2023-11-24 17:39:15,366 [testcontainers.core.waiting_utils   ] INFO    : Waiting to be ready...

Interestingly enough, test_run_delete_with_tags_match works well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the test cases defined for cratedb_toolkit.retention need this setting to be configured, so it will need to go into tests/conftest.py somehow. On the other hand, it should not be part of the generic configuration. That's yet another cliff we need to take.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the whole chain of the nested fixtures with different scopes, it will either require some more time from me next week, or a simpler solution with cratedb override for tests/retention/conftest.py that have node.attr.storage=hot in it.

Copy link
Member

@amotl amotl Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can easily handle that on a later iteration, and/or discuss possible solutions beforehand. Thank you.

Copy link
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Vitaly. Thanks once again for your work on this. I've added two more nitpicky suggestions ;].

self._command += " -Cpath.repo=/tmp/snapshots"
self._name = "testcontainers-cratedb"

cmd_opts = cmd_opts if cmd_opts else {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit: I usually write this like that. Is it a good idea?

Suggested change
cmd_opts = cmd_opts if cmd_opts else {}
cmd_opts = cmd_opts or {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for key, val in opts.items():
if isinstance(val, bool):
val = str(val).lower()
cmd.append("-C{}={}".format(key, val))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use f-strings already?

Suggested change
cmd.append("-C{}={}".format(key, val))
cmd.append(f"-C{key}={val}")

Copy link
Contributor Author

@pilosus pilosus Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
Alhough I personally am not a big fan of f-strings, because sometimes people abuse their flexibility with larger expressions and, thus, damage readability. But it's a personal choice rather than anything else. In this particular case it's just fine to use f-strings

@amotl amotl merged commit 7f3a493 into crate:main Nov 24, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants