Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

live-tests: debug mode and initial regression tests framework #35624

Merged
merged 23 commits into from
Mar 6, 2024

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented Feb 26, 2024

Initial implementation of "live tests" - tests that are intended to be run against actual APIs instead of static data. This currently implements a debug command which forms the foundation for regression tests, by providing an interface for running an Airbyte command against multiple versions of a connector and storing output to disk.

To run a regression-style tests with this current version of live-tests, we run, e.g.

live-tests debug read \
--connector-image=airbyte/source-<SOURCE>:dev \
--connector-image=airbyte/source-<SOURCE>:latest \
--output-directory=</path/to/output> \
--config-path=</path/to/config.json> \
--catalog-path=</path/to/configured_catalog.json>

Results will be stored in the output-directory and can be compared by

diff -u </path/to/output>/read/latest/airbyte_messages/<source>_records.jsonl </path/to/output>/read/dev/airbyte_messages/<source>_records.jsonl

Additional details are in the README.

@clnoll clnoll requested a review from a team as a code owner February 26, 2024 06:44
Copy link

vercel bot commented Feb 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Mar 6, 2024 7:42am

@clnoll clnoll marked this pull request as draft February 26, 2024 06:44
@clnoll clnoll force-pushed the regression-tests-package branch 3 times, most recently from 0e93700 to 9d1462e Compare February 26, 2024 07:38
logger = logging.getLogger(__name__)


async def _main(
Copy link
Contributor

@alafanechere alafanechere Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to separate connector image retrieval (test bootstrapping) from actual command execution.
I suggest creating a ConnnectorUnderTest dataclass which could have the following attribute:

@dataclass
class ConnectorUnderTest:
   technical_name: str, 
   version: semver.Version # or str 
   container: dagger.Container

We could add other attribute for reporting if needed, but the key here is the container attribute, whose retrieval happens before instantiation.

We can then call dispatch on a ConnectorUnderTest instance and keep the metadata attribute at hand during the whole execution flow.

For connector container retrieval we can implement multiple upstream functions producing ConnectorUnderTest according to the execution context:

  1. You want to run regression test on two already released version: we create the connector container with dagger_client.container().from_(image_address) in a specific function
  2. You want to run regression test on a target which is not released: we build it re-using the airbyte-ci build step, in a specific function again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I've added the ConnectorUnderTest dataclass and added a TODO for handling building of images based on context.

@clnoll clnoll force-pushed the regression-tests-package branch 4 times, most recently from 91d5ee8 to 1a5fd4d Compare February 27, 2024 19:18
raise NotImplementedError(f"{command} is not recognized. Must be one of {', '.join(COMMANDS)}")


@click.command()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we declare the command(s) in a separate cli.py module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

default=None,
type=str,
)
def main(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use asynclick instead of click so that we can declare async def main and just await on _main

Example of the same pattern here on the QA checks side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done.

):
runner = ConnectorRunner(container, backend, f"{output_directory}/{command}")

if command == "check":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we introduce a Command(Enum) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Done.

await runner.call_discover(config)

elif command == "read":
runner = ConnectorRunner(container, backend, f"{output_directory}/read")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these command specific runner can be removed right, the runner defined on line 92 can be reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Removed.

catalog: dict = None,
state: Union[dict, list] = None,
enable_caching=True,
) -> List[AirbyteMessage]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not returning a list of AirbyteMessage if I'm not mistaken 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making _run return a tuple with the executed command and the awaited container.

Then this tuple can be passed to a "backend" which would produce the artifact we would need downstream for debugging or testing?

This would help making this class self contained: only running airbyte commands on connector containers

entrypoint = await container.entrypoint()
airbyte_command = entrypoint + airbyte_command
container = container.with_exec(
["sh", "-c", " ".join(airbyte_command) + f" > {self.IN_CONTAINER_OUTPUT_PATH} 2>&1 | tee -a {self.IN_CONTAINER_OUTPUT_PATH}"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes sh and tee are available in the container.
I think we should rather await on a with exec which is not altering the original entrypoint and get stdout + stderr.
And as I suggested above, I think this can happen in a different class, maybe at the backend class level.
E.G:
_run "just" returns:

return command, await container.with_exec(airbyte_command)

Then in a different code path:

command, executed_container = connector_run.call_read(***)

stdout, stderr = await executed_container.stdout(), await executed_container.stderr()
# These stdout stderr can be written to files, db are whatever the "backend" implements

raw_output = await AnyioPath(filepath).read_text()
await self._backend.write(self._raw_output_iter(raw_output))

def _raw_output_iter(self, raw_output):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the runner should not perform any deserialization. I'd suggest deserialization to happen post dispatch when we actually need deserialized stuffed in tests or diff.
My point is that error detection should happen at a more downstream level where actual good or bad connector behavior will be evaluated.

I like the flow of:

  1. run commands on containers and gather raw outputs (stdout, stderr)
  2. build objects from these output with backends.
  3. Use a backend useful for test, another one for debugging, etc.


logger = logging.getLogger(__name__)

COMMANDS = ["check", "discover", "read", "read-with-state", "spec"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love a Commands enum 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

default=None,
type=str,
)
def main(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this command run and create it under a regression-test command group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done.


# TODO: maybe use proxy to cache the response from the first round and use the cache for the second round
# (this may only make sense for syncs with an input state)
if command == "all":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead change the main signature to take a commands list and iterate on commands? It will help you avoid the if/else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Done.

dagger_client.host().directory(self._test_output_directory)))
os.makedirs(self._in_container_results_directory, exist_ok=True)

async def _diff(self, container: Container, control_connector: ConnectorUnderTest, target_connector: ConnectorUnderTest) -> Container:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we say we are comparing raw outputs. Can we diff stderr and stdout of the control and the target?
let's say you can access the executed container with control_connector.executed_container.
Then you can mount the stdout/stderr as files to you container with diff:

self._container
.with_new_file(f"{control_directory}/stdout.txt", await control_connector.executed_container.stdout())
.with_new_file(f"{target_directory}/stdout.txt", await target_connector.executed_container.stdout())
.with_new_file(f"{control_directory}/stderr.txt", await control_connector.executed_container.stderr())
.with_new_file(f"{target_directory}/stderr.txt", await target_connector.executed_container.stderr())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting to realize that with this approach the connector output is never transiting on the host filesystem, which can be an interesting security point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed I think it would make sense to put this behind a debug flag. Let me know when your changes to the connector runner are stable and I'll add it.

{ include = "live_tests", from = "src" },
]

[tool.poetry.dependencies]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add click as a depenency
click = "^8.1.3"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after an update, I now needed to add
asyncclick ="^8.1.3.2"

Copy link
Contributor Author

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere I've addressed all the comments except the ones touching the ConnectorRunner, which you're taking, and the --debug flag that we discussed for determining whether to store the raw output. I'll add it in when your ConnectorRunner changes are stable. In the mean time I'll add some more unit tests.

raise NotImplementedError(f"{command} is not recognized. Must be one of {', '.join(COMMANDS)}")


@click.command()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

await runner.call_discover(config)

elif command == "read":
runner = ConnectorRunner(container, backend, f"{output_directory}/read")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Removed.

):
runner = ConnectorRunner(container, backend, f"{output_directory}/{command}")

if command == "check":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Done.

default=None,
type=str,
)
def main(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done.


logger = logging.getLogger(__name__)

COMMANDS = ["check", "discover", "read", "read-with-state", "spec"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# TODO: maybe use proxy to cache the response from the first round and use the cache for the second round
# (this may only make sense for syncs with an input state)
if command == "all":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Done.

default=None,
type=str,
)
def main(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done.

dagger_client.host().directory(self._test_output_directory)))
os.makedirs(self._in_container_results_directory, exist_ok=True)

async def _diff(self, container: Container, control_connector: ConnectorUnderTest, target_connector: ConnectorUnderTest) -> Container:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed I think it would make sense to put this behind a debug flag. Let me know when your changes to the connector runner are stable and I'll add it.

Copy link
Contributor

alafanechere commented Mar 4, 2024

@alafanechere alafanechere force-pushed the regression-tests-package branch 3 times, most recently from 54eb1ce to 3c1e213 Compare March 5, 2024 21:10
@clnoll clnoll marked this pull request as ready for review March 6, 2024 00:24
@clnoll clnoll changed the title WIP: regression tests separate package live-tests: debug mode and initial regression tests framework Mar 6, 2024
@clnoll clnoll merged commit 1571dbd into master Mar 6, 2024
35 checks passed
@clnoll clnoll deleted the regression-tests-package branch March 6, 2024 13:18
xiaohansong pushed a commit that referenced this pull request Mar 7, 2024
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
Co-authored-by: Augustin <augustin@airbyte.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants