Add ci/cd support for metadata unit tests with Dagger #24649

bnchrch · 2023-03-29T01:47:54Z

What

We need to run our unit tests for the metadata library

How

This creates and uses a new dagger pipeline to accomplish that.
Updates the dagger command structure to have one top level command called airbyte-ci-pipelines
Refactors the context to be slightly more generic

bnchrch · 2023-03-29T01:48:31Z

.github/workflows/metadata_service_tests_dagger.yml

+          python-version: "3.10"
+      - name: Install pipeline package
+        run: pip install ./tools/ci_connector_ops\[pipelines]\
+      - name: Run airbyte-ci-pipeline metadata-service test-metadata-service-lib [PULL REQUESTS]


@alafanechere I opted not to have a workflow dispatch, good idea or bad idea?

tools/ci_connector_ops/ci_connector_ops/pipelines/actions/environments.py

bnchrch · 2023-03-29T01:51:48Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

+            exit_code = await with_exit_code(metadata_lib_module.with_exec(["poetry", "run", "pytest"]))
+
+            # Raise an exception if the exit code is not 0
+            if exit_code != 0:


@alafanechere I didnt like that this was where I ended up. Do you have any ideas as to how we can make the pipeline context own the handling of the status code?

For the record the reason I raise is to take advantage of the git error status reporting in __aexit__ inside the pipeline class

I would suggest you implement a Step class like MetadataLibTest(Step) and make its _run return self.get_step_result(metadata_lib_module) .
Like what I've done for QaChecks

I would also suggest you make use of ConnectorTestReport (or a Metadata version of it). This will help normalize the way we report success or failure to users.

bnchrch · 2023-03-30T02:11:34Z

.github/actions/run-dagger-pipeline/action.yml

@@ -0,0 +1,55 @@
+name: "Run Dagger pipeline"


@alafanechere What do you think about the dagger action? If your into it I can update the other dagger workflows

I think it's a nice initiative.
I think @cpdeethree would like to eventually have a single workflow which will run on every PR.
The logic of what pipeline runs would be inferred by the git diff in the entrypoint.
I we go down this way declaring an action could be overkill, but I like it for our current way of doing things.

Im thinking Ill add a few more commands before conors repo is ready.

Should I just use this action for my new commands then and leave the existing ones alone to avoid refactoring twice?

I like the action pattern so feel free to use it for your command and refactor the existing one in a separate PR.

alafanechere

I'm glad you plugged in this codebase!!! #collaboration
I think you can benefit a bit more from the classes I've declared in bases.py.
I would also like to challenge the existence of subcommand group and would prefer simple CLI like: airbyte-ci test-connectors or airbyte-ci test-metadata

alafanechere · 2023-03-30T18:55:57Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/main.py

+@click.option("--ci-context", default="manual", envvar="CI_CONTEXT", type=click.Choice(["manual", "pull_request", "nightly_builds"]))
+@click.option("--pipeline-start-timestamp", default=get_current_epoch_time, envvar="CI_PIPELINE_START_TIMESTAMP", type=int)
+@click.pass_context
+def airbyte_ci_pipeline(


I would suggest having a main command group called airbyte-ci with subcommands like test-connectors . I don't think have a connectors-ci subgroup is required.

Good idea, my thought with this was that these commands were always going to relate to running dagger pipelines (hence the prefix) but Im happy to change it to airbyte-ci

On the note of groups. Im thinking that as this grows we will want different command groups? one set thats specific for connector related tasks, another specific to metadata related tasks. Another perhaps for docs specific tasks?

Yes sure, but I'd like to keep these commands as concise as possible... Would you mind following my suggestion for now and we'll probably re-evaluate the command structure in the future according to use cases growth?

alafanechere · 2023-03-30T18:57:07Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/main.py

What do you think keeping the main command group directly under pipelines in an airbyte_ci.py module?

I dont have exactly a strong opinion here. Just wanted a separation from this is the pipeline code and this is the command that runs the code.

thoughts about

commands/airbyte_ci.py commands/airbyte_ci/test_connectors.py commands/airbyte_ci/metadata_service.py

Feel free to say no, do this instead. This is your project captain!

I say yes :) As long as it makes sense I'm in. Decoupling command code from pipeline code is great. It's not what you did so far right? This file has both a command and pipeline code:

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

alafanechere · 2023-03-30T19:05:11Z

.github/actions/run-dagger-pipeline/action.yml

@@ -0,0 +1,55 @@
+name: "Run Dagger pipeline"


I think it's a nice initiative.
I think @cpdeethree would like to eventually have a single workflow which will run on every PR.
The logic of what pipeline runs would be inferred by the git diff in the entrypoint.
I we go down this way declaring an action could be overkill, but I like it for our current way of doing things.

alafanechere · 2023-03-30T19:32:04Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

+            exit_code = await with_exit_code(metadata_lib_module.with_exec(["poetry", "run", "pytest"]))
+
+            # Raise an exception if the exit code is not 0
+            if exit_code != 0:


I would also suggest you make use of ConnectorTestReport (or a Metadata version of it). This will help normalize the way we report success or failure to users.

alafanechere · 2023-03-30T19:34:23Z

tools/ci_connector_ops/ci_connector_ops/pipelines/github.py

@@ -19,6 +19,8 @@
 def update_commit_status_check(
    sha: str, state: str, target_url: str, description: str, context: str, should_send=True, logger: Logger = None
 ):
+    logger.info(f"Attempting to create {state} status for commit {sha} on Github in {context} context.")


This call will fail if logger is None, which is the default value.

Ahh my mistake.

Question, why do we do this above?

from typing import TYPE_CHECKING if TYPE_CHECKING: from logging import Logger

I'm not sure this is required in this context but sometime we only make imports to use these as type hints. And sometime we can't import a module because of circular dependencies.
if TYPE_CHECKING allows to only import a module when mypy is performing type checks.

alafanechere · 2023-03-30T19:40:47Z

tools/ci_connector_ops/ci_connector_ops/pipelines/contexts.py

+    FAILURE = {"github_state": "failure", "description": "Pipeline failed."}
+
+
+class PipelineContext(ABC):


This is an abstract class but you created instance of it in metadata_service.

I suggest you create a MetadataPipelineContext(PipelineContext). I would also declare __aenter__ and __aexit__ as abstract methods, and you make your MetadataPipelineContext implement these methods.
I we eventually spot a pattern on a third child class we could consider unabstracting these methods and implement a sensible default one.

Hmm good callout.

I want to push back slightly. But only because I may not fully understand.

But would sensible defaults be what we have

On error -> set commit status to error

On success -> set commit status to success

For the pipelines im intending to implement I think thats all I need?

These are the tests im thinking about

Run metadata lib unit test -> pass/fail

Run metadata orchestrator unit test -> pass/fail

Run metadata validator -> pass/fail

You're right, feel free to not declare this class as abstract then if you think the default behavior you've implemented is broad and simple enough to be reusable.

tools/ci_connector_ops/ci_connector_ops/pipelines/actions/environments.py

alafanechere · 2023-03-30T19:45:41Z

tools/ci_connector_ops/ci_connector_ops/pipelines/actions/environments.py

 INSTALL_LOCAL_REQUIREMENTS_CMD = ["python", "-m", "pip", "install", "-r", "requirements.txt"]
 INSTALL_CONNECTOR_PACKAGE_CMD = ["python", "-m", "pip", "install", "."]
+INSTALL_POETRY_PACKAGE_CMD = ["python", "-m", "pip", "install", "poetry"]
+POETRY_INSTALL_DEPENDENCIES_CMD = ["poetry", "install"]
+
 DEFAULT_PYTHON_EXCLUDE = [".venv"]
+POETRY_EXCLUDE = ["__pycache__"] + DEFAULT_PYTHON_EXCLUDE
 CI_CREDENTIALS_SOURCE_PATH = "tools/ci_credentials"
 CI_CONNECTOR_OPS_SOURCE_PATH = "tools/ci_connector_ops"


nit: what do you think about moving these constant directly to the environment function they're used in. I think it my help for readability and they are not reused.

Generally in favor! With the exception of the paths.

Love getting the path in constants pattern set up early so we dont have to play find where we hardcoded the path later

Sounds good to me.

bnchrch · 2023-03-31T00:29:27Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

+METADATA_LIB_MODULE_PATH = "airbyte-ci/connectors/metadata_service/lib"
+
+
+class MetadataLibRunTest(Step):


@alafanechere putting this up now slightly incomplete (needs some comments and classes moved to new files) just to get your thoughts on this.

Generally do you feel like im using the classes you defined in the correct spirit of things?

Also I had an idea while writing this. What if we made the PipelineContext hold onto a collection of steps and the be responsible for running them?

async def run_metadata_pipeline(cli_context): async with dagger.Connection(DAGGER_CONFIG) as dagger_client: metadata_pipleline_context = PipelineContext( pipeline_name="Metadata Service Lib Unit Test Pipeline", is_local=cli_context.obj["is_local"], git_branch=cli_context.obj["git_branch"], git_revision=cli_context.obj["git_revision"], gha_workflow_run_url=cli_context.obj.get("gha_workflow_run_url"), pipeline_start_timestamp=cli_context.obj.get("pipeline_start_timestamp"), ci_context=cli_context.obj.get("ci_context") dagger_client = dagger_client steps = [ MetadataLibRunTest, ] ) async with metadata_pipleline_context: test_report = await metadata_pipleline_context.run() return test_report ``` Where the pipeline context becomes concerned with everything that happens around a pipeline run 1. Logging 2. Reporting status to github 3. Ordering steps 4. Reporting outcome

Also I had an idea while writing this. What if we made the PipelineContext hold onto a collection of steps and the be responsible for running them?

I like it but it would mean implementing a PipelineContext (we might consider renaming PipelineContext to Pipeline) for each pipeline type and PipelineContext would need to be abstract (as it was before). I feel like its too big of a lift to make this happen in the current PR as it will change quite a lot of things in your PR for adapting the connector pipeline, and in this PR.
Would you mind creating an issue on the "Connectors CI in production" epic to track this future improvement?

Generally do you feel like im using the classes you defined in the correct spirit of things?

Yep thank you!!!

Added the issue here: Update Dagger pipeline class to manage the running of steps

alafanechere

I approve to unblock you. I shared my last suggestions.
For the command group discussion, on a second thought I'm not against subgroup but I would love to have a concise CLI.
airbyte-ci connectors test --name=source-pokeapi
airbyte-ci connectors publish --name=source-pokeapi
airbyte-ci metadata test lib (lib being an argument)
airbyte-ci cdk test
airbyte-ci cdk publish
airbyte-ci connector-acceptance-test test
etc...

alafanechere · 2023-03-31T13:59:44Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

+        async with metadata_pipeline_context:
+            result = await MetadataLibRunTest(metadata_pipeline_context).run()
+            test_report = TestReport(pipeline_context=metadata_pipeline_context, steps_results=[result])
+            test_report.print()


I made this print happen in __aexit__ of ConnectorTestContext. I would suggest you print the test report in the PipelineContext

alafanechere · 2023-03-31T14:00:04Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

+METADATA_LIB_MODULE_PATH = "airbyte-ci/connectors/metadata_service/lib"
+
+
+class MetadataLibRunTest(Step):


Generally do you feel like im using the classes you defined in the correct spirit of things?

Yep thank you!!!

alafanechere · 2023-03-31T14:01:44Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

+
+
+class MetadataLibRunTest(Step):
+    title = "Run Metadata Service Lib Unit Tests"


nit for later: I'm wondering if we should add a description attribute that could explain to the world the philosophy of the implemented pipeline, with links to the thing under test code etc. It would be interesting metadata to propagate to logs or automatically generate docs.

I like that idea! Feel free to add an issue for it!

alafanechere · 2023-03-31T14:04:15Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/airbyte_ci.py

+    type=str,
+)
+@click.option("--gha-workflow-run-id", help="[CI Only] The run id of the GitHub action workflow", default=None, type=str)
+@click.option("--ci-context", default="manual", envvar="CI_CONTEXT", type=click.Choice(CI_CONTEXT_VALUES))


I think you can directly use the enum class in Choice. It worked for ConnectorLanguage:
https://github.com/airbytehq/airbyte/blob/master/tools/ci_connector_ops/ci_connector_ops/pipelines/connectors_ci.py#L137

alafanechere · 2023-03-31T14:13:04Z

@bnchrch as you modified the CLI could you please update the existing GHA workflow using it in this PR? (or refactor them to use the action you wrote)

Co-authored-by: Augustin <augustin@airbyte.io>

bnchrch added the from/connector-ops label Mar 29, 2023

bnchrch requested review from alafanechere and a team March 29, 2023 01:47

bnchrch commented Mar 29, 2023

View reviewed changes

tools/ci_connector_ops/ci_connector_ops/pipelines/actions/environments.py Show resolved Hide resolved

bnchrch commented Mar 29, 2023

View reviewed changes

bnchrch force-pushed the bnchrch/cicd-metadata-test branch from 5b1c7bc to 4026b0a Compare March 29, 2023 02:00

bnchrch requested review from a team as code owners March 29, 2023 02:00

octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Mar 29, 2023

bnchrch changed the base branch from bnchrch/generate-metadata-files to master March 29, 2023 02:00

bnchrch force-pushed the bnchrch/cicd-metadata-test branch from 4026b0a to dae4977 Compare March 30, 2023 01:18

octavia-squidington-iii removed area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Mar 30, 2023

bnchrch commented Mar 30, 2023

View reviewed changes

alafanechere requested changes Mar 30, 2023

View reviewed changes

bnchrch force-pushed the bnchrch/cicd-metadata-test branch from e5a6277 to 2aae329 Compare March 30, 2023 21:09

evantahler removed the from/connector-ops label Mar 31, 2023

bnchrch commented Mar 31, 2023

View reviewed changes

alafanechere approved these changes Mar 31, 2023

View reviewed changes

bnchrch force-pushed the bnchrch/cicd-metadata-test branch from 2299af8 to ada9db0 Compare March 31, 2023 23:58

bnchrch added 7 commits April 3, 2023 13:52

Update install script

4a4b97e

Add skelton

ee364e3

Run metadata unit tests

7e57584

Move poetry set up to environment

6065719

Set up poetry caching

f2add5a

Add top level airbyte-ci command

54988a6

Move connectors_ci under airbyte_pipeline command

bf29853

bnchrch and others added 22 commits April 3, 2023 13:52

Add pipeline context

04e4655

Properly handle exit codes

d129787

Check for changed metadata lib paths

7fff0f8

Run black and flake

bbf9908

Update environment to use context

81a0b5b

Add workflow

02407b8

Get rid of log name

c9b4d7f

Add github action

4e87fe1

Add workflow dispatch

fad2167

Apply suggestions from code review

1f42350

Co-authored-by: Augustin <augustin@airbyte.io>

rename airbyte_ci_pipeline to airbyte_ci

ac75703

Update lib

9126798

Add cicontext enum, safe logging, move call to super.__init__

d0152ce

Run black

e6a4162

Refactor to use test

a67132b

Move test logic into pipeline

31038e9

Add reporting back

7626821

Rename commands

4abf3fc

Move metadata run to pipeline

9ed6f2d

Move connectors run to pipeline

9e0e35d

Run black

2dce0fb

Increase connector ops version

ba5583f

bnchrch force-pushed the bnchrch/cicd-metadata-test branch from ada9db0 to ba5583f Compare April 3, 2023 21:16

bnchrch added 4 commits April 3, 2023 14:29

Clear pip

77a0899

Different uninstall command

7b9fa44

omit git

3245d70

Silly shot in the dark -e

6856965

bnchrch merged commit 082303a into master Apr 3, 2023

bnchrch deleted the bnchrch/cicd-metadata-test branch April 3, 2023 22:37

alafanechere mentioned this pull request Apr 11, 2023

[EPIC] - Connectors CI/CD Pipeline in Production #24403

Closed

		FAILURE = {"github_state": "failure", "description": "Pipeline failed."}


		class PipelineContext(ABC):

		METADATA_LIB_MODULE_PATH = "airbyte-ci/connectors/metadata_service/lib"


		class MetadataLibRunTest(Step):



		class MetadataLibRunTest(Step):
		title = "Run Metadata Service Lib Unit Tests"

Add ci/cd support for metadata unit tests with Dagger #24649

Add ci/cd support for metadata unit tests with Dagger #24649

Conversation

bnchrch commented Mar 29, 2023

What

How

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alafanechere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alafanechere Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alafanechere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alafanechere commented Mar 31, 2023

alafanechere Mar 31, 2023 •

edited

Loading