Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ci/cd support for metadata unit tests with Dagger #24649

Merged
merged 34 commits into from
Apr 3, 2023

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Mar 29, 2023

What

We need to run our unit tests for the metadata library

How

  1. This creates and uses a new dagger pipeline to accomplish that.
  2. Updates the dagger command structure to have one top level command called airbyte-ci-pipelines
  3. Refactors the context to be slightly more generic

python-version: "3.10"
- name: Install pipeline package
run: pip install ./tools/ci_connector_ops\[pipelines]\
- name: Run airbyte-ci-pipeline metadata-service test-metadata-service-lib [PULL REQUESTS]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere I opted not to have a workflow dispatch, good idea or bad idea?

exit_code = await with_exit_code(metadata_lib_module.with_exec(["poetry", "run", "pytest"]))

# Raise an exception if the exit code is not 0
if exit_code != 0:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere I didnt like that this was where I ended up. Do you have any ideas as to how we can make the pipeline context own the handling of the status code?

For the record the reason I raise is to take advantage of the git error status reporting in __aexit__ inside the pipeline class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you implement a Step class like MetadataLibTest(Step) and make its _run return self.get_step_result(metadata_lib_module) .
Like what I've done for QaChecks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest you make use of ConnectorTestReport (or a Metadata version of it). This will help normalize the way we report success or failure to users.

@bnchrch bnchrch requested review from a team as code owners March 29, 2023 02:00
@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Mar 29, 2023
@bnchrch bnchrch changed the base branch from bnchrch/generate-metadata-files to master March 29, 2023 02:00
@octavia-squidington-iii octavia-squidington-iii removed area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Mar 30, 2023
@@ -0,0 +1,55 @@
name: "Run Dagger pipeline"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere What do you think about the dagger action? If your into it I can update the other dagger workflows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a nice initiative.
I think @cpdeethree would like to eventually have a single workflow which will run on every PR.
The logic of what pipeline runs would be inferred by the git diff in the entrypoint.
I we go down this way declaring an action could be overkill, but I like it for our current way of doing things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im thinking Ill add a few more commands before conors repo is ready.

Should I just use this action for my new commands then and leave the existing ones alone to avoid refactoring twice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the action pattern so feel free to use it for your command and refactor the existing one in a separate PR.

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you plugged in this codebase!!! #collaboration
I think you can benefit a bit more from the classes I've declared in bases.py.
I would also like to challenge the existence of subcommand group and would prefer simple CLI like: airbyte-ci test-connectors or airbyte-ci test-metadata

@click.option("--ci-context", default="manual", envvar="CI_CONTEXT", type=click.Choice(["manual", "pull_request", "nightly_builds"]))
@click.option("--pipeline-start-timestamp", default=get_current_epoch_time, envvar="CI_PIPELINE_START_TIMESTAMP", type=int)
@click.pass_context
def airbyte_ci_pipeline(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest having a main command group called airbyte-ci with subcommands like test-connectors . I don't think have a connectors-ci subgroup is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, my thought with this was that these commands were always going to relate to running dagger pipelines (hence the prefix) but Im happy to change it to airbyte-ci

On the note of groups. Im thinking that as this grows we will want different command groups? one set thats specific for connector related tasks, another specific to metadata related tasks. Another perhaps for docs specific tasks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, but I'd like to keep these commands as concise as possible... Would you mind following my suggestion for now and we'll probably re-evaluate the command structure in the future according to use cases growth?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think keeping the main command group directly under pipelines in an airbyte_ci.py module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont have exactly a strong opinion here. Just wanted a separation from this is the pipeline code and this is the command that runs the code.

thoughts about

commands/airbyte_ci.py
commands/airbyte_ci/test_connectors.py
commands/airbyte_ci/metadata_service.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to say no, do this instead. This is your project captain!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say yes :) As long as it makes sense I'm in. Decoupling command code from pipeline code is great. It's not what you did so far right? This file has both a command and pipeline code:

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py

@@ -0,0 +1,55 @@
name: "Run Dagger pipeline"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a nice initiative.
I think @cpdeethree would like to eventually have a single workflow which will run on every PR.
The logic of what pipeline runs would be inferred by the git diff in the entrypoint.
I we go down this way declaring an action could be overkill, but I like it for our current way of doing things.

exit_code = await with_exit_code(metadata_lib_module.with_exec(["poetry", "run", "pytest"]))

# Raise an exception if the exit code is not 0
if exit_code != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest you make use of ConnectorTestReport (or a Metadata version of it). This will help normalize the way we report success or failure to users.

@@ -19,6 +19,8 @@
def update_commit_status_check(
sha: str, state: str, target_url: str, description: str, context: str, should_send=True, logger: Logger = None
):
logger.info(f"Attempting to create {state} status for commit {sha} on Github in {context} context.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call will fail if logger is None, which is the default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh my mistake.

Question, why do we do this above?

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from logging import Logger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is required in this context but sometime we only make imports to use these as type hints. And sometime we can't import a module because of circular dependencies.
if TYPE_CHECKING allows to only import a module when mypy is performing type checks.

FAILURE = {"github_state": "failure", "description": "Pipeline failed."}


class PipelineContext(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an abstract class but you created instance of it in metadata_service.

I suggest you create a MetadataPipelineContext(PipelineContext). I would also declare __aenter__ and __aexit__ as abstract methods, and you make your MetadataPipelineContext implement these methods.
I we eventually spot a pattern on a third child class we could consider unabstracting these methods and implement a sensible default one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good callout.

I want to push back slightly. But only because I may not fully understand.

But would sensible defaults be what we have

  1. On error -> set commit status to error
  2. On success -> set commit status to success

For the pipelines im intending to implement I think thats all I need?

These are the tests im thinking about

  1. Run metadata lib unit test -> pass/fail
  2. Run metadata orchestrator unit test -> pass/fail
  3. Run metadata validator -> pass/fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, feel free to not declare this class as abstract then if you think the default behavior you've implemented is broad and simple enough to be reusable.

Comment on lines 27 to 29
INSTALL_LOCAL_REQUIREMENTS_CMD = ["python", "-m", "pip", "install", "-r", "requirements.txt"]
INSTALL_CONNECTOR_PACKAGE_CMD = ["python", "-m", "pip", "install", "."]
INSTALL_POETRY_PACKAGE_CMD = ["python", "-m", "pip", "install", "poetry"]
POETRY_INSTALL_DEPENDENCIES_CMD = ["poetry", "install"]

DEFAULT_PYTHON_EXCLUDE = [".venv"]
POETRY_EXCLUDE = ["__pycache__"] + DEFAULT_PYTHON_EXCLUDE
CI_CREDENTIALS_SOURCE_PATH = "tools/ci_credentials"
CI_CONNECTOR_OPS_SOURCE_PATH = "tools/ci_connector_ops"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what do you think about moving these constant directly to the environment function they're used in. I think it my help for readability and they are not reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally in favor! With the exception of the paths.

Love getting the path in constants pattern set up early so we dont have to play find where we hardcoded the path later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

METADATA_LIB_MODULE_PATH = "airbyte-ci/connectors/metadata_service/lib"


class MetadataLibRunTest(Step):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere putting this up now slightly incomplete (needs some comments and classes moved to new files) just to get your thoughts on this.

Generally do you feel like im using the classes you defined in the correct spirit of things?

Also I had an idea while writing this. What if we made the PipelineContext hold onto a collection of steps and the be responsible for running them?

async def run_metadata_pipeline(cli_context):
    async with dagger.Connection(DAGGER_CONFIG) as dagger_client:
        metadata_pipleline_context = PipelineContext(
            pipeline_name="Metadata Service Lib Unit Test Pipeline",
            is_local=cli_context.obj["is_local"],
            git_branch=cli_context.obj["git_branch"],
            git_revision=cli_context.obj["git_revision"],
            gha_workflow_run_url=cli_context.obj.get("gha_workflow_run_url"),
            pipeline_start_timestamp=cli_context.obj.get("pipeline_start_timestamp"),
            ci_context=cli_context.obj.get("ci_context")
            dagger_client = dagger_client
            steps = [
                MetadataLibRunTest,
            ]
        )

        async with metadata_pipleline_context:
            test_report = await metadata_pipleline_context.run()
            return test_report
            ```
Where the pipeline context becomes concerned with everything that happens around a pipeline run
1. Logging
2. Reporting status to github
3. Ordering steps
4. Reporting outcome

Copy link
Contributor

@alafanechere alafanechere Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I had an idea while writing this. What if we made the PipelineContext hold onto a collection of steps and the be responsible for running them?

I like it but it would mean implementing a PipelineContext (we might consider renaming PipelineContext to Pipeline) for each pipeline type and PipelineContext would need to be abstract (as it was before). I feel like its too big of a lift to make this happen in the current PR as it will change quite a lot of things in your PR for adapting the connector pipeline, and in this PR.
Would you mind creating an issue on the "Connectors CI in production" epic to track this future improvement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally do you feel like im using the classes you defined in the correct spirit of things?

Yep thank you!!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve to unblock you. I shared my last suggestions.
For the command group discussion, on a second thought I'm not against subgroup but I would love to have a concise CLI.
airbyte-ci connectors test --name=source-pokeapi
airbyte-ci connectors publish --name=source-pokeapi
airbyte-ci metadata test lib (lib being an argument)
airbyte-ci cdk test
airbyte-ci cdk publish
airbyte-ci connector-acceptance-test test
etc...

async with metadata_pipeline_context:
result = await MetadataLibRunTest(metadata_pipeline_context).run()
test_report = TestReport(pipeline_context=metadata_pipeline_context, steps_results=[result])
test_report.print()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this print happen in __aexit__ of ConnectorTestContext. I would suggest you print the test report in the PipelineContext

METADATA_LIB_MODULE_PATH = "airbyte-ci/connectors/metadata_service/lib"


class MetadataLibRunTest(Step):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally do you feel like im using the classes you defined in the correct spirit of things?

Yep thank you!!!



class MetadataLibRunTest(Step):
title = "Run Metadata Service Lib Unit Tests"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for later: I'm wondering if we should add a description attribute that could explain to the world the philosophy of the implemented pipeline, with links to the thing under test code etc. It would be interesting metadata to propagate to logs or automatically generate docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea! Feel free to add an issue for it!

type=str,
)
@click.option("--gha-workflow-run-id", help="[CI Only] The run id of the GitHub action workflow", default=None, type=str)
@click.option("--ci-context", default="manual", envvar="CI_CONTEXT", type=click.Choice(CI_CONTEXT_VALUES))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can directly use the enum class in Choice. It worked for ConnectorLanguage:
https://github.com/airbytehq/airbyte/blob/master/tools/ci_connector_ops/ci_connector_ops/pipelines/connectors_ci.py#L137

@alafanechere
Copy link
Contributor

@bnchrch as you modified the CLI could you please update the existing GHA workflow using it in this PR? (or refactor them to use the action you wrote)

@bnchrch bnchrch force-pushed the bnchrch/cicd-metadata-test branch from ada9db0 to ba5583f Compare April 3, 2023 21:16
@bnchrch bnchrch merged commit 082303a into master Apr 3, 2023
@bnchrch bnchrch deleted the bnchrch/cicd-metadata-test branch April 3, 2023 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants