-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ci/cd support for metadata unit tests with Dagger #24649
Conversation
python-version: "3.10" | ||
- name: Install pipeline package | ||
run: pip install ./tools/ci_connector_ops\[pipelines]\ | ||
- name: Run airbyte-ci-pipeline metadata-service test-metadata-service-lib [PULL REQUESTS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere I opted not to have a workflow dispatch, good idea or bad idea?
tools/ci_connector_ops/ci_connector_ops/pipelines/actions/environments.py
Show resolved
Hide resolved
exit_code = await with_exit_code(metadata_lib_module.with_exec(["poetry", "run", "pytest"])) | ||
|
||
# Raise an exception if the exit code is not 0 | ||
if exit_code != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere I didnt like that this was where I ended up. Do you have any ideas as to how we can make the pipeline context own the handling of the status code?
For the record the reason I raise is to take advantage of the git error status reporting in __aexit__
inside the pipeline class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest you implement a Step
class like MetadataLibTest(Step)
and make its _run
return self.get_step_result(metadata_lib_module)
.
Like what I've done for QaChecks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest you make use of ConnectorTestReport (or a Metadata version of it). This will help normalize the way we report success or failure to users.
5b1c7bc
to
4026b0a
Compare
4026b0a
to
dae4977
Compare
@@ -0,0 +1,55 @@ | |||
name: "Run Dagger pipeline" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere What do you think about the dagger action? If your into it I can update the other dagger workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a nice initiative.
I think @cpdeethree would like to eventually have a single workflow which will run on every PR.
The logic of what pipeline runs would be inferred by the git diff in the entrypoint.
I we go down this way declaring an action could be overkill, but I like it for our current way of doing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im thinking Ill add a few more commands before conors repo is ready.
Should I just use this action for my new commands then and leave the existing ones alone to avoid refactoring twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the action pattern so feel free to use it for your command and refactor the existing one in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you plugged in this codebase!!! #collaboration
I think you can benefit a bit more from the classes I've declared in bases.py
.
I would also like to challenge the existence of subcommand group and would prefer simple CLI like: airbyte-ci test-connectors
or airbyte-ci test-metadata
@click.option("--ci-context", default="manual", envvar="CI_CONTEXT", type=click.Choice(["manual", "pull_request", "nightly_builds"])) | ||
@click.option("--pipeline-start-timestamp", default=get_current_epoch_time, envvar="CI_PIPELINE_START_TIMESTAMP", type=int) | ||
@click.pass_context | ||
def airbyte_ci_pipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest having a main command group called airbyte-ci
with subcommands like test-connectors
. I don't think have a connectors-ci
subgroup is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, my thought with this was that these commands were always going to relate to running dagger pipelines (hence the prefix) but Im happy to change it to airbyte-ci
On the note of groups. Im thinking that as this grows we will want different command groups? one set thats specific for connector related tasks, another specific to metadata related tasks. Another perhaps for docs specific tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure, but I'd like to keep these commands as concise as possible... Would you mind following my suggestion for now and we'll probably re-evaluate the command structure in the future according to use cases growth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think keeping the main command group directly under pipelines
in an airbyte_ci.py
module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont have exactly a strong opinion here. Just wanted a separation from this is the pipeline code and this is the command that runs the code.
thoughts about
commands/airbyte_ci.py
commands/airbyte_ci/test_connectors.py
commands/airbyte_ci/metadata_service.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to say no, do this instead. This is your project captain!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say yes :) As long as it makes sense I'm in. Decoupling command code from pipeline code is great. It's not what you did so far right? This file has both a command and pipeline code:
tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py
tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py
Outdated
Show resolved
Hide resolved
tools/ci_connector_ops/ci_connector_ops/pipelines/commands/metadata_service.py
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,55 @@ | |||
name: "Run Dagger pipeline" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a nice initiative.
I think @cpdeethree would like to eventually have a single workflow which will run on every PR.
The logic of what pipeline runs would be inferred by the git diff in the entrypoint.
I we go down this way declaring an action could be overkill, but I like it for our current way of doing things.
exit_code = await with_exit_code(metadata_lib_module.with_exec(["poetry", "run", "pytest"])) | ||
|
||
# Raise an exception if the exit code is not 0 | ||
if exit_code != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest you make use of ConnectorTestReport (or a Metadata version of it). This will help normalize the way we report success or failure to users.
@@ -19,6 +19,8 @@ | |||
def update_commit_status_check( | |||
sha: str, state: str, target_url: str, description: str, context: str, should_send=True, logger: Logger = None | |||
): | |||
logger.info(f"Attempting to create {state} status for commit {sha} on Github in {context} context.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call will fail if logger is None, which is the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh my mistake.
Question, why do we do this above?
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from logging import Logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is required in this context but sometime we only make imports to use these as type hints. And sometime we can't import a module because of circular dependencies.
if TYPE_CHECKING
allows to only import a module when mypy
is performing type checks.
FAILURE = {"github_state": "failure", "description": "Pipeline failed."} | ||
|
||
|
||
class PipelineContext(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an abstract class but you created instance of it in metadata_service
.
I suggest you create a MetadataPipelineContext(PipelineContext)
. I would also declare __aenter__
and __aexit__
as abstract methods, and you make your MetadataPipelineContext
implement these methods.
I we eventually spot a pattern on a third child class we could consider unabstracting these methods and implement a sensible default one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good callout.
I want to push back slightly. But only because I may not fully understand.
But would sensible defaults be what we have
- On error -> set commit status to error
- On success -> set commit status to success
For the pipelines im intending to implement I think thats all I need?
These are the tests im thinking about
- Run metadata lib unit test -> pass/fail
- Run metadata orchestrator unit test -> pass/fail
- Run metadata validator -> pass/fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, feel free to not declare this class as abstract then if you think the default behavior you've implemented is broad and simple enough to be reusable.
tools/ci_connector_ops/ci_connector_ops/pipelines/actions/environments.py
Outdated
Show resolved
Hide resolved
INSTALL_LOCAL_REQUIREMENTS_CMD = ["python", "-m", "pip", "install", "-r", "requirements.txt"] | ||
INSTALL_CONNECTOR_PACKAGE_CMD = ["python", "-m", "pip", "install", "."] | ||
INSTALL_POETRY_PACKAGE_CMD = ["python", "-m", "pip", "install", "poetry"] | ||
POETRY_INSTALL_DEPENDENCIES_CMD = ["poetry", "install"] | ||
|
||
DEFAULT_PYTHON_EXCLUDE = [".venv"] | ||
POETRY_EXCLUDE = ["__pycache__"] + DEFAULT_PYTHON_EXCLUDE | ||
CI_CREDENTIALS_SOURCE_PATH = "tools/ci_credentials" | ||
CI_CONNECTOR_OPS_SOURCE_PATH = "tools/ci_connector_ops" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what do you think about moving these constant directly to the environment function they're used in. I think it my help for readability and they are not reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally in favor! With the exception of the paths.
Love getting the path in constants pattern set up early so we dont have to play find where we hardcoded the path later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
e5a6277
to
2aae329
Compare
METADATA_LIB_MODULE_PATH = "airbyte-ci/connectors/metadata_service/lib" | ||
|
||
|
||
class MetadataLibRunTest(Step): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere putting this up now slightly incomplete (needs some comments and classes moved to new files) just to get your thoughts on this.
Generally do you feel like im using the classes you defined in the correct spirit of things?
Also I had an idea while writing this. What if we made the PipelineContext hold onto a collection of steps and the be responsible for running them?
async def run_metadata_pipeline(cli_context):
async with dagger.Connection(DAGGER_CONFIG) as dagger_client:
metadata_pipleline_context = PipelineContext(
pipeline_name="Metadata Service Lib Unit Test Pipeline",
is_local=cli_context.obj["is_local"],
git_branch=cli_context.obj["git_branch"],
git_revision=cli_context.obj["git_revision"],
gha_workflow_run_url=cli_context.obj.get("gha_workflow_run_url"),
pipeline_start_timestamp=cli_context.obj.get("pipeline_start_timestamp"),
ci_context=cli_context.obj.get("ci_context")
dagger_client = dagger_client
steps = [
MetadataLibRunTest,
]
)
async with metadata_pipleline_context:
test_report = await metadata_pipleline_context.run()
return test_report
```
Where the pipeline context becomes concerned with everything that happens around a pipeline run
1. Logging
2. Reporting status to github
3. Ordering steps
4. Reporting outcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I had an idea while writing this. What if we made the PipelineContext hold onto a collection of steps and the be responsible for running them?
I like it but it would mean implementing a PipelineContext (we might consider renaming PipelineContext to Pipeline) for each pipeline type and PipelineContext would need to be abstract (as it was before). I feel like its too big of a lift to make this happen in the current PR as it will change quite a lot of things in your PR for adapting the connector pipeline, and in this PR.
Would you mind creating an issue on the "Connectors CI in production" epic to track this future improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally do you feel like im using the classes you defined in the correct spirit of things?
Yep thank you!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the issue here: Update Dagger pipeline class to manage the running of steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve to unblock you. I shared my last suggestions.
For the command group discussion, on a second thought I'm not against subgroup but I would love to have a concise CLI.
airbyte-ci connectors test --name=source-pokeapi
airbyte-ci connectors publish --name=source-pokeapi
airbyte-ci metadata test lib (lib being an argument)
airbyte-ci cdk test
airbyte-ci cdk publish
airbyte-ci connector-acceptance-test test
etc...
async with metadata_pipeline_context: | ||
result = await MetadataLibRunTest(metadata_pipeline_context).run() | ||
test_report = TestReport(pipeline_context=metadata_pipeline_context, steps_results=[result]) | ||
test_report.print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this print happen in __aexit__
of ConnectorTestContext. I would suggest you print the test report in the PipelineContext
METADATA_LIB_MODULE_PATH = "airbyte-ci/connectors/metadata_service/lib" | ||
|
||
|
||
class MetadataLibRunTest(Step): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally do you feel like im using the classes you defined in the correct spirit of things?
Yep thank you!!!
|
||
|
||
class MetadataLibRunTest(Step): | ||
title = "Run Metadata Service Lib Unit Tests" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for later: I'm wondering if we should add a description
attribute that could explain to the world the philosophy of the implemented pipeline, with links to the thing under test code etc. It would be interesting metadata to propagate to logs or automatically generate docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea! Feel free to add an issue for it!
type=str, | ||
) | ||
@click.option("--gha-workflow-run-id", help="[CI Only] The run id of the GitHub action workflow", default=None, type=str) | ||
@click.option("--ci-context", default="manual", envvar="CI_CONTEXT", type=click.Choice(CI_CONTEXT_VALUES)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can directly use the enum class in Choice
. It worked for ConnectorLanguage:
https://github.com/airbytehq/airbyte/blob/master/tools/ci_connector_ops/ci_connector_ops/pipelines/connectors_ci.py#L137
@bnchrch as you modified the CLI could you please update the existing GHA workflow using it in this PR? (or refactor them to use the action you wrote) |
2299af8
to
ada9db0
Compare
Co-authored-by: Augustin <augustin@airbyte.io>
ada9db0
to
ba5583f
Compare
What
We need to run our unit tests for the metadata library
How
airbyte-ci-pipelines