Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airbyte-ci: Add --metadata-query option #30330

Merged
merged 12 commits into from
Sep 13, 2023
Merged

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Sep 12, 2023

Problem(s)

  1. Selecting which category of connectors to run is too rigid and requires boilerplate for each new field.
  2. Complex selections are impossible
  3. We want to run weekly tests based on ab_internal

Solution

Allow for filters based on metadata.yaml contents

related to #30218
e.g.

  1. airbyte-ci connectors --metadata-query="data.name == 'Postgres'" test
  2. airbyte-ci connectors --metadata-query="(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)" test

@bnchrch bnchrch requested review from alafanechere and a team September 12, 2023 01:22


@pytest.fixture()
def set_working_dir_to_repo_root(monkeypatch):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere I discovered we are not currently running these tests.

I imagine its because many of them depend on connectors being preset in the main repo.

This was a shortcut I took to have them passing again. But I want to raise a flag that it is a hack. Just not sure if its an acceptable short term hack.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
I've set it in airbyte-ci/pipelines in a slighlty different way:

@pytest.fixture(scope="session")
def airbyte_repo_path() -> Path:
    return Path(git.Repo(search_parent_directories=True).working_tree_dir)

@pytest.fixture(autouse=True, scope="session")
def from_airbyte_root(airbyte_repo_path):
    """
    Change the working directory to the root of the Airbyte repo.
    This will make all the tests current working directory to be the root of the Airbyte repo as we've set autouse=True.
    """
    original_dir = Path.cwd()
    os.chdir(airbyte_repo_path)
    yield airbyte_repo_path
    os.chdir(original_dir)

It's a bit less hacky as the git library finds the repo path.
The sessions scope is cool to not re-evaluate this fixture on each test function depending on it.

@@ -41,4 +41,4 @@ jobs:
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
git_branch: ${{ steps.extract_branch.outputs.branch }}
github_token: ${{ secrets.GITHUB_TOKEN }}
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will reduce our weekly connector tests from over 200 down to ~70.

The question is should we test the 100 level ql community connectors weekly as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, it would still be nice to define this query as a variable that describes what it is for

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cool! The parsing knows to convert the input into numbers and do math?!
Edit: Thanks, simple_eval!

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me on the basics - I like the metadata query a lot, and you've got good tests around it

@@ -41,4 +41,4 @@ jobs:
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
git_branch: ${{ steps.extract_branch.outputs.branch }}
github_token: ${{ secrets.GITHUB_TOKEN }}
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cool! The parsing knows to convert the input into numbers and do math?!
Edit: Thanks, simple_eval!

Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't comment on the short term hack - perhaps i'd pull it out for now to get this in and reconsider it later. Otherwise, added some documentation notes, but looks like a really nice solution!

@@ -41,4 +41,4 @@ jobs:
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
git_branch: ${{ steps.extract_branch.outputs.branch }}
github_token: ${{ secrets.GITHUB_TOKEN }}
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, it would still be nice to define this query as a variable that describes what it is for

Comment on lines +297 to +306
Examples
--------
>>> connector.metadata_query_match("'s3' in data.name")
True

>>> connector.metadata_query_match("data.supportLevel == 'certified'")
False

>>> connector.metadata_query_match("data.ab_internal.ql >= 100")
True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great examples!

Comment on lines +47 to +53
assert connector.icon_path == Path(f"./airbyte-integrations/connectors/{connector.technical_name}/icon.svg")
assert len(connector.version.split(".")) == 3
else:
assert connector.metadata is None
assert connector.support_level is None
assert connector.acceptance_test_config is None
assert connector.icon_path == Path(f"./airbyte-config-oss/init-oss/src/main/resources/icons/{connector.name}.svg")
assert connector.icon_path == Path(f"./airbyte-integrations/connectors/{connector.technical_name}/icon.svg")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these were tests that weren't being run, and didn't pass after we made them run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly! I want to get them run automatically again #30330 (comment)

Ill address that in another PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks for fixing them in any case :D

@@ -122,6 +122,7 @@ Available commands:
| `--use-remote-secrets` | False | True | If True, connectors configuration will be pulled from Google Secret Manager. Requires the GCP_GSM_CREDENTIALS environment variable to be set with a service account with permission to read GSM secrets. If False the connector configuration will be read from the local connector `secrets` folder. |
| `--name` | True | | Select a specific connector for which the pipeline will run. Can be used multiple time to select multiple connectors. The expected name is the connector technical name. e.g. `source-pokeapi` |
| `--support-level` | True | | Select connectors with a specific support level: `community`, `certified`. Can be used multiple times to select multiple support levels. |
| `--metadata-query` | False | | Filter connectors by metadata query using `simpleeval`. e.g. 'data.ab_internal.ql == 200' |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the link to simpleeval? and will everyone need to backslash the quotes in the usage like here? some more concrete examples of using this in the readme might be helpful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially pointing out that data points to the top level of the metadata file - not sure people will know that.

non_empty_connector_sets = [
connector_set
for connector_set in [
selected_connectors_by_name,
selected_connectors_by_support_level,
selected_connectors_by_language,
selected_connectors_by_query,
selected_modified_connectors,
]
if connector_set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: line 100, add "by query" to the intersection of listed groups comment

@bnchrch bnchrch enabled auto-merge (squash) September 13, 2023 01:34
@bnchrch bnchrch merged commit 13a5a40 into master Sep 13, 2023
20 checks passed
@bnchrch bnchrch deleted the bnchrch/ci/weekly-ql-test branch September 13, 2023 01:53
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool feature! I added a suggestion related to the test execution from the repo root.

@@ -41,4 +41,4 @@ jobs:
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
git_branch: ${{ steps.extract_branch.outputs.branch }}
github_token: ${{ secrets.GITHUB_TOKEN }}
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnchrch looks like simple_eval is providing safe eval 👍



@pytest.fixture()
def set_working_dir_to_repo_root(monkeypatch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
I've set it in airbyte-ci/pipelines in a slighlty different way:

@pytest.fixture(scope="session")
def airbyte_repo_path() -> Path:
    return Path(git.Repo(search_parent_directories=True).working_tree_dir)

@pytest.fixture(autouse=True, scope="session")
def from_airbyte_root(airbyte_repo_path):
    """
    Change the working directory to the root of the Airbyte repo.
    This will make all the tests current working directory to be the root of the Airbyte repo as we've set autouse=True.
    """
    original_dir = Path.cwd()
    os.chdir(airbyte_repo_path)
    yield airbyte_repo_path
    os.chdir(original_dir)

It's a bit less hacky as the git library finds the repo path.
The sessions scope is cool to not re-evaluate this fixture on each test function depending on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants