Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT tools/acceptance_test_config_migration: Make relatively configurable per migration, add (hacky) ability to run CAT tests locally for multiple connectors #24377

Merged
merged 78 commits into from
Apr 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
f596e3f
Pass argument along, add test that should pass and test that should fail
erohmensing Mar 13, 2023
c88305e
Add tests with additionalProperties
erohmensing Mar 13, 2023
1f91a59
Set additionalproperties=false when not set|
erohmensing Mar 13, 2023
a8ef276
Parametrize test cases
erohmensing Mar 13, 2023
3e5cf1f
Make the behavior 'optional'
erohmensing Mar 13, 2023
a7eb6f6
Fix parametrization for all combinations
erohmensing Mar 13, 2023
2545f8f
Improve CI credentials README and rename param
erohmensing Mar 14, 2023
356b12a
Update naming to be clearer about columns only
erohmensing Mar 20, 2023
a53c31a
record_has_unexpected_field > record_has_unexpected_column
erohmensing Mar 20, 2023
c82879e
Automated Change
marcosmarxm Mar 20, 2023
646b372
Hacking the CAT dockerfile and run script to test my changes specific…
erohmensing Mar 20, 2023
fa72e00
First crack at running CAT on all connectors
erohmensing Mar 20, 2023
be92aef
Write during instead of after all tests
erohmensing Mar 20, 2023
96eb3fc
Async-ify it
erohmensing Mar 20, 2023
9fc650c
Add ability to define max concurrency
erohmensing Mar 20, 2023
ede50bd
Write successes
erohmensing Mar 20, 2023
cea154a
ci_credentials: fix overwriting 'data' before getting nextPageToken
erohmensing Mar 20, 2023
a2569e4
Adjustible num_semaphores, check to make sure it's an airbyte connect…
erohmensing Mar 20, 2023
63a273b
Automated Change
erohmensing Mar 20, 2023
f729dc2
Make create_issues and create_prs more configurable, add issue for fa…
erohmensing Mar 20, 2023
1aa9644
Add ability to pass in sources as a list or from a txt file
erohmensing Mar 20, 2023
6535110
Add logs to issue, make project nullable
erohmensing Mar 21, 2023
9de27c3
Merge branch 'master' into ella/no-extra-fields
erohmensing Mar 21, 2023
ae1c263
Migrate multiple connectors
erohmensing Mar 22, 2023
b538a8c
Add cli args
erohmensing Mar 22, 2023
358b0f8
use ruamel.yaml to preserve ordering
erohmensing Mar 22, 2023
f42a3a9
Separate config loading from config migration
erohmensing Mar 22, 2023
abe4006
Add ability to pass in lists of sources to test. Sort output by exit …
erohmensing Mar 22, 2023
f61804d
Merge branch 'master' into ella/run-all-CATS
erohmensing Mar 22, 2023
7655714
Default to testing only beta and GA connectors
erohmensing Mar 22, 2023
b460b1c
Always write test output when available
erohmensing Mar 22, 2023
532a35e
Merge branch 'master' into ella/no-extra-fields
erohmensing Mar 22, 2023
c498f9f
Merge branch 'ella/no-extra-fields' into ella/run-all-CATS
erohmensing Mar 22, 2023
6a17ba1
Revert "Add cli args"
erohmensing Mar 22, 2023
92bf491
Remove slash
erohmensing Mar 22, 2023
8d9cfa7
Don't run on alpha connetors, handle older config style
erohmensing Mar 22, 2023
e739e12
Don't migrate to new format, preserve quotes and long lines
erohmensing Mar 22, 2023
fb06bd6
Automated Change
erohmensing Mar 23, 2023
6091f69
Merge branch 'master' into ella/run-all-CATS
erohmensing Mar 23, 2023
9f865fe
Merge branch 'master' into ella/run-all-CATS
erohmensing Mar 24, 2023
a9003ed
Update issue, don't run for alpha connectors
erohmensing Mar 24, 2023
a487f30
Automated Change
erohmensing Mar 24, 2023
1827e5d
Merge branch 'master' into ella/run-all-CATS
erohmensing Mar 26, 2023
027fd23
Add bypass for extra fields test
erohmensing Mar 27, 2023
18b9c10
Add bypass for extra fields test
erohmensing Mar 27, 2023
20f0c96
Rename run_tests script
erohmensing Mar 27, 2023
022ce66
Rename module
erohmensing Mar 27, 2023
7876cdf
Update args usage, small changes
erohmensing Mar 27, 2023
c603e9f
Refactor create_issues.py
erohmensing Mar 28, 2023
73cc08e
Clean up run_tests.py
erohmensing Mar 28, 2023
b8364d1
Sort out arg parsers
erohmensing Mar 28, 2023
99d1100
Pull out get_valid_definitions_from_args
erohmensing Mar 28, 2023
4265ff8
Merge branch 'master' into ella/run-all-CATS
erohmensing Mar 28, 2023
fc3fa2b
Import definitions module instead of methods
erohmensing Mar 28, 2023
42da5f1
Use config files to provide constants for each migration
erohmensing Mar 28, 2023
d0896c5
Handle FileNotFoundError in create_issues.py, improve logging
erohmensing Mar 28, 2023
b1d91fe
Rename to migrations, reference name of folder via utils
erohmensing Mar 29, 2023
1a6157d
Update readmes for migration modules, add script for getting outputs
erohmensing Mar 29, 2023
5858e4d
Use tmp dir, correct path for issue reference
erohmensing Mar 29, 2023
be8b8e2
Fix bash script
erohmensing Mar 29, 2023
ba0b757
Fix create command, pull out test results insertion
erohmensing Mar 29, 2023
94bbf12
Update call to update_configuration
erohmensing Mar 30, 2023
46b9c5e
add precommit to requirements
erohmensing Mar 30, 2023
66446be
Reorder README
erohmensing Mar 30, 2023
887f2dc
README cleanup for test and create issues
erohmensing Mar 30, 2023
6bfeb69
README cleanup for create_prs and config_migration
erohmensing Mar 30, 2023
7fe82b0
More readmes! Readmes galore
erohmensing Mar 30, 2023
d05404e
allow_beta
erohmensing Mar 30, 2023
7d9fb14
Restore hacky changes to dockerfile and acceptance-test-docker
erohmensing Mar 30, 2023
3aa0d33
Handle 'other' release stages
erohmensing Mar 30, 2023
511476b
Update readme
erohmensing Mar 30, 2023
eefc8d2
Remove TODO, add comments to shell script
erohmensing Mar 30, 2023
330b59c
Merge branch 'master' into ella/run-all-CATS
erohmensing Mar 30, 2023
9e7d6ce
format according to gradle
erohmensing Mar 31, 2023
a328ec1
fix merge conflict
erohmensing Apr 18, 2023
dc8d20a
format
erohmensing Apr 18, 2023
5ba051c
Merge branch 'master' into ella/run-all-CATS
erohmensing Apr 18, 2023
7c88cb1
Fix formatting
erohmensing Apr 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original file: airbyte-integrations/bases/connector-acceptance-test/tools/strictness_level_migration/README.md

(diff was too large for it to clock a move vs a delete and add)

Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
# Tooling for automated migration of `acceptance-test-config.yml` files

This directory contains scripts that can help us manage the migration of connectors' `acceptance-test-config.yml` files.

## Setup
Before running these scripts you need to set up a local virtual environment in the **current directory**:
```bash
python -m venv .venv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small idea / nit here.

What are your thoughts on using poetry here over venv + pip?

It would allow us to reduce the number of steps to install and at the same time remove the whole "whoops I forgot to activate my venv" problem

poetry install
poetry run run_tests.py

This is a tool weve been using with the Dagger and Metadata work and so far its been working well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no preference either way since this is "Connector ops" code - happy to switch over! Especially if in the future people won't be doing much running of the scripts of their own (Ideally there's only one command to perform the testing, migrating failed configs, opening issues, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be too big of a lift. But very optional, up to you in terms of how much other work you have on your plate :)

source .venv/bin/activate
pip install -r requirements.txt
brew install gh
```

Then create a module to contain the information for your migration/issues etc.:
```
mkdir migrations/<migration_name>
touch migrations/<migraton_name>/__init__.py
touch migrations/<migration_name>/config.py
```

Copy a config.py file from another migration and fill in the `MODULE_NAME` variable. The other variables
can be filled in when you use certain scripts.
```python
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

from typing import Optional, List

# SET THESE BEFORE USING THE SCRIPT
MODULE_NAME: str = "<migration_name>"
# GITHUB_PROJECT_NAME: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these commented out? but not module name?

Copy link
Contributor Author

@erohmensing erohmensing Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MODULE_NAME is required for all scripts, and should match the name of the module you just made (this could be made clearer), so I thought it would make sense to set it while you're doing the project setup.

The others are only used for certain scripts, e.g. you don't need to define them to run run_tests.py or config_migration.py, so the instruction to define them lives under those scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# COMMON_ISSUE_LABELS: List[str] = ["area/connectors", "team/connectors-python", "type/enhancement", "column-selection-sources"]
# ISSUE_TITLE: str = "Add undeclared columns to spec"
```

## Scripts

The scripts perform operations on a set of connectors.

### `run_tests.py`: Run CAT tests and save results by exit code

#### What it does

The script will run CAT on all connectors and report the results to `migrations/<migration_name>/output` and classify the
results by exit code.

TODO: Replace this process with Dagger

#### Before running

1. The tests will run on the `latest` version of CAT by default. To run the `dev` version of CAT, and select a specific
test, commit the hacky changes in [this commit](https://github.com/airbytehq/airbyte/pull/24377/commits/7d9fb1414911a512cd5d5ffafe2a384e8004fb1e).
erohmensing marked this conversation as resolved.
Show resolved Hide resolved

2. Give Docker a _lot_ of space to build all the connector images!

3. Make sure you have the secrets downloaded from GSM for all of the connectors you want to run tests on. Please keep
in mind that secrets need to be re-uploaded for connectors with single-use Oauth tokens.

#### How to run

Typical usage:
```
python run_tests.py
```

Full options:
```
usage: run_tests.py [-h] --connectors [CONNECTORS ...] [--allow_alpha | --no-allow_alpha] [--allow_beta | --no-allow_beta] [--max_concurrency MAX_CONCURRENCY]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is there any benefit to wrapping this in the click library that augustin used in QA checks?


Run connector acceptance tests for a list of connectors.

options:
-h, --help show this help message and exit
--connectors [CONNECTORS ...]
A list of connectors (separated by spaces) to run a script on. (default: all connectors)
--allow_alpha, --no-allow_alpha
Whether to apply the change to alpha connectors, if they are included in the list of connectors. (default: False)
--allow_beta, --no-allow_beta
Whether to apply the change to bets connectors, if they are included in the list of connectors. (default: False)
--max_concurrency MAX_CONCURRENCY
The maximum number of acceptance tests that should happen at once.
```

### `create_issues.py`: Create issues in bulk

#### What it does:
For each connector:
1. Generate an issue content (title, body, labels, project), using an `issue.md.j2` template
2. Find an already existing issue with the same title, if one already exists
3. Create the issue and return its url if it does not exist.

Issues get created with the title according to `ISSUE_TITLE`. Labels are added according to `COMMON_ISSUE_LABELS`. Issues are added to the `GITHUB_PROJECT_NAME` project, if one is provided.

#### Before running

1. Update your config file to define the following variables:

```python
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

from typing import Optional, List

# SET THESE BEFORE USING THE SCRIPT
MODULE_NAME: str = "<migration_name>"
GITHUB_PROJECT_NAME: Optional[str] = "<project_name>"
erohmensing marked this conversation as resolved.
Show resolved Hide resolved
COMMON_ISSUE_LABELS: List[str] = ["<label_1>", "<label_2>", "..."]
ISSUE_TITLE: str = "<issue_title>"
```
Note that `ISSUE_TITLE` will be prepended with `Source <source name>:` in the actual created issue.

2. Create a template for your issue:

```bash
touch migrations/<migration_name>/issue.md.j2
```

If you need to fill more variables than are currently defined in the call to `template.render()`
in `create_issues.py`, edit the script to allow filling of that variable and define how it should be
filled. Please keep in mind the other migrations when you do this.

3. Update the following line in the script so that it points to the config file from your migration:

```python
## Update this line before running the script
from migrations.<migration_name> import config
```

#### How to run:

Typical usage (dry run):
```
python create_issues.py --connectors <connectors>
```

Typical usage (real execution):
```
python create_issues.py --connectors <connectors> --no-dry
```

Full options:
```
usage: create_issues.py [-h] [-d | --dry | --no-dry] --connectors [CONNECTORS ...] [--allow_beta | --no-allow_beta] [--allow_alpha | --no-allow_alpha]

Create issues for a list of connectors from a template.

options:
-h, --help show this help message and exit
-d, --dry, --no-dry Whether the action performed is a dry run. In the case of a dry run, no git actions will be pushed to the remote. (default: True)
--connectors [CONNECTORS ...]
A list of connectors (separated by spaces) to run a script on. (default: all connectors)
--allow_beta, --no-allow_beta
Whether to apply the change to bets connectors, if they are included in the list of connectors. (default: False)
--allow_alpha, --no-allow_alpha
Whether to apply the change to alpha connectors, if they are included in the list of connectors. (default: False)
```


### `config_migration.py`: Perform migrations on `acceptance-test-config.yml` files

#### What it does:
For each connector:
1. Load the connector's `acceptance-test-config.yml`
2. Migrate the connector to the new format if this option was chosen
3. Apply the given migration to the `acceptance-test-config.yml` file

Note that all changes happen on the working branch.

#### Before running

1. Create a method in `config_migration.py` to perform your migration on a given config. For this,
You can take inspiration from the existing `set_high_strictness_level` and `set_ignore_extra_columns`
migration methods.

2. Update the following line to point to your new migration:
```python
# Update this before running the script
MIGRATION_TO_RUN = <migration_method>
```

#### How to run:

Typical usage:
```
python config_migration.py --connectors <connectors>
```

Full options:
```
usage: config_migration.py [-h] --connectors [CONNECTORS ...] [--allow_alpha | --no-allow_alpha] [--allow_beta | --no-allow_beta] [--migrate_from_legacy | --no-migrate_from_legacy]

Migrate acceptance-test-config.yml files for a list of connectors.

options:
-h, --help show this help message and exit
--connectors [CONNECTORS ...]
A list of connectors (separated by spaces) to run a script on. (default: all connectors)
--allow_alpha, --no-allow_alpha
Whether to apply the change to alpha connectors, if they are included in the list of connectors. (default: False)
--allow_beta, --no-allow_beta
Whether to apply the change to bets connectors, if they are included in the list of connectors. (default: False)
--migrate_from_legacy, --no-migrate_from_legacy
Whether to migrate config files from the legacy format before applying the migration. (default: False)
```


### `create_prs.py`: Create a PR per connector that performs a config migration and pushes it

## Create migration PRs for GA connectors (`create_prs.py`)

#### What it does:
For each connector:
1. Create a branch and check it out
2. Locally apply the migration to `acceptance_test_config.yml` by calling `config_migration.py`
3. Commit and push the changes on this branch
4. Open a PR for this branch
5. Run a connector acceptance test on this branch by posting a `/test` comment on the PR

An example of the PR it creates can be found [here](https://github.com/airbytehq/airbyte/pull/19136).

PRs get created with the title according to `ISSUE_TITLE`. Labels are added according to `COMMON_ISSUE_LABELS`. PRs are added to the `GITHUB_PROJECT_NAME` project, if one is provided.

#### Before running

1. Update your config file to define the following variables:

```python
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

from typing import Optional, List

# SET THESE BEFORE USING THE SCRIPT
MODULE_NAME: str = "<migration_name>"
GITHUB_PROJECT_NAME: Optional[str] = "<project_name>"
COMMON_ISSUE_LABELS: List[str] = ["<label_1>", "<label_2>", "..."]
ISSUE_TITLE: str = "<issue_title>"
```
Note that `ISSUE_TITLE` will be prepended with `Source <source name>:` in the actual created issue.

2. Create a template for your PR description:

```bash
touch migrations/<migration_name>/pr.md.j2
```

If you need to fill more variables than are currently defined in the call to `template.render()`
in `create_pr.py`, edit the script to allow filling of that variable and define how it should be
filled. Please keep in mind the other migrations when you do this.

3. Update the following line in the `create_prs.py` so that it points to the config file from your migration:

```python
## Update this line before running the script
from migrations.<migration_name> import config
```

4. Ensure that your current git envronment is clean by (perhaps temorarily) committing changes.


#### How to run:

Typical usage (dry run):
```
python create_prs.py --connectors <connectors>
```

Typical usage (real execution):
```
python create_prs.py --connectors <connectors> --no-dry
```

Full options:
```
usage: create_prs.py [-h] [-d | --dry | --no-dry] --connectors [CONNECTORS ...] [--allow_alpha | --no-allow_alpha] [--allow_beta | --no-allow_beta]

Create PRs for a list of connectors from a template.

options:
-h, --help show this help message and exit
-d, --dry, --no-dry Whether the action performed is a dry run. In the case of a dry run, no git actions will be pushed to the remote. (default: True)
--connectors [CONNECTORS ...]
A list of connectors (separated by spaces) to run a script on. (default: all connectors)
--allow_alpha, --no-allow_alpha
Whether to apply the change to alpha connectors, if they are included in the list of connectors. (default: False)
--allow_beta, --no-allow_beta
Whether to apply the change to bets connectors, if they are included in the list of connectors. (default: False)
```

## Existing migrations
* `strictness_level_migration`: Migrates a connector from the old format to the new format, and adds enforcement of high strictness level.
* `fail_on_extra_columns`: Adds `fail_on_extra_columns: false` to connectors which fail the `Additional properties are not allowed` extra column validation.
Supports adding this parameter to configs in the old and new config format.
Copy link
Contributor Author

@erohmensing erohmensing Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original file: airbyte-integrations/bases/connector-acceptance-test/tools/strictness_level_migration/config_migration.py

(diff was too large for it to clock a move vs a delete and add)

Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

import argparse
import logging
from pathlib import Path
from typing import Callable

import definitions
import utils
from connector_acceptance_test.config import Config
from ruamel.yaml import YAML

yaml = YAML()
yaml.preserve_quotes = True
yaml.width = 150

parser = argparse.ArgumentParser(description="Migrate acceptance-test-config.yml files for a list of connectors.")
utils.add_connectors_param(parser)
utils.add_allow_alpha_param(parser)
utils.add_allow_beta_param(parser)
parser.add_argument(
"--migrate_from_legacy",
action=argparse.BooleanOptionalAction,
default=False,
help="Whether to migrate config files from the legacy format before applying the migration.",
)


def load_config(config_path: Path) -> Config:
with open(config_path, "r") as file:
config = yaml.load(file)
return config


def migrate_to_new_config_format(config: Config):
if Config.is_legacy(config):
return Config.migrate_legacy_to_current_config(config)
else:
logging.warning("The configuration is not in a legacy format.")
return config


def set_high_test_strictness_level(config):
if Config.is_legacy(config):
raise Exception("You can't set a strictness level on a legacy config. Please use the `--migrate_from_legacy` flag.")
config["test_strictness_level"] = "high"
for basic_read_test in config["acceptance_tests"].get("basic_read", {"tests": []})["tests"]:
basic_read_test.pop("configured_catalog_path", None)
return config


def set_ignore_extra_columns(config):
if Config.is_legacy(config):
for basic_read_test in config["tests"].get("basic_read"):
basic_read_test["fail_on_extra_columns"] = False
else:
for basic_read_test in config["acceptance_tests"].get("basic_read", {"tests": []})["tests"]:
basic_read_test["fail_on_extra_columns"] = False
return config


def write_new_config(new_config, output_path):
with open(output_path, "w") as output_file:
yaml.dump(new_config, output_file)
logging.info("Saved the configuration in its new format")


def update_configuration(config_path, migration: Callable, migrate_from_legacy: bool):
config_to_migrate = load_config(config_path)
if migrate_from_legacy:
config_to_migrate = migrate_to_new_config_format(config_to_migrate)
new_config = migration(config_to_migrate)
write_new_config(new_config, config_path)
logging.info(f"The configuration was successfully updated: {config_path}")
return config_path


if __name__ == "__main__":
args = parser.parse_args()

# Update this before running the script
MIGRATION_TO_RUN = set_high_test_strictness_level
Comment on lines +83 to +84
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, this is something I'd like to both define and set in the config. I.e. in the config, we have a method something like

def run_migration(config): 
   .... 

which we can use via as config.run_migration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we turn this into an cli argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so - sort of a combo of "We should be able to import the module via a cli var" and "we should be able to define the migration in the module in a standardized way" would get us there.


for definition in utils.get_valid_definitions_from_args(args):
config_path = utils.acceptance_test_config_path(definitions.get_airbyte_connector_name_from_definition(definition))
update_configuration(config_path, MIGRATION_TO_RUN, args.migrate_from_legacy)
Loading