-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating Google AutoML example_dags to sys tests #32368
Merged
Merged
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,17 +35,14 @@ | |
AutoMLTrainModelOperator, | ||
) | ||
|
||
GCP_PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "your-project-id") | ||
GCP_AUTOML_LOCATION = os.environ.get("GCP_AUTOML_LOCATION", "us-central1") | ||
GCP_AUTOML_TEXT_CLS_BUCKET = os.environ.get("GCP_AUTOML_TEXT_CLS_BUCKET", "gs://INVALID BUCKET NAME") | ||
|
||
# Example values | ||
DATASET_ID = "" | ||
ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID") | ||
DAG_ID = "example_automl_classification" | ||
GCP_PROJECT_ID = os.environ.get("SYSTEM_TESTS_GCP_PROJECT", "default") | ||
GCP_AUTOML_LOCATION = "us-central1" | ||
|
||
# Example model | ||
MODEL = { | ||
"display_name": "auto_model_1", | ||
"dataset_id": DATASET_ID, | ||
"text_classification_model_metadata": {}, | ||
} | ||
|
||
|
@@ -55,7 +52,10 @@ | |
"text_classification_dataset_metadata": {"classification_type": "MULTICLASS"}, | ||
} | ||
|
||
IMPORT_INPUT_CONFIG = {"gcs_source": {"input_uris": [GCP_AUTOML_TEXT_CLS_BUCKET]}} | ||
|
||
DATA_SAMPLE_GCS_BUCKET_NAME = f"bucket_{DAG_ID}_{ENV_ID}" | ||
AUTOML_DATASET_BUCKET = f"gs://{DATA_SAMPLE_GCS_BUCKET_NAME}/automl-text/dataset.csv" | ||
IMPORT_INPUT_CONFIG = {"gcs_source": {"input_uris": [AUTOML_DATASET_BUCKET]}} | ||
|
||
extract_object_id = CloudAutoMLHook.extract_object_id | ||
|
||
|
@@ -65,24 +65,23 @@ | |
start_date=datetime(2021, 1, 1), | ||
catchup=False, | ||
tags=["example"], | ||
) as example_dag: | ||
) as dag: | ||
create_dataset_task = AutoMLCreateDatasetOperator( | ||
task_id="create_dataset_task", dataset=DATASET, location=GCP_AUTOML_LOCATION | ||
) | ||
|
||
dataset_id = cast(str, XComArg(create_dataset_task, key="dataset_id")) | ||
MODEL["dataset_id"] = dataset_id | ||
|
||
import_dataset_task = AutoMLImportDataOperator( | ||
task_id="import_dataset_task", | ||
dataset_id=dataset_id, | ||
location=GCP_AUTOML_LOCATION, | ||
input_config=IMPORT_INPUT_CONFIG, | ||
) | ||
|
||
MODEL["dataset_id"] = dataset_id | ||
|
||
create_model = AutoMLTrainModelOperator(task_id="create_model", model=MODEL, location=GCP_AUTOML_LOCATION) | ||
|
||
model_id = cast(str, XComArg(create_model, key="model_id")) | ||
|
||
delete_model_task = AutoMLDeleteModelOperator( | ||
|
@@ -99,10 +98,17 @@ | |
project_id=GCP_PROJECT_ID, | ||
) | ||
|
||
# TEST BODY | ||
import_dataset_task >> create_model | ||
# TEST TEARDOWN | ||
delete_model_task >> delete_datasets_task | ||
|
||
# Task dependencies created via `XComArgs`: | ||
# create_dataset_task >> import_dataset_task | ||
# create_dataset_task >> create_model | ||
# create_dataset_task >> delete_datasets_task | ||
|
||
Comment on lines
106
to
+107
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like we need a watcher wdyt? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, we can have a watcher too for each of these tests |
||
from tests.system.utils import get_test_run # noqa: E402 | ||
|
||
# Needed to run the example DAG with pytest (see: tests/system/README.md#run_via_pytest) | ||
test_run = get_test_run(dag) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we hardcode these values? Shouldn't it be fetched from env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the convention of the other system tests written, for example in #30003 and also in the automl folder. Does it make sense to auto hard code them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @VladaZakharova might help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main principle is to keep these tests self-sufficient. But they need an external resource and infra on which those resources run. We can control those resources, like creating a bucket on the fly, but if a user has a certain infra, IMO it would be best to give flexibility to the user to set up the test given the infra rather than looking at the test and setting up the infra. That would end up making these tests less useful wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice point. Any suggestions that i can follow? It will help me tweak the DAG according to your use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, all the tests look good, except for the first point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Do you mean to say that we should have it as:
And so on?