Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query-backfill): adding a more flexible approach to overriding scheduling attributes #5540

Merged
merged 7 commits into from May 15, 2024

Conversation

braunreyes
Copy link
Contributor

@braunreyes braunreyes commented May 9, 2024

This PR further expands on allowing for overrides to scheduling attributes when running the bqetl query backfill command. bqetl gives developers the ability to set date_partition_parameter to null and manage partitioning manually via the destination table. This make is impossible to use the backfill dag bqetl_backfill because if this parameter is null, it will blow away the whole table.

For this query: https://github.com/mozilla/private-bigquery-etl/blob/main/sql/moz-fx-data-shared-prod/ads_derived/tiles_addressable_inventory_hourly_v1/metadata.yaml, we need to be able to re-process a whole days worth of data and override the partition or add a new one.

This change would allow us to run a backfill on a query like this with attributes that activate proper daily backfill processing, yet still have the custom logic for the ongoing task runs.

This work will eventually close: #5364

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
  • [N/A] If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
    in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
  • [N/A] If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
  • [N/A] When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.

For modifications to schemas in restricted namespaces (see CODEOWNERS):

┆Issue is synchronized with this Jira Task

tests/cli/test_cli_query.py Outdated Show resolved Hide resolved
bigquery_etl/cli/query.py Outdated Show resolved Hide resolved
bigquery_etl/cli/query.py Outdated Show resolved Hide resolved
bigquery_etl/cli/query.py Outdated Show resolved Hide resolved
@braunreyes braunreyes merged commit bd5ffe4 into main May 15, 2024
22 checks passed
@braunreyes braunreyes deleted the feature/AD-243-backfill-enhancement-duex branch May 15, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding the ability to override scheduling parameters on backfill
3 participants