Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow partition-wise copy from bq to bq tables. #26884

Closed
1 of 2 tasks
bhankit1410 opened this issue Oct 5, 2022 · 10 comments
Closed
1 of 2 tasks

Allow partition-wise copy from bq to bq tables. #26884

bhankit1410 opened this issue Oct 5, 2022 · 10 comments
Assignees
Labels
area:providers good first issue kind:feature Feature Requests provider:google Google (including GCP) related issues

Comments

@bhankit1410
Copy link

Description

As of now, there is no support to copy a partition from a source table to destination table's partition in the existing code

bq cp cli gives an option to do this with a decorator ( $ ).

Use case/motivation

we have many use cases everyday where we keep different tables across projects in sync by materialising new partitions with a frequency. This feature could be great. other options for now would be using DBT, but to use it just for this will be an added task.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@bhankit1410 bhankit1410 added the kind:feature Feature Requests label Oct 5, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 5, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@o-nikolas
Copy link
Contributor

@eladkal, @kaxil
Can you add the area:providers/provider:Google label to this one?

Also, it seems like a somewhat straightforward update to the operator (though GCP is not my area of expertise) so perhaps Good First Issue as well?

@uranusjr uranusjr added provider:google Google (including GCP) related issues area:providers labels Oct 6, 2022
@kaxil
Copy link
Member

kaxil commented Oct 6, 2022

@o-nikolas done

@eladkal
Copy link
Contributor

eladkal commented Oct 6, 2022

@bhankit1410 We only expose capabilities available in job AP:
https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.copy
Can you reference to this ability in the API?

@bugraoz93
Copy link
Contributor

Hello, I think I can give the reference. I think @bhankit1410 is talking about the following operation. It copies individual partitions to another table or another table's partition.
https://cloud.google.com/bigquery/docs/managing-partitioned-tables##copying_partitions
This is the same job triggered by the following part of the BigQuery Hook which is used in the BigQueryToBigQuery Operator.
Hook:


Operator:

The only difference is sourceTables.tableId and destinationTable.tableId can contain partition information with a $ sign followed by PARTITION_IDENTIFIER for the same copy job.
https://cloud.google.com/bigquery/docs/reference/rest/v2/TableReference

I am willing to submit a PR! If anyone can assign me the issue, that would be great! Thanks in advance!

@eladkal
Copy link
Contributor

eladkal commented Nov 22, 2022

Assigned

@bugraoz93
Copy link
Contributor

Thank you very much @eladkal!

@bugraoz93
Copy link
Contributor

Hey @bhankit1410, I think this feature has already been supported.

I have tested on BigQueryToBigQueryOperator. I created a simple DAG. You can see the DAG as follows. I generated sample 1000 records with multiple types of column types such as integer, UUID and timestamp. I have inserted data into a table. I have tested with every three Write Dispositions. Every case I have tested has been working for me so far.

https://gist.github.com/bugraoz93/d3ee6d2d03d1881de4614d1e7c3b8234

import datetime

from airflow import DAG
from airflow.providers.google.cloud.transfers.bigquery_to_bigquery import BigQueryToBigQueryOperator

with DAG(
    dag_id="test_dag",
    max_active_runs=1,
    start_date=datetime.datetime(2022, 11, 24),
    schedule_interval="@once",
    catchup=False,
    concurrency=1,
) as dag:
    project = "test_project"
    dataset = "bugraoz93_test"
    source_table = "test_table$20221125"
    destination_table = "test_table_dest$20221124"

    copy_table_to_fact = BigQueryToBigQueryOperator(
        task_id="copy_test",
        gcp_conn_id="gcp_conn",
        source_project_dataset_tables="{project}.{dataset}.{table}".format(
            project=project, dataset=dataset, table=source_table),
        destination_project_dataset_table="{project}.{dataset}.{table}".format(
            project=project, dataset=dataset, table=destination_table
        ),
        write_disposition="WRITE_APPEND",
        create_disposition="CREATE_IF_NEEDED",
        dag=dag,
    )

I have also checked the code. I have tested the individual methods to ensure that they can process the $ sign within the table name without any exceptions. There is no part within the code that prevents this feature to work.

Could you please expand your case a little bit? Which Apache Airflow version are you using to achieve it? Which provider version are you using for google (apache-airflow-providers-google)?

@eladkal
Copy link
Contributor

eladkal commented Dec 30, 2022

Thank you @bugraoz93
I'm closing this issue as completed then

@eladkal eladkal closed this as completed Dec 30, 2022
@bhankit1410
Copy link
Author

Yes. I realised it works with a '$' decorator and went ahead. It is in production for 2 months now. My sheer stupidity to miss out on updating this thread and closing it. Sorry about that. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:feature Feature Requests provider:google Google (including GCP) related issues
Projects
None yet
Development

No branches or pull requests

6 participants