Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5318] Option to specify location of the new BQ dataset #5923

Conversation

mohannadbanayosi
Copy link
Contributor

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

The extra str argument gives the option to easily specify the location where the user wants the new BQ dataset to reside. The location will be US if nothing is provided.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@mohannadbanayosi mohannadbanayosi changed the title AIRFLOW-5318 Option to specify location of the new BQ dataset [AIRFLOW-5318] Option to specify location of the new BQ dataset Aug 27, 2019
@mohannadbanayosi mohannadbanayosi force-pushed the AIRFLOW-5318_Specify_new_dataset_location_option branch from 304d355 to 1e50b9a Compare August 27, 2019 12:10
@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Aug 27, 2019
@ryanyuan
Copy link
Contributor

Can the location just be provided in dataset_reference? It seems redundant to make it a seperate parameter here.

@mohannadbanayosi
Copy link
Contributor Author

Yes the location along with other options like the description and tags can be set using the dataset_reference.

BigQueryCreateEmptyDatasetOperator(
        task_id='create_dataset',
        project_id='project-id',
        dataset_id='new_dataset_id',
        dataset_reference={'location': 'EU'},
        dag=dag
)

I agree it is not a functionality improvements but rather just to make it easier. Nonetheless, the location is slightly more important (in my opinion) because the default if not provided is US and can not be set on project level. So it has to always be explicitly added.

BigQueryCreateEmptyDatasetOperator(
        task_id='create_dataset',
        project_id='project-id',
        dataset_id='new_dataset_id',
        location='EU',
        dag=dag
)

@ryanyuan

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some effort to synchronize and standardize all the GCP/Google/Alphabet operators and moving them to core, so we might want to improve that part still but it's a good thing to change now so that we can do any refactor. It looks good :).

@potiuk potiuk merged commit 6b82b9e into apache:master Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants