-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: GCS To BigQuery source_object #16160
Fix: GCS To BigQuery source_object #16160
Conversation
Fix GCS To BigQuery source_object to accept both str and list
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
Could you please add a test for that @tegardp |
@@ -286,7 +286,7 @@ def execute(self, context): | |||
else: | |||
schema_fields = self.schema_fields | |||
|
|||
source_uris = [f'gs://{self.bucket}/{source_object}' for source_object in self.source_objects] | |||
source_uris = [f'gs://{self.bucket}/{source_object}' for source_object in self.source_objects] if type(self.source_objects) is list else [f'gs://{self.bucket}/{self.source_objects}'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to convert the string to list and then the code will work just fine without a special case for string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the input is a list, wouldn't it be converted to a 2-dimensional list?
self.source_objects = "file1.csv"
source_object = [self.source_objects]
output: ["file1.csv"]
self.source_objects = ["file1.csv", "file2.csv", "file3.csv"]
source_object = [self.source_objects]
output: [["file1.csv", "file2.csv", "file3.csv"]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I was thinking of converting the type to List (if needed) so the logic part of the code won't need any modification as it works fine with List type. something like:
if isinstance(obj, str):
obj=[obj]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, I was thinking about that solution too. So, it doesn't have to be one line code?
converting source_objects to list instead of modifying the logic part
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this white space
Also please add a test to https://github.com/apache/airflow/blob/master/tests/providers/google/cloud/transfers/test_gcs_to_bigquery.py I think just a test that verify once with list and once with string is enough |
@eladkal is correct - just add a test where the |
Looks good :). Running the workflow now ! |
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease. |
Awesome work, congrats on your first merged pull request! |
hey, @tegardp I noticed that this change could be missing an edge case when retrieving a list of files from Example:
since the rendering to native object only happens on |
Fixes #16008 GCS To BigQuery Operator's parameter
source_object
to accept both str and list^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.