Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4899] Fix get_dataset_list from bigquery hook to return next… #5565

Closed
wants to merge 1 commit into from

Conversation

benjamingrenier
Copy link
Contributor

Next page token used and max results parameter too. The method get_datasets_list can return more than 50 datasets.

@eladkal
Copy link
Contributor

eladkal commented Jul 11, 2019

Maybe out of scope but it still won't necessarily give all datasets.
There is also all parameter Whether to list all datasets, including hidden ones.

airflow/contrib/hooks/bigquery_hook.py Outdated Show resolved Hide resolved
@benjamingrenier
Copy link
Contributor Author

benjamingrenier commented Jul 11, 2019

Thanks for your comment. I added the parameter all, it can be helpful.

airflow/contrib/hooks/bigquery_hook.py Outdated Show resolved Hide resolved
airflow/contrib/hooks/bigquery_hook.py Outdated Show resolved Hide resolved
@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Jul 22, 2019
@mik-laj
Copy link
Member

mik-laj commented Aug 6, 2019

Travis is sad

======================================================================
13) FAIL: test_get_datasets_list (tests.contrib.hooks.test_bigquery_hook.TestDatasetsOperations)
----------------------------------------------------------------------
   Traceback (most recent call last):
    tests/contrib/hooks/test_bigquery_hook.py line 704 in test_get_datasets_list
      self.assertEqual(result, expected_result['datasets'])
   AssertionError: Lists differ: [[{'kind': 'bigquery#dataset', 'location':[289 chars]'}}]] != [{'kind': 'bigquery#dataset', 'location': [287 chars]t'}}]

   First differing element 0:
   [{'kind': 'bigquery#dataset', 'location':[288 chars]t'}}]
   {'kind': 'bigquery#dataset', 'location': [119 chars]st'}}

   Second list contains 1 additional elements.
   First extra element 1:
   {'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_1_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_1_test'}}

   Diff is 910 characters long. Set self.maxDiff to None to see it.
   -------------------- >> begin captured logging << --------------------
   airflow.contrib.hooks.bigquery_hook.BigQueryBaseCursor: INFO: Datasets List: [[{'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_2_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_2_test'}}, {'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_1_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_1_test'}}]]
   --------------------- >> end captured logging << ---------------------

@mik-laj
Copy link
Member

mik-laj commented Aug 12, 2019

Are you planning to continue working on this change? This week, I would like to deal with PR reviews related to GCP. I would be happy if you would respond to all comments

@benjamingrenier
Copy link
Contributor Author

Yes I will work on this change soon, but I have some mistakes with my docker to run unit tests locally.
I use the same plan as describe in Contributing.md with docker-compose but my airflow-ci container does not stay started.

@benjamingrenier
Copy link
Contributor Author

Travis is sad

======================================================================
13) FAIL: test_get_datasets_list (tests.contrib.hooks.test_bigquery_hook.TestDatasetsOperations)
----------------------------------------------------------------------
   Traceback (most recent call last):
    tests/contrib/hooks/test_bigquery_hook.py line 704 in test_get_datasets_list
      self.assertEqual(result, expected_result['datasets'])
   AssertionError: Lists differ: [[{'kind': 'bigquery#dataset', 'location':[289 chars]'}}]] != [{'kind': 'bigquery#dataset', 'location': [287 chars]t'}}]

   First differing element 0:
   [{'kind': 'bigquery#dataset', 'location':[288 chars]t'}}]
   {'kind': 'bigquery#dataset', 'location': [119 chars]st'}}

   Second list contains 1 additional elements.
   First extra element 1:
   {'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_1_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_1_test'}}

   Diff is 910 characters long. Set self.maxDiff to None to see it.
   -------------------- >> begin captured logging << --------------------
   airflow.contrib.hooks.bigquery_hook.BigQueryBaseCursor: INFO: Datasets List: [[{'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_2_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_2_test'}}, {'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_1_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_1_test'}}]]
   --------------------- >> end captured logging << ---------------------

I ran unit tests locally and all succeed for tests.contrib.hooks.test_bigquery_hook.TestDatasetsOperations.
I do not understand why Travis fails. Have you got an idea ?

@benjamingrenier
Copy link
Contributor Author

Travis is sad

======================================================================
13) FAIL: test_get_datasets_list (tests.contrib.hooks.test_bigquery_hook.TestDatasetsOperations)
----------------------------------------------------------------------
   Traceback (most recent call last):
    tests/contrib/hooks/test_bigquery_hook.py line 704 in test_get_datasets_list
      self.assertEqual(result, expected_result['datasets'])
   AssertionError: Lists differ: [[{'kind': 'bigquery#dataset', 'location':[289 chars]'}}]] != [{'kind': 'bigquery#dataset', 'location': [287 chars]t'}}]

   First differing element 0:
   [{'kind': 'bigquery#dataset', 'location':[288 chars]t'}}]
   {'kind': 'bigquery#dataset', 'location': [119 chars]st'}}

   Second list contains 1 additional elements.
   First extra element 1:
   {'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_1_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_1_test'}}

   Diff is 910 characters long. Set self.maxDiff to None to see it.
   -------------------- >> begin captured logging << --------------------
   airflow.contrib.hooks.bigquery_hook.BigQueryBaseCursor: INFO: Datasets List: [[{'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_2_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_2_test'}}, {'kind': 'bigquery#dataset', 'location': 'US', 'id': 'your-project:dataset_1_test', 'datasetReference': {'projectId': 'your-project', 'datasetId': 'dataset_1_test'}}]]
   --------------------- >> end captured logging << ---------------------

I ran unit tests locally and all succeed for tests.contrib.hooks.test_bigquery_hook.TestDatasetsOperations.
I do not understand why Travis fails. Have you got an idea ?

Sorry, I found my mistake. I append an array in array. I fixed it.

next_page_token = datasets_list_resp.get('nextPageToken', None)

if next_page_token:
optional_params['PageToken'] = next_page_token
Copy link
Member

@mik-laj mik-laj Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using the list_next method here? This will simplify the code.
Example:
https://github.com/apache/airflow/blob/master/airflow/contrib/hooks/gcp_transfer_hook.py#L201-L206

@mik-laj
Copy link
Member

mik-laj commented Aug 14, 2019

@benjamingrenier flake8 is sad ;_;

@benjamingrenier benjamingrenier force-pushed the AIRFLOW-4899 branch 3 times, most recently from ea7560e to 06dae40 Compare August 20, 2019 12:30
@mik-laj
Copy link
Member

mik-laj commented Aug 28, 2019

Hi. I have prepared a commit for you.
This deletes excessive logs, but it also add missing tests.
Can you take a look at this? To add this commit to the your repository, run the following command

curl https://termbin.com/fzyt | git am

@benjamingrenier
Copy link
Contributor Author

Hi. I have prepared a commit for you.
This deletes excessive logs, but it also add missing tests.
Can you take a look at this? To add this commit to the your repository, run the following command

curl https://termbin.com/fzyt | git am

Thanks it's done.

@mik-laj
Copy link
Member

mik-laj commented Sep 5, 2019

Hi.

I made a change in the base class - GoogleCloudBaseHook. Your PR may need to be changed. Could you do rebase?

Cheers

Reference:
#5907

@benjamingrenier
Copy link
Contributor Author

Hi.

I made a change in the base class - GoogleCloudBaseHook. Your PR may need to be changed. Could you do rebase?

Cheers

Reference:
#5907

Ok I will do that.

@benjamingrenier
Copy link
Contributor Author

Hi.
I made a change in the base class - GoogleCloudBaseHook. Your PR may need to be changed. Could you do rebase?
Cheers
Reference:
#5907

Ok I will do that.

What rebase do you excepted ? On airflow/master ?

@mik-laj
Copy link
Member

mik-laj commented Sep 5, 2019

Yes. Please do rebase on latest apache/master.

@benjamingrenier
Copy link
Contributor Author

Yes. Please do rebase on latest apache/master.

Thanks, but how do you proceed to make that properly ?

@potiuk
Copy link
Member

potiuk commented Sep 6, 2019

See CONTRIBUTING guide: https://github.com/apache/airflow/blob/master/CONTRIBUTING.md#pull-request-guidelines - it has the link explaining how to rebase.

@potiuk
Copy link
Member

potiuk commented Sep 6, 2019

Sam here @benjamingrenier -> we want to play nice and move the operators only after we merge outstanding requests so it would be great if you can rebase this PR on top of apache/master.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will approve after rebase.

@benjamingrenier
Copy link
Contributor Author

benjamingrenier commented Sep 7, 2019 via email

@potiuk
Copy link
Member

potiuk commented Sep 21, 2019

Same here @benjamingrenier :(. Rebase is needed.

@benjamingrenier
Copy link
Contributor Author

Will approve after rebase.

Rebase done, but I don't understand why there are errors ?

@stale
Copy link

stale bot commented Nov 20, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 20, 2019
@stale stale bot closed this Nov 27, 2019
@OmerJog
Copy link
Contributor

OmerJog commented Nov 27, 2019

@potiuk wasn't this mergable? the failed test was just a timeout.
What was missing here?

@OmerJog
Copy link
Contributor

OmerJog commented Dec 10, 2019

@benjamingrenier did you stop working on this PR ? seems like it was marked stale by mistake

@benjamingrenier
Copy link
Contributor Author

benjamingrenier commented Dec 10, 2019 via email

@benjamingrenier
Copy link
Contributor Author

Done : #6780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants