Skip to content

Comments

[AIRFLOW-7064] Add CloudFirestoreExportDatabaseOperator#7725

Merged
mik-laj merged 6 commits intoapache:masterfrom
PolideaInternal:firestore-gcloud
Mar 18, 2020
Merged

[AIRFLOW-7064] Add CloudFirestoreExportDatabaseOperator#7725
mik-laj merged 6 commits intoapache:masterfrom
PolideaInternal:firestore-gcloud

Conversation

@mik-laj
Copy link
Member

@mik-laj mik-laj commented Mar 14, 2020

It is WIP. I have to prepare documentation, but I would like to ask for opinions. gcloud is not a standard solution in operators. A more complex example that will use BigQuery would also be helpful.


Issue link: AIRFLOW-7064

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • [Xe] I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added area:docs provider:google Google (including GCP) related issues labels Mar 17, 2020
@mik-laj mik-laj force-pushed the firestore-gcloud branch 2 times, most recently from 17fa2e4 to 3582f81 Compare March 17, 2020 19:17
@mik-laj mik-laj changed the title [AIRFLOW-7064][WIP] Add CloudFirestoreExportDatabaseOperator [AIRFLOW-7064] Add CloudFirestoreExportDatabaseOperator Mar 18, 2020
@mik-laj mik-laj requested review from potiuk and turbaszek March 18, 2020 06:18
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comment copy&paste victims to fix 👍


def get_conn(self):
"""
Retrieves the connection to Cloud Build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Retrieves the connection to Cloud Build.
Retrieves the connection to Cloud Firestore.

"""
Retrieves the connection to Cloud Build.

:return: Google Cloud Build services object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:return: Google Cloud Build services object.
:return: Google Cloud Firestore services object.

# then it will get the message below:
# > Request contains an invalid argument.
# At the same time, the Non-Authorized Client has no problems.
non_authorized_conn = build("firestore", self.api_version, cache_discovery=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a ticket in google-api-python-client.
googleapis/google-api-python-client#845
I am not sure if this is a good place to report a bug, so I do not add a comment in the code. The error is in the service, not in the library, but I don't know a better place to report this error.

self, body: Dict, database_id: str = "(default)", project_id: Optional[str] = None
) -> None:
"""
Starts a build with the specified configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Starts a build with the specified configuration.
Starts export with the specified configuration.

See:
https://firebase.google.com/docs/firestore/reference/rest/v1beta1/projects.databases/exportDocuments
:type body: dict
:param project_id: Optional, Google Cloud Project project_id where the function belongs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param project_id: Optional, Google Cloud Project project_id where the function belongs.
:param project_id: Optional, Google Cloud Project project_id where the database belongs.

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some questions and additional copy-past fixes :)

.get(name=operation_name)
.execute(num_retries=self.num_retries)
)
if operation_response.get("done"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok, however, I wonder isn't if "done" in operation_response more pythonic?

Copy link
Member Author

@mik-laj mik-laj Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to check if the done value is "true"

GCP_SPANNER_KEY = 'gcp_spanner.json'
GCP_TASKS_KEY = 'gcp_tasks.json'
GMP_KEY = 'gmp.json'
G_FIREBASE_KEY = 'g_firebase.json'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
G_FIREBASE_KEY = 'g_firebase.json'
G_FIREBASE_KEY = 'gcp_firebase.json'

Let's keep consistency in case of GCP :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not GCP, but Firebase service. This database is available in the Pantheon, but is most often used by Firebase and was created as part of Firebase. It also has a common name with Firebase.

TEST_PROJECT_ID = "firestore--project-id"


class TestCloudBuildHookWithPassedProjectId(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TestCloudBuildHookWithPassedProjectId(unittest.TestCase):
class TestCloudFirestoreHookWithPassedProjectId(unittest.TestCase):

self.hook.export_documents(body=EXPORT_DOCUMENT_BODY, project_id=TEST_PROJECT_ID)


class TestGcpComputeHookWithDefaultProjectIdFromConnection(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TestGcpComputeHookWithDefaultProjectIdFromConnection(unittest.TestCase):
class TestCloudFirestoreHookWithDefaultProjectIdFromConnection(unittest.TestCase):

self.hook.export_documents(body=EXPORT_DOCUMENT_BODY)


class TestCloudBuildHookWithoutProjectId(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TestCloudBuildHookWithoutProjectId(unittest.TestCase):
class TestCloudFirestoreHookWithoutProjectId(unittest.TestCase):

@pytest.mark.system("google.firebase")
@pytest.mark.system("google.cloud")
@pytest.mark.credential_file(G_FIREBASE_KEY)
class CampaignManagerSystemTest(GoogleSystemTest):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class CampaignManagerSystemTest(GoogleSystemTest):
class CloudFirestoreSystemTest(GoogleSystemTest):

super().tearDown()

def clean_up(self):
self.execute_with_ctx(["gsutil", "rm", "-r", f"{EXPORT_DESTINATION_URL}"], G_FIREBASE_KEY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use delete_gcs_bucket method of GoogleSystemTest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to delete the entire bucket, but only its contents, because this test has complex requirements for the location of the bucket and permissions. Information is available in example_firestore.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants