[AIRFLOW-8664] Postgres to GCS operator - use named cursor#11793
[AIRFLOW-8664] Postgres to GCS operator - use named cursor#11793mik-laj merged 1 commit intoapache:masterfrom
Conversation
70fb9b3 to
7df41f4
Compare
|
I like this change very much, but I would like to test it before merging. Can you add system tests to make this easier? https://github.com/apache/airflow/blob/7df41f4f14a870ac3f62da5b639768b72ba9f583/tests/providers/google/cloud/transfers/test_postgres_to_gcs_system.py |
|
As @mik-laj mentioned - adding system tests/update the howtos would be really great. And I agree that generalizations might be added later. We even have a discussion about it and design doc about it : Discussion here: https://lists.apache.org/thread.html/rc888a329f1c49622c0123c2ddbcfcc107eead020b774f8a8fab6d7f1%40%3Cdev.airflow.apache.org%3E and the design doc : https://docs.google.com/document/d/1o7Ph7RRNqLWkTbe7xkWjb100eFaK1Apjv27LaqHgNkE/edit so you might simply join the discussion and maybe participate later in developing it. Since we are busy with 2.0 release now, this feature ("generic transfer operators') does not have the highest priority, but sounds like a solid candidate for 2.1 some time early next year. And thanks for the kind words about Breeze :) |
|
sure, will add the system tests. Would try to do it in first half of the week. ad generic transfers - @potiuk great to hear that something like that it's happening. thank you for the info, i'll look into it :) |
7df41f4 to
bcc8664
Compare
|
hello, I amended the system tests and docs. Few notes: 1. It took quite a lot of time making it work locally.
2. I'm not really sure what are these tests for. Not sure if 1. should be solved as I'm not sure about 2. 😄 But if yes, I can fix both later - gcs_bucket name could be generated and made unique and docs could be updated to mention setting up env vars for system tests specifically. |
We are working to make all environmental variables public, but that will take some time. Sorry for the inconvenience.
It looks like you don't have access to the right buckets. Each DAG example should allow bucket to be changed using an environment variable. In this particular case, it did not happen, and I think it is worth adding this possibility. This will allow you to use this easy to change buckets in case of a conflicting name.
In most cases, we have a more complicated DAG example and the validation is embedded in the DAG logic, e.g. first task creates a resource, and the next task uses this resource. If the first task didn't create it, the next task wouldn't use it. But even without complicated DAGs, such simple system tests without validation can detect problems with third libraries, because when a method with wrong parameters is called, a runtime exception will be thrown. This won't get detected in unit testing as we use unitest.mock very intensively.
We don't want random names generated every time we run the test as this will make debugging bugs difficult. The bucket name should only be generated once on a given computer to make behavior more predictable. If you want, you can create This quarter, we've already started working on making these system tests run regularly on the CI. Once we're done, everyone will have an easy way to create all the required environment keys and variables. Running system tests in an OSS project is quite complicated from a variety of financial, security and technical factors. Hope my explanations were helpful for you. Let me know if you want to make any more changes. If not, I will run the system tests (manually) and check if it works properly, then merge the change. |
bcc8664 to
9262b3f
Compare
|
thank you very much for such an thorough explanation, it clears a lot stuff in my head!
amended with possibility to take it from env vars, this was missing and thats why I got stuck to most :/
agree. my idea was to append some random suffix when starting breeze and set the env vars, e.g. that'd be all from my side, so you can test it :) |
9262b3f to
680d7a4
Compare
|
sure, done. |
Thanks! |
|
The PR should be OK to be merged with just subset of tests as it does not modify Core of Airflow. The committers might merge it or can add a label 'full tests needed' and re-run it to run all tests if they see it is needed! |
Hello!
Implemented feature for postgres_to_gcs operator that allows to use server side cursor. More in the issue #8664
I was thinking about multiple designs to generalize it a bit more, but in the end went with straight forward solution on doing it only for postgres db, while keeping it consistent with presto_to_gcs operator. If there would be more demand for server-side cursors implemented on other dbs as well (mssql, mysql), generalization can be done then.
btw, had great experience with Breeze, absolutely love it! I was sceptic and was counting on "this will help you develop, but it will yield 100 errors anyway" experience, but it was really a breeze - install, read docs, run. nice!
Thanks!