Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of testing Providers that were prepared on October 30, 2021 #19328

Closed
28 of 54 tasks
potiuk opened this issue Oct 30, 2021 · 31 comments
Closed
28 of 54 tasks

Status of testing Providers that were prepared on October 30, 2021 #19328

potiuk opened this issue Oct 30, 2021 · 31 comments
Labels
kind:meta High-level information important to the community testing status Status of testing releases

Comments

@potiuk
Copy link
Member

potiuk commented Oct 30, 2021

Body

I have a kind request for all the contributors to the latest provider packages release.
Could you help us to test the RC versions of the providers and let us know in the comment,
if the issue is addressed there.

Providers that need testing

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 2.4.0rc1

Provider apache.hive: 2.0.3rc1

Provider cncf.kubernetes: 2.1.0rc1

Provider docker: 2.3.0rc1

Provider elasticsearch: 2.1.0rc1

Provider facebook: 2.1.0rc1

Provider google: 6.1.0rc1

Provider jenkins: 2.0.3rc1

Provider microsoft.azure: 3.3.0rc1

Provider mongo: 2.2.0rc1

Provider pagerduty: 2.1.0rc1

Provider salesforce: 3.3.0rc1

Provider samba: 3.0.1rc1

Provider sftp: 2.2.0rc1

Provider snowflake: 2.3.0rc1

Provider ssh: 2.3.0rc1

Provider tableau: 2.1.2rc1

Provider trino: 2.0.2rc1

Providers that do not need testing

Those are providers that were either doc-only or had changes that do not require testing.

Thanks to all who contributed to those provider's release

@sreenath-kamath @jarfgit @frankcash @deedmitrij @enima2684 @uranusjr @peter-volkov @mariotaddeucci @potatochip @malthe @msumit @dimberman @jameslamb @Brooke-white @GuidoTournois @minu7 @ashb @shadrus @ignaski @eladkal @ephraimbuddy @eskarimov @JavierLopezT @keze @josh-fell @raphaelauv @blag @Aakcht @guotongfei @SamWheating @danarwix
@subkanthi @alexbegg @bhavaniravi @tnyz @Goodkat @fredthomsen @SayTen @kaxil @lwyszomi @ReadytoRocc @baolsen @30blay @nathadfield @xuan616 @RyanSiu1995

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@potiuk potiuk added the kind:meta High-level information important to the community label Oct 30, 2021
@raphaelauv
Copy link
Contributor

for cncf.kubernetes: 2.1.0rc1 it's all good

add more information to PodLauncher timeout error (#17953) -> is working

  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 374, in execute
   raise AirflowException(f'Pod Launching failed: {ex}')
airflow.exceptions.AirflowException: Pod Launching failed: Pod took longer than 120 seconds to start. Check the pod events in kubernetes to determine why.
[2021-10-30, 14:21:41 UTC] {local_task_job.py:154} INFO - Task exited with return code 1
[2021-10-30, 14:21:41 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check

Add more type hints to PodLauncher (#18928) > just typing , so OK

@pavelhlushchanka
Copy link
Contributor

I think #18733 was released in 2.3.0

@raphaelauv
Copy link
Contributor

for google: 6.1.0rc1

Google provider catch invalid secret name (#18790) -> is working

[2021-10-30 14:56:31,254] {logging_mixin.py:104} INFO - Running <TaskInstance: a_nice_dag.also_run_this 2021-10-20T00:00:00+00:00 [running]> on host 0dbece050201
[2021-10-30 14:56:31,797] {secret_manager_client.py:100} ERROR - Google Cloud API Call Error (InvalidArgument): Invalid secret ID XXXXXXXXXXXXXX-variable-toto.tata.
                Only ASCII alphabets (a-Z), numbers (0-9), dashes (-), and underscores (_)
                are allowed in the secret ID.
                
[2021-10-30 14:56:31,858] {taskinstance.py:1300} INFO - Exporting the following env vars:
...

the error about secrets with a non valid char is now catch

@potiuk
Copy link
Member Author

potiuk commented Oct 30, 2021

I think #18733 was released in 2.3.0

Good spot @codenamestif ! It turned out that after last month's release of 2.3.0 as rc2 I set the "2.3.0" tag wrongly to rc1 - so the changes added between rc1 and rc2 were duplicated in the issue and changelog (and I missed that when I prepared it today). Those are the duplicated issues:

  • Add AWS Fargate profile support (#18645)
  • Add emr cluster link (#18691)
  • AwsGlueJobOperator: add wait_for_completion to Glue job run (#18814)
  • AwsGlueJobOperator: add run_job_kwargs to Glue job run (#16796)
  • Add additional dependency for postgres extra for amazon provider (#18737)
  • Adds an s3 list prefixes operator (#17145)
  • ECSOperator: airflow exception on edge case when cloudwatch log stream is not found (#18733)
  • Amazon Athena Example (#18785)
  • Amazon SQS Example (#18760)

I already corrected the tag, the issue and I will also correct the Changelog (and in docs those entries will be missing), but unless there are other, more serious changes that make it rc1, it will remain in the package README (and tecchnically those changes ARE in 2.4.0 as well as 2.3.0), In the future packages changelog will corrected.

Sorry everyone involved for spamming!

@Aakcht
Copy link
Contributor

Aakcht commented Oct 30, 2021

Tested #18854 - all good.

#18331 is already present in 2.1.1 ( it was tested in #18638) , so I don't think hdfs 2.1.1rc1 should be present here.

@JavierLopezT
Copy link
Contributor

JavierLopezT commented Oct 31, 2021 via email

@mariotaddeucci
Copy link
Contributor

#18027 is already present in 2.3.0, tested in #18638
#18671 and #18819 are all good
#18844 got errors using RedshiftSQLHook when executing multiple commands into a prepared statement. I'll fix it.

@enima2684
Copy link
Contributor

#18990 tested and working

@enima2684
Copy link
Contributor

#18872 I tested this PR and it is not working as expected.
The DockerSwarmOperator never returns and keep hanging.
I used for the test the following Job :

    task = DockerSwarmOperator(
        task_id="task",
        image="python:3.9",
        enable_logging=True,
        tty=True,
        command=["echo", "hello world !"]
    )

@potiuk
Copy link
Member Author

potiuk commented Nov 1, 2021

#18331 is already present in 2.1.1 ( it was tested in #18638) , so I don't think hdfs 2.1.1rc1 should be present here.

Right - the issue generation script has a bug apparently in this case - hdfs is not being released. I Will fix it. Removed it from the issue.

@potiuk
Copy link
Member Author

potiuk commented Nov 1, 2021

#17397 and #18764 were already tested in the last wave

Correct - already removed (that was amazon's later release and wrong tag set - already corrected :)

@potiuk
Copy link
Member Author

potiuk commented Nov 1, 2021

#18844 got errors using RedshiftSQLHook when executing multiple commands into a prepared statement. I'll fix it.

How serious / how much of a regression change is it @mariotadeucci ?

@Goodkat
Copy link
Contributor

Goodkat commented Nov 1, 2021

#17850 was already tested within 2.0.0rc2 release
#15016 (comment)

@potiuk
Copy link
Member Author

potiuk commented Nov 1, 2021

| #18872 I tested this PR and it is not working as expected.

@RyanSiu1995 can you please double check if you have the same problem ?

@potiuk
Copy link
Member Author

potiuk commented Nov 1, 2021

#17850 was already tested within 2.0.0rc2 release

Correct - exasol is not being released (same bug as with hdfs when generating issue only). I will fix it. Sorry for spam.

@potiuk
Copy link
Member Author

potiuk commented Nov 1, 2021

I removed the few other providers which suffered from the same issue - sorry :(

@fredthomsen
Copy link
Contributor

fredthomsen commented Nov 1, 2021 via email

@GuidoTournois
Copy link
Contributor

I have validated that both changes for pagerduty work as intended!

@Brooke-white
Copy link
Contributor

#18447 tested and working

@anaynayak
Copy link
Contributor

#18807 tested and verified both single + multiple s3 prefix matches.

@josh-fell
Copy link
Contributor

Tested and verified #19052, #19062, and #19323. Thanks for organizing this release Jarek!

@tnyz
Copy link
Contributor

tnyz commented Nov 1, 2021

tested #18676 and working

@potiuk
Copy link
Member Author

potiuk commented Nov 1, 2021

Good progress so far :)!

@mariotaddeucci
Copy link
Contributor

mariotaddeucci commented Nov 2, 2021

#18844 got errors using RedshiftSQLHook when executing multiple commands into a prepared statement. I'll fix it.

How serious / how much of a regression change is it @mariotadeucci ?

@potiuk Happend on S3ToRedshiftOperator with specific configuration. By using "UPSERT" or "REPLACE" is generate an sql block with multiple queries. The RedshiftSQLHook don't support execute multiple queries in a single call of execute. To fix it just need to convert the single query string to a list of queries. Fix is available on PR #19358.

@eskarimov
Copy link
Contributor

Tested #19048, works correctly :)

@bhavaniravi
Copy link
Contributor

Tested #19276 MongoSensor picking up the new db param

@mik-laj
Copy link
Member

mik-laj commented Nov 2, 2021

Provider facebook: 2.1.0rc1
Align the default version with Facebook business SDK (#18883): @RyanSiu1995

It is a breaking change, so it should be in a new release we should bump major version.

Provider google: 6.1.0rc1
Replace default api_version of FacebookAdsReportToGcsOperator (#18996): @eladkal

It is breaking change also.

Provider microsoft.azure: 3.3.0rc1
update azure cosmos to latest version (#18695): @eladkal

I am not sure here, but when we changed the minimum requirements of the google libraries, we always bumped the major version. Airflow is a thin layer between libraries and user, so breaking changes propagate to Airflow very easily.

@potiuk
Copy link
Member Author

potiuk commented Nov 2, 2021

Provider facebook: 2.1.0rc1
Align the default version with Facebook business SDK (#18883): @RyanSiu1995

It is a breaking change, so it should be in a new release.

Provider google: 6.1.0rc1
Replace default api_version of FacebookAdsReportToGcsOperator (#18996): @eladkal

It is breaking change also.

Provider microsoft.azure: 3.3.0rc1
update azure cosmos to latest version (#18695): @eladkal

I am not sure here, but when we changed the minimum requirements of the google libraries, we always bumped the major version. Airflow is a thin layer between libraries and user, so breaking changes propagate to Airflow very easily.

Thanks @mik-laj I will take a look and decide

@SamWheating
Copy link
Contributor

regarding #18992

I've tested this change in our environment with the DataflowCreateJavaJobOperator. I haven't had a chance to set up test jobs for the other operators but based on the similarities in implementation I think that this fix should be applicable to all of them.

This PR introduced a change in behaviour, but I don't think it should be considered a breaking change as it is simply returning to the original / documented behaviour.

@potiuk
Copy link
Member Author

potiuk commented Nov 3, 2021

Provider facebook: 2.1.0rc1
Provider google: 6.1.0rc1
microsoft.azure 3.3.0rc1

Hey @mik-laj - thanks for raising your concenrs. i looked closer, and I do not see those changes as really "breaking" (or at least not really breaking enough and under our control to justify major version change (but I am open to hear to arguments if others disagree).

Facebook

Facebook API does not follow SemVer. AT ALL.

They mostly introduce new features and bugfixes when they bump the version. More interestingly, they even CHANGE behaviour of old versions of APIs when this behaviour is working for quite some time in the new versions.

Here are the changes to insights features we are using for example:

Insights
Applies to v9.0+. Will apply to all versions on May 9, 2021.
IG User follower_count values now align more closely with their corresponding values displayed in the Instagram app. In > addition, follower_count now returns a maximum of 30 days of data instead of 2 years.

and:

Ads Insights API
Updated date_preset parameter
Applies to v10.0+.

The lifetime parameter (date_preset=lifetime) is disabled and replaced with date_preset=maximum, which can be used to retrieve a maximum of 37 months of data. The API will return an error when requests contain date ranges beyond the 37-month window.
For v9.0 and lower, there will be no change in functionality until May 25, 2021. At that time, date_preset=maximum will be enabled and any lifetime calls will default to maximum and return only 37 months of data.

and:

Deprecation of Store Visit Metrics
Applies to 11.0+. Will apply to all versions on Sept. 6, 2021.

This means that the version 6.0 of "insights" we were using so far has anyway changed behaviour on Sep 6th this year to match the current 11+ behaviour 😱

So from what we see here, Facebook has chosen a "move fast break things" approach for their APIs. I assume Facebook users know it and I think the approach we took is a good one - we should not really try to keep compatibilty with version vN of Facebook, because even they don't do/recommend it (and we won't be able to actually do that because they can anyhow change the behaviour without us knowing it). So keeping the approach where by default API version is "latest" seems like a good approach and I do not see it really breaking anything.

Cosmos

Provider microsoft.azure: 3.3.0rc1
update azure cosmos to latest version (#18695): @eladkal

I looked at the changes again, and I do not see any breaking change. We indeed sometimes (but not always) bumped major versions for our providers when underlying libraries changed but only when they changed the Airflow API of that provider (format of data returned and similar). The sheer fact of library upgrading it's version does not automatically invalidates and forces bump to new version of all the applications and libraries using it. In this particular case, we simply adapted our implementation so that the API remained the same (the breaking change which affected us was simply throwing a different exception - in our case we caught it and returned None, and this behaviour remained so there is no reason to bump the major version here.

@potiuk
Copy link
Member Author

potiuk commented Nov 4, 2021

Closing this one. As described in https://lists.apache.org/thread/6tzsq6xkm5r1q6pqtyq5yhvr5p1jqd13 I am releasing the wave now and we are removing the Amazon Provider from that release due to Redshift regression (and I will release fixed version soon). Thanks @mariotaddeucci for testing and spotting the problem (it was not obvious in the initial change and it's great you tested this case!)

Thanks everyone who helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:meta High-level information important to the community testing status Status of testing releases
Projects
None yet
Development

No branches or pull requests