add newer_than parameter to SFTP sensor#21811
Closed
AdamPaslawski wants to merge 282 commits intoapache:mainfrom
Closed
add newer_than parameter to SFTP sensor#21811AdamPaslawski wants to merge 282 commits intoapache:mainfrom
AdamPaslawski wants to merge 282 commits intoapache:mainfrom
Conversation
AdamPaslawski
commented
Feb 25, 2022
Comment on lines
66
to
69
| if self.newer_than: | ||
| _mod_time = datetime.strptime(mod_time, '%Y%m%d%H%M%S') | ||
| _newer_than = make_naive(self.newer_than) if not is_naive(self.newer_than) else self.newer_than | ||
| return _newer_than <= _mod_time |
Contributor
Author
There was a problem hiding this comment.
I decided to assume a naive comparison and allow the sensor to be configured with an offset aware or unaware newer_than parameter, thoughts on this @eladkal ?
Member
There was a problem hiding this comment.
I think you shoudl convert all dates to UTC. It's surprisingly hard to find what is the timezone of st_mtime but all signs point at UTC.
Contributor
Author
There was a problem hiding this comment.
I agree. Assuming the SFTP site is always on the same time-zone as the airflow instance is weaker than assuming UTC.
I'll make that change.
TableConfig requires models.ImportError, while the try/except on line 128 requires the builtin ImportError This was causing errors when running with k8 scheduler, without having celery packages installed.
Co-authored-by: huan.15 <huan.15@kakaocorp.com>
…pache#21546) We are preparing to switch from Buster to Bullseye and this is the second change that is needed (following apache#21522). This change allows to choose whether we want to use Buster or Bullseye images as a base. We need to be able to choose, because: 1) we want to keep backwards compatibility and continue our users to build Buster-base images 2) we cannot yet fully switch to Bullseye because MsSQL's odbc driver does not yet support Bullseye and we reached out to mysql maintainers to learn about their plans to make the decision on when and how we are going to support Bullseye and MSSQL. Details of this discussion are in: https://github.com/MicrosoftDocs/sql-docs/issues/7255#issuecomment-1037097131 This PR adds the capability of choosing the DEBIAN_VERSION in Breeze when building images but does not yet switch from Buster to Bullseye
) Debian 11 Bullseye have been released some time ago as the new LTS Debian release and already all our dependencies (including MySQL and MSSQL ODBC drivers) caught up with it so we can finally migrate to it. This change switches base images to bullsey for our Dockerfiles as well as for Redis image we are using in CI. The relevant packages have been updated to include that and documentation have been updated. Examples of ours also are updated to use "bullseye" rather than buster. Closes: apache#18190 Closes: apache#18279
…pache#21378)" (apache#21874) This reverts commit 5d89dea. The issue is not a random IO timeout -- it's a problem with the file in the repo. Reverting this right now as all PRs are failing :(
* Expand mapped tasks in the Scheduler Technically this is done inside DagRun.task_instance_scheduling_decisions, but the only place that is currently called is the Scheduler The way we are getting `upstream_ti` to pass to expand_mapped_task is all sorts of wrong and will need fixing, I think the interface for that method is wrong and the mapped task should be responsible for finding the right upstream TI itself. * make UI and tree work with mapped tasks * add graph tooltip and map count * simplify node label redraw logic * add utils.js and map_index to /taskInstances * use TaskInstanceState instead of strings * move map_index on /taskinstance to separate PR * check to use Task or Tasks * remove `no_status` and use TaskInstanceState Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
The method outlined in the current doc results in a 403 error, which can be avoided by following Databricks' documentation on this topic: https://kb.databricks.com/dev-tools/invalid-access-token-airflow.html. The changes suggested here reflect the Databricks doc.
`sql` accepts `Union[str, Iterable[str]],` not `Optional[Union[Dict, Iterable]],`
* document airflow version in each alembic migration module and use this to autogen the doc * update each migration module to have the same description used in migration ref (so it can be used in autogen)
Fix walking through wildcarded directory in `FileSensor.poke` method
* log celery task id to correlate logs * add celery.task_timeout_error metric
need to add packages in setup.py, otherwise the package can not be found
…he#20759) A workaround was added (apache#5731) to handle the refreshing of EKS tokens. It was necessary because of an upstream bug. It has since been fixed (kubernetes-client/python-base@70b78cd) and released in v21.7.0 (https://github.com/kubernetes-client/python/blob/master/CHANGELOG.md#v2170).
When the no value is found with `get_conn_value`, the warning was being triggered, even though `get_conn_value` was implemented and just returned no value (cus there wasn't one). Now we make the logic a little tighter and only raise the dep warning when `get_conn_value` not implemented, which is what we intended to do in the first place.
* Stronger language about Docker Compose customizability Despite our warnings, our users continue treating the Docker Compose that we exposed as something that should be easy to extend and customize for their own needs, yet they continue to struggle with some basic behaviour of containers, Docker Compose and how they interact. This results in vast space of potential problems as Docker Compose gives the user a false premise of something that "just works" where it requires quite a deep understanding on how it works. When you get things wrong with Docker Compose, you often end up with extremely confusing messages, that might suggest that the problem is with Airflow, but really the problem is with how users interact with their custom Docker images, registries, pulling, networking, mounting volumes and plenty other things. While this is the same with Kubernetes and Helm Chart, Helm Chart makes it infinitely easier to customize in declarative way (this is what our values.yaml does) and anything that has not been foreseen by Helm Chart developers is "hard" by definition. Docker Compose makes no such distinction. You really can't make Docker Compose customizable by configuration, and any customization in it requires modifying the compose file and for people who do not know what they are doing will eventually lead to errors that they are not able to diagnose and leads to creation of "Airlfow isssues", where they should be brought to "Docker Compose" issues. Example of that is here: apache#22301 where there are at least two issues that are not reproducible without knowing in detail what the user has done, how the image was build and distributed, and how the docker-compose installation interacted with them. This leads to a terrible distraction for supporting users of Airflow as the issues are really Docker Compose issues and Airflow maintainers should not be involved in solving those. This PR adds a bit stronger language and statement about the scope and customizability of the Quick Start Docker Compose of ours. Not only mentioning "Lack of Production Readiness" but also the responsibility of the user to understand and diagnose docker compose errors on their own and setting expectations that issues with Docker Compose running should be directed elsewhere. * Update docs/apache-airflow/start/docker.rst * Update docs/apache-airflow/start/docker.rst Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> * Update docs/apache-airflow/start/docker.rst Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
We had disabled this previously in (apache#22254) but now the website is up on a different domain as listed in celery/celeryproject#51 (comment)
The celery documentation have been moved from https://docs.celeryproject.org/ to https://docs.celeryq.dev/. The old links now refer to a 404 error page, the new links to the actual documentation.
Prior to SqlAlchemy 1.4 the correct scheme for postgres was `postgres+psycopg2` but as of 1.4 it is `postgresql`. Airflow 2.3 updates SqlAlchemy to 1.4 so unless we patch the config for users (or they update their URIs), upgrading to 2.3 will break.
…ingOperator, S3GetBucketTaggingOperator, S3DeleteBucketTaggingOperator, S3DeleteBucketOperator (apache#22312)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes: #21655
Add a parameter newer_than to the sftp sensor to allow for specifying a datetime for which the file in the SFTP site should be newer than